[Issue] BigInt serialization

Serializing a bigint into a DataView currently requires many bit shifts that turn the operation into O(n^2). The only way to avoid this is with .toString(16), then re-parse the string byte by byte. This feels like the wrong way to do it, and certainly does not come without overhead.

Solutions

BigInt.prototype.toArray(little? = false, array?)

Returns a Uint8Array representing the bits that make up the BigInt, with an additional explicit zero for positive integers and one for negative (i.e 128 would be represented as [0, 128] to disambiguate from -128 which would be [255, 128]. Numbers like 64 or -64 do not need this disambiguation)
little essentially writes the array in reverse
If array is specified then it will be filled and returned instead, provided it is big enough (a RangeError is thrown otherwise)

BigInt.fromArray(array: TypedArray, little? = false, offset?, length?)

Same as above but constructs from an array.
little essentially reads the array in reverse

BigInt.prototype.magnitude(): number

Returns the magnitude of this BigInt in bits
The result gives a hint as to the size of the array that would be returned by toArray(): for a magnitude M, the array returned has a byte length of (M+7)>>3. (As a result, the magnitude should have 1 added for the sign bit, 1n.magnitude() == 2 )

Let me know if I am missing anything!

4 Likes

Uint8Array.setFromHex is coming, and should be much faster than parsing the string yourself, but yes it's still slower than writing the BigInt directly would be because of the intermediate string.

For BigInt values less than 2**64 there's DataView.prototype.setBigUint64, though that doesn't cover larger values without additional work.

Between the two of those, however, I don't think the remaining use case comes up enough to be worth solving. Writing the last 64 bits and then shifting over by 64 bits and repeating should be fairly efficient. I don't know why you think that it's O(n^2).

3 Likes

A bigint with 64,000 bits will require 1000 shifts, but each of these shifts copy 0-64000 bits, that's how it's O(n^2)
64,000 bits would require copying 3Mbit of memory
64 million bits would require copying 3Tbit of memory

2 Likes

I'm curious, are there really applications that use that large bigints?

You could also divide and conquer the outer loop, to achieve O(n log n). Instead of shifting 64M-bit numbers 64 bits at a time, chop the number into 16M-bit chunks, and each of those into 256k-bit chunks, and each of those into 4k-bit chunks. Then instead of doing 1M shifts of a ~32M-bit number, you'll only do 4 of those plus 256 shifts of an 8M-bit number plus 16k shifts of a ~128k-bit number plus 1M shifts of a ~2k-bit number.

2 Likes

in ungap structured-clone bigint is serialized as easily as bigint.toString(); structured-clone/esm/serialize.js at 68e41b1f898343db1bd6b681e947e08cb7500299 Ā· ungap/structured-clone Ā· GitHub

AFAIK bigint also exceeds Array boundaries (bound to Int32 world) so ...

  • why do you think Array is the right primitive to represent BigInt
  • if that's not about arrays but Typed Arrays, why wouldn't the method be called toBuffer or something that doesn't involve arrays? toUint8Array and fromUInt8Array would be better? ... or toView and fromView ?
  • why don't you pass a specialized instance that carries just the string representation?

curious about what serialize means in here, happy to read answers :wave:

it's not about arrays or typed arrays, both arrays and strings are a higher level view of raw binary data. Uint8Arrays are a relatively neutral way of representing binary data (1 element = 1 byte). In reality, an application may choose to encode the number in their own format (e.g LEB128 or VLQ) but providing one universal method for efficiently getting the bytes of a BigInt is much better than 20 for each possible commonly used format

1 Like

should "little" be true by default?
Internally implementations stores the value it in the array of 64 bit integers. each 64bit integer is in little-endian byte order. Well, on x86 there is BSWAP instruction, which can reorder bytes quickly.

toArray should also accept offset perhaps, well.. new Uint8Array(buffer, offset, length) can be used instead. Same for fromArray

1 Like

Accepting offset for toArray makes sense
However, for your first point, consider that in any application that seeks performance the second parameter would be supplied and therefore implicitly the first. Consider also that while the 64 bit numbers would be stored little-endian on most platforms, the entire array may not necessarily be stored little endian (u64 might be stored lowest-byte-to-highest-byte but the whole number might be highest-u64-to-lowest-u64) in which case setting little to true by default would not make much sense. Having little set false by default also matches with DataView's methods

O(n log n) with lots of overhead and recursion is still a lot worse than an O(n) memcpy. In addition, how would you know the magnitude of the number, in order to detect the best possible number of steps to make?

2 Likes

You can always replace recursion with a loop and a stack. Yes, it's worse than O(n), my point was that you don't have to suffer O(n*n) to do it right now.

You can find that in O(n log n) as well. (too bad BigInt doesn't have .bit_length or .log2)

2 Likes

You can accomplish reading only a part of the array by creating a subarray:

BigInt.fromArray(array.subarray(offset, length))

BigInt.toArray(array.subarray(offset, length))

BigInt.prototype.magnitude()
It looks like it is the same as bitLength(abs(bigint)) + 1. So may be BigInt.bitLength or BigInt.prototype.bitLength should be provided instead.

There's two main applications for arithmetic on integers that large:

  • Cryptography commonly see 4096-bit primes and sometimes larger, though they usually use byte buffers directly.
  • Statistical analysis deals with a lot of operations where numbers grow exponentially if not even faster (like factorially: n choose r = n! / r!(n-r)!), and the result of even seemingly innocent intermediate operations (like 10000! ā‰ˆ 10^35659.454... ā‰ˆ 2^118458.134..) can quickly blow past that. These sometimes show up in gambling contexts, but most often in applied scientific computing (such as in urban planning and financial analysis) where accuracy can't be sacrificed (usually for legal reasons).

From what I've read elsewhere, some of the first people to move to bigints in the first place were gambling and financial companies.

Also, there's precedent for having such a "to byte array" method on arbitrary-precision integers. To name a few:

3 Likes

Conversion of bigints to and from byte arrays is useful for polynomial multiplication using Kronecker Substitution.

Imagine we have the following polynomials:
A(x) = 6x^3 + 3x^2 + 2x + 1
B(x) = 3x^3 + 5x^2 + 3x + 2
They can be encoded as integers by evaluating at x = 1000:
A(1000) = 6003002001
B(1000) = 3005003002
The product is:
6003002001n * 3005003002n === 18039039034017007002n
Now we can decode the result from the integer by splitting into groups of 3 digits:
A(x)*B(x) = 18x^6 + 39x^5 + 39x^4 + 34x^3 + 17x^2 + 7x + 2

This polynomial multiplication can benefit from fast integer multiplication.

In practice, binary representation is used. And two's complement is used for negative coefficients. Encoding and decoding take 2/3 of time for small integer polynomials when using recursion.

2 Likes

Are there any further discussion about this? It seems a relatively useful update for BigInt to use with ArrayBuffer, etc., containing wasm, uniformly.

Previously https://es.discourse.group/t/bigint-enhancements has discussed the nearly same issue and didn't get solved. Is there a blocking issue on that?

Unfortunely I did not find any (formally) proposal about this, the only I can find is in the ā€œfurther proposal planā€ in proposal-bigint-math#Vision.

Welcome to the forum. I’m the author of that BigInt Math proposal you linked to. I would like to clarify that the section to which you linked is not really a ā€œfurther proposal planā€ but rather merely a vision for possible future proposals for BigInts.

There has been no further discussion about standardized BigInt serialization directly into ArrayBuffers for now. This is probably because, as far as I can tell, this use case is quite rare. Right now TC39 is focusing on more frequent needs.

If you need to serialize BigInts into ArrayBuffers, I would recommend that you create your own code that does so; please consider also publishing to NPM. The more specific real-world uses for serialization that TC39 sees, the more likely TC39 will prioritize it. Welcome again to the forum.

2 Likes

I don't agree with the mentality that priority should be put to the proposals / ideas with "frequent need". This puts forward the precedent that it's okay if a language has gaping holes and unfinished features as long as no one is using them.

Actually it's worse than that, people would be using them, but instead, there seems to be a bias towards implementing higher-level and convenience features instead of lower-level building blocks, and this is prevalent across the entire javascript ecosystem (if you're not convinced look no further than WebRTC). You spend your energy tackling the most "frequent" needs, forgetting to realise that these problems would resolve themselves if the foundation was solid to begin with. This adds a lot of bloat to the language and its environments without fundamentally solving anything.

This specific issue exists because there is no solid solution to the problem proposed above. Instead, we see redundant features make it to shipping like toReversed or the pipeline operator (don't get me wrong, it's a good feature, but that doesn't take away from the fact that it adds absolutely nothing to Javascript and takes away priority from other issues).

Counterargument: WebAssembly, Float16Array and Records/Tuples are all great additions that make Javascript more complete as a language and directly addresses many serious hurdles that often heavily impact performance or code complexity

Hi, thanks for your response. I think there might be a misunderstanding here over what I said about TC39 proposal triage.

  • I didn’t say anything about TC39 prioritizing high-level convenience features more than lower-level building blocks.
  • In fact, the opposite has been happening: over the past several years, several TC39 members have been pushing for prioritizing lower-level features that bring new capabilities more than higher-level developer-convenience APIs that are already possible in userspace.
  • Relevant here are the JS0/JSSugar proposal and the follow-up Things Should Layer proposal.

Standardizing serialization of BigInts into ArrayBuffers would not be a ā€œfoundationalā€ feature.

  • Built-in serialization would be a higher-level convenience feature because it is already possible in user code.
  • Built-in serialization is not a ā€œfoundationalā€ feature; it is a convenience feature.

Whether a feature is ā€œhigher-levelā€ and for developer ā€œconvenienceā€ or ā€œlower-levelā€ and ā€œfoundationalā€, the feature must address common or important real-world use cases.

  • Even if built-in serialization were foundational, if it would be a very rare use case, then it deserves to be considered after proposals that have more common use cases and which would help more programs.
  • And TC39 and the engine implementors, having a limited amount of time, must triage foundational proposals based on how many programs would benefit from them.

Also, WebRTC has very little to do with TC39 or the core JavaScript language. It is a W3C web API. TC39 did not create and does not manage WebRTC, so I’m confused on how it’s relevant to TC39’s proposal triage.

The pipe operator has been dormant for three years because its champions have been prioritizing other, higher-priority, lower-hanging fruit more than the pipe operator. See pipe’s most recent update.

proposal-change-array-by-copy (toReversed etc.) was indeed for developer convenience.

  • But proposal-change-array-by-copy’s use cases are probably orders of magnitude more common than serialization of BigInts to ArrayBuffers would be.
  • Plus, like I said, TC39 has gradually been focusing on lower-level proposals that bring new capabilities in general.
  • TC39 generally scrutinizes higher-level proposals that bring only developer convenience more skeptically. proposal-change-array-by-copy did face this increased scrutiny and happened to pass it.
  • There are numerous other active proposals that deal with patching up low-level holes in JavaScript’s foundations.
    • Two examples of TC39 improving JavaScript’s foundations:
      1. Patching up the private-field hole in Object.preventExtensions, making it possible for them to have immutable memory layouts.
      2. Structs.
    • Both of these proposals would make interoperability with Wasm garbage collection much easier.
    • Both of these proposals would affect many more programs than the developer convenience of serializing BigInts to ArrayBuffers.
    • My understanding is that most active TC39 proposals today are like those two examples: ā€œlow-levelā€, ā€œfoundationalā€ ā€œbuilding blockā€ proposals that bring true new capabilities to developers.

Thanks for writing. I hope what I said makes some sense. I apologize if I misunderstood anything you said.

1 Like

I read your whole comment and I sincerely hope you're right, that would be a massive step in the right direction.
That being said, a building-block feature is about more than if something is "already possible". Consider that javascript is turing complete, everything is already technically possible in userspace. These "hole"s then, refer to incomplete features, the main problem being that in many cases they render the rest of the feature mostly useless, as the increased complexity/overhead of trying to integrate a userspace solution into a language-level construct outweighs the benefit of using said feature at all. In my case, having a bigint that cannot be serialized fast is a lot worse than having no bigint at all.

Let's focus a bit more this issue: BigInt serialization is possible in userspace. However it's not possible in O(n). If your program needs to serialize or otherwise split very large bigints, you have 4 solutions:

  1. Accept that your code will block for minutes straight because of a severe oversight in the design of BigInts
  2. Accept that your program is not feasible to write in pure Javascript and try for a native or wasm implementation depending on the environment
  3. Ditch BigInt and make your own large integer class
  4. Raise an issue like I did to address this once and for all

As far as I am aware the current most common approaches are 2. and 3., and what you are suggesting could only fall under 1.

Hope this helps clarify where my frustration originates

toReversed was also originally part of the Records and Tuples proposal. It was split off so it could be landed independently. This was done so that if Tuples were added to the language they could share the same common interface with Arrays.

(cite: I was the champion of the proposal)

2 Likes