[Issue] BigInt serialization

Serializing a bigint into a DataView currently requires many bit shifts that turn the operation into O(n^2). The only way to avoid this is with .toString(16), then re-parse the string byte by byte. This feels like the wrong way to do it, and certainly does not come without overhead.

Solutions

BigInt.prototype.toArray(little? = false, array?)

Returns a Uint8Array representing the bits that make up the BigInt, with an additional explicit zero for positive integers and one for negative (i.e 128 would be represented as [0, 128] to disambiguate from -128 which would be [255, 128]. Numbers like 64 or -64 do not need this disambiguation)
little essentially writes the array in reverse
If array is specified then it will be filled and returned instead, provided it is big enough (a RangeError is thrown otherwise)

BigInt.fromArray(array: TypedArray, little? = false, offset?, length?)

Same as above but constructs from an array.
little essentially reads the array in reverse

BigInt.prototype.magnitude(): number

Returns the magnitude of this BigInt in bits
The result gives a hint as to the size of the array that would be returned by toArray(): for a magnitude M, the array returned has a byte length of (M+7)>>3. (As a result, the magnitude should have 1 added for the sign bit, 1n.magnitude() == 2 )

Let me know if I am missing anything!

3 Likes

Uint8Array.setFromHex is coming, and should be much faster than parsing the string yourself, but yes it's still slower than writing the BigInt directly would be because of the intermediate string.

For BigInt values less than 2**64 there's DataView.prototype.setBigUint64, though that doesn't cover larger values without additional work.

Between the two of those, however, I don't think the remaining use case comes up enough to be worth solving. Writing the last 64 bits and then shifting over by 64 bits and repeating should be fairly efficient. I don't know why you think that it's O(n^2).

3 Likes

A bigint with 64,000 bits will require 1000 shifts, but each of these shifts copy 0-64000 bits, that's how it's O(n^2)
64,000 bits would require copying 3Mbit of memory
64 million bits would require copying 3Tbit of memory

2 Likes

I'm curious, are there really applications that use that large bigints?

You could also divide and conquer the outer loop, to achieve O(n log n). Instead of shifting 64M-bit numbers 64 bits at a time, chop the number into 16M-bit chunks, and each of those into 256k-bit chunks, and each of those into 4k-bit chunks. Then instead of doing 1M shifts of a ~32M-bit number, you'll only do 4 of those plus 256 shifts of an 8M-bit number plus 16k shifts of a ~128k-bit number plus 1M shifts of a ~2k-bit number.

2 Likes

in ungap structured-clone bigint is serialized as easily as bigint.toString(); structured-clone/esm/serialize.js at 68e41b1f898343db1bd6b681e947e08cb7500299 ยท ungap/structured-clone ยท GitHub

AFAIK bigint also exceeds Array boundaries (bound to Int32 world) so ...

  • why do you think Array is the right primitive to represent BigInt
  • if that's not about arrays but Typed Arrays, why wouldn't the method be called toBuffer or something that doesn't involve arrays? toUint8Array and fromUInt8Array would be better? ... or toView and fromView ?
  • why don't you pass a specialized instance that carries just the string representation?

curious about what serialize means in here, happy to read answers :wave:

it's not about arrays or typed arrays, both arrays and strings are a higher level view of raw binary data. Uint8Arrays are a relatively neutral way of representing binary data (1 element = 1 byte). In reality, an application may choose to encode the number in their own format (e.g LEB128 or VLQ) but providing one universal method for efficiently getting the bytes of a BigInt is much better than 20 for each possible commonly used format

1 Like

should "little" be true by default?
Internally implementations stores the value it in the array of 64 bit integers. each 64bit integer is in little-endian byte order. Well, on x86 there is BSWAP instruction, which can reorder bytes quickly.

toArray should also accept offset perhaps, well.. new Uint8Array(buffer, offset, length) can be used instead. Same for fromArray

1 Like

Accepting offset for toArray makes sense
However, for your first point, consider that in any application that seeks performance the second parameter would be supplied and therefore implicitly the first. Consider also that while the 64 bit numbers would be stored little-endian on most platforms, the entire array may not necessarily be stored little endian (u64 might be stored lowest-byte-to-highest-byte but the whole number might be highest-u64-to-lowest-u64) in which case setting little to true by default would not make much sense. Having little set false by default also matches with DataView's methods

O(n log n) with lots of overhead and recursion is still a lot worse than an O(n) memcpy. In addition, how would you know the magnitude of the number, in order to detect the best possible number of steps to make?

2 Likes

You can always replace recursion with a loop and a stack. Yes, it's worse than O(n), my point was that you don't have to suffer O(n*n) to do it right now.

You can find that in O(n log n) as well. (too bad BigInt doesn't have .bit_length or .log2)

2 Likes

You can accomplish reading only a part of the array by creating a subarray:

BigInt.fromArray(array.subarray(offset, length))

BigInt.toArray(array.subarray(offset, length))

BigInt.prototype.magnitude()
It looks like it is the same as bitLength(abs(bigint)) + 1. So may be BigInt.bitLength or BigInt.prototype.bitLength should be provided instead.

There's two main applications for arithmetic on integers that large:

  • Cryptography commonly see 4096-bit primes and sometimes larger, though they usually use byte buffers directly.
  • Statistical analysis deals with a lot of operations where numbers grow exponentially if not even faster (like factorially: n choose r = n! / r!(n-r)!), and the result of even seemingly innocent intermediate operations (like 10000! โ‰ˆ 10^35659.454... โ‰ˆ 2^118458.134..) can quickly blow past that. These sometimes show up in gambling contexts, but most often in applied scientific computing (such as in urban planning and financial analysis) where accuracy can't be sacrificed (usually for legal reasons).

From what I've read elsewhere, some of the first people to move to bigints in the first place were gambling and financial companies.

Also, there's precedent for having such a "to byte array" method on arbitrary-precision integers. To name a few:

3 Likes