[DataView] variable int

Variable-width integers, or varints, are encoding unsigned integers using anywhere between one to many bytes, with small values using fewer bytes.

Each byte in the varint has a continuation bit that indicates if the byte that follows it is part of the varint. This is the most significant bit (MSB) of the byte (sometimes also called the sign bit). The lower 7 bits are a payload; the resulting integer is built by appending together the 7-bit payloads of its constituent bytes.

A few modern encode/decoder use this to it's advantage, protobuf for instance and google's structuredClone algoritm use this for instance in IndexedDB

i think it would be cool if we could have this built in to DataView as a both getter & setter.

2 Likes

To clarify, you would like to assign an arbitrary BigInt to a DataView in one single operation, not limited to 64bits.

That's correct. the 7 bit would be a continuation flag indicating that you need to read the next byte, so smaller number === fewer bytes.

Maybe it don't have to be limitied to just BigInt? but could also work for regular numbers too? But i guess using BigInt for this purpose makes more sense.

Currently if you have a BigInt and want to store this value in a typed array, then you have to use setBigUint64 that currently needs 8 bytes (64 bits) but if the value is only 4n then it can safely be stored in just one byte. instead of taking up 8 bytes.


Also it would be useful to know how many bytes you are reading/writing so that you know how many bytes you must skip/jump to next offset. so a return value of { value: 300n, read: 2 } is needed.

I like this idea because it can also be applied to bools. If we ever get a built-in BooleanArray or Uint1Array (inherits from TypedArray) then each bool would only "allocate" 1 bit instead of the usual byte.

However, I don't understand what's the point of having a "signal/metadata" bit and 7bits of payload, it just overcomplicates the data structure and turns it into something similar to a linked-list, which has O(n) (sequential) access times

Never considered Booleans but heck yea, that would be nice to have as well. Easy access to read/write one bit in one byte. but that wasn't the reason why i created this issue. and that's a bit of topic from this feature request i made

overcomplicates the data structure and turns it into something similar to a linked-list, which has O(n)

Even if it's more complicated structure, it's still a widely popuplar format that some frameworks uses, It's a more compact way of storing any arbitrary lengthy integers that can save a bunch of bytes to make the payload smaller. it is also more flexiable to work with it cuz your data can grow to any size you like.

There is a npm package called varint varint - npm that dose exactly this. and it's downloaded over 600k times per week and has 300+ packages that depends on it.

You could also factor in protobufjs that has over 9M downloads / week that have this built in. And google uses this format when using the structured clone algoritm

Google have a good article on variable integer here on how it works

1 Like

Thank you for explaining, and for the links to docs. I now realize it's actually O(1) ! The "linked bytes" are only linked within a CPU word/register but the varints in an array have O(1) access each. This happens because regular Arrays are O(1), and varints have a max size of ~8~ 10 bytes, having an upper bound of O(10) which is equivalent to O(1), accessing a varint in an Array is therefore O(11) = O(1). It's cheap! Not a linked list (especially because LLs use pointers that point somewhere in an unordered heap, while arrays are ordered and varints use continuation-bits)

Looks like there was some discussion about more APIs for this back in 2017 but they did not make the MVP

I don't know if there needs to be any upper limit to ~8~ 10 bytes...

it's currently possible to create any arbitrary BigInt such as
20282409603651670423947251285887n but there is no possible way to store this in a ArrayBuffer using any DataView method.
the highest possible number you can store with DataView.prototype.setBigUint64() is
2n ** 64n - 1n (18446744073709551615n )

so a method to store any arbitrary number wether how small or big it might be shouldn't have any limitation? maybe a optional option to specify the maxium of bytes/bits to read/write would be OK?

const dv = new DataView(new ArrayBuffer(1024))
const offset = 0
const largeN = 20282409603651670423947251285887n
dv.setUvarInt(offset, largeN)
dv.getUvarInt(offset) === largeN
dv.getUvarInt(offset, max) !== largeN 

console.log(new Uint8Array(dv.buffer))
[255, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 63, 0, 0, 0, 0, ...]

I agree with this, specially because of standarization. Encoding/Decoding BigInts to/from TypedArrays is no trivial task, not just because of the sign, but because of all possible endianess combinations.

Usually, devs would use little-endian words, with LE Bytes, and big-endian bits. But there's also the "all BE" encoding. And there also LE word + BE byte + BE bit. And more esoteric encodings like BLB, LLL, and BBL.

There's 8 possible combinations of these encodings, and that's assuming that all word sizes are the same. If we distinguish between 16b, 32b, and 64b, the total number jumps exponentially, and the problem gets even worse if we add 128b.

If there was a std encoding, no dev would have to go through the cognitive overhead and potential performance penalties of choosing a "bad encoding". Choosing a good encoding means that almost no explicit order-conversion is necessary, but that ignores the implicit conversions done when transmitting data between devices with different endianesses.

I know I'm worrying too much about this, and I probably went too far, but this is something that we should keep in mind