Proposal for Array.prototype methods for multi-dimensional arrays

Happy New Year everyone!

I wrote some utilities for multi-dimensional array processing, aiming to provide a blueprint for a new ECMAScript proposal.

The NPM package is here:

And the source code in TypeScript is here:

Feel free to open any issues and pull requests. Let’s improve the package together!

Motivation

There is currently no recommended way to build a new array of a given length, except the Array.from hack, and a single method to do so is being suggested here: Provide an easy way to create a new array, filled via a mapping function

But the Array.from entirely covers the usage of this method, so if there is ever a proposal for it, the proposal would be hard to get accepted.

Afterwards, I came up with some methods for nested arrays:

[2, 3].buildShape((x, y) => x * 3 + y) // [[0, 1, 2], [3, 4, 5]]

As 1D arrays are also n-dimensional arrays, you can essentially do the following:

[5].buildShape(n => n + 1) // [1, 2, 3, 4, 5]

Subsequently it becomes a superset of the method mentioned in the above thread, while being able to extend Array.prototype with more useful utilities.

Related thread: Include a built-in matrix / table / column primitive

Finally

Because I am not too familiar with writing a specification, I hope anyone could help me start the proposal up. Thank you for taking the time to read the entire post!

2 Likes

Multi-dimensional arrays are not a thing in JS. An array of arrays is not a 2D array. You cannot enforce the shape, cannot create views, etc. It doesn't qualify as ndarray.

Imagine if TypedArray constructors that take buffer, would also take stride argument. That'd get you 1D views (along any axis) into "fake ndarray in a Buffer", but it still wouldn't give you ND views. For that a whole new set of *NDArray classes would probably be needed.

2 Likes

Many JavaScript developers will create an array of nested arrays with fixed sizes to represent a 2D data structure. While it's true that JavaScript doesn't provide built-in mechanisms to enforce the length of these nested arrays, I don't think there's an issue with calling these sort of arrays "multi-dimensional", as that's effectively how they're being used. Yes, they're a bit different from a C++ or Java's version of multi-dimensional arrays, because in those languages you would set a size constraint on each dimension of the array, but then again, a JavaScript array is also different from an array in those languages for the same reason - JavaScript arrays have dynamic sizes while theirs do not.

However, you do bring up a good point about whether or not we want to introduce a new multi-dimensional data-structure instead of adding utility methods that operate on nested arrays, which is a good discussion worth having.

Most of the methods @graphemecluster is proposing are utility methods that are to help with any kind of nested array data-structure one might have, the inner arrays don't necessarily have to be the same size to use them, in fact, the only methods that really require a concept of fixed inner-array lengths would be the buildShape(), shape(), and shapeAtOrigin(), where buildShape() produces a multi-dimensional array, and both shape() and shapeAtOrigin just analyze one and returns the lengths of each dimension (and also documents what it'll do if an "improper" multi-dimensional array were given).

2 Likes

Well, just think these are some utilities for “arrays as elements in an array” – I am not intended to make a new type or something. Calling it “multi-dimensional array” is just a convenience; please try not to nitpicking the name. (*・ω・)ノ

Of course there are already some libraries functioning like a real ndarray, like this one. But people will try to avoid using a new class with a small 2D array, like in a simple board game programming logic. So what I am trying to do is just to provide an easy way to manage these “nested array”.

Thank you for helping me explain my thoughts to @lightmare, my English might not be clear enough. What you said is exactly what I’m thinking.

@lightmare did point out a good point: mixing “ndarray”, “multi-dimensional array” and “nested array” might cause confusions to the user, and I shall change all of them to “nested array”. These are only faults in the explanation of my library though; it should not affect what I am proposing.

2 Likes

@graphemecluster - here's just a few thoughts I have on some of your proposed methods. I think I would find a number of them to be incredibly useful, but there's also many others in which I'm skeptical about their overall utility.

buildShape() would be really nice to have. There's been countless times when I've needed to create an n X m array, and it's a little tedious to do using Array.from().

The .shapeAtOrigin() method honestly doesn't seem overly helpful. .shapeAtOrigin() is a little nice, because it provides a more declarative way to do a common task, but the type of code it's replacing isn't all that verbose or difficult to write.

// Instead of code like this:
function doThing(map) {
  for (let y = 0; y < map.length; ++y) {
    for (let x = 0; x < map[0].length; ++x) {
      ...
    }
    ...
  }
}

// We can write code like this:
function doThing(map) {
  const [height, width] = map.shapeAtOrigin()
  for (let y = 0; y < height; ++y) {
    for (let x = 0; x < width; ++x) {
      ...
    }
    ...
  }
}

So yes, using .shapeAtOrigin() improves the code a little bit, but the improvements are minimal, and it requires readers to know what the shapeAtOrigin() function does, while the first version is something that anyone can read and understand.

The problem with .shape() is that, 99% of the time, .shapeAtOrigin()-like behavior is all I need - I'm usually working with "multi-dimensional arrays" where each subarray is the same length, so I'm ok making this assumption and simply checking the length of the first element of each array. The difference between .shape() and .shapeAtOrigin() is only noticeable if I'm dealing with a nested array, whose subarrays may have different sizes. From what I can tell, .shape() will mostly just provide a convenient way to find the biggest length of a certain dimension, which will probably be occasionally useful, but I'm not I'm convinced that it's a common enough need to warrant a new method.

.nestedMap() seems like a really nice method to have, I've often needed this sort of thing. I wonder if it would be helpful, with some of these nested methods, to have the ability to specify a nesting depth. Usually, you're working with a fixed number of dimensions anyways, and adding the ability to specify a max-depth would allow these nested array to contain anything, including subarrays, without messing the logic up.

[['x', ['z', 'a'], 'yz'], ['a', 'bxy', 'c']]
  .nestedMap(x => x.length, { dimensions: 2 })
// That produces this:
[[1, 2, 2], [1, 3, 1]]

Note how .nestedMap() didn't traverse into ['z', 'a'] when mapping over the data, instead, because it knew it was only operating on two dimensions, it passed the entire ['z', 'a'] array as an argument.

.nestedForEach() looks like a helpful method as well. Even better would perhaps be to provide a function like .nestedEntries() that returns an iterator which yields a coordinate/value tuple or something, that way one can do nested iteration in a for-of loop.

.nestedSplit() and .nestedJoin() are interesting ideas, but it's not anything I'm too excited about. It feels to me that, if I ever have a scenario where I'm able to use a function like .nestedJoin(), it's more of a coincidence that I'm wanting to do a join on both the inner and outer dimensions.

The only time I've used .fill() is with the Array(length).fill(null) pattern, to create an array of a certain size (a pattern that can also be achieved via Array.from({ length }), () => null)). I've never used it on a pre-existing array, because I tend to avoid mutating my arrays. So, I personally wouldn't find .nestedFill() to be that useful, but maybe others would enjoy its use. Same thing goes for .nestedFillMap(), I'd rather receive a new array than mutate the current one.

The problem with .nestedIncludes(), .nestedFind(), .nestedFindLast(), .nestedSome(), and .nestedEvery() is that all of these behaviors can be achieved today by simply using .flat() first, e.g. yourArray.flat().includes(). It's true that there's a performance overhead for this sort of thing, but this could potentially be solved via the iterator-helpers proposal. It looks like that proposal doesn't currently provide an iterator.flat() function, but if we went and advocated for that, we would get the performance benefit of all these nested functions with a single new function (with .nestedFindLast() being the exception, that would need to be solved some other way).

As for .nestedIncludesFromLast(), .nestedSomeFromLast(), and .nestedEveryFromLast(), I'm not too gung-ho about introducing nested "fromLast" variants unless we have non-nested from-last versions of these functions as well.

Next, you have .nestedIndexOf(), .nestedLastIndexOf(), .nestedFindIndex(), and .nestedFindLastIndex(), which all seem pretty nice, and I would love to have functions like those available.

So, from your repository, the methods I'm seeing that I would care most about are as follows:

  • .buildShape()
  • .nestedMap()
  • .nestedForEach()
  • .nestedIndexOf()
  • .nestedLastIndexOf()
  • .nestedFindIndex()
  • .nestedFindLastIndex()

Perhaps others would be of a different opinion.

2 Likes

Let me add: I sometimes saw things like array.forEach(row => row.forEach(col => col.forEach(item => doSomethingWith(item)) while reading other’s code, that’s why I also included some utilities other than buildShape and shape.

As a simple example of the usage of .nestedSplit and .nestedJoin, I found these methods useful while splitting and joining a simple CSV-like string with [/\r?\n|\r/, ","] (not exactly though, as escapes are not handled).

I also occasionally saw someone saving rows of number data with a custom “pseudo-format” like

1,3,6;10,15,21,28;4,5,7,8,10
2,4,6;3,14,15,9,26;2,7,18,28

3,3,4;2,3,5,8,13;1,2,3,6,7,9

So these methods my help them build and break up the piece of string easily (in the above example, with ["\n\n", "\n", ";", ","] instead of string.split("\n\n").map(section => section.split("\n").map(row => row.split(";").map(part => part.split(",").map(number => +number))))).

1 Like

Well, I actually doubt whether all of the methods should be included in the proposal. And even if we are going to include all of them, we should separate them into 2 to 3 pieces of proposals.

Pinging @ljharb, @claudiameadows and @jschoi for more help and feedback; sorry for any disturbance.

@graphemecluster - your split example was pretty interesting. A chunk of code with that much nesting gets pretty gross-looking. I tried rewriting it using the upcoming pipeline operator, because the pipeline operator is supposed to help linearize this sort of code, but found that to be an incredibly gross solution as well. I'm realizing that a lot of the current-day solutions to these nested-array-processing methods suffer from this issue of having deeply-nested code that's awkward to read and write.

While I'd certainly love to see new methods to help with common scenarios, such as building a new multi-dimensional array, or mapping over one, I'm realizing that I'd also like to see a more-generic solution to help write readable, callback-heavy nested code. I opened up this issue on the pipeline proposal to propose a change that would help with the more general case of working with nested callback logic.

3 Likes

First, I'm glad your published package does not modify Array.prototype with nonstandard things :-)

Second, a proposal needs to start, first and foremost, with a problem statement - something that establishes the problem, so the committee can agree that it's a problem worth solving with changes to the language. I'm personally not super persuaded by this use case - i feel like it's a very uncommon one overall, despite likely being very common in specific fields (please withhold the inevitable outrage that statement will generate from those of you that do experience the problem frequently). Long before a possible solution is suggested, I'd want to see lots of concrete, non-contrived use cases, with examples of the code you'd need to achieve it now.

2 Likes

To echo what Jordan said, in my experience, I’ve had the most success with proposals when I first consolidate specific use cases from real-life open-source codebases. That’s what I’ve tried to do with all my current proposals (links go directly to specific sections):

These “real-world examples” sections are almost always the first section I write.

To give an example of a more preliminary proposal, Context blocks for JavaScript · GitHub is in really early days. It’s very unfinished, but even if I did nail down its precise specification, I wouldn’t dare present the proposal until I have as many examples from real-world codebases as I can. That’s why the Gist is starting out as a list of use cases—the problem space—while deferring shaping the solution to later.

You probably would want to try to do the same. Search for open-source codebases that may use these things.

I also recently presented and failed to achieve consensus for a grab bag of helper functions (notes/oct-28.md at f368c224708e2c8fa4a1a1647a008d3f11226bac · tc39/notes · GitHub). I was asked to split them up into separate proposals. These array helpers probably would need to be the same.

(My personal bandwidth is currently limited, so I myself would not be able to champion any of these in the near future; my apologies.)

2 Likes

So, @graphemecluster, perhaps the best shot here would be to try and focus on a single method we'd want the most. We could add at the end of the proposal a "potential future enhancements" section to describe how we can further take this sort of nesting-array logic in future proposals, similar to how the pattern matching proposal does it.

I could potentially help with some of the research, to see how much certain patterns get used in the wild, once we've decided on a single area of focus.

@jschoi - I see that many committee members understandably get bogged down with how much is currently on their plates to manage. I'm wondering if there's stuff we can do as a community to help carry some of that burden - probably not, I assume a lot of it has to do with moderation, attending meetings, writing up technical specifications (I have no skills in that), etc. But still, I think it's worth asking. (And, It's not that I'm hoping to specifically clear your plate, so you can help with this specific spec or anything.)

Yeah, like you said, there might not be much the community can do. The best case scenario for collaboration would be a TC39 member signaling that they are interested enough in a proposal to present it if it already existed, but that they do not the time to create it. Then a driven person outside of TC39 creating the explainer and doing the research to find real-world examples, and then the champion, if they so choose, would present it to plenary.

James DiGioia did something like that for the pipe operation some years ago. Before him, there was also collaboration between @gilbert and @littledan. See proposal-pipeline-operator/HISTORY.md at 51994df6ad9b3f2e8cfb85a5c07f604b04dfd99f · tc39/proposal-pipeline-operator · GitHub.

I don’t recall how the collaboration with @gilbert occurred (@littledan was always very open to community proposals—it was part of his full-time job in fact, as far as I recall)…Come to think of it, I also had been writing proposal explainers and specifications for the pipe operator as a community member, years before I joined a TC39 member organization—but @littledan was still the formal champion.

In any case, I would say that, in the end, it’s about presenting ideas to potential champions (TC39 members) and hoping that one would declare interest in championing it, if an explainer got written.

And to bootstrap that process, the best way I can think of would be to compile real-world examples. Though even then it’s not guaranteed, of course. Sorry if that is discouraging; I appreciate the offer.

1 Like

Sorry for being away from here for 4 weeks and thank you all for your comments!

I've only joined this forum for a moment, and it is my apologies if I'm causing any trouble for this community by doing something I am new to.

Now I do understand how important real-world examples are, but I'm bad at doing research on them, so I could only leave this job up to any volunteer here. But in case no one has the time, I would probably have to leave this proposal as is.

@ljharb, yes, though somehow specific, I believe they are common operations in building board games and academic utilites.

@theScottyJam, I'd like to focus on methods doing what buildShape, nestedMap and nestedForEach (along with nestedFill and nestedFillMap if possible, because they have a similar nature) do as a start.

@jschoi, no worries, you proposals are really worth reading and learning from. By the way it may sound like a weird question, but I am curious about where are you from because of your surname. (Please kindly ignore it if you feel uncomfortable saying it.)

Oh yes, I should have included the upToAxis (or name it maxDepth) parameter to all the methods except .buildShape(), .nestedSplit() and .nestedJoin().

For your example, most of the time people should be encouraged to use .nestedForEach() instead especially in functional programming (unless you need to do something after the inner for loop):

const [height, width] = map.shapeAtOrigin();
map.nestedForEach((item, [y, x]) => console.log(item, y / height, x / width));

Additionally there is one more reason to use the .shape() or .shapeAtOrigin() method: both map.length and map[0].length are computed multiple times in the top example but not in the bottom one (well, they are cheap operations though). You probably don't want to write const height = map.length, width = map[0].length;.

Update the package to include the maxDepth parameter, suggested by this comment: Proposal for Array.prototype methods for multi-dimensional arrays - #7 by theScottyJam.