stdLib proxies, for globalThis mutation protection.

Background and Objective

This idea is inspired by a use initial directive I had brought up previously. The "use initial" directive attempted to allow developers to write normal-looking JavaScript code, while the engine automatically applied some magic to make it safe from prototype/globalThis pollution. It was an interesting conversation, and I'm glad we explored the possibility space, but overall, it seems we would need a whole lot of magic and create a bunch of odd pitfalls in order to make it happen.

The purpose of this thread is to try and do something similar, but to loosen some of the restrictions I had placed on the "use initial" idea. There will still be a fair amount of magic involved in this idea, but the amount of magic required should be much less, after some simplifications have been applied. I am mostly trying to explore this possibility space to see how viable such a solution would be. I fully understand if people don't want to actually travel down this route.

So now, instead of trying to allow users to write globalThis-mutation safe code that looks as clean as normal JavaScript, the goal is to allow users to write globalThis-mutation safe code that looks as clean and concise as a language like ReScript.

In JavaScript, if I want to map over an array, I can just access the map function on the Array's prototype.

console.log([2, 3, 4].map(x => x + 1)

In ReScript, the array methods aren't found on a prototype, instead, you must access them statically on an Array2 namespace which in turn is on a Js namespace. It's a little extra verbose, but it's something many functional fans are used to - they're always dealing with static functions like this.

Js.log([2, 3, 4] -> Js.Array2.map(x => x + 1))

// or

module Array2 = Js.Array2
Js.log([2, 3, 4] -> Array2.map(x => x + 1))

If we accept that this level of verbosity is acceptable, then we can loosen some of the restrictions placed on the "use initial" directive, to come up with something that's much simpler. The end result should allow JavaScript developers to write robust code that's about at the verbosity level as an ReScript program and no more.


The proposal

This is a simplification of the apiStructure idea from "use initial" (we won't need any of the coerceKey nonsens I had previously talked about in that thread). I'm going to lean on the FixedMap proposal to make this work (a FixedMap is just an immutable map).

When I talk about an "API definition", I'll be referring to any set of nested FixedMaps that follow a particular interface. Specifically, it's a FixedMap that contains objects as keys and FixedMaps as values. These next-layer FixedMaps in turn contain property names as keys and anything as a value. JavaScript will ship with a default API definition (the "standard API definition") which can be found at globalThis.stdLibDefinition. The standard API definition describes all property lookup operations available within the language. For example:

stdLibDefinition.get(Object).get('toString') === Object.toString
stdLibDefinition.get(Math).get('max') === Math.max
stdLibDefinition.get(Array.prototype).get('map') === Array.prototype.map

Pollyfills can't mutate the stdLibDefinition, but they can create a new one from the existing one, and replace globalThis.stdLibDefinition with a new definition.

Next, we will now provide a function, createAPIProxy(), that takes in an APIDefinition as an input, and spits out a proxy object that mimics the property lookup, defined by the api definition.

For example, you can run "const stdLib = createAPIProxy(stdLibDefinition)" to receive a proxy that's centered on globalThis. If you then access stdLib.Array, the proxy will use the API definition to find what object is at stdLib.Array, and will give you back that object, wrapped in a proxy. Thus, if you do Array.prototype.map, a number of lookups will be done in the API definition, until you arrive at the map function, which will be given back to you, wrapped in a proxy. Only the property lookup will have special behaviors, every other operation will pass through to the value being wrapped by the proxy. For example, calling a function will behave as normal.

Putting this all together, and you'll end up with a solution that's almost as concise as the ReScript language (with the help of syntax from both the pipeline operator and bind this syntax)

const { Array } = createAPIProxy(stdLibDefinition)

const addOne = items => items
  |> %::Array.prototype.map(x => x + 1)

With one more addition, we can actually reach the same verbosity level that ReScript has. I'm going to introduce a simple -> operator. x->y would be the same as x.prototype.y.

const { Array } = createAPIProxy(stdLibDefinition)

const addOne = items => items
  |> %::Array->map(x => x + 1)

And there we go! That's not too different from the ReScript code.

module Array2 = Js.Array2

let addOne = (items) => items
  -> Array2.map(x => x + 1)

(Note that I could have chosen many functional languages as an example to compare against, I just chose ReScript, as that's a language I've toyed around with recently).

What this does

I mostly just explained the underlying machinery, now I want to explain what it does and why it works. Take this example:

const stdLib = createAPIProxy(stdLibDefinition)

const addOne = items => items
  |> %::stdLib.Array->map(x => x + 1)

globalThis.Array.prototype.map = () => 'I broke it!'
addOne(data) // [3, 4, 5]

The important thing from the above example is that the addOne function is immune to future globalThis pollution. Replacing globalThis.Array.prototype.map has no effect on addOne(), because it's using a proxy that relies on a frozen data structure for all of its property lookup actions. This means that, even though globalThis.Array.prototype.map was altered, the Array proxy had received no effect because it used stdLibDefinition for all of its lookup actions.

Is it worth it?

The main improvement this proposal provides over current day solutions to prototype pollution, is that it makes it so you don't need to individually pick off each standard library function you wish to use from the start. Instead, you can grab a stdLib proxy, and always find what you're looking for on that proxy.

Providing a giant, default stdLib definition may seem like a lot of work, but I believe it wouldn't be any more difficult than the current getIntrinsic proposal. Having a stdLib definition object would also supply most of the use case that the getIntrinsic proposal fulfills, which means this proposal could be viewed as a competing proposal to getIntrinsic. The only thing remaining would be the proxy magic (which isn't using any new behaviors, as such proxies can be made in userland as well), and the new -> operator, which I can easily understand if that doesn't gain much traction.

Is this deniable? I could see the value of it (with the actual global object exposed as a membrane proxy of it), provided it's still mutable for things like SES that have to deny access to things like Date.now and for polyfills to have something they can still work with (they'd just mutate this instead of globalThis). The use case would be dodging prototype-polluting libraries while still being able to use polyfills and such, though admittedly this isn't really all that compelling of a motivation.

Is this deniable?

Yes, it's deniable. At any point in time, a polyfill could replace the built-in stdLibDefinition object with a custom one that has the polyfill included. Any code that runs afterwards that uses createAPIProxy(stdLibDefinition) would then automatically receive the polyfill. Any code that ran below won't have that polyfill.

The use case would be dodging prototype-polluting libraries while still being able to use polyfills and such, though admittedly this isn't really all that compelling of a motivation.

This is certainly reasonable. I'm was mostly trying to explore what it would look like if we were to try and satisfy this use case, and perhaps, through discussion, we can arrive at a simpler solution, but maybe that won't happen. Certainly, I understand if the language authors are not willing to actually implement this sort of complexity.