'use initial' directive, for globalThis mutation protection

rdking · November 8, 2021, 3:04pm

@ljharb
Sure it would be easy to leak, but since when it is incumbent on the engine to deter sloppy programmers from being sloppy? I've asked this question before. It is a programmer's job to be precise about their intentions. Having the language police the developer is an awkward concept to me. But then again, maybe I'm just too old of a programmer.

In either case, I don't see that as an issue. The same goes for use of array literal syntax. That'd be an obvious and highly visible mistake, just like leaking the copied constructors. The leaks could be easily plugged if the constructors and objects returned from getIntrinsic() are frozen. It would be impossible to modify them, and as such pose no risk of losing robustness.

As for the memory and performance overhead, that's minimal since the engine would only need to keep one frozen copy of the API per engine context. It's not as though the engine needs to create new copies on the fly at every request. That's just the most naive approach to doing this.

@theScottyJam

That's true IFF arrayOfLength() did some form of strict identity testing against the constructor or prototype. Kind of a pointless check, but I wouldn't put it past anyone to do. Otherwise, an array created with _Array would function identically to one created with Array. So no issue there. See above about freezing the clean-room copy and protecting it from mutation.

The generate() function could get around the parameter problem by substituting out the prototype of the passed in array with the clean room copy, performing it's operations, then replacing the original potentially monkey patched prototype. Or it could just copy the data into a clean _Array instance.

No matter the problem you throw, there's a similar solution. The hiccup here is mostly just ergonomics. Mind you that I don't place a particularly high value on ergonomics. I'm more of a function-over-form person. Make it work first. Then make the UX good. When faced with the idea that engine developers frown on the idea of adding more directives to the language, you've got to go for the next best thing. The functionality is needed, so why not go for a solution that gives you the best chance at acceptance while improving the ergonomics as much as reasonably possible?

theScottyJam · November 8, 2021, 3:16pm

Well, it already works. Picking of functions from the prototype, call-binding them, then using those picked-off functions. Everything we're trying to add is entirely ergonomic-related and nothing more. So, if the proposed solution is only a little less tedious than what we have today, and if it introduces additional pitfalls to be aware of, then it's not all that great of a solution.

Yeah, this would be a solution to the problems I was posing earlier.

The only worry I would have is that libraries that care enough to be robust against global mutations, probably care deeply about backwards compatibility as well, and would likely view return instances with frozen prototypes as non-backwards-compatable. You could imagine an end-user who tried to polifill some missing Array methods at first-load, who would be surprised when they're unable to use their polyfilled methods on the return values of your API.

I also now prefer @aclaymore built-time step idea over my directive idea. I think the problem I was trying to solve could easily be solved at built-time without loosing much. In fact, it could even be improved upon - if the type system knows your function only accepts an array, it could automatically add an assertion that the parameter really is an array (something my solution would need anyways, so you don't start calling .map() on a non-array, normal object that the user passed in).

rdking · November 8, 2021, 3:44pm

Likewise, if the engine developers refuse to add another directive, "then it's not all that great of a solution."

Yes, I can. To that I'd say "Mission Accomplished!" Why? Because by forcefully using the original API despite someone polyfilling something, you've declared the intent to ignore all such polyfills. This goes back to what I said before about the urge to protect sloppy programmers from themselves. I don't get it. Nor do I see the need. I'll spare us all the philosophical rant about this.

That has its place, but it's not solving the same problem, just a loosely related one. If you're not using some kind of transpiler, there's no such thing as build time. So you're talking about something that isn't going to be part of ES in that case.

claudiameadows · November 17, 2021, 8:43am

It's worth mentioning that GitHub - tc39/proposal-ses: Draft proposal for SES (Secure EcmaScript) specifically requires globals to be deniable to be polyfilled, and further restrictions placed on it likewise will require globals to remain deniable and transparently overwritable. And I've seen in the TC39 meeting notes numerous proposals down this vein die because reconciling them with the requirements of SES proved too difficult.

mhofman · November 18, 2021, 2:48am

That would mean virtualization being denied, which is a non-starter.

It's a layering question. Code that runs first needs to be able to mutate the environment such that code that follows cannot observe what was the environment before it starts running. Mutation includes removing or changing anything that is available on the global object, and these mutations should be inescapable.

getIntrinsic does not violate virtualization by allowing to itself be virtualized. I would love to find a mechanism that improves the developer ergonomics of using the state of the environment when their code started running, but it cannot break the virtualization requirement. It probably would have to look like an easier way to dispatch captured functions onto objects.

If you don't care about performance, you could write a helper that proxies a target, and only invokes the intrinsics of the same name, but it'd have to behave as a sort of membrane to work properly.

theScottyJam · November 18, 2021, 3:32am

Thanks everyone for the input.

I think I might have an idea on how this virtualization problem can be solved.

Instead of using a straight directive, what if we used a "parameterized directive", something akin to what the operator overload proposal is doing.

with stdLib from globalThis.getIntrinsic

// This code is now safe
export function addOne(items) {
  if (!Array.isArray(items)) throw new TypeError('Must pass in an array')
  return array.map(x => x + 1)
}

The "with stdLib from <expression>" statement will cause the code block to enter "use initial" mode. It takes, as a parameter, the getInstrincis function. Any time you try to access a property from globalThis, or any built-in, JavaScript will first check getIntrinsic for this value. If getIntrinsic says the value does not exist, then JavaScript will fall back to normal property lookup behavior. If getIntrinsic() does return something, then that's the value that JavaScript will give back from the property lookup.

Thus, the "with stdLib" statement would be fully customizable. First-run code can replace getIntrinsic with a different function to customize what gets returned. Later-run code can easily capture the current state of the available standard library by grabbing useIntrinsic, and easily use that snapshot with this new statement. Even @ljharb's polyfills would be able to work with this, as the first polyfill that loads can share the snapshot of useIntrinsic with other polyfills, and then they all can parameterize this with-stdLib statement with the snapshot (the 'use initial' directive as originally proposed could not fulfill @ljharb's use case).

It's possible that getIntrinsic isn't the best option for this sort of parameter, and that we may want to come up with some new, default function or frozen object or something that can be used to parameterize this statement in a customizable way.

mhofman · November 18, 2021, 4:17am

What does this mean?

Let's start with "property from globalThis".
I'll assume you mean using the identifier Array. How is this different than creating a new scope with a regular with to shadow the global (minus the this context horror)?

Now let's move to "access any built-in", and take your example of map.

I'll assume you mean when calling a prototype method of an object, for which the prototype object is registered in the intrinsics? Let's assume that is how you identify an instance of an intrinsic. If your prototype is not registered, it means what you have is another type of object, or a derived object, e.g. if Array was replaced by something completely different, or the Array constructor of another realm.

// `hasIntrinsic(foo)` such that `getIntrinsic(ident) === foo` for some `ident` value
hasIntrinsic(Object.getPrototypeOf(items));

I think it's worth mentioning that the intrinsics returned by the primordial getIntrinsics are realm-local. In your example if the provided items is an array instance of another realm, you will pass Array.isArray but won't find it's an intrinsic because of the realm mismatch. items instanceof Array might be more appropriate in this case.

Now that you've identified an instance of an intrinsic, how should it do the map property lookup? I'm gonna assume that if the prototype is registered as %Array.prototype%, it would lookup %Array.prototype.map%?

This could work, but it's fairly intrusive, and I'm not sure it'd be worth the cost. It seems even worse dynamic behavior than what a basic with allows. And there are other cases it wouldn't catch, like the derived objects case.


class MyArray extends Array {
  map(...args) {
    console.log('Gotcha', ...args);
    return super.map(...args);
  }
}

new MyArray() instanceof Array; // true

The prototype of the derived class can always override properties of the ancestor.

theScottyJam · November 18, 2021, 5:36am

It'll mostly use the semantics I shared from my first post, but using the value provided to "with stdLib" for all property lookups. I'll try to explain the import bits here.

This is a good point. I leaned on getIntrinsic() to explain how it could work, but I do think it would be better to parameterize it with some sort of frozen, static object. There could be multiple ways of doing this, but I think a straightforward option would be to use unchangeable maps (which aren't currently a thing, but there is a proposal out there).

with stdLib from new FixedMap([
  [globalThis, new FixedMap([
    ['Object', Object],
  ])],
  [Object, new FixedMap([
    ['toString', () => 'myCustomToString'],
  ])],
])

Here, we are declaring that, in this scope, we want to have a very minimal standard library. In fact, there's only going to be one thing available to globalThis with certainty, and that's Object. And there's only one thing available with certainty on Object, and that's a custom toString method we provide. (other properties could be found on globalThis or Object, but only via the fallback, normal property lookup). In the same block, we can then execute this chunk of code:

Object.toString() // 'myCustomToString'

Internally, when JavaScript sees that we're referencing some global called Object, it'll first look for it within the frozen structure we provided. It'll do so by basically executing this:

theFrozenStructure.get(globalThis)?.get('Object')

It'll see that a value got returned, so that's what it'll provide. Then, when you access .toString() on Object, it'll again do this:

theFrozenStructure.get(Object)?.get('toString')

to get back the custom toString function. At which point, you can call the toString function.

The pattern is this. Whenever you want to do a property lookup, JavaScript will first execute something like this:

theFrozenStructure.get(<the object you're operating on>)?.get(<the property name>)

and if a value is returned, that's what will be given back to the executing code. Otherwise, it'll fall back to a normal property lookup.

A default frozen structure will be provided to every enviornment, say, under globalThis.apiStructure. Most people will generally use the new "with stdLib" with the default implementation, like so "with stdLib from apiStructure". Polyfills are unable to mutate the apiStructure, but they can clone pieces to construct a new apiStructure object, that they can then use to replace globalThis.apiStructure.

This sort of system should hopefully be easier to optimize, and should prevent property lookup logic from getting overly dynamic.

When it comes to properties on globalThis, it is no different than using "with". Except for the fact that "with" would only provide protection for one layer deep, which isn't much protection. The "with api" statement would allow you to safely do stuff such as "Math.max(2, 3)" as well, because it's capable of safely finding the original implementation of max (as provided to "with api"), even it was overwritten later on.

hmm, this is a good point I didn't think about. You're right that instanceof would be the better option in that example. But, that would also be an unfortunate limitation if these "safe" functions can only support data generated from the same realm.

It should be possible to work around this issue though. One solution would be with the collection normalization proposal, which lets us coerce map keys. Let's say, by default, the FixedMaps that gets used utilize a special key-normalization function to normalize all objects from any realm to unique sentinels. All Array objects, from all realms, will normalize to the same sentinel. And all Math objects, from any realm, would also be normalized to a unique sentinel. Every built-in would normalize to some unique sentinel. That way, when a property lookup happens with a particular value from some realm, the correct thing will always be given back. Or, something like that.

Some sort of normalization thing would need to happen anyways, in order to make it so Array subtypes will match up with the Array class in this lookup logic, which I'll discuss next. (Perhaps FixedMap() wasn't the best choice for describing the shape of this apiStructure, but it works).

The prototype of the derived class can always override properties of the ancestor.

So, the way I'm proposing this, you won't be able to override built-in methods with a derived class, when you're in a "with stdLib" block. This means, for example, that ({ toString: () => 'abc' }).toString() === '[object Object]' while String({ toString: () => 'abc' }) === 'abc' (though, we could perhaps make an exception for the object prototype, but I would vote that we don't). It has to be this way, otherwise this proposal won't be providing any protection. Someone can just subclass the expected object, customize whatever they want, pass it in, and get past all of the defenses.

I think this will be the number one thing people would have to be aware of when they use "using stdLib", and it's also why "using stdLib" isn't intended for the general public. It's only meant for those who really need protection from globalThis pollution, but still want to code with JavaScript in a natural way. Note that this does not mean that the updated map function in your example is completely inaccessible, it just means you deliberately have to pull back the curtains to get to it, e.g. by doing Object.getPrototypeOf(yourSubClassedArray).map().

I've had another thought. I originally proposed the 'use initial' idea, so that if the property being looked up isn't a built-in property, it falls back to doing a normal property lookup. Part of the reason I did this was so that new polyfills could still be applied, but this sort of thing isn't really needed anymore. I think the complexity of this idea can be reduced if we took out this fallback action. If you have an object from stdLib (i.e. an object that found within the provided stdStructure), then property lookup will always happen via this special, safe way. If you don't, then a normal property lookup action will always be performed. This basically means that "with stdLib" literally sets the standard library for you. Whatever you provide to it becomes the globals you work with. If someone from outside passes you an array, and the standard library you set only has a flatMap() function, then the only thing you can do with that array is call flatMap().

Under this scenario, we would certainly have to make an exception for Object. Either property lookups for Object will have a fallback for doing normal property lookup, or, you simply can't customize the functions available on Object via "use stdLib".

ljharb · November 18, 2021, 6:22am

I really don't understand the benefits here. It is not desirable for individual modules to have different sets of globals. First-run code sets up the environment, and everything else relies on it - why does it need to be opted into per-module or per-scope?

mhofman · November 18, 2021, 6:26am

My dynamic behavior comment was about the "intercept" mechanism itself, not the way the intercept mechanism is configured. I used getIntrinsic as basis but I agree it's not appropriate, and a system that allows interleaving of user code in lookup would be a non-starter in my book.

Right but how would you discover identities of other realm intrinsics in the first place? Thankfully legacy realms are only available through host APIs, and ShadowRealm won't allow passing around objects, so there would be no identity discontinuity.

I don't follow. Is the normalizer some user code? That would allow interleaving.

"normalize all objects from any realm" doesn't mean much. An object on its own is not "from a realm". Functions however are linked to their realm. An object is initially accessible in the realm of the function that created it, until it is passed around (to simplify). My point is that unless you have access to the "frozen map" of the other realm, you can't recognize these objects, and then if you did, you'd have to somehow map that to the equivalent entry in the local frozen map.

Since we're dreaming about a way to interject in property lookup of objects without a proxy mechanism, maybe there could also be a way to request a "no-cross-realm call" guard.

Define built-in again. The mechanism you're proposing works through a frozen map that defines which object identities should be interjected for property get. From its point of view, there is no "built-in", and there can never be such an implicit recognition to support virtualization.

Then I don't understand how the mechanism work. Can you clarify at what point a method of an object is recognized as "overriding a built-in"?

That's not entirely true. I does protect objects created by a library against prototype pollution happening after the library is loaded declared. That's still valuable. Is it more valuable than the cost of this mechanism, I doubt, but it's an interesting thought experiment. Of course no mechanism will protect you against calling methods on foreign objects, but it never could. There is no way for example any proposal will ever be allowed to pierce and bypass a proxy.

Once again, this does not mean anything. The only thing you can check is object identities of an object or some object on its prototype chain, and redirect the property get in case of a match.

mhofman · November 18, 2021, 6:29am

I believe the goal is to write defensive code against modifications to the environment after your code has loaded, while still writing that code the "normal way", without having to uncurry every single methods.

theScottyJam · November 18, 2021, 7:28am

It certainly is. I see that it's, unfortunately, becoming increasingly more complex, but it's still interesting to try and figure out what it would look like if we were to try and add such a feature to JavaScript. I'm also realizing that there's pieces of "normal JavaScript" that I'm trying to preserve, that perhaps don't really need preserving. I'm trying to work out if I can come up with a simpler formulation that's still friendly to use, but doesn't try as hard to preserve the normal parts of JavaScript. We'll see if I can get anywhere with that.

Sorry, I'm probably flip-flopping definitions as I go along. I'll try to stick to this terminology from now on:

starting built-ins: The built-ins that the host provides, when your script first loads.
stdLib: the set of "built-ins" you supplied with "with stdLib". Normally these are the same as the starting built-ins, but it could contain polyfills and what-not as well.

In that particular comment you were referencing, I was talking about "starting builtins". The initial definition of apiStructure would contain FixedMaps with key-normalization functions that normalize starting built-ins to unique sentinels. Any other values would stay as-is after the normalization (user-defined values don't need cross-realm protection). And, perhaps we also expose these normalization functions, so that end-users can construct their own special FixedMaps when they're trying to add polyfills.

Something along those lines anyways.

If, for example, each starting global is embedded with a hidden field that contains a unique, identitying sentinel. Any array, no matter which realm it was created from, would have this sentinel embedded into it. Array.isArray()'s implementation would simply be to check if the object passed in had the array sentinel.

A starting Array builtin from one realm would contain the same hidden sentinel as a starting Array builtin from another reaml. Same is true for all starting builtins. Thus, this special normalization function would simply need to pull out the hidden sentinel if it exists, and use that as the map key.

I'm not sure I fully understand this question either :p. But, let me try and lay out a more concrete algorithm, and hopefully, that can help with the confusion.

/*
Property lookup algorithm sketch (when you're within a "with stdLib" block)

Definitions:
* stdLib: The value received by "with stdLib".
* obj: The object in which we're performing the property lookup on
* key: The string we're trying to lookup

For example: Math.max will first do a lookup, with globalThis as the obj and the string "Math" as the key, then it will do another lookup on the resulting object, with that resulting object as the obj, and "max" as the key.
*/

/* -- algorithm -- */

function specialPropertyLookup(stdLib, obj, key) {
  let ancestor = obj
  while (true) {
    if (stdLib.has(ancestor)) {
      break
    }
    ancestor = Object.getPrototypeOf(ancestor)
    if (ancestor == null) return obj[key] // Fall back to normal property lookup
  }

  const stdLibEntry = stdLib.get(ancestor)
  if (stdLibEntry.has(key) {
    return stdLibEntry.get(key)
  } else {
    return obj[key] // Fall back to normal property lookup
  }
}
// If we're applying the last snippet I gave from the previous post, that
// talked about not using a fallback, then simply change any locations
// in the above algorithm that have `return obj[key]` to `return undefined`.
// We would also need to add an exception for anything that directly inherits
// from Object (and not from another built-in as well), and let those fall back
// to normal behavior.

The FixedMaps of stdLib will by default use the following normalization function:

/*
Definitions:
* @@sentinel: Refers to the private field that holds a unique
    sentinel for each starting global, such that
    realm1GlobalThis.Array[@@sentinel] === realm2GlobalThis.Array[@@sentinel]

By default, this coerceKey function will be applied to the first layer of FixedMaps
on the default-provided apiStructure object. (the inner maps
don't need coercion, because they use string keys). Some form of this function
could be publically exposed, so that others may polyfill the apiStructure object.
*/

function coerceKey(key) {
  if (key[@@sentinel]) {
    // This is a starting built-in. Use it's hidden sentinel.
    return key[@@sentinel]
  }
  // This is a user-defined object. It does not need to be safe across realms.
  // No coersion necessary.
  return key
}

I'd be happy to continue to clarify details of how I see that this could work - like you said, it's an interesting thought experiment that shows us what it would take to make this sort of thing possible - turns out, a whole lot. But, as I mentioned at the start, I'm thinking it might be possible to come up with a simpler formulation if we're willing to sacrifice just a little, which I'll probably post in a new thread if I'm able to work out the details.

ljharb · November 18, 2021, 7:43am

I personally think the path to that is getIntrinsic plus the bind-this proposal (or pipeline), both of which are useful beyond this use case. Adding a ton of new syntax solely for this use case doesn't seem well-motivated.

mhofman · November 18, 2021, 7:50am

Agreed and I said as much earlier. A "contextual dispatch override", which is basically this proposed mechanism, would be very complex and narrowly motivated, all to minimize the amount of code changes. An "explicit dispatch" proposal would probably require new syntax at the call site but would cover more use cases.

theScottyJam · November 18, 2021, 4:26pm

Ok, well I think it was good that we explored this topic. Thanks everyone who helped contribute. And, @rdking, you were right that this was just overly complicated. I agree that this should idea should not be pursued. But, it was still interesting to explore, and it sparked some other ideas.

I made a new topic over here to explore something similar, but with some of the restrictions removed that I was placing on "use initial". It's very possible that this idea does not go far either, but I think it's still worth the exploration.

Topic		Replies	Views
stdLib proxies, for globalThis mutation protection. 💡 Ideas	2	429	November 30, 2021
New built-in: Sandbox 💡 Ideas proposal	5	368	December 3, 2020
Sandboxed / Scoped function 💡 Ideas proposal	7	532	January 17, 2020
private! protected! const! JSON parse directives 💡 Ideas proposal	4	414	October 23, 2021
a "use set" directive to reverse the effects of class fields [[Define]]? 💡 Ideas	9	353	December 5, 2023

'use initial' directive, for globalThis mutation protection

Related topics