'use initial' directive, for globalThis mutation protection

This is a proposal to add a new directive, 'use initial', to JavaScript. It's goal is to help people write code that's robust against globalThis mutation, through the means of a bit of magic. It works like this.

Say at some point, you've replaced Math.max with the function () => 2. As soon as you enter a scope with a 'use initial' directive, any modifications to globalThis will seem to temporarily disappear. While you're in that block, Math.max will get "shadowed" by the engine with the real max function. As soon as you leave the scope, Math.max will go back to how it was before.

function getMax(x, y) {
  'use initial'
  // The modified max function is being shadowed by
  // the real max function.
  return Math.max(x, y)
}

Math.max = () => 2
console.log(getMax(2, 3)) // 3
Math.max(2, 3) // 2

If you replace the entire Math object on globalThis with something else, then the original Math object would likewise reappear within a 'use initial' scope.

function getMax(x, y) {
  'use initial'
  return Math.max(x, y)
}

globalThis.Math = {}
console.log(getMax(2, 3)) // 3

You can even mutate prototypes, and the initial functions would be accessible from within a 'use initial' scope.

function addOne(value) {
  'use initial'
  if (!Array.isArray(value)) throw new Error('Give me an array please.')
  return value.map(x => x + 1)
}

Array.prototype.map = () => []
addOne([3, 4]) // [4, 5]
addOne([3, 4]).map(x => x + 1) // []

// It'll even protect you if someone's using the prototype to shadow a built-in
const data = [3, 4]
data.map = () => []
addOne(data) // [4, 5]

class MyArray extends Array {
  // This function is on the prototype
  map() {
    return []
  }
  // This function is on the instance
  map = () => []
}
const data = new MyArray(3, 4)
// Still works
addOne(data) // [4, 5]

All of this demonstrates that replacing a built-in property with a custom one will cause the custom property to be "shadowed" by the built-in one for the duration of this special scope. However, if your customization does not replace any built-in values, that customization will persist.

// In this case, the "x" property is still accessible
// (it's not shadowing a built-in on Object.prototype)
// but the toString() function will become shadowed by the built-in.
doStuff({
  x: 2,
  toString: () => 'a string'
})

function doStuff(data) {
  'use initial'
  console.log(data.toString()) // '[object Object]'
  // The String() constructor doesn't have 'use initial' applied to it,
  // so it's able to see the modified toString() function.
  console.log(String(data)) // 'a string'
  return data.x
}

The ability to access properties that aren't shadowing built-ins, as demonstrated above, is important. It allows you to, for example, add new features via polyfills, use anything that inherits from Object (or any built-in class) in a normal way, and even add custom properties to functions like people will sometimes do.

Note that 'use initial' only really applies to direct property access via "." and maybe brackets "[]". Lots of other stuff will remain unaffected. This means it's still possible to get access to the functions being shadowed. For example:

function doStuff(data) {
  'use initial'
  Object(array).map() // 2
  Object.entries(array) // ['map', <function>]
}

const array = []
array.map = () => 2
doStuff(array)

This idea also gives you free brand checking. For example, I believe this would allow you to robustly check if something is a Map.

function isMap(value) {
  'use initial'
  return value instanceof Object && value.toString() === '[object Map]'
}

Anything that's an instance of Object can not shadow their own toString() function within a 'use initial' scope, so you know when you call .toString(), you're using a native toString function, thus, you know you can trust it's output to not be forged.

I realize there's a high bar for adding new directives, and I realize there's a fair amount of magic going on in this proposal, but I'm sure all of those people out there who use JavaScript in an extreme fashion, and who write horrendously ugly code in order to prevent prototype mutations from affecting their code would appreciate a directive like this. A directive that would let them write JavaScript in a much more normal fashion, while getting the protection they want. It will also make it much easier for other people to start protecting their libraries in a similar fashion.

3 Likes

Yeah, I can see this getting shot down for both being directive-based when TC39 has an embargo on directives, and for being heavier than a ton of bricks since the most effective way of doing this would be to duplicate the original prototype objects and let those duplicates inherit the existing, potentially modified prototypes and replace the prototype entries of all of the built-ins. Just calling such a function would be expensive. On top of this, calls to other functions from such a function may be expecting to receive the modified tree. If this directive removes the modification for every non-directive call and restores the modification on return, it's even more expensive.

While interesting, how about we give them something the engineers won't "die on a hill" over not implementing? This is why I mentioned an API for this. Not a directive or an alternate set of classes, but a few functions:

//Return the unmodified Map class
let OMap = globalThis.getOriginal("Map");

I was thinking more along these lines, something that can look up the original definition of a given class whether native or userland (as long as the userland class was registered).

This is exactly what the ShadowRealms proposal is trying to solve; It is at stage 3, the viability of an alternative / inline solution such as this remains to be seen.

@rdking

I don't think the performance has to be that bad.

Here's a performant way in which this proposal can be accomplished.

First, tag all built-in objects with a hidden, unique id.

Then, using the original standard library, prepare a lookup table. You should be able to plug in a built-in object's id and the property name being accessed into the lookup table, and get back out the id of another built-in object/property if that property existed. This lookup table can be a static artifact that the engine reads from a file or something everytime it starts up, since it never needs to change between execution.

Finally, we'll need to change the property-access algorithm while you're within a 'use initial' region to work as follows:

  1. walk up the prototype chain until you hit a built-in object (caching this information on each object could speed this step up, if that's really wanted)
  2. Plug that object's unique id and the name of the property being looked up into the lookup table.
  3. If the table returned the id of another built-in value, then go ahead and find that built-in and return it.
  4. If not, perform a normal property lookup.

Such a system shouldn't be much slower than the vtable system that C++ uses (I think, I actually know very little about their vtable).

@jithujoshyjy

It's true that the shadow-realm proposal would help with this as well, but, it feels like that proposal is more geared towards trying to sandbox someone else's code, not trying to sandbox your own. Though, perhaps it could be used to shield yourself as well. You do get the unfortunate problem that any arrays made within a ShadowRealm is a completely different array type from one made outside. And, I believe you still wouldn't be able to trust that an array someone passed to you into the realm would actually have well-behaving properties like array.map() as you would expect, those outside the realm could have screwed up their own intrinsics, and then pass them on into your realm. I think that's how it works anyways, the specific details of the shadow-realm are still a bit fuzzy to me.

You might be interested in @ljharb 's getIntrinsic proposal: GitHub - ljharb/proposal-get-intrinsic: EcmaScript language proposal for a way to get intrinsics.

Seems to me that what you're proposing doesn't meet the ideal you described before. If you're injecting another lookup between [[Get]](P, Receiver) 2 & 3 and also between 4.a & 4.b to check for the presence of this new unique id, returning the corresponding value if present as long as use initial is in play, then you haven't shadowed the existing property value. You've replaced it as there wouldn't be a means to retrieve the existing value.

By comparison, vtables in C++ are much faster as they do not have as many conditions and lookups required to retrieve the final value. The real problem here is trying to maintain shadowing. The approach you're taking is a slower version of temporarily replacing the value returned by a given property. Since you only get 1 property value per property per object, the only way to shadow is to have the original value accessible through a prototype.

Your algorithm could be accomplished using a Proxy over the original class. Now imagine the scenario when you don't have access to the original and are relying on the Proxy to return you either the current or the original value based on a flag that once set is irrevocable per function. That's the scenario you're presenting.

Yeah, I was using the word "shadow" a little looser than what you're describing. I didn't mean for a literal new prototype link to suddenly appear in the front of each object in existence every time you entered in one of these functions. However, the concept still fits a looser form of the word "shadow". When you're in a 'use initial' scope and you perform a property access, it's as if the built-in prototypes jump first in line when the prototype chain is walked up. The built-in prototypes will "shadow" whatever user-defined overrides you made. Those user-defined overrides are still there, they didn't get kicked off of the object and replaced by something else. Instead, they're just being "shadowed" during the property lookup process, which becomes apparent because you can indeed get behind that shadow, it's just done differently than how you normally would in a normal prototype setup.

const obj = {
  toString() {
    return '[my object]'
  }

  callToString() {
    this.toString()
  }
}

;(function() {
  'use initial'
  obj.toString() // '[object Object]'
  obj.callToString() // '[my object]'
  new Map(Object.entries(obj)).get('toString')() // [my object]
})()

So yes, this isn't how "shadowing" normally works, and perhaps there was a better word I could use, but that's the best word I could think of. And you're right, proxies could also work.

I do want to address a concern @ljharb brought up in an unrelated thread, but applies to this thread as well.

That would break a ton of security-conscious environments that depend on being able to DENY access to builtins to code that runs after it.

So how can this be dealt with if a secure environment wants to prohibit access to certain built-in functions? I'm not 100% sure on the use cases of this, but here are some options.

  • Have them use a library, like acorn, that's able to parse the code they're about to execute and remove any 'use initial' directives. It's not a great solution, but it technically works.
  • Make the directive dynamic and replacable. e.g. add .useInitial() to a function's prototype. Now, first-run code can replace .useInitial() with a function that silently does nothing, or throws. This unfortunately makes it so 'use initial' is not statically analyzable, but it sort of already isn't if we're introducing some system to turn it off. This solution would also be a little awkward to use.
  • An import assertion or something that can literally turn off 'use initial' on the imported module and all of its dependencies. This creates awkward problems if a module with and without 'use initial' permissions try to import the same module, I don't think there's a clean way to solve that issue.
  • Perhaps the cleanest solution would be to provide a shadow realm option that lets you turn off this directive within the realm. Then, you can just run this less-trusted code within a shadow realm. "Turn off" can either mean "throw an error when it gets used", or "silently pretend the directive is being used", I assume the throw-an-error option would be better, but I'm not certain.

Here's an even cleaner solution: provide an ES API function that allows a caller to disable other builtin API functions. By disable, I mean replace them with undefined or a function that throws.

globalThis.disableAPI(owner, fnName) {
   let retval = owner.prototype[fnName];
   function disabledFn() {
      throw new ReferenceError(`This function ('${fnName}') has been disabled.`);
   }
   function isOriginal(fn) {
      return !fn.name.startsWith('bound ') && fn.toString().endsWith('{ [native code] }');
   }
   Object.defineProperty(disabledFn, Symbol.disabled, { value: true });
   if (isOriginal(owner) && isOriginal(retval)) {
      owner.prototype = disabledFn;
   }
   else {
      retval = undefined;
   }

   return retval;
}

Something like this implemented in engine would disable the call for all successive callers. The catch is that it would have to work in concert with something like 'use initial' such that use initial would not be allowed to return the original of such disabled functions. It would do so by checking if Symbol.disabled has been set on the method. This not only gives the engine a simple, shim-able way of disabling methods, but also gives developers a way of doing the same for registered APIs under my alternative approach.

Regardless, the approach for denying access to ES API built-ins should be something generic. It should probably have a proposal all its own and be applicable regardless of any new feature added. BTW, I set it so the disableAPI function returns the original of the function that was disabled. This way the caller still has access to the function if desired.

Hmm, that's an interesting idea. Instead of providing a way to turn off the entire 'use initial' directive, we provide ways to turn off individual parts of the whole API, and let that affect 'use initial'. I think something like this would work better, it would mean people are always allowed to use the 'use initial' directive, there's just certain parts of the global API they're never allowed to use, even with 'use initial'.

The only way to deal with it is the same way that currently exists - first-run code must modify the environment so that later-run code is denied access (to whichever deniable things the first-run code wants).

In other words, later-run code does not, and can never, have any guarantees about the state of deniable things, including the constructor for an iframe or a shadow realm.

This is discussed in the readme of https://github.com/tc39-transfer/proposal-get-intrinsic

1 Like

hmm, ok. So, I'll also try to rope some of your related comments from another thread into here to find a solution that could work.

I think the most straightforward solution would be to have a special, exotic object exist on globalThis, say, globalThis.useInitialDirectiveAllowed. When a module is first loaded, if 'use initial' is present, it'll check for the existence of this exotic object on globalThis. If it exists, then useInitial will work as expected. If it does not exist, then either an error will be thrown during load time, or the 'use initial' directives will be silently ignored.

Anyone who's written first-run code that white-lists certain globals and auto-deletes anything it does not recognize will continue to work as expected. This exotic object would automatically be deleted, and no later-run code would have access to non-white-listed built-ins.

This is a blunt solution, but hey, it works. It also frees us to brainstorm an additional, "finer-comb" solution if wanted, that let's people disable individual APIs on globalThis (that will be disabled, even when 'use initial' is used) such as @rdking proposed, without fear of breaking backwards compatibility.

Isn't that an awful lot of implementation for something that essentially should be single use? The only reason to have a module use a directive is if you want the whole module to be subjected to the limitations of that directive. In the case of 'use strict', you're only going to want to do that if you want to ensure that the ES API you're using is the cleanest copy of it available. Most of the modules that do this are ones that are being proactively defensive about what later modules might do to the public API interfaces (monkey patching and the like). Most of the ones that remain that do this are intentionally trying to control what later modules can do.

For both those categories, a directive is overkill. Further, having to add some new exotic object for this one-off case is also excessive and can easily be mitigated by the presence and/or absence of a useInitial() function. If it's there, you can call it. If it's not, someone has restricted access. Same effect, without the engine developers complaining about not wanting any new directives. This is why I suggested this be an API and not a directive.

Well, the goal is to try and make this sort of defensive programming more ergonomic. An API wouldn't help at all with this.

Let's play around with a concrete example for a bit, and pretend we fall in the category of "being proactively defensive against global mutations". Here's a bit of code I wrote a little while ago that simply generates a random string of a specific shape. (And, excuse me sparse-array haters for my use of the array constructor :p)

const CHOICES = (
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
  .replace(/[l1oO0]/g, '') // Some characters removed for readability
)

const arrayOfLength = length => new Array(length).fill()

export function generate() {
  const genChar = () => CHOICES[Math.floor(Math.random() * CHOICES.length)]
  return '§' + arrayOfLength(5).map(genChar).join('')
}

Now, let's see how I would rewrite this today, if I was trying to make it robust against global mutations.

const callBind = fn => fn.call.bind(fn)
const $Array = Array
const arrayFill = callBind(Array.prototype.fill)
const arrayMap = callBind(Array.prototype.map)
const arrayJoin = callBind(Array.prototype.join)
const random = Math.random
const floor = Math.floor

const CHOICES = (
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
  .replace(/[l1oO0]/g, '') // Some characters removed for readability
)

const arrayOfLength = length => arrayFill(new $Array(length))

export function generate() {
  const genChar = () => CHOICES[floor(random() * CHOICES.length)]
  return '§' + arrayJoin(arrayMap(arrayOfLength(5), genChar), '')
}

Eww...

Ok, let's try adding an API that lets you explicitly get a fresh function, like what @ljharb is proposing.

const callBind = fn => fn.call.bind(fn)
const $Array = getIntrinsic('%Array%')
const arrayFill = callBind(getIntrinsic('Array.prototype.fill'))
const arrayMap = callBind(getIntrinsic('Array.prototype.map'))
const arrayJoin = callBind(getIntrinsic('Array.prototype.join'))
const random = getIntrinsic('Math.random')
const floor = getIntrinsic('Math.floor')

const CHOICES = (
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
  .replace(/[l1oO0]/g, '') // Some characters removed for readability
)

const arrayOfLength = length => arrayFill(new $Array(length))

export function generate() {
  const genChar = () => CHOICES[floor(random() * CHOICES.length)]
  return '§' + arrayJoin(arrayMap(arrayOfLength(5), genChar), '')
}

That seems to have actually made things worse. @ljharb's API is designed to solve a very specific problem, and that problem isn't related to the ergonomics of writing robust code, which is why this code hasn't improved at all.

Perhaps when you were imagining a getGlobalCopy function, you were imagining that it would construct a shiny new replica of a built-in that could be independently mutated without affecting anyone else, thus you would be allowed to safely get an entire object or class and use methods from it, instead of fetching each individual function. This could certainly be discussed (though, the constant use of such a duplicating function at the top of all of your modules could very well be a bit memory intensive unless we can find a way around that). Let's see how this looks.

const callBind = fn => fn.call.bind(fn)
const $Array = getGlobalCopy('%Array%')
const arrayFill = callBind(Array.prototype.fill)
const arrayMap = callBind(Array.prototype.map)
const arrayJoin = callBind(Array.prototype.join)
const $Math = getGlobalCopy('%Math%')

const CHOICES = (
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
  .replace(/[l1oO0]/g, '') // Some characters removed for readability
)

const arrayOfLength = length => arrayFill(new $Array(length))

export function generate() {
  const genChar = () => CHOICES[$Math.floor($Math.random() * CHOICES.length)]
  return '§' + arrayJoin(arrayMap(arrayOfLength(5), genChar), '')
}

Ah, that didn't help much either, because of the pesky call bindings.

We can certainly try brainstorming other ideas on how to make a user-friendly API that's better than what we have today, I'd be happy to hear if you have any better ideas than what I showed.

Now, let's see what happens if we drop in the 'use initial' directive.

'use initial'

const CHOICES = (
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
  .replace(/[l1oO0]/g, '') // Some characters removed for readability
)

const arrayOfLength = length => new Array(length).fill()

export function generate() {
  const genChar = () => CHOICES[Math.floor(Math.random() * CHOICES.length)]
  return '§' + arrayOfLength(5).map(genChar).join('')
}

That piece of code is exactly the same as our original example, except for the directive at the top. That's all we needed to do to make this piece of code robust against global mutations. Certainly, it's not always this easy, there may be times when you need to add additional brand-checking and what-not (to make sure people aren't giving you fake parameters), and it's certainly important to understand how this directive works when you use it, but the overall effect is pretty nice. You get to write JavaScript like you normally would, but with the protection you want.

The extra, exotic object on globalThis and whatnot are unfortunate extra details, but it's also not a major part of this directive. Very few people will actually need to use those pieces, or even know much about them, but it's there if anyone needs them.

And, I also get that there's a high bar for directives. But, I can't think of any good API-only solution that really helps a lot with the erognomics of global mutation protection.

I wonder if there are tools that, given enough static type information (jsdoc/TypeScript), could re-write normal looking code into the robust style as a build step.

1 Like

...Not that I know of, but that's a pretty good idea.

If you declare a variable as type Array, then try to call a map function on it, it could transform that code to pick a map function off of the array's prototype at the top of your module, then use the picked-off map function instead.

I am realizing that there is one special case that the 'use initial' directive is able to handle cleanly, that neither a transpiler, nor hand-written code could handle very well, and that's the ability to have polymorphism.

For example, if you wanted to write a function that accepts any TypeArray and called .fill() on it, with 'use initial', you can just accept it as a parameter, ensure it really is a TypedArray, then call .fill(). A transpiler or hand-written code would have to pre-stash the fill function from every TypedArray variant beforehand, specifically, check which TypedArray you provided, then call the appropriate .fill().

Still though, a transpiler would get us quite far. Interesting...

The different TypedArray variants inherit from the same base, so there is only one %TypedArray%.prototype.fill. So that particular example doesn’t hold. But the general polymorphism one does.

I’ve come across this when writing code that could accept either a Map or a WeakMap. Needed to create a special set and get that would check the receiver type and then use the correct function.

2 Likes

Then try it my way:

{
   //_Array a new instance of the original function with a new instance of 
   // the original prototype object, holding new instances of each of the 
   // original prototype member functions. A bit different from what ljharb
   // wants. Likewise with _Math.
   const _Array = getIntrinsic("Array");
   const _Math = getIntrinsic("Math");
   //Could also do
   //const {Array=_Array, Math=_Math} = getIntrinsic("Array", "Math");

   const CHOICES = (
      'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
         .replace(/[l1oO0]/g, '') // Some characters removed for readability
   );

   const arrayOfLength = length => arrayFill(new _Array(length))

   export function generate() {
      const genChar = () => CHOICES[_Math.floor(_Math.random() * CHOICES.length)];
      return '§' + arrayOfLength(5).map(genChar).join('');
   }
}

Doesn't get much more clear and ergonomic than this. Except for the '_'s and 2 lines (plus the comment) it's identical to your original code. The difference is in what getIntrinsic() does. If you request a function, it returns to you a clean-room copy of the original. The function itself is a new function, so Array !== _Array, but the behavior is identical and calls the same native functionality. If the function normally has a prototype object, it will also be a clean-room copy of the original, and Array.prototype !== _Array.prototype. However, on that new prototype object exists all of the original functions (that haven't been restricted by another API). Each of those functions is likewise a clean-room copy.

The point is that the API tree is recreated starting at the function or object that is requested if it exists. As a result, it can be used exactly like pre-existing ES API objects with the only issue is that the API objects will not be able to pass strict identity comparisons with the pre-existing objects. I don't see such comparisons as useful. Plus, there's easy work arounds for any resulting instance objects that may be required to pass such identity tests.

It would be exceedingly easy to accidentally leak your copied constructors, making them no longer robust; it would also be very easy for you to use array literal syntax and assume it was giving you the wrong Array prototype; and this would almost certainly have memory and performance overhead engines would be unwilling to consider.

You're right, that would be a simpler rendering of my example if you're receiving a deep copy of the requested object. Don't know why I didn't think of that.

But, @ljharb does bring up some good points. Things get a lot more messy when the situation different a little bit.

For example, if the arrayOfLength function was found in a general purpose utility file, then you would want to use the normal Array constructor instead of the clean-room copy, which also means you would need to do all of the callBind stuff. If you used a cleanroom copy, and a public function in your library returned one of those cleanroom copy arrays, then the end-user would have access to your cleanroom copies prototype to mutate it and what-not.

Likewise, if this generate() function accepted a parameter, say, an array of characters to choose from, you would also need to work with callBind, because they're giving you a potentially corrupt array.

Basically, you would only get improved ergonomics if you know for sure that values you're operating on are intermediate-only values - they don't get inputted, and they don't get outputted.

Things also get weird for operations like Array.prototype.join() - I assume this would return a native string, not a cleanroom copy, even if you were using a copied Array. Which means you would have to coerce it back into a copy if you wanted to then use String.prototype methods safely.