Proxy drilling once again ...

From me, being a developer, it makes a lot of sense that they put work into making sure language invariants are being upheld, even if you're using a proxy. Yes, there's a lot of extra flexibility that could be unlocked if you allow proxies to break these invariants, but more flexibility isn't always a good thing (think of macros - tons of flexibility, and tons of danger and abuse as well - code that says it does one thing but actually does something completely different because of a hidden macro somewhere). The less invariants the language has, the more pitfalls it has that I have to be weary off, and the more likely it is for programs I write to misbehave if someone passes in a funky proxy.

I'm also not sure why so many of your arguments are founded on the typeof operator. I hate the typeof operator, it's so broken in so many ways. I wish they would come out with a newer operator/function/something that would replace the need to use typeof, and perhaps this new operator could actually make a distinction between arrays and other objects

2 Likes

I wasn't there when that decision was made but my understanding is that this is because before Proxy was added the language already had Array.isArray, the only class that back then and still today has such a predicate.

FWIW going back to 2010 it's possible to see the earlier designs of Proxy: https://youtu.be/sClk6aB_CPk. There isn't this target/witness object to enforce the invariants. But towards the end it does hint at the discussions to introduce invariants, with the context of limiting what host objects (i.e. browser C++ based objects) can do. A case he mentions is host objects that are callable but not functions.

1 Like

to whom it might concern, I've summarized all the shenanigans (or meant behavior to some) in here and suggested how to workaround these or yet another library I had to create to circumvent all the findings and limitations ... that's used in production already and 4 related projects are based on it: Proxy Traps Cheat Sheet (github.com)

I hope this helps at least developers but if you read through, you'll see how many inconsistencies there are beyond the Proxy as it is today.

It's a sealed shipped thing, I understand that, but I still hope in the future there could be something better than that, something that can proxy even primitives, if desired, or can proxy any held reference that comes from foreign PLs (WASM) or foreign realms (Worker/main) ... I am dealing with these things daily and the dance I need to do instead of just trusting traps is hardly bearable ... although tamed so far (but also only recently after new "break all things at runtime" discoveries).

I still don't get why splitting hairs on arrays. Arrays are not that special from other branded objects, other than it has a predicate that drills through proxies. You can't mimic any object containing internal slots/private properties with proxies. A proxy of a Map can't survive a custom isMap check either:

function isMap(x) {
  try {
    Map.prototype.get.call(x, undefined);
    return true;
  } catch {
    return false;
  }
}

Are you surprised that Array.isArray does work with proxies of arrays? That's the only thing "special" about them and I think some others feel the same, too.

1 Like

I don't get what that means. What is a proxy over a primitive, what would it do?

I'll try to explain better what I am doing and why all these caveats matter to me.

Let's take the most basic code example but please pay attention to the explanation:

// this must be an array
const cpus = require('os').cpus();
// it needs to survive Array.isArray(cpus)
// it must be iterable too

// each CPU must be an object where each property
// might be an object or a primitive
for (const cpu of cpus) console.log({...cpu});

There is no surprise in this tiny snippet that one can copy and clone in node repl and see results but what you might not expect is that this code runs in a Worker that communicate directly via Web Socket to either NodeJS or Bun and every reference doesn't exist in the Worker, it's simply a Proxy of the reference created in the NodeJS / Bun interpreter and Garbge Collected when it's not needed/used/reached within the Worker anymore.

The cpus looks like an array, acts like an array, but in the Worker is actually just a thin Proxy to a unique identifier: ["array", 123]

Whenever any operation happens in the Worker code Atomics ask to the main thread to ask via sockets to the forein interpreter to execute that Proxy trap operation in the real reference held until GC'd, and the same goes for anything this remote array returns per each interaction but in this case we have objects: {type: "objet", value: {....}}.

 ┏━━━━━━━━━━━━━━┓
 ┃ ◂━  Worker ◂━╋┓
 ┗┳━━━━━━━━━━━━━┛┣ parse
  ┣ Proxy trap   ┃
  ┣ Atomics wait ┣ notify
 ┏┻━━━━━━━━━━━━━┳┛
 ┃     Main     ┣◂┓
 ┗┳━━━━━━━━━━━━━┛ ┃
  ┣ Web Socket    ┃
 ┏┻━━━━━━━━━━━━━┓ ┃
 ┃ NodeJS / Bun ┃ ┃
 ┗┳━━━━━━━━━━━━━┛ ┃
  ┣ Apply trap    ┃
  ┣ stringify     ┃
  ┗ WS Result ━━━━┛

Please note I've used node and bun as example because that's easier to understand from a JS developer point of view, but you can replace Web Sockets and server with pyodide or MicroPython interpreters with their own FFI and the dance is basically still the same ... or just how fully driving the real DOM from a Worker happens, chopping on the 3rd indirection still the dance is the same.

Also note this is not hypothetical, this is how polyscript works and polyscript fuels PyScript.

Constraints

  • all objects must behave exactly like objects
  • all arrays must behave exactly like arrays
  • all methods and functions and classes must behave the same too
  • all primitives must be able to travel but also buffers

The latest point means that in an ideal world, where engines would be so kind to expose their structuredClone serializer/deserializer utilities, I could use just those primitives and be done but as reality kicks in I need to use @ungap/structured-clone/json parse and stringify utilities to survive types not compatible with JSON and on top of that I want objects to fully reflect their source nature, meaning that if {notYet: undefined} is referenced, in the Worker the notYet key must be present and the undefined value carried along.

Add bigint and known symbol survival (Symbol.iterator to name one) and you see that this architecture is screaming for a common way to define type / value pairs that survive all sort of indirections and common issues with JSON or structuredClone.

Previously ...

The initial implementation of coincident (which makes polyscript possible, hence PyScript too) used the [type, value] convention (for non function cases) to describe the desired type and behave accordingly with traps and/or arguments or returned values deserialization.

This revealed the issue with Array.isArray first, so that I've monkey patched within the Worker (hence not nearly as bad as a global main polyfill yet still ugly) but then we had an example where a main thread library, that we don't control and couldn't fix for workers, that was failing hard with ["object", {...}] references because the traps such as ownKeys and the one around descriptors wanted a length non configurable property so that any object out there with an actual length property, for whatever reason, would've failed the roundtrip dance and at the Proxy level.

The TL;DR is that patching in each Worker Array.isArray wasn't good enough, so we had to disambiguate between the Array, the Object, and the Function case, which are the main 3 types the Proxy handles and drills arbitrarily, with also caveats for the apply VS construct dance, but those errors can be lazy too so it's less of an issue.

Current state

I wrote yet another "shouldn't be necessary but here we are" library called proxy-target which resolves everything I've encountered to date by providing utilities that creates [value] when the foreign reference is an Array, {t, v} pairs to describe every other type including null and undefined, and Ctx.bind(value) where Ctx is a function like this:

function Ctx() {
  'use strict';
  return this;
}

We can now bind remote functions references with integers or strings (see use strict not accidental) and intercept all function traps and never break for objects or arrays, so we removed the need for an Array.isArray patch and now objects acts 100% as objects and no extra checks or operation is ever needed on our side.

As summary

I am confident almost nobody will read any of this length explanation about why I've been filing issues and why all these things are just unnecessary friction and most of the time undesired to be forced to solve in user-land ... so that while I am sure nothing will ever change in the Proxy space at least anyone interested in the story around why proxy-target, coincident and polyscript needed a better specification based solely on traps intents and not target drilling surprises could have an answer.

Regards

P.S. if anyone is interested in "what could you do with such complex stack?" this video should easily answer that: https://twitter.com/WebReflection/status/1678762388538155013 it's a worker using synchronous NodeJS APIs to orchestrate via DOM client/Raspberry Pi 2W Zero results.

Thanks! I think you should have led with that instead of just complaining that proxies are broken or not sufficiently documented.

I don't get this. What exactly is "any operation in the worker"? Let's look at the code

const count = cpus.length;
console.log(count + 1);

From my understanding, only the .length access on the proxied array needs to be trapped, travel to the foreign interpreter, and come back to return a primitive number. But the second line executes exclusively in the worker, the arithmetics and the logging do not involve any proxies, atomics, sockets, interpreters or ffi. Am I missing something?

Assuming I understood this correctly, why would count need to be a primitive wrapped in a proxy?

count is resolved as cpus.length value, the primitives story is a bit of an overlap with the rest of the story but these are just traveling in a well recognizable format ... these are, in fact, not really proxied in my use case, but I think having the ability to eventually also proxy primitives would be cool, so that one can extend literally anything without needing to pollute ever any native prototype.

The typeof operator already drills the function VS object case, if it could drill string and number too it would be awesome ... but it's less needed.

The reason I am normalizing everything as {t, v} pairs is that known Symbol, as example, will travel as {t: 'symbol', v: 'iterator'} so I can disambiguate from a symbol and just any "iterator" like string, which would travel as {t: 'string', v: 'iterator'}

Same goes for bigints that travel serialized as string but with type bigint ... maybe I should've been clearer on this as it's not that I am actually using primitives as proxies, these are all resolved out of the box as expected, although because I am trapping by references anything I want to, a huge string could also be trapped as proxy target and forward traps for operations to the main thread so that memory needed to represent that huge string is on main only and not repeated in every worker that deals with that main.

Now, this is hypothetical, but as I've unlocked these kind of possibilities, I think it wouldn't hurt to have a way to trap typeof checks and be able to forward elsewhere even primitive kinds, specially when these can be super heavy ... I hope this makes sense.

for clarification sake, I hear you ... but to me the specs are full of badly documented caveats, arbitrary drilling of typeof or Array.isArray and all behind the "it should not be possible to disambiguate a proxy from its target" where instanceof fails, properties access fails, tons of stuff easily fails revealing the proxy nature ... and on top of that, I needed this thread, one new library, and 3 follow up libraries updates and refactory to finally be able to formulate and write down my previous post / explainer, so that I couldn't have started with it, I had to fully understand all the details.

So yes, the post came out of frustration ... and here the TL;DR version of my effort:

  • create an extremely complex architecture that never existed to date (yes, we tried comlink, it failed our purpose and expectations)
  • find the best way to represent all possible data without breaking it while traveling
    • idea: to use the least amount of RAM and bytes after serialization the [type, value] array proxy and the bound context for methods and classes should be perfect!
    • amend: the Array.isArray operation drills the proxy and reaches the target and there's no way to avoid returning true ...
      • idea: how about we patch Array.isArray in the worker as that's a new env anyway so not too intrusive?
      • amend: what the heck is happening with ownKeys and property descriptors? Why do I need to provide a non configurable length for stuff that could literally be anything out there?
      • amend: after tests and investigations and discussions, I came to the conclusion that having a generic [type, value] is doomed for my purposes so I need a thin {t, v} object for everything else and use [v] only for arrays ... at least the bound function trap still works as expected!

This tiny list has been my last week nightmare to fix and solve without breaking or changing anything at all on the surface of all projects invloved so once again, I am not trying to justify my initial "attack" but gosh if this API really gets on the way of new ideas and projects ... that's it.

I forgot to answer to this:

I don't proxy maps, I proxy foreign references ... the references deal with the map in the map domain / realm / context so your example works very well with coincident library ... please read my follow ups to understand how I use Proxy ... most known use cases to date are about wrapping (for not-so-clear reasons) arbitrary objects where your case, DOM nodes, and whatnot, fails already, but I am using Proxy to actually proxy remotely anything I am proxying so that references exists elsewhere, not in the current realm, and operations work out of those references operations happening in their own realm.

If anything, I think I am one of the few out there using Proxy semantically: to proxy any reference operation elsewhere, not in the current scope/realm/context. This is probably also why Proxy never really gained too much momentum among developers and libraries, it solves almost nothing for code in the same realm, it's a wonder of possibilities when paired with FFI from WASM and/or Atomics sync iterations from Web Workers.

I understand none of this was common or even available at the time the spec landed in ECMAScript, but today it's the only primitive anyone dealing with more complex stuff than usual has.

Yesterday it took forever to land cross browser and it rarely got adopted as primitive, today it's crucial to enable WASM related use cases and cross thread/realm/env scenarios ... this is why I would love to have a new discussion about a modern version of the Proxy standard, one that might even bail out of all engines optimizations, but one that would never fail developers expectations and that normalizes everything out of the box, instead of having a long list of invariants ... that's my complain about current specs in a nutshell.

Formalizing with tons of "open to discussion" thoughts around my proposal, this is what I'd love to see in JS next:

Delegate extends Proxy

A Delegate (name just semantically made up) requires new Delegate and it cannot be sub-classed, exactly like Proxy.

A Delegate signature is like this one:

new Delegate(any, DelegateHandler)

DelegateHandler

A delegate handler is an extension of the current Reflect namespace / utilities.

const DelegateHandler = {
  ...Reflect,
  // must return any typeof known value or throw
  typeof(target) {
    return 'bigint' || 'boolean' || 'function' || 'number' || 'object' || 'string' || 'symbol' || 'undefined';
  },
  // invoked only if typeof trap returned 'object'
  // also invoked only when Array operations are expected?
  isArray(target) {
    return true || false;
  }
};

That's is, disambiguation for any delegate value is done.

Now ...

Delegate Traps

This might be the most controversial part of this proposal, but basically what I am thinking is that any typeof should get a chance to apply or construct too, so that any possible string, namespace (as object or array) and any possible type should get a chance to be invokable, for whatever reason they want or need to out there.

  • "why would anyone invoke a boolean?"
    • I don't know, but I don't have enough crystal-ball knowledge to tell you that operation should be forbidden ... they can throw on delegated invokes over a boolean type though, we should be all good!
  • "what engines should do around this monstrosity?"
    • bail out, for the time being, around all possible optimizations ... let's see how much this pattern is used or how, and then optimize, if even possible, for that use case? When I've heard V8 would optimize out of React state choice so that destructuring [value, update] out of any JS code would be faster, I kinda felt these optimization patterns are really demand based, aren't they? So that in here it'd be the same: do nothing, optimize nothing, and see if this Delegate primitive has momentum and usage, then think about it, not before that time.

I doubt that is possible with the current design of the JavaScript language. A primitive is a value that the engine can directly access to do computations, and it is immutable so it can be copied and compared easily. Evaluating 2 ** 7 === 13, !null or 'A' < 'B' never involves any user code.
Proxying primitives would mean the ability/need to intercept all these native operations. There would need to be a proxy trap for every operator, and the ability to introduce new operators without breaking existing proxy-based code would be limited. It would probably mean the addition of new invariants, such as x == y and x != y never being true at the same time (same for x > y and x < y). It would require a solution to operator overloading.
At that point it's probably easier to instead get rid of primitives and go full OOP where everything is an object, and operator evaluations desugar to method calls. We could remove typeof entirely and rely only on the instanceof protocol or duck typing (including things like 'call' in f ? f() : Null). We would need to add a toBoolean() method.

But then that would be a new language, no longer JavaScript (with all its quirks).
It would, I think, however be a subset of the JavaScript language we have today, you can already write JS code in that style if you want/need to interact with proxied values from another environment.

I think that would already be possible by implementing a StringView class that behaves like String objects. Most code would continue to work with this, as strings with their .length and indexed properties and many methods are already the primitive type closest in usage to objects. The code only needs adjustment for typeof and comparison operators.

The only true limitation I can see around writing primitive-less code is about symbols: objects (and proxies) cannot be used as property keys.

I think you're treating the authors of the proxy spec unfairly here. They actually had your use case (and similar ones) in mind, try reading up on some of the old discussions.
It's true that you rarely see proxies in everyday code, but that's not because of bad API design. There is some adoption in libraries that use proxies to intercept property accesses, but when knowing the property names up front you'd use basic getters and setters instead.
The proxy API was not designed for ease of use, but for powerfulness, completeness, security and forward compatibility. It needed total feature coverage of the object meta protocol, precisely to enable use cases such as yours. But sandboxing, layering, FFIs, etc are niche - they are implemented by platform developers, not by application developers.

2 Likes

I am a platform developer these days and that's where current API is not ideal ... I am even OK with non primitives but the Array.isArray missing trap on the proxy or the fact getPrototypeOf is completely arbitrary both bothers me and makes the dance more awkward than it should so I need to use a user-land library ... I just keep piling up user land libraries for various APIs and that's not super helpful, that's it.