Proposal: ESX as core JS feature

Nothing is related to the DOM and nothing is invoked out of the box, that's what ESX representation is about. All examples in the gist already answer this question though.

Update

There is finally an ESX Babel transformer that follows exactly the Gist specifications and allows me to showcase how ESX can be used by any environment or library to output or do something so that we have less theory and more practical cases to talk about.

This is a basicHTML helper that, in a few lines of code, enables nodeJS or any other runtime, even Web Workers, to produce an output giving an ESX input.

function Paragraph({content}) {
  return <p>{content}</p>;
}

const div = (
  <div class={'some class'} onclick={() => ignored}>
    <Paragraph content="some content" />
  </div>
);

console.log(basicHTML(div));
// <div class="some class"><p>some content</p></div>

There will be more examples soon but actually I would like to take the time to migrate udomsay library to ESX to hopefully improve performance further and demonstrate ESX is superior than JSX for library authors too.

The ESX transformer exists thanks to @nicolo-ribaudo which remains a maintainer but I'll take over from now on as I don't want to keep bothering him with my ideas :-)

To make it even more explicit, what is it that template literals tags miss, I've created a comparison chart and added it to the F.A.Q. section.

Some thoughts and observations:

  • What exactly is the desired runtime representation of the ESX expressions?

    • Would you really have this class hierarchy of ESXToken, ESXProperty, ESXChevrons, ESXElement, ESXFragment, ESXComponent, ESXTemplate as laid out in the Gist? Or would there be only ESXToken, as in the "Simplified implementation"?
    • Would these be available as globals?
    • Would they be callable/constructable directly?
  • Are the ESXToken instances mutable? Frozen? Deeply frozen?

  • Why does an ESXElement have a .value? Wouldn't .name, .tagName or .nodeName be more appropriate?

  • Why are the .properties of an ESXChevrons instance another ESXToken with an array as its .value, instead of simply the array itself? Can these be nested arbitrarily?

  • Is it allowed that properties contain non-ESXProperty tokens?

  • I'm missing the interpolation story for children.

    • Does it work like in JSX?
    • Can I use <…>{ stringExpression }</…>, what's the runtime representation?
    • What about <…>{ arrayExpression }</…>? What can the array contain?
    • What about null values?
    • Can they be arbitrarily nested JSXTokens without chevrons? That's quite complicated to handle (but on the other hand might help with reactive rendering).
  • I don't get the point of STATIC_TYPE and RUNTIME_TYPE, and certainly not of MIXED_TYPE.

    • it sounds like they are somehow supposed to be used for performance optimisation, but it's not quite clear in what manner. Shouldn't this rather be solved through the identity of the token objects, i.e. <>static</> evaluating to the same immutable instance every time?
    • putting them in the .type of the token makes it impossible to distinguish (by .type) whether a token is a text token (CData/Text in DOM) or a property token (Attr in DOM) or just a list of other tokens. The .type of elements/fragments/components makes more sense.
    • Are properties like bool={false} or num={42} really dynamic? Passing non-string constant values to components is quite common in React/JSX in my experience, and it seems like these would be classified as "runtime values" in your proposal, making a vDOM necessary again.
    • Can a tree that contains components ever be static? The variable values for the component constructor might change between evaluations.

    I would propose adding a .static = true property on the tokens that represent a static tree, nothing more. If absolutely necessary, mix it into the type as a bitwise flag, but that'll confuse most developers. Tokens that represent runtime values, or contain other tokens with runtime values, would just not have this property (or have it set to false), I see no need to distinguish "mixed".

    Maybe introduce some sort of <?static><Component value={false} count={3}>{"\t"}</Component> helper to memoise ESX trees based on syntactical location, i.e. keeping the value of the first evaluation and making it static. Or, since we're in ECMAScript land, a cover grammar to determine whether the interpolated expression is constant (consists only of literal expressions)?

  • Naming: I'd like to carry over the terminology from JSX. The "chevrons" should be ESXElements, the "elements" should be ESXTags

  • Does one really need to distinguish between tags, fragments, and components? What would happen if I were to construct an ESXComponent with a string or null as the value? Notice that React/JSX does allow this, return <div /> is equivalent to const Tag = 'div'; return <Tag />. From the renderer's point of view, branching based on the typeof the .value is no different than branching on the token .type.

  • Something where I see JSX lacking in DX would be commenting. { /* <NotRendered /> */ } is possible, but weird. Can we have nice <!-- --> comments in ESX, please?

  • I don't get the point of the ESXTemplate wrapper.

    • Why does it have an id, but none of the other tokens do?
    • Does the .id need to be a separate property? (I think I know an answer, but you might still want to spell this out somewhere)
    • If multiple tokens can have the same .id value, does it really denote an identity at all?
    • Introducing ESXTemplate as an extra level of nesting seems to make the syntax less composable. I would expect <p><span /></p> to lead to the same ESX tree as <p>{<span />}</p>, but in the latter expression, there's an extra template wrapped around the inner element.
  • on FAQ:

    • "no callbacks needed" - I totally understand what's wrong with directives to set the JSX interpreter, however does this not mean a performance advantage? It seems ESX trees need to be constructed and later be interpreted by passing it to a renderer function. Template tags and JSX could be evaluated directly to the rendered representation, with no overhead of allocating temporary objects.
    • "difference to JSX: it's not HTML" - JSX is not HTML either. Only React-DOM is tying JSX to the DOM. Please remove that point.
    • "out of the box static analisys + correctness" - if you refer to syntactical validity, yes. But how can ESX components be checked for semantics and types? In particular, what should TypeScript do with them, what would be the type of an ESXComponent instance that uses a particular component? Without knowing what renderer the ESX is passed into, it seems hard to describe and typecheck that the props and children match the declared interface of the component.
    • serialisation - you argue that only ESX can be serialised, but I'd say it's not that different from JSX or tagged template literals. Sure, it depends on the target representation of those, but it's generally possible as well, and faces the same issue when containing references to functions.
    • "it's scope aware" - so are template literal and JSX. Yes, interpolating component constructors into templates is weird, but that's covered on the previous point already. Once you're there, all the interpolated expressions are evaluated in the local scope as expected.

Please fix the .type issue. I'd suggest having only TEXT_TYPE, ATTR_TYPE and ELEMENT_TYPE, nothing more.

This could cause other issues. If I recall correctly, <!-- ... --> is already optionally allowed as comment syntax within JavaScript, for backwards compatible browser support, but in general shouldn't be used. Having it be an allowed form of comments within ESX would cause interesting issues when someone tries to write code like the below example, and notices it works, but only because they're running JavaScript in the browser, and it turns out this comment is outside of the ESX and will not be parsed as a comment in other enviornments.

return (
  <!-- This is doing some stuff -->
  <div>
    ...
  </div>
);

I started a new thread to focus on improving template tags, so we don't need to muddle this thread with that discussion: Improve sub-lange support for template literals

That being said, I would like to still discuss briefly one thing relate to template tags, mostly because it pertains to your FAQ section.

I would like to add one option that hasn't been discussed much yet. That table is basically comparing userland syntax support (JSX), built-in syntax support (ESX), userland tagged-template support (the tagged template column), but we haven't yet discussed the implications of using a built-in template tag. I'm not necessarily arguing for or against it here, I'm mostly just wanting to bring it up as an option as well.

If we were to add "built-in template tags" to that table, we'd end up with the same checks that "Tagged Templates" have, as well as:

  • allows object spread for props/attrs - Why can't template tags support this? In JSX, you spread attributes with {...attrs}, and in template tags, you'd spread with with ...${attrs}. In JSX, you spread children by just providing a list where a child is expected. The same can be done with template tags.
  • out of the box static analisys + correctness - this can happen to a degree. It's true that syntax errors won't be thrown until you're actually running that piece of code, but, linters and editors can also help you catch syntax errors inside the template.
  • syntax highlight in most popular IDEs - if you're tagging with a built-in tag, an editor can know how to highlight it.
  • it's standardized as syntax - It doesn't get more standardized than having a standards committee do so :)
  • it can be serialized - As Bergus noted, I don't see why any of these options couldn't be serialized when you're not dealing with user-defined components.
  • "it's highly interoparable" - I don't fully understand this point to begin with. You're claiming that ESX is interoperable with JSX, but I don't see how that is. Yes, they use the same syntax, but they build into two different things, so you can't just go using ESX anywhere where JSX is expected.

This seems to cover most of the checks in this box. However, there's still an argument to be made for the fact that it's more clunky to use template tags in regards to interpolation, which you did mention in the FAQ. I believe that would be the main advantage we gain if we move from a built-in template tag to built-in syntax support. Which is fine - the purpose of syntax is to make things less clunky to use.

1 Like

I hadn't had time to get back to this thread, but the point made by @theScottyJam above is exactly the one I would have made.

IMO, a built-in template tag would get you most of the benefit you're looking for, including the standardization aspect, and the possibility for dev tooling to "recognize" this built-in tag.

As stated above, I understand the DX may not be as good as direct ESX syntax. I see the following 3 drawbacks:

  • need to wrap ESX in template string literals
  • no early syntax error
  • requires wrapping scoped Element tag names in ${Tag}.

I'd argue the latter is actually a benefit as the magic lowercase vs upper case distinction between div and Tag is really not obvious to me. From what I understand, it's also possible to avoid the interpolation here by relying on scoped registries (which does bring the question of the best way to "configure" the parser, and what kind of dynamism should be possible in changing said configuration).

Altogether, I am not convinced that the cost of new syntax is worth the potential improved DX over a built-in tag. That said it's probably a stage 2 concern anyway, and a delegate could probably make the case that exploring a way to generate tree like structures using standardized XML-like templates or syntax is a worthy problem to explore.

I've already answered in the other thread about improving template literals and I won't discuss in here template literals for the following reasons:

  • they are inferior DX, it's proven by the industry, usage, and all surveys
  • if anyone propose tooling to fix template literals issues, the reality already answered: if you need tooling around fragmented template literals syntax (due non de-facto standard syntax around libraries) you're better off with de-facto standard tooling around JSX wich is a proven standard in the industry already
  • self closing tags, components (scope awareness), correctness (syntax) are not available out of the box with template literals, and likely never will due their nature, hence it's pointless to even keep template literals as part of this proposal conversation, as there's no way these can compete

Now I'll take the time to answer all relevant ESX proposal questions instead, but once again, please don't answer "template literals can do that" because that's not the reality, nor what developers using JSX today wants to hear or know that new tools might fix their problem ... writing template literals tags is advocated as tooling free standard approach, and ESX has exactly that very same purpose, and no tooling should be needed at all, or it's a dead-end already.

Thanks for not pushing around template literals but, if you'd like to, please chime in in the other thread.

Appreciated.

for clarification sake, I'll answer these first:

  • Why can't template tags support this?
    • because there's not standard syntax around string chunks in a template literal. ESX (and JSX) is syntax. So spreading that way in library-x tag will produce the unexpected in library-y tag. In ESX there won't be any issue.
  • It's true that syntax errors won't be thrown until you're actually running that piece of code, but
    • that but is a show-stopper for developers
  • if you're tagging with a built-in tag, an editor can know how to highlight it.
    • that assumes libraries exporting html as tag all work the same and offer same syntax features: they don't, because there's no standard around what you can write in a string (rightly so)
  • It doesn't get more standardized than having a standards committee do so :)
    • the committee doesn't care at all about the differences between uhtml, htm, lighterhtml, hyperHTML, and lit-html differences within the string chunks (or any other library to date) so this is not an answer, actually the problem underlined
  • I don't see why any of these options couldn't be serialized when you're not dealing with user-defined components.
    • I will answer Bergus in details, but template tags are unique, not just an array, and they are dependent of the tag consuming these, hence not serializable across libraries in a meaningful way, as those arrays can contain syntax not allowed, or ignored, in other libraries ... interoperability is template literal tag dependent there, not syntax and its standard representation dependent. The basicHTML example out of the DOM world can render the same ESX anywhere, even in current JSX based libraries. A template as array with non-specified syntax? I strongly doubt so.
  • You're claiming that ESX is interoperable with JSX, but I don't see how that is.
    • anything using a transpiler/transformer/tool around JSX will work out of the box with ESX, because they transform the code, so ESX doesn't exist in their final output. Any library compatible with ESX, will work with previously written JSX components or templates, because it will understand ESX. ESX is a migration pattern for old to new libraries parsers, and a zero-trouble for anyone already using JSX out there based on their transformer. It's 100% backward compatible due tooling removing it from the equation, and 100% forward compatible due new libraries understanding JSX too, as their parsers will recognize ESX instead and work with it.
  • it's more clunky to use template tags in regards to interpolation
    • that's the tiniest detail behind this ESX proposal, as much more than that is solved out of the box.

Please let's move this conversation in the other thread (and thanks for opening it) if any extra clarification is needed.

1 Like

Sorry, but the question of whether template literals may be a valid approach for ESX has not been answered IMO. The other thread is about how developer tooling for template literals could be improved in general.

Again, the suggestion is that ESX be standardized, and implemented as a built-in template literal tag. The string parsing part of the template literal would thus be standard.

Why are early errors such a strong requirement?

Once again, the suggestion is to standardize a built-in esx tag. I don't see where different libraries implementing their own parsing comes in the discussion here (besides being the current state of affairs).

Maybe I'm really missing something, but the suggestion is to standardize the "syntax" of the template string parsed by a built-in esx tag. Can't there be a tree-like representation outputted by that parsing that could be used as standardized input to the consuming libraries? Does the construction of that representation need to be parametrized (like through delegation to the various ESXToken classes in your gist)?

@bergus first of all, thank you for going through the proposal and asking valid questions. Your post is long, so this answer will be long too. I hope you'll have the patience, and the time, to read it through.

  • What exactly is the desired runtime representation of the ESX expressions?
    • the proposal is rather an abstract but it all boils down to the simplified implementation which already provides everything is needed to deal with ESX: the ESXToken class. Every token is rather defined by shape, but it doesn't need a specialized class, and the Babel transformer reflects that using this class to rule them all. Accordingly, the ESXToken should be globally available as new primitive, but it shouldn't be constructable or callable at all. (edit revisited in my latest proposal)
  • Are the ESXToken instances mutable? Frozen? Deeply frozen?
    • I don't have a strong opinion here but no would be my answer. ESX must be simple and fast by all means, including its definition in the specs. As these tokens are also serializable (elements and fragments out of the box) I'd save any burden to de-serialize these as frozen, or deeply frozen too. Beside template arrays and static arrays, I don't recall any other frozen or deeply frozen primitive in the specs, and if even crypto namespace is configurable, I don't think ESX should have any special treatment in that regard. The transformer currently doesn't care at all about freezing, and that's also good for performance (in user-land).
  • Why does an ESXElement have a .value? Wouldn't .name, .tagName or .nodeName be more appropriate?
    • for shape consistency, every ESXToken instance has at least a type and a value. Because there's no inention to couple ESX with the DOM, even if it's arguably the most common use case, no DOM fields have been strictly used to define it. On the other hand, if an element should have a tagname, a Component should have a callback property. These are all parsing and branching frictions to me, as {type, value} would cover them all, and make it always safe to destructure these from a token. (edit revisited in my latest proposal)
  • Why are the .properties of an ESXChevrons instance another ESXToken with an array as its .value, instead of simply the array itself? Can these be nested arbitrarily?
    • there are 4 kind of properties in JSX (edit revisited in my latest proposal):
      • no property, hence properties is null: <div />
      • static properties, hence properties is a STATIC_TYPE token: `
      • mixed properties, with MIXED_TYPE token: <div a="1" b={2} />
      • runtime, hence RUNTIME_TYPE: <div a="1" {...any} /> where any could have a a field that overwrites the previous static one
    • in short, the idea behind properties is that the ESX consumer can decide what strategy to use to address any of these cases, and because this is statically analyzable, I've thought it's a check worth having at the syntax level, rather than at library side. Truth to be told, I wouldn't mind if properties where a possibly empty array right away, where if not empty, libraries can crawl these and decide what strategy to take. This would be more aligned with the children field which is always an array, but things with children is that there are no spread cases allowed (a spread can be just an array instead, it will work via differs when needed), but properties are extremely common in JSX so that having any extra hint/helper to understand how to handle these felt like the right choice. If this is a performance stopper for the parser, once again, I wouldn't mind having properties always a direct array of what they contain, but it'll be left to libraries understand when the type is RUNTIME_TYPE and also the value is an object ... in short, "we" need a way to understand spread operations that are different from runtime non-spread operations, as that can take a completely different pattern for the ESX consumer
  • Is it allowed that properties contain non-ESXProperty tokens?
    • no. properties is either null or a list of tokens because tokens carry the property kind (static or runtime) and its value. The value won't be a token as the syntax doesn't allow it: <div a={<b />} /> breaks and also makes no sense to me
  • I'm missing the interpolation story for children.
    • it works like in JSX and every example of yours will be a token with either STATIC or RUNTIME type and their value. The value can be anything, including an ESXToken, but it's up to the ESX consumer to understand all sort of interpolations (signals, components, callbacks, and so on). <div>{<b />}</div> is perfectly OK, and so is <div>{[...items]}</div>. Each child will specify the type and their value is up to library "debate". <div>a{'thing'}</div> creates two children, one static with value as string, and one runtime with current value also as string, but the fact it's runtime means in the future, due conditions, it returns something else, the library/framework should work around it. These situations have been solved already by pretty much every JSX based solution, so it's nothing new, or nothing to worry about: an interpolation value as children can be anything ... understanding it's static or runtime though, that's where ESX shines.
  • I don't get the point of STATIC_TYPE and RUNTIME_TYPE, and certainly not of MIXED_TYPE. (edit revisited in my latest proposal, no MIXED_TYPE in the mix)
    • <>static</>, as stand alone template, is a unique template that contains a fragment with a static children which value is the string "static". The library can use any of these details to make it already resolved at any time in the program. As chunk, withink an outer template, is a fragment with a static child. A library can optimize over that too. So yes, that's theoretically immutable by definition, but a library can decide to ignore that immutability (one-off SSR, like lambdas)
    • you are bringing in the DOM and ESX has nothing strictly to do with the DOM. A child type is about its nature, being either static, as immutabe, or runtime, conditions within the interpolation might change the outcome in the future. ESX tells you the template type, not the value type, because the value itself can be anything: not an attribute, not a CData/Text, just a signal, an obect with a special toString, a string, a number, who cares ... that's library / user-land issue, not an ESX one, as it can't represent all user-land made types in the world, and it shouldn't.
    • vDOM is not necessary at all, and udomsay already showcased that, which is based on a half-backed *ESX proposal (all hints re in place, just in a different way). bool={false} is not a use case, as a static JSX template either has <div bool /> or it doesn't. If there is an interpolation around it means it can be either true or false so, bool={condition} is reflected in ESX. Same goes for num={42} unless you are talking about props forwarded to components, where their specific value matters. in that case, udomsay library has a cache when such props are used to define attributes, still forwarding the value to the component, so basically your concern is not about ESX but how libraries would handle those edge cases. They can, but it has not much to do with the poposal here ... I mean, all these ESX details can be just ignored by any library too, like I did with the SSR example, right?
    • a component can be a static pin-point in the outer template, yes. The fact is a component is already a hint for libraries to treat it as such, and the fact it could return different results each time is up to the library too. udomsay has stricter rules around conditional returns, but nothing in ESX stops implementations to work around these cases too.
  • I would propose adding a .static = true property on the tokens that represent a static tree
    • that brings nothing to a well known type comparison ... and it would make the token shape awkward, as type is there by default, and you want to move it into a property instead, do I get this right?
  • Maybe introduce some sort of <?static> ...
    • that would break migration and it would make ESX more awkward than template literals so ... no?
  • Naming: I'd like to carry over the terminology from JSX. The "chevrons" should be ESXElements, the "elements" should be ESXTags
    • I am sorry you focused on an abstract intermediate class that doesn't exist in reality (current ESX transformer) so please let me fix that by changing the gist to show only the simplified proposal
  • Does one really need to distinguish between tags, fragments, and components?
    • absolutely, and your example should happily throw an error somewhere as that's not how anyone is using JSX for the last X years. With ESX though, the type will be component and the moment a library will try to invoke its value it will throw and that would be expected.
  • Can we have nice <!-- --> comments in ESX, please?
    • comments are for developers, not for end-users. I wouldn't mind having those comments fully ignored by ESX, and as JS already understands those, that should be a no-brainer: developers write comments for their own documentation sake, ESX ignore this like every good practice around minifying JS out there worked to date. I'm in, unless you want comments in ESX to which I'd ask why, for whom, which use case? but I wouldn't mind ignoring these while parsing it ... it's just unnecessary overhead to me.
  • I don't get the point of the ESXTemplate wrapper. (edit revisited in my latest proposal)
    • the id is the whole point ... a template is unique same as template literal tags are unique. The id is there to help optimizing everything every library based on template literal tags optimized to date, which is a lot of one-off parsing ... it serves the exact same purpose for libraries that want to enable it.
    • the template type is an artifact, and so is its .id. This is the funny part: you said developers don't know or care about bitwise operations, and I rise your concern with this: how ould you decide if a token is both a Component, a fragment, or an element, and a template? The answer is that in an ideal world where programmers know how to token.type & ESXToken.TEMPLATE_TYPE the whole template story should go away. Because the current state of the art is that nobody understands bitwise operators, the template is there to signal there is an outer static, well defined, template to optimize for, if anyone would like to. This allows token.type === ESXToken.TEMPLATE_TYPE as opposit of using bitwise checks all over ... but I am with you! As library author, I wouldn't mind having the template detail hidden in the binary representation of its type, so that the template wrapper would never be needed, and it's .id could be an optional field to populate. However, with the current proposal, nobody needs to learn bitwise operations and live happily ever after with a shape that has an identifier, a type as template, and a value that can be any of the fragment, element, or Component case, and let libraries' authors optimize on that.
    • no token will ever have the same id. An id is defined like in template literal tags. I don't think I need to expand on how template literal tags uniqueness work, because I am sure you know that already.
    • <p>{<span />}</p> is not really an interesting case to discuss, so I rise it to <ul>{data.map(x => <li>{x}</li>)}</ul> where having a known <li> template there enables tons of optimizations. Again, this is all identicall to template literal tags, so I am not sure what you are after as question here.
  • on FAQ
    • every JSX library is behind udomsay in terms of performance, unless they use a specialized JSX transform that makes JSX something else. ESX can be rendered right away as demoed already via the current basicHTML too. No callbacks means same outcome, without callback. In JSX is the render that matters, not the callback per-se.
    • JSX maps to HTML by default, which is the reason it uses className instead of class, because it expects the DOM behind the scene to be updated after an Object.assign operation on the node with props (more or less). In that sense, JSX is related to the DOM, while ESX isn't.
    • tooling is not part of this proposal ... tooling is something that happens after proposals. how TS handle anything in JS is, AFAIK, not a TC39 concern to date ... or is it? Anyway, if JSX worked with tooling so far, os will ESX, as they don't even need to change tool because it's backward and forward compatible.
    • in JSX you have React.createElement callback and that can't be serialized. Neither it can React.Fragment as it won't survive any JSON by design. That's why ESX is serializable compared to JSX.
    • template literals are not scope aware at all, JSX is. You can hook components into template literals but please show me which library provides same JSX expectations out there, and I wrote myself at least 3 kinds of those.

I won't fix any .type issue (edit revisited in my latest proposal) hoping you understand all decisions have been made for the reasons I have explained, and the transformer is there to demonstrate all these decisions are worth it.

I am not too fund of the properties part, so for that I can see, or change, my mind, but all other decisions have a strong reason to exist and I hope this answer convinced you all decisions are there for a reason too.

edit I might be OK with having an optional .id for the known complex types (fragment, element, Componnets) to reduce the wrapping surface and make that the explicit hint that, if not null, it means the template is an outer one, instead of an inner one ... I'll write an over-simplification around these details (template and properties) soon, but I was kinda hoping to first demonstrate current transform already makes it a win around logic and performance ... although if making it simpler is a way forward, I'm more than game there!

sorry @mhofman I don't think anyone want to standardize what developers can put into strings so to me this conversation makes no sense and has no future already. I was hoping to discuss ESX instead, thanks for your understanding.

edit also ...

the suggestion is that ESX be standardized, and implemented as a built-in template literal tag

not at all ... ESX should be syntax and it has nothing to do with template literal tags, it just carries some of the benefits listed in the table I've already shared twice.

Your unwillingness to explore suggested alternatives that may solve the core problem (a standard way to represent XML like trees within JS) makes me doubtful of the possibility to reach any kind of consensus.

The proposal process is about reaching a solution that solves a stated problem. At the end of the process, the proposal will rarely look exactly what the champions had originally in mind.

In my opinion you shouldn't be mandating specific syntax without allowing exploration of other options as part of a stage 0 proposal.

I am willing to explore alternatives as long as these are on topic. For template literal tags proposals, which is not what this is about, please let's keep conversations in the other thread, if that's not asking too much.

@bergus beside me editing my reply (see the bottom of it) there's one particular detail that bugs me even with the current transformer ... <Component props={...} /> and <tag props={...} /> are two completely different beast: one is forwarding props to the component callback (although this is a current state-of-the-art practice, it doesn't have to be the same with ESX but it'd break otherwise) while the other is defining attributes (although even this is a state-of-the-art practice, but ESX doesn't enforce that neither).

The gotcha is that while Component is a callback away, the tag is an inevitably (on DOM) a document.createElement operation, so that at the render level, the latter is clear, the former is "it could be anything, or even nothing" but it's currently still flagged as template when used.

To some extend, I agree with you components as templates are kinda useless, as there's really little to optimize there if the result is unknown (aka: conditional) so that in an attempt to have it all as potential tempates, I might have overlooked the component case. This is effectively the main bummer around templates or ESX in general, as Components aren't even serialized and, I believe (udomsay does that) components will have special treatment beside them being a template or not.

Now here I'd like to know your (or any other) opinion too ... as that might indeed be simpler by design.

This, before I update my overly-simplified proposal in the gist.

@bergus I've been thinking a lot about possible optimizations and screening relevant types, simplifying the resulting structures, and I wonder if this solution would be considered better (explained after):

class ESXToken {
  // attributes or children only
  static STATIC_TYPE      = 1 << 0; // 1
  static RUNTIME_TYPE     = 1 << 1; // 2

  // the following utilities DO NOT NEED TO BE AVAILABLE or standardized
  // these are here to simplify, via a namespace, a possible Babel transformer

  // child case
  static child = (type, value) => ({__proto__: ESXToken.prototype, type, value});

  // component cases
  static component = (value, properties, ...children) => ({__proto__: ESXToken.prototype, value, properties, children});

  // nodes cases
  static attribute = (type, name, value) => ({__proto__: ESXToken.prototype, type, name, value});
  static element = (id, name, attributes, ...children) => ({__proto__: ESXToken.prototype, id, name, attributes, children});
  static fragment = (id, ...children) => ({__proto__: ESXToken.prototype, id, name: '#fragment', attributes: null, children});
}

// component example
({
  __proto__: ESXToken.prototype,
  value:Function,               // the reference class or callback
  properties:null | {...props}, // optional properties forwarded just like in JSX 
  children: ESXToken[]          // a list of {type, value} children where
                                // type is either STATIC or RUNTIME and value
                                // can be anything, including ESXToken instances
})

// element or fragment example
({
  __proto__: ESXToken.prototype,
  id: null | {},
  name: '#fragment' | tagName,
  attributes: null | ESXToken[],// none, one, or more {type, name, value} attributes
  children: ESXToken[]          // a list of {type, value} children where
                                // type is either STATIC or RUNTIME and value
                                // can be anything, including ESXToken instances
})

The id for templates grants tagged literal possible optimizations, while the fact a child is static or runtime is defined through the list of children.

Attributes are either static or runtime and it's up to the library author to analyze the value and decide if that should be treathed as spread operation or not, when it's runtime.

The components have no id and it's up to the render handle components if these are the top-most entry point.

There are less moving parts, less hierarchy, more burden moved to libraries authors, but at least all relevant info are still there, and forwarded properties for components do not need any dance to be re-mapped as objects, as it's actually the case with the current implementation/transformer.

Would this over-simplification work or be something better as starting point to move this forward?

Thanks in advance for any hint, comment, or outcome.

another variant could be having all types as one shape:

({
  __proto__: ESXToken.prototype,
  id: null | {},
  type: STATIC_TYPE | RUNTIME_TYPE,
  name: '#fragment' | '#component' | tagName,
  value: null | Function,         // the reference class or callback for #component
  properties: null | {...props},  // optional properties forwarded just like in JSX for #component
  attributes: null | ESXToken[],  // none, one, or more {type, name, value} attributes not for #component
  children: null | ESXToken[]     // a list of children like in JSX. These can be anything
})

The peculiarity of this approach is that the token itself states if it's static or runtime, the kind can be retrieved by the name where:

  • a #fragment won't have neither properties nor attributes or value, just children and, optionally, an id, but only if it's runtime as static fragments within nodes don't get one
  • a #component won't have attributes (or should it?) but it could have an id for consistency sake, in case it's an outer component instead of an inner one. This simplifies the render logic as it can just check if previous id is same, or no id at all was present for a specific container.
  • any other element might have an id if this is an outer template, they won't have a value or properties, just details around attributes.
  • attributes, if present, can be just a list of {type, name, value} without needing to be specific token instances (desirable though)
  • children, if present, are still a list of tokens, otherwise it's impossible to distinguish <>A</> from <>{condition ? 'A' : 'B'}</>. Children can use exact same shape of attribute as {type, name, value} where the name could be #child or #content, making parsing children pretty easy out there.

This version nukes the ability to have <#thingy /> in the future but I don't think anyone would mind.

With this version there could be 2 globally defined classes, and maybe we can allow making these constructable.

class ESXToken {
  static STATIC   = 0;
  static RUNTIME  = 1;
  constructor(type:number, name:string, value:null|function) {
    this.type = type;
    this.name = name;
    this.value = value;
  }
}

class ESXNode extends ESXToken {
  constructor(
    type:number, name:string, value:null|function,
    id:null|object,
    children:null|ESXToken[],
    properties:null|object = null,
    attributes:null|ESXToken[] = null
  ) {
    super(type, name, value);
    this.id = id;
    this.children = children;
    this.properties = properties;
    this.attributes = attributes;
  }
}

The helpers for the transformers can then be something similar to this:

const component = (type, value, id, properties, ...children) =>
  new ESXNode(type, '#component', value, id, children, properties);
const element = (type, name, id, attributes, ...children) =>
  new ESXNode(type, name, null, id, children, null, attributes);
const fragment = (type, id, ...children) =>
  new ESXNode(type, '#fragment', null, id, children);

const attribute = (type, name, value) => new ESXToken(type, name, value);
const child = (type, value) => new ESXToken(type, '#content', value);

so all the details around everything is still available, the separation of concern among children or attributes is still in place, and it should be relatively straight forward for library authors to orchestrate any render being SSR, browser, cloud worker, and so on.

I am not sure I can simplify this further, otherwise precious details will be lost in the process.

Thanks a lot for those explanations! Another long response follows though.

I don't think that's much of an advantage, because the shape of the extra properties varies greatly between the types and you have to branch anyway; using different property names for different values with different meaning would help in clear communication.

Yes, that's what I was getting at. In particular, I was also taking an issue at the "wrapper token" around the list of attributes. It doesn't carry any runtime information about what it is syntactically, i.e. whether it came from properties or from children of a component, and if its value was an array, what kind of tokens to expect inside of it (attributes with a .name? Spread values without a .name? Child elements? Literal strings?). So you couldn't write a function that takes an arbitrary ESXToken instance and returns a string representation of it that would parse back to the same token tree (ignoring serialisation of functions and object identities in values).

Please don't disregard those use cases. Yes, I was thinking about general component trees, where arbitrary values need to be passed as attribute values and for children. Making STATIC_TYPE only work for strings does focus too much on the HTML use case. (Even in the DOM, numeric and boolean properties are common). Each library will want to support its own things, this is up for competition and shouldn't be limited.

Sorry, my terminology might be lacking there. What I meant is that unless one always wants to render the whole thing at once (like in SSR) or all components are stateless, there is a component tree holding component states and the library will need to do some sort of diffing.

Yes, sorry for being unclear there (@theScottyJam understood the same), I meant supporting <!-- … --> as standard comment syntax in ESX code (not just annex B stuff). They should not have any runtime representation, a <Comment /> component should be used if that is needed. It's just that this is one issue I identified with JSX developer experience, the JSX parser does not support this currently.

I understand how tagged template literals work, but I still don't get how identifying ESX tokens is supposed to work, especially when you say that no two tokens will ever have the same id that threw me off.

const tag = (idLiteral, ..._args) => idLiteral;
function createTaggedTemplate() { return tag`demo`; }
function createEsxToken() { return <demo />; }
console.assert(createTaggedTemplate() === createTaggedTemplate()); // true - immutable, unique identity
console.assert(createEsxToken() === createEsxToken()); // false I guess?
console.assert(createEsxToken().id === createEsxToken().id); // expected to be true?

This also informs whether the tokens should be immutable or not. Iff createEsxToken() returns the same (identical) <demo /> token on every invocation, that object must be immutable and frozen. It would not necessarily need an .id even, it could just be distinguished by its identity. (Or: its .id could just be a self-reference?)
However, a function createEsxToken(arg) { return <demo attr={arg} />; } cannot return the same token on every invocation, an .id to identify syntactical location would be required. The created token objects would not necessarily need to be immutable, although it still seems advisable if a library were to expect the same number and kind of nested tokens inside based on the id.

Why only give ids to the outermost token? I would just give ids to every token, and libraries can then decide whether to use only the id of the tree root or not.

I wouldn't focus too much on that. Sure, it's nice to know bitwise operations, and using a .type with enum values that can be used for bit masking enables optimisations. But in the exploratory stage of the proposal it's much more important to communicate and agree on the semantics, and I can see multiple concerns that should be separated:

  • is the token a tree root, the outermost tag of a template?
  • is it a tag token, a child token, or an attribute token?
  • if a tag, is it a raw element, a component, or a fragment?
  • is it a literal value or a dynamic expression?

These should all be separate properties imo, at least for discussion. We can still later implement these as getters, all backed by the same data property that stores a bitfield value.

I would have phrased that as React mapping JSX to HTML, which makes React related to the DOM, and the JSX syntax would be seen as general-purpose. But let's not get hung up on that.

Are you refering to an update of your gist, to your later comments in this thread, or to something else?

Thanks, this is greatly appreciated!

My personal mental model (of nothing but the syntax, no semantics denoted) is something like (expressed in TypeScript declarations):

interface Token {
  __proto__: Token.prototype
  type: enum // Tokentype
  isLiteral: boolean
  id: EsxToken | symbol
}
interface Element extends EsxToken {
  type: Token.ELEMENT
  name: string | null | object // `object` could be restricted to `Function`
  attributes: (Attribute | Interpolation)[]
  children: (string | Element | Interpolation)[]
}
interface Fragment extends Element {
  name: null
  attributes: [] // empty array
}
interface Attribute extends Token {
  type: Token.ATTRIBUTE
  name: string
  value: unknown
}
interface Interpolation extends Token {
  type: Token.DYNAMIC
  isLiteral: false
  value: unknown
}

Notes:

  • the .id could be present only on Element tokens, not on all tokens - if elements are deeply frozen and attribute.length/children.length never change on elements with the same id, attributes and child tokens could easily be identified just by position in the element
  • isLiteral denotes whether the token is static and contains no moving parts, also nested. This could also be represented as a separate bit in the type enum, i.e. having enum TokenType = ELEMENT | LITERAL_ELEMENT | ATTRIBUTE | LITERAL_ATTRIBUTE | DYNAMIC.
  • the Element.type could be split up into three, i.e. the distinction between fragments, string-named elements and component elements, not just doing that based on the .name value
  • Element.attributes and Element.children could be optional properties. A Fragment would have no attributes at all. <El></El> could be distinguished from <El /> by the presence/absence of the children array (i.e. get isSelfClosing() { return this.children === undefined; }), but that seems unnecessary.
  • attr="value" would be distinguished from attr={true && "value"} only by the .isLiteral flag, not as a separate token type. This seems sufficient for all use cases?
  • attr={"value"}, attr={42}, attr={null} (containing no impure expressions) could even be treated as a literal attribute token, which is immutable and has no side effects after all. The parser could determine this statically.
  • the Interpolation interface (and its token type enum value) could be split into AttributeInterpolation (=SpreadAttributes) and ChildInterpolation. I don't think this distinction is necessary though, the meaning is derived from the usage in either .attributes or .children.
  • I wonder about spread syntax. <El {value}>{value}</El>, <El {...value}>{...value}</El> (and combinations thereof) could both be allowed, and they could be distinguished as separate types (or as a boolean isSpread flag on the Interpolation token) or be considered indistinguishable. I'd prefer the latter for simplicity.
  • Instead of having primitive strings in the .children array, there could be a interface Literal { type: Token.Literal; isLiteral: true; value: string } type to represent these children as token objects as well. This would probably be more consistent and simplify iteration.
  • not sure about standalone attributes, e.g. <Checkbox checked />. These could be either represented as literal attributes with the value undefined, or there could be an extra token type for them that has only a name. I would distinguish them from literal child tokens though, even if they're syntactically similar.

Let me start pointing at what I was referring which is the discussion in here, not the gist.

As generic comment, it looks like I am trying to simplify the proposal while you are proposing even more classes and distinctions but I think these are mostly overhead. Both udomsay and voby have explored the JSX mapping for static VS runtime parts and the conclusion is that having a token everywhere makes little sense, same way template literal tags have templates as unique always-same identifier, none of the value and none of the chunks in the template have a unique ID and these have no usage.

const div = (
  <div>
    <p>no point in having a token for this child</p>
    <p>what matters is the ability to recognize the outer div</p>
    <p>because these are all static children that cannot be moved around</p>
  </div>
);

Following up around the unique ID:

  • it cannot be the same reference for the reason you stated
  • because symbol cannot be used as WeakMap key, a reference is preferred
  • the reference cannot be recursively the object itself because it would be different each time

In short, this is indeed the expected behavior:

console.assert(createEsxToken().id === createEsxToken().id);
// always true

If it's a row object it can also be serialized without requiring any dance, which is a plus.
The unique identifier, in short, is really like template literal tags, nothing more, nothing less:

  • outer template are interesting as unique identifiers via their .id
  • children are never interesting as unique identifier ... unnecessary bloat and heap consumption for zero proven benefit with libraries that are already using specialized JSX transformers
  • interpolations can contain unique identifier but again, nothing new to see here:
html`<div>${html`<p></p>`}</div>`

About immutability: it doesn't really matter, imho, but it might get on the way instead or make polyfills extremely slow for no real-world gain or reason.

You created a problem that doesn't exist with my proposal by deciding every node should have an id.
The parsing happens only on outer tokens or, if present, within interpolations.
From application perspective, you start always from an outer id, there's no way your render will receive a child instead unless you explicitly pass a child but again I don't see any real-world application for that so I think we should not over-engineer this proposal.

React.createElement to me makes it even more coupled with the DOM, as well as React.Fragment, but fair enough, it's a React choice, although that's how every JSX based library work: through globally defined callbacks that are 100% absent and not needed in ESX.

I see a few issues there, let me try to address these:

The Token should not have an ID because only relevant IDs are the outer templates, not their content.

In my latest proposal, I argue that only elements or fragments can have IDs, as IDs are practically fully irrelevant for Components, but surely irrelevant for Attributes, as their index in the array already works as identifier because attributes are statics, their position won't ever change in time, and same goes for children, which is also why spread operations with children makes no sense: children are always at the same position, but an interpolation can be an Array. With attributes though, we don't spread an array, we spread only properties, and yet that spread is well defined at the related index.

<div a="a" b={...spread} c={123} />

This results into 3 attributes where first one is static, second one and third are dynamic, but their position in the template will never ever change.

<div a="a" b={...spread} c={123}>
  <p />
  {anyValueButNoSpread}
  <p />
</div>

Same goes for children: they are always static, won't ever change position, but an interpolation can contain an array and let the library handle that case as it is already done in every JSX library out there.

Adding spread in the children means the last <p /> will be at unknown position and even more vDOM/diffing would be needed in that case, so it's performance and ergonomics hostile to me.

I think this happens already but the key is that those values must be preserved as they are, even if flagged as static, meaning 42 must be the number 42 as value, not the string "42", and same goes for any other value used within that dynamic-but-not-really interpolation. This might cause confusion between static literal and dynamic-but-static literal, imho.

This is awkward to work with as component because properties cannot be forwarded.

When attributes are used in a component zero, one, or more might end up being used in the resulting element/fragment, if any is returned, but have no practical use, as attributes, within the component.

Components are indeed just a way to forward maybe attributes to the node they produce or to confine reactivity / effects within these. If a component has attributes, its name invocation as function will be both awkward and completely misaligned with the current de-facto standard around JSX components, where the signature is function Component({props, as, fields}, ...children) {}.

If ESX needs to map attributes back to props, also loosing inevitably all details around their nature, to simplify migration, I think there won't be great performance, and the heap will be also bloated for no concrete gain / reason.

It's also weird to have a name field that points to a function which also has a name field in it and it's not a string, so I would personally not suggest it.

Putting all things together, after all these considerations, what do you think about this representation?

interface Token {
  __proto__: Token.prototype
  type: Token.DYNAMIC | Token.STATIC
}

interface Attribute extends Token {
  name: string
  value: unknown
}

interface Interpolation extends Token {
  type: Token.DYNAMIC
  name = '#interpolation'
  value: unknown
}

interface Template extends Token {
  id: null | object | symbol
  children: (string | Element | Fragment | Component | Interpolation)[]
}

interface Fragment extends Template {
  name = '#fragment'
}

interface Element extends Template {
  name: string
  attributes: (Attribute | Interpolation)[]
}

interface Component extends Template {
  name = '#component'
  value: function
  properties: null | object
}

This would represent all cases and solve everything already solved by the current ESX transformer, but basically it is the current transformer with the following differences:

  • the template wrap becomes unnecessary
  • a child can be a string instead of being always a token that carries eventually the string
  • components get properties forwarded as these are

I am not fully sure components should also get attributes with all details per each property but if anyone thinks that's the case, then the following would solve them all, and properties can still be manually forwarded as object without loosing details around these.

interface Component extends Element {
  name = '#component'
  value: function
}

The attribute as non-spread interpolation is imho not a case to consider, I mean:

<div {123} />

what does that mean?

This makes sense:

<div test />

it produces a static attribute with name test and value as boolean true, but without spreading, hence ensuring there is an object, I wouldn't know if the previous <div {123} /> case should be considered as attribute ... to me that should be a syntax error instead.