Background
I recently did an analysis of every proposal that has been submitted to TC39, because I was hoping to find commonalities about what types of proposals make it into the language vs which proposals get rejected (or, simply, do not advance). I categorized each proposal as semantic or syntactic; see the linked issue for my definitions, but in short: semantic proposals change the behavior of the language, and syntactic proposals change the behavior of the parser.
The vast majority of TC39 proposals are semantic in nature. Of the 59 proposals that have been accepted into ECMA262, 53 of them or 90% were semantic proposals; if you include all proposals that have made it to stage 2 or later, that ratio increases to 93% semantic. Only 6 of the 101 proposals that made it to stage 2 were purely syntactic in nature, and all of them have been implemented as of 2023.[1]
There are 25 other pure-syntax proposals listed in the repository. This makes pure-syntax proposals by far the least-accepted class of proposals with a 19% acceptance rate, compared with 41% for purely-semantic proposals and 47% for semantic proposals that require syntax changes. The conclusion is as self-evident as it is sensible:
TC39 does not like to make syntax-only changes to ECMAScript.
This is obvious, on consideration. The design space for any given programming language is finite. Making any change in the language means taking up syntactic, semantic, and/or conceptual real estate that can't be used for something else in the future - even more so because of the need to maintain compatibility. Put another way: once ECMAScript runs out of SyntaxErrors, once every syntax has a meaning, that's it. You can't make any more syntax changes in the language, ever.
And yet, syntax-level changes are still useful, and there is certainly a demand for them. About 13% of all proposals brought before TC39 have been purely syntactic in nature; if you include all proposals that have a syntactic element, that number climbs to about 46%. From a rough estimate, on this Discourse around 120-150% of all proposals are for a purely syntactic change to ECMAScript - one that can be represented solely at the level of the AST.
What's more, ECMAScript implementations have already implemented vendor-specific syntax extensions, fracturing the ecosystem. Deno runs a preprocessor on Typescript code before parsing it. Console implementations treated an initial hashbang as a comment, which was an extension until 2023. Many, many open-source projects written for ECMAScript are not written in ECMAScript. The abillity to interoperate with these software projects depends on the existence of transpilers whose operation is outside the scope of TC39; thus, ECMAScript on its own is unable to accurately describe the relationship of two pieces of software written in different "ECMAScript-compatible" languages. This occurs even within a single project; only the build scripts know that a ".js"
file should be parsed differently than a ".jsx"
file.
Proposal
This proposal describes a standardized mechanism to query parser capabilities as well as to inform the parser what mode it should use to interpret a given source text. This can be done either by the requester (import a given file as Typescript, for example), by the source text itself (declare that the current file is written in Typescript), or by out-of-band descriptors referencing ECMAScript-defined conversions (import maps, HTML tags, HTTP headers, file extensions, etc). It also describes a syntax that can be used to concretely and affirmatively describe a given source text transformation that can be used to validate the functionality of such a transforming parser, or to polyfill such functionality if it is missing.
There are a number of other proposals which also deal with syntax-level changes in ECMAScript, including some designed around accepting a different type of input to the parser:
- GitHub - tc39/proposal-json-modules: Proposal to import JSON files as modules (stage 3)
- GitHub - tc39/proposal-binary-ast: Binary AST proposal for ECMAScript
- GitHub - tc39/proposal-type-annotations: ECMAScript proposal for type syntax that is erased - Stage 1
with others describing partial syntax changes that could be accomodated with an AST transformation:
- GitHub - tc39/proposal-pipeline-operator: A proposal for adding a useful pipe operator to JavaScript.
- GitHub - tc39/proposal-export-default-from: Proposal to add `export v from "mod";` to ECMAScript.
- GitHub - tc39/proposal-object-pick-or-omit: Ergonomic Dynamic Object Restructuring.
- GitHub - tc39/proposal-negated-in-instanceof: A proposal to introduce negated in and instanceof operators to JavaScript
Each of these are already implemented, somewhere, perhaps under an experimental tag, perhaps in a transpiler. With no way to describe these transformations, this leads to some engines being unable to parse code that would otherwise be valid ECMAScript. Once the transformations are defined, however, they can be used to polyfill older implementations or to prototype new features.
Note that, depending on engine support, this mechanism could be used to inject code in previously untamperable locations, as there is no detectable difference between implementing a pipe operator or changing a URL. There are two mitigations to this:
- There are only two contexts that can affect the parser: the importing module (using an Import Attribute) and the source text itself (using a syntax to be defined herein). An attacker who can access the source text to add a parser declaration can already add whatever code they need, and the importing module has the choice to import any URL they like. Furthermore:
- Engines are not required to support polyfilling parser implementations, and they are not required to respect cross-origin import attributes (they are allowed to pretend they can't perform a given kind of conversion). This notation, when used, is expected to be consumed by whatever build process is used for these files (for typical bundler/minifier implementations) or used to control an implementation-specific parser feature like language support.
Should I continue?
Trailing commas, Lifting template literal restriction, Optional catch binding, JSON superset, Numeric Separators, and Hashbang Grammar. âŠī¸