Class method keyword vs identifier parsing

Should this parse or not? TS says yes. Babel says no. Now I'm in a pickle...

class Foo {
  static ge\u{73} foo () {}
}

Both parsers agree that this is OK:

class Foo {
  sta\u{73}ic get foo () {}
}

I presume that TS is more correct than Babel. If I were to guess at the reasoning it's that both "words" could prove to be either keywords or identifiers in a left to right reading:

Here's the opposite example where each of the the words that turned out to be keywords in the last example turn out to be identifiers in this one:

class Foo {
  static get = 'foo';

  // syntax highlighter is wrong here
  static = 'oof';
}

On the other hand the TS parser allows:

foo bar;

which it interprets as two expression statements on one line!?

For fun I tested the same behaviors in OXC and Rust. They're both permissive about parsing escapes in keywords, but then they throw errors that say that escapes in keywords are not permitted.

See: Playground | Biome

This is not the venue for questions about TypeScript or Babel, but the ECMA-262 lexical grammar specifies equivalence between Unicode escape sequences in |IdentifierName| tokens and the code points they represent and also requires literal terminal symbols to be free of such escapes (which specifically covers keywords).

So per |ClassElement| and |MethodDefinition| respectively, neither class Foo { sta\u{74}ic get foo () {} } nor class Foo { static ge\u{74} foo () {} } are valid class declarations (and note the correction of \u{73} to \u{74} for escaping t), and AFAICT all ECMA-262 implementations conform with the specification in rejecting such input.

OK, that makes sense, and the relevant parts of the spec are much appreciated.

It makes parsing even harder though! You basically have to let identifiers stand in as keywords and then go back and throw away any results where they did...

I think you're being a little imprecise in your use of "parsing", and might benefit from re-reading Syntactic and Lexical Grammars. But to summarize, the lexical grammar is defined to convert a sequence of code points into a sequence of "input elements" (each of which is an |IdentifierName| if and only if it starts with $, _, a Unicode code point with property "ID_Start", or a \u… escape sequence), and the syntactic grammar is defined to convert a sequence of input elements into a parse tree. Some productions of the syntactic grammar reference keywords, which are |IdentifierName| tokens with specific literal contents that never include Unicode escape sequences.

As a result, source text like class Foo { sta\u{74}tic get foo () {} } lexes fine but must fail to parse and there's no need to revisit anything. Similarly, source text like let els\u{65}; must also lex but fail to parse, although because of an early error rather than because of input elements that cannot be fully consumed by syntactic grammar productions (and note that implementations are specifically allowed to interleave early error detection with parsing).

There are some cases where the specification is worded to "go back" and reïnterpret an already-constructed parse tree, and those supplemental grammars are specifically identified and make use of "must cover" processing, but none of those cases relate to escape sequences in |IdentifierName| tokens.

The difficulty in parsing ECMA-262 largely stems not from simple restrictions that allow get where ge\u{74} is disallowed, but rather from interdependence between the lexical and syntactic grammars such that lexing output is sensitive to the syntactic grammar context and thus cannot be an independent process (e.g., inside the process of constructing a parse tree, conversion of not-yet-consumed source text like /2/i into input element sequence [|RegularExpressionLiteral| /2/i] vs. [|DivPunctuator| /, |NumericLiteral| 2, |DivPunctuator| /, |IdentifierName| i] depends upon which syntactic grammar production(s) can potentially match).