Lexical function declaration

dmchurch · April 26, 2024, 9:07pm

Currently, all function and class declarations are made with var scope - that is to say, they can be accessed above or below their definition^[1]. This, along with var itself, means that a parser can't know what an unknown identifier refers to until it has finished parsing the entire file.

Well-written modern ECMAScript code generally eschews the use of var, for this and other reasons, but the only alternative to var-scoped function declarations is to use function expressions, instead:

function varScopedFunction() {}
let letScopedFunction = function() {}
const constScopedFunction = function() {}

This is a programmer-unfriendly syntax, and (perhaps largely because of this) it isn't often used. How about, instead, a syntax that allows specifying the binding type as part of the declaration:

let function letScopedFunction(); // forward declaration
let function letScopedFunction() {}
const function constScopedFunction() {}

let class letScopedClass;
let class letScopedClass {}
const class constScopedClass {}

While classes can't be used before their declaration, they can be referenced in earlier parts of the file, as long as those references aren't immediately-evaluated. The let and const forms of the class (and function) declarations, on the other hand, cannot be referenced in earlier parts of the source file, which is why the forward-declared forms (with no function/class body or class heritage clause, and ending with a semicolon) exist. There is no forward-declared form of the const declaration, as there is no forward-declared form of const variables, either.

These syntaxes offer only slight utility to the developer (it allows for declaring functions/classes that can't be reassigned, without using a function expression assigned to a const variable), but it offers a significant advantage to a parser: these forms, like let and const variable declarations themselves, can be differentiated from classic var-scoped function declarations as early as scan time, even before parse time. This opens the door to a possible fast-path parsing mechanism that, like asm.js, utilizes a syntactically-restricted form of ECMAScript. It might be declared as follows:

// As part of directive prologue, one per statement according to spec
"use_strict"; // should be first in case an engine doesn't follow spec
"use_forward_declarations";
// As a Parser Augmentation directive, multiple per statement, order-agnostic
syntax "strict", "forward_declarations";
// As a PA directive, explicitly marking "forward_declarations" as optional
syntax "strict", from ["forward_declarations", null];

An engine might see the forward_declarations directive and switch to a parse mode that makes early judgments about identifiers found in the source text, speeding up parsing considerably; if it then encounters a var declaration or a var-scoped function or class, or if it encounters an import statement anywhere after the module prologue, it could throw a SyntaxError or perhaps restart the parse in non-forward-declarations mode. This can't be done today partly because (a) there is currently no syntax for forward declarations of function or class bindings, and (b) the amount of work the parser would have to do to verify that a forward_declarations directive is accurate would negate any benefits from being able to do one-shot parsing.

Even if programmers don't use these forms directly, a smart transpiler could convert bare function and class declarations to their let-scoped variants when set to a target environment with a high-enough language version to support them, inserting forward declarations wherever necessary. In this way, it would provide a middle ground between existing ECMAScript (requires out-of-order parsing) and a full-on Binary AST implementation (cannot be parsed at all except by engines with explicit BinAST support).

A class cannot be accessed before its definition, temporally speaking, but it can be accessed above its definition, lexically speaking. ↩︎

ljharb · April 27, 2024, 12:21am

The term for this is "hoisting" (i've never heard "forward declaration" before). I don't think we'd want any new pragmas for anything, so it'd have to be a modifier on the function declaration syntax.

That said, as much as i dislike hoisting, there's eslint rules to prevent relying on it, and function expressions assigned to a const work fine, so i'm not sure it's worth new syntax.

bergus · April 27, 2024, 12:43am

No, function and class declarations are lexical declarations, they work just like let. Only in the global scope, and in sloppy mode, they behave like var.

That would be inconsistent with how let and const work, so I'd rather suggest a different keyword to avoid confusion.

Would this really be significant? Is this currently a bottleneck?
I'd rather avoid introducing new syntax that's hardly useful for developers. If you're really concerned about performance, why make do with a "middle ground" that requires extra effort to implement and not just go the full way to Binary AST?

dmchurch · April 27, 2024, 1:54am

Haha, hoisting is kinda the opposite - I considered calling it "no_hoisting_required" or something but I couldn't come up with anything that sounded good. In any case TC39 doesn't have to worry about that, or at least certainly not right this moment - like with "use_asm", engines are free to invent their own prologue directives, and I wouldn't want to make a try at standardizing it unless and until engines start actually using it.

Also, I'm afraid you're outing yourself as a non-C/C++ developer here the term "forward declaration" comes from languages that do require that the declaration appear in the source text prior to the first usage. It refers to declaring just the name of a thing prior to defining it, so the parser knows what kind of a thing it is. If you wanted to define two structs that point to each other, you'd have to forward-declare one of them:

struct bar;

struct foo {
  struct bar* ptr_to_bar;
};

struct bar {
  struct foo* ptr_to_foo;
};

It's worth noting that this isn't a new language mode like defined in ECMA-262 11.2.2 - none of the semantics of the language change when it's present, unlike with "use strict". It's more like the "use asm" of asm.js that way - engines can use it as an optimization hint if they want, but they aren't forced to. The only difference is that "use asm" is an optional runtime optimization, while "use_forward_declarations" would be an optional parse-time optimization. They could even be used in conjunction, for a fast-parsing file that self-compiles to optimized code.

And while it's true that this wouldn't technically require new syntax, it wouldn't be something transpilers could safely do on their own, as the semantics of "function/class declared as X" and "anonymous function/class assigned to X" aren't the same, and they can't be safely translated into each other. It would require special effort from all developers responsible for any of the code not only in that project but also in all its dependencies as well in order to make it specifically compatible with this hypothetical "forward declarations" mode - which means that, in practice, it won't happen. The only way this can be safely done to all code is if there is some sort of forward-declaration syntax that allows the actual function/class to be declared later as a declared function/class.

I'm specifically referring here to how they are visible both above and below their declaration, like a var is. In fact they have even stronger reach than a var, since a function's entire definition is hoisted to the top of its block, while a var is undefined when referenced before its assignment.

In what way? A variable declared with let or `const' can't be referenced until after it's out of the TDZ, i.e. below that point in the file.

Yes and yes.

The Binary AST proposal (linked above) hasn't made it out of Stage 1. Standardizing a new on-the-wire format and requiring that it be syntax-compatible with not only today's ECMAScript but extensible to every ECMAScript syntax that will ever be made in the future is a difficult question that hasn't been answered yet.

That's why I'm suggesting a syntax addition that will make it possible for parsers to be able to get many of the benefits of a pre-parsed AST without having to redefine the entire syntax of the ECMAScript language in an AST form.

Topic		Replies	Views
New mode, which abolishes "let" and "var" words 💡 Ideas	3	453	March 10, 2021
Labelled Scoping 💡 Ideas	11	271	August 14, 2021
Draft for a proposal of an Explicit Reference Syntax 🦋 Proposals proposal	12	747	December 5, 2023
Why can't functions expose their inner declarations and act like a class? I have questions	3	378	February 20, 2021
If statement expression alias 💡 Ideas proposal	3	499	August 6, 2020

Lexical function declaration

Related topics