Is the API surface of the spec machine readable?

Over in Test for javascript.builtins · Issue #1056 · foolip/mdn-bcd-collector · GitHub I'm trying to figure out if there's already a data source for all of the API surface defined by Ecma262. By this I mean something which could be turned into lists of:

  • Constructors on the global object, like Array
  • What static methods are on that constructor, like Array.isArray
  • What methods are on the prototype, like Array.prototype.forEach
  • What properties are only on instances, like length

Based on this information, it would be possible to generate feature detection tests and cross-check the data in MDN's browser-compat-data project.

I think much of it could be inferred from just the spec's headings, but if the work of turning the spec into a dataset like that has already been done, that would be nice to take a look at.

I'm not aware of any existing efforts, I'm afraid. Perhaps someone else might know of one though.

I would hope it would be relatively straightforward to parse out of the specification (at least ignoring Annex B). For example, I think you can get all the non-Annex B properties of the global object by for searching for is the initial value of the \S+ property of the global object, you can get the static properties of Array by looking at the subclauses of 23.1.2 Properties of the Array Constructor, you can get the prototype properties of Array by looking at the subclauses of 23.1.3 Properties of the Array Prototype Object, etc.

Another approach could be to use the TypeScript definitions

TypeScript provides an API for walking the AST

Thank bakkot, I've made an initial attempt at parsing only the TOC and seeing how much can be inferred from that. Here's a snippet of what my script logged:

The Global Object
Value Properties of the Global Object
Function Properties of the Global Object
eval ( x )
PerformEval ( x, callerRealm, strictCaller, direct )
HostEnsureCanCompileStrings ( callerRealm, calleeRealm )
EvalDeclarationInstantiation ( body, varEnv, lexEnv, strict )
isFinite ( number )
isNaN ( number )
parseFloat ( string )
parseInt ( string, radix )
URI Handling Functions
URI Syntax and Semantics
Encode ( string, unescapedSet )
Decode ( string, reservedSet )
decodeURI ( encodedURI )
decodeURIComponent ( encodedURIComponent )
encodeURI ( uri )
encodeURIComponent ( uriComponent )

It looks like this will not suffice, because abstract operations and look similar. Other than the capitalization, nothing seems to distinguish "Encode ( string, unescapedSet )" (abstract operation) from "encodeURI ( uri )" (function property of the global object).

It looks like parsing the actual prose of the spec would be necessary, as you suggest. Alternatively, heuristics could be used and the output still needs to be modified by hand to fix errors.

aclaymore, do you know TypeScript definitions are maintained? It looks to me like not much of Array can be found in TypeScript/lib.es2019.array.d.ts at ec77bff33226fb01f4e38b20e481f8c1fcd9e6c0 · microsoft/TypeScript · GitHub alone, as far as I can tell nothing of Array.prototype.* is in there.

Oops, the indentation of my script output wasn't preserved. The full indented output can be seen at scraped-sections.txt · GitHub.

The TypeScript definitions are broken up so it's possible to only target particular ecma-262 versions. To get all the Array methods you'll need to look at all the files and combine all the Array interfaces together.

Initial starting interface for Array methods: TypeScript/lib.es5.d.ts at ec77bff33226fb01f4e38b20e481f8c1fcd9e6c0 · microsoft/TypeScript · GitHub

You can distinguish an abstract operation from a built-in by the presence of an aoid attribute in the emu-clause start tag.

Although that may change in the near future.