Improve sub-lange support for template literals

A discussion around the shortcomings of template tags was started in this thread, which attempts to propose a native JSX syntax, and argues that template tags aren't good enough for this use case. In my mind, this is one major reason why template tags exist - to enable userland to basically embed languages within JavaScript for various reasons. The fact that we're not feeling that template tags are "good enough", and we need to create custom syntax for JSX tell me that there's some improvement that needs to be done in regards to template tags.

I'd like to move discussion around improving template tags to a separate thread, so that thread can just focus on native JSX syntax.

The proposal(s)

(Everything here is copy-pasted from a comment I left in the JSX thread, with only slight modifications).

Template tags allow us to effectively embed sublanguages within JavaScript, which is awesome! It helps to enable userland to build all sorts of helpful tooling, not just related to XML/HMTML in JavaScript, but also CSS-in-JavaScript, or I've personally used it to allow you to provide TypeScript like syntax in a template tag, then use that for runtime type validation.

But, there are a couple of major issues with them:

  • Syntax highlighting and editor support is difficult within template tags, especially userland defined ones. This is because it can be difficult to statically analyze what tag you're using, and because there's really no way for a userland library to bundle editor tooling with their library.
  • You basically have to bundle an entire language parser with any library that supports this kind of thing, which isn't lightweight. It makes it hard to want to use these kinds of libraries when you know it's adding a fair size to your website's bundle.

This means there's three major hurdles we'll have to overcome if we really want template tags to reach their full potential. (not all of which can be solved by TC39).

How can we associate a template tag with certain editor features, like syntax highlighting, snippets, autocomplete, etc?

One option would be to somehow register metadata with a template tag. This could be done in a special comment before the template tag function, that points to, say, a metadata JSON file, like this:

// @@TemplateTagEditorMetadata: ../editor-metadata.json
function jsx(...) { ... }

This metadata file could contain recommended plugins to install (maybe even containing different sets of plugins depending on which editor you're using). Or, it could declaratively contain instructions on how to syntax highlight content that this tag is tagging. Maybe it can also contain useful snippets, or some basic auto-completion instructions. Since the metadata json file is entirely declarative, it should be safe for an editor to automatically follow these instructions.

2. How can we make template tags more statically analyzable?

One option would be to do nothing. If a template tag is exported in one place and imported and used in another, it should be easy enough for an editor to make an attempt at following the import. Maybe by using a @@TemplateTagEditorMetadata comment, you're consenting that you're not going to try to do anything funky with the template tag that would make it hard for an editor to follow a path backwards to the declaration. (Users using the tag would likewise have to not do funcky tricks with the tag, like putting it on the global scope, storing it in another object, etc).

3. Is there a way to reduce the weight of libraries that provide these sorts of template tags?

Here, I'm envisioning that a library could tell your bundling tool how it could go about optimizing template tags.

One possible example would be, perhaps, another metadata comment before your template tag definitions:

// @@TemplateTagCompiler: ../jsx-compiler.js
function jsx(templateTagParts, ...interpolatedValues) { ... }

The file the comment points to can export a function like this:

export function compile(templateTagParts) {
  return <json serialiable data>;

When a transpiler runs, whenever it runs into a user of a jsx template tag, it'll pass all of the string pieces of that template tag into the compile() function. The compile function will parse that template tag, look for syntax or semantic errors, and then output some piece of data (e.g. an AST tree). The transpiler can take this final piece of data and embed it inside the the original template tag function, so that the final tag definition would look something like this:

function jsx(templateTagParts, ...interpolatedValues) {
  // The transpiler auto-inserts a line that looks something like this.
  templateTagParts.dataFromCompileStep = { the json serialiable data };
  // Everything else is the same. The rest of this function
  // can access and use `templateTagParts.dataFromCompileStep`
  // if it's there. If not, then the transpiler hasn't run, and it's this
  // function's job to compile the passed-in templateTagParts itself.

What I'm proposing here is two separate features. One to somehow standardize editor tooling support for third-party libraries, and one to somehow standardize tooling support to allow expensive and heavy parsing logic to be done once on a developer's machine. And, again, I acknowledge that it's probably outside of TC39's realm to handle this sort of thing. But, this would be my dream path of accomplishing the original feature request.

It is outside of TC39's realm, but it's in mine! : )

none of these improvements fix anything around superior DX provided by ESX, the fact every ESX (or JSX) is standard syntax and no shenanigans are allowed in between the template static chunks, plus the fact strings are not scope aware, no components first class and so on ... no syntax errors out of the box neither, no well defined usage and correctness, strings are strings, so if templates literal tags were born to fix lack of tree definition in the language, the premises were odd to start with ... but I am sure that wasn't the case. SQL, CSS, any other string based language finds a great place in template literal tags, and I've used it for everything to date too, so I know what I am talking about ... but tree representation through a well defined syntax that's portable? Nowhere in the language, and strings can't bring that in neither.

Any special comment that might be ignored by IDEs is not at pair with syntax, and if these info don't provide validation out of the box, it's game over for DX. <div attr=${value}" /> ... see that missing quote? see that self closing tag? nothing template literals are good at fixing or providing as info, while ESX would throw errors right away for malformed code, and will give all details around that explicit self closing tag.

In short, I think template literal tags shipped just fine and we should still use these for anything template literal tags are good at ... but tree syntax? they pretty much suck at that, by all mean, and they have no notion about surrounding scope, so they can't provide any first-class utility/component out of the box.

The chart speaks clearly here.

It all works fine if you just turn the world on its head!

You see the current world in which tools can't help you avoid those problems because tools can't even understand the basic things that people understand when they read code. If people stop using tools that aren't even fully predicated on the idea that what they are editing is a program I think we'll see that tagged template strings can do everything that ESX can with just as much safety.

I've basically invented template literal tags based DOM solutions (hyperHTML) and explored the field since ... tools, no tools (the premises around standard VS tooling) and all I read here is that developers should use more tools to obtain what JSX does, which is a tools based DSL ... if any developer needs to install toolchains to deal with non standard template tags literal definitions, they'll say no thanks and use a well defined de-facto standard which is JSX syntax ... you mention tools for a single purpose? then go full tooling and forget the myth standards based solutions require no tools out of the box.

I really don't understand any of these conversations but it's a lost cause if anyone believe tooling is the answer, as tooling indeed has been the answer, and the answer out there is JSX, not template literal tags. Every poll and survey shows that, so I am not sure it's me turning the world on its head or "you".

I'm not creating new tools to declare things. HyperHTML looks awesome, and is exactly kind of thing I think people might use more if tools were better at understanding template tags. But as you mention, there are no current tools that can make any sense of a pattern like that -- at least not out of the box.

Think of it this way. You can install a syntax highlighter plugin that understands hyperHTML well enough, but you have two problems.

First: someone has to install the plugin, so it won't work out of the box and it doesn't help anyone who doesn't already know that they need to install the plugin. To fix this we'd need to be able to see a template tag and understand some metadata about it, particularly the grammar embedded inside the template. To the extent that we have this kind of metadata currently it's siloed inside tools because there's just no place to put it.

Second: Syntax highlighting is nice, but it provides no real guarantees. What you really want is to demand that the code you write be parsable, even inside a template tag. Currently no tool could possibly tell you that about the contents of a template tag, because the contents of a template tag are a template -- they have pieces missing! That's OK though because the tools i'm building are template-native -- they are perfectly content to work with programs parts of which are missing.

I've answered relevant questions to this conversation in here:

I also believe tools are not the answer for the simple reason there's no standard around how HTML or SVG or even SQL can be written in template literal tags ... anyone can bring in its own sugar, like @click in lit-html did, while ESX is about standardizing a syntax everyone can understand.

The root issue about not being ble to just add your tool to the equation is this: there is no standard in how people can write strings, and the fact they can put anything in a string is actually the strength of template literals tag based solutions, but it falls shortly in terms of advantages compared to the de-facto JSX standard. ESX wants to bring both world into syntax anyone can consume and understand, same way template literals tags do already, but every library in its own special way. Fragmentation is bad for standards adoptions, we all now that, which is why I don't understand this conversation, as it's focused on highlight and tools, while the issue is the inability to have a standard way to define, and understand, tree structures in ECMAScript, and literally nothing else.

I think the point is that anything besides a plain text editor is a tool.

A plain text editor does not understand JS or C++ or whatever language, it's just a file (in these cases likely a large utf-8 encoded string).

An IDE will understand the syntax of the file, be able to parse the string, and provide highlighting, code completion etc.

The suggestion is to have these existing developer tools be better at recognizing nested languages, in this case as template literal strings inside JavaScript. In the case of ESX, if the syntax is standardized, the tools could all provide support for it in a consistent way.

The fact they're nested inside a tagged template literal does not change anything to the parsing abilities of the developer tools. It only impacts their ability to recognize the nested standardized language.

That ability is the bulk of the utility - it’s why shared coordination points in the language/platform are so critical.

@WebReflection Again I completely agree with you that all syntaxes are essentially trees of closely related grammars. I just don't think fragmentation is so bad. We live with it already, and I think there are plenty of reasons that we always will. Since I deem it impossible to eliminate fragmentation, I've decided to completely embrace it. I gain a lot of safety by using CST structures to store code -- I always know both what the concrete syntax was and what the parser understood it to mean. This lets me do things like warn the user if a parser upgrade changes the meaning of existing code.

folks, I have personally no interest in esx as template tag, because it won't solve other problems I've already listed. I also have no interest in proposing more tools to developers because they are already drowning with tools. ESX could be a reality but if people are like "we have already arrays, what's the goal of proposing statically typed arrays?" I am respectfuly out of the conversation, as it's clear we have different experience in the field. I've pushed dozen (literally) libraries around template literals tags based solutions, and I am afraid there might be little interest for me to iterate on every place these fail compared to well defined and standardized syntax.
I'll keep an eye on this thread, but please don't expect much interaction from my side at this point, thanks.

I probably didn't communicate it clearly, but I didn't really intend for this thread to be about using template tags instead of ESX specifically (It seems that conversation ran its course here anyways), rather, it's more intended to be about how we can improve sublanguage parsing inside of template tags, ESX/JSX-as-template-tags being just one possible use-case for this sort of thing. There's technically nothing stopping the template-tag-metadata feature-request of having template tag metadata from living alongside the ESX proposal (I'm personally unsure if ESX offers enough over a standardized template tag, or userland template tags with metadata, but that's besides the point). Beefing up template tag metadata support is something I personally would like to have for other projects as well.

As I briefly mentioned, I've been putting together a library that understands TypeScript-like syntax, and can be used to validate runtime data. Like this:

function doThing(arg) {
  const validateThisThing = validator`{ x: number, y: ${Date} }`;

doThing({ x: 2, y: 'not a date instance' }); // Error! You gave it a bad argument!

There's a couple of problems with my library though.

  1. It's not a tiny library. I'm including an entire parser in there in order to parse those template strings, and that's not lightweight or fast. I've added some caching to help with the performance issues (if the validator template tag is called with the same string array multiple times, it'll lookup a cached AST-like tree instead of re-parsing it). This helps with performance, but doesn't help with the fact that this library is unsuitable for many front-end webpages due to its size. If there were just a way to pre-compile the template tags...
  2. There's no editor tooling support for anything found inside the template tag. No syntax highlighting, auto-complete, I can't use ctrl-slash to make a comment, even though my sublanguage does recognize // ... and /* ... */ comments, etc.

This is a userland library, it's not anything that should ever go on the standards track. At the same time, it would be awesome if there were some way I could provide metadata to editors and bundlers about this tag, to automatically enable editor tooling and optimizations. Right now, I really don't have any good options to do this, which severely limits my target audience to server code, or those who are ok with slightly heavier front-end pages.

I've also used CSS-in-JS tools in the past, and have loved them. But, they also suffer from similar issues where there's both performance problems with how their template tags work, and there's no easy way for them to provide good editor tooling within the sublanguage. Which is such a shame.

Perhaps I'll focus for now on the editor tooling use-case as that seems like an easier problem to tackle. I'll do it with a primary focus on syntax highlighting, while leaving room to expand to other editor tooling.

I do like the idea of leaving a // @@TemplateTagEditorMetadata comment before a function tag, as a way to link the template tag to editor-tooling metadata. Another route we could take that could perhaps be even better, would be to add a new directive (or whatever they're called) to js-doc comments, like this:

 * @templateTagEditorMetadata ../editor-metadata.json
function myTemplateTag(...) { ... }

I noticed that they have a github repo over here, so we could always drum up a conversation there to see what happens.

As for how we should actually codify syntax highlighting, one solution that seems to already be widely supported by many editors is the textmate grammar files (documented here). These grammars can be codified in a custom key-value grammar file they support, or in JSON (I've done it in JSON for vs-code in the past). What's nice with these grammars is that you can teach the editor how to do, for example, a line or block comment, thus teaching the editor what to do if you use a "comment-this-line" hotkey.

This seems like a good choice to use, simply because many editors already understand these kinds of grammar files, so it shouldn't be too difficult to adapt them to this new features. That being said, I don't think this kind of grammar should necessarily be the only kind that ever gets supported. Atom, for example, supports textmate grammars, but only as a legacy feature. They prefer you to use a "tree-sitter grammar" as explained here, because it apparently provides better performance and provides extra information to help enable additional editor features. (Goodby Atom, old pal :disappointed_relieved:).

With this in mind, the "editor metadata JSON file" could be a JSON file that simply contains key-value pairs of different types of metadata you'd like to add, textmate grammars being just one possible type of metadata. Other kinds of grammars, snippets, etc, could all be added under different keys to this grammar file, like this:

  "textMateGrammar": { ... },
  "treeSitterGrammar": { ... },
  "someAwesomeSnippets": [ ... ]

For best editor support, a programmer should at least add a "textMateGrammar" key-value pair to this file, but it's technically not required. Anyone in the community can choose to make plugins/editors that understand other keys inside this file and try to encourage the community to use their standards. It'll be a bit of a mess, since there's no one governing what can and can not go in this file, but I think an open environment like that makes for the quickest innovation in this space. (Hopefully, people try to choose unique names for these keys, like "textMateGrammar", instead of something that will easily conflict, like "snippets" or "grammar". I guess if we really wanted to, we could use ugly OID numbers or company URLs as keys, but those systems aren't very pretty).

I note that Prettier already recognizes a few common template literal names so that it can format their contents - for example. This is a little bit fragile (and I've suggested improvements, which will probably happen someday...), but basically works fine. There is no particular reason editors could not already do the same.

Realistically I don't think you're ever going to get cross-editor support - there's just too much variation in what functionality editors provide. (It's definitely not enough to just give a grammar; when editing any language with a modern editor you really expect to get functionality like renaming of variables, not mere syntax highlighting.) So I don't think there's that much point in trying to standardize syntax for "what's in this template tag". It's always going to need to be functionality provided by the editor itself, either built-in or via a plugin.

In any case, any coordination between editors would need to be provided by the editors themselves, and probably a common format for plugins would be the most straightforward solution, rather than trying to figure out a coordination point for one part of the syntax of one language. The language server protocol is already a major step in that direction.

I disagree wholeheartedly! YOU can rename a variable in a program when you know the language syntax can't you?

... No? You need to be aware of the scoping rules to do renaming, which aren't entailed by the bare syntax. (Consider Python, for example.)

Though, going back to the example of the library I'm putting together, who's job will it be to create these editor plugins to provide syntax highlighting and other tooling for this library? At the moment, it would be mine. Eventually, if the library gets a big enough user-base, it's possible the community would help out in this regard, but for now, I'd have to be the one to do it if I want such tooling to exist.

Now, you're absolutely right that there really isn't any sort of standard way to create editor tooling. I can't just create a single language plugin that's portable and works with all editors (unfortunately). Which means I'd have to make a separate plugin for each editor I want to support. And then, people will have to actually install the plugin to get the extra support, which most people will likely not do (You'd only install a plugin if you're often working in a project that's often using that library. This means, most of the time you get no syntax highlighting and no other tooling support for template tag content).

I don't think now is the time to try and put together a standard for editor plugins either. (I'm hoping that something will eventually arise, just like how browser extensions used to be all-powerful, and break with every release, but eventually, a semi-standard API was put together, which some browsers now support, that made extensions a little more portable and less fragile).

So, circling back to me as a library author, what if, instead of making a slew of different libraries that people have to choose to install, I instead made a slew of different entries in the metadata file, i.e. something like this:

  "vscode": { ... },
  "intellij": { ... },

The vs-code editor can look for stuff it's capable of understanding under the "vscode" key of this config file. The "intellij" editor can look for its stuff under that entry. For me, as a library author, I'd have to provide this information in a format that each editor is capable of understanding, which is super annoying, but it's no different from what I already have to do. And, hopefully, as time goes on, some stuff can be added that can be understood by multiple editors. Like, it doesn't sound unreasonable to me to at least have a "textMateGrammar" key in there that multiple editors can be programmed to understand, since most editors are capable of understanding textMate grammars anyways. But, even this isn't necessarily standardized - an editor can exist that understands other information in this metadata config file, but doesn't understand "textMateGrammar". And, this still doesn't stop me from also providing editor plugins, if I want to add features that can't be expressed in this config file.

So, the goal isn't to try and standardize a way of providing tooling in a metadata format, I don't think that's something that can really be done at this moment, and that could hinder future innovation. Rather, the goal is to simply provide a flexiable space to provide metadata that doesn't have too many restrictions, and then different editors and libraries can use that space however they see fit. And, even if, to start with, the only kinds of things that that editors are able to understand from this metadata file is syntax highlighting, and you have to install a plugin to get anything else done, I'd still consider that a win.

Perhaps, I'll spell out a rough action plan for how I see that this proposal could be accomplished.

  1. We figure out how exactly we want to tie a template tag to metadata (presumably some sort of comment).
  2. We try to decide on some stuff that could be added to the config file, that most editors could be capable of supporting with minimal effort. For a minimum-viable-product, first attempt, this is probably nothing more than syntax highlighting metadata. So, something like a "textMateGrammar" key-value pair.
  3. We work with some popular editors to update them on how to understand and use these metadata files. In most cases, if not all, this probably just means we write plugins for them, and hope that eventually the plugin will eventually be incorporated into their default JavaScript plugin.
  4. Other editors may eventually choose to jump on the band-wagon and start incorporating support for these metadata files as well. When they do so, they can choose to use and understand the "textMateGrammar" convention (assuming we settle on that one), or if they don't want to for whatever reason, they can make up their own "specialGrammar" key that they understand instead. It'll then be the burden of the library authors to additionally add a "specialGrammar" key/value according to how that editor wants it, if they wish to support that editor.
  5. Some editors may never support the metadata files. In this case, it's the burden of the library authors to provide plugins for those editors if they wish to still support them. (or, it's the burden of the users of the library if they wish to contribute).

@bakkot I don't see the distinction that you see. All tools start with syntax and build up higher level concepts from there.