Perhaps, to quickly introduce the string aspect for those interested:
According to clause 6.1.4 of the ECMAScript specification, a string is defined as the "set of all ordered sequences of zero or more 16-bit unsigned integer values".
Only if the string is in fact interpreted as text, say when logged to console, "each element in the String is treated as a UTF-16 code unit value". Most folks will have spotted the "�" character from time to time, indicating that part of a string is not printable. In fact, since strings are often constructed via APIs like fromCharCode
, substring
, concat
and so forth, or their contents similarly inspected, it is common that the individual parts that eventually make up text are not necessarily well-formed UTF-16.
A simplified example is a string builder or buffer containing an array of strings of a fixed length, say 1024 bytes, that will later be concatenated to form the final text. Here, whenever a string is sliced by a fixed length, it might split a so-called surrogate pair (see the presentation linked at the end for more info on the concepts) into half, like so:
let part1 = "𝄞".substring(0, 1);
let part2 = "𝄞".substring(1);
console.log(part1, part2); // � �
Neither of these strings is well-formed, in that the first contains the first half of a surrogate pair and the second contains the second half. This is not a problem when both strings are re-assembled again:
...
let text = part1 + part2;
console.log(text); // 𝄞
where now the surrogate pair has been fused and the string represents text again.
Now, what does this mean for our string builder or buffer? Since the Component Model does not allow this over boundaries, the string builder or buffer, say when packaged as its own module, will be broken when written in JavaScript, Java, C#, Dart, Kotlin, or any other language doing the same, as it can neither be provided with contents nor its contents be serialized as a whole without silently mutating the string data. In essence, the same thing as when logging part1
and part2
above will happen, just now it will not be limited to display because the information is lost during processing already through replacement with "�", i.e. when reassembling the string beyond the boundary, the resulting string will not be "𝄞" but "��" (two individually replaced surrogate code points since individually transferred over a WebAssembly boundary).
The argument to nonetheless break this is basically that while I might be technically correct, several experts "don't think this is a problem". No credible evidence for this claim has been provided in five years of discussion, and instead the discussions have degenerated into harassment and defamation of my person for insisting on proper arguments. Typically, seeing this happen over and over again should raise red flags, and one would resort to what is already well-known and applicable precedents.
One of these places is WebIDL, featuring the following prominent "Warning!":
Here, USVString
semantics is what the Component Model exclusively proposes by means of fixing its char
type, whereas DOMString
is a normal JavaScript string. Notably, WebIDL is very clear that "When in doubt, use DOMString
." I would add: Because otherwise stuff will break.
Another such place is the Web Platform Design Principles, that is also very clear:
Here as well, the recommendation is to use DOMString
, where USVString
is a special case that should only be used under very specific circumstances. Given that WebAssembly wants to support many languages, and as such a "WebAssembly Component Model" is also a "Java Component Model" or an "Interact-with-JavaScript Component Model" or a "JavaScript-compiled-to-WebAssembly Component Model", it becomes clear that the special case does not apply, since applying it would make it merely an "[insert a few languages here] Component Model", whereas composing an application written in anything else becomes a hazard, like in the string builder example.
An applicable precedent is JSON, where JSON.stringify
produces a DOMString
. Originally, if the input to JSON.stringify
contained, say, the contents of the string builder, then these contents were preserved in the resulting DOMString
. This indirectly led to a problem, since if the result is saved to disk, it is saved in UTF-8 encoding, breaking the contents of the string builder, producing mojibake as shown above again. This issue has actually been addressed by the proposal Well-formed JSON.stringify, where now JSON.stringify
uses escape sequences so no information can be lost in subsequent steps, including that when saving the result to disk string integrity is preserved. Practically speaking, JSON is a sophisticated manifestation of the simplified string builder example above.
As one can see, a lot of thought went into this over the last few decades already, with recommendations and principles being formulated, whereas the Component Model simply proposes to break all this with unsubstantiated claims, in conflict with all the evidence that has, for some reason, been systematically ignored. "I don't think this is a problem" and "this dude is impolite" summarizes foregoing discussions very well.
More such places are the language specifications respectively conventions of Java, C#, Dart, Kotlin, TypeScript, you name it, that all behave the same as JavaScript. All of these will be affected, in that composing an overall application of respective components implies lossy strings. A presentation about the implications that I wanted to give in a Wasm CG meeting but was not (really) allowed to can be found here. Note that this link has already been linked by the ECMAScript editor above in a grossly different frame of reference.
What might be much more interesting, however, is that WASI, that the Component Model is a spin-off of, some say a trojan horse of, is eagerly establishing an entire JS-incompatible set of platform APIs. Nothing is reused. Nothing is bridged. This is what 99% of AssemblyScript's objections is about, yet the bulk has been cleverly moved out of view here with manipulative rhetoric and irrelevant accusations, that on its own violates like half of TC39's CoC I'd say, even though nobody will be penalized for this "political finesse". So I hope nobody is surprised that I am not amused by a mere continuation of the WebAssembly CG's practices.
I really wish these concerns could be adequately discussed for once (nobody has responded to AssemblyScript's objections so far), as I think this is critical so, to phrase it in the OPs words, JavaScript is not "relegated to the back-seat, or possibly made irrelevant or inoperable".
P.S.: Some companies are already experimenting with disabling what makes JS fast, the JIT, for "super duper security". I really hope this is not somehow connected.