Perhaps, to quickly introduce the string aspect for those interested:
According to clause 6.1.4 of the ECMAScript specification, a string is defined as the "set of all ordered sequences of zero or more 16-bit unsigned integer values".
Only if the string is in fact interpreted as text, say when logged to console, "each element in the String is treated as a UTF-16 code unit value". Most folks will have spotted the "�" character from time to time, indicating that part of a string is not printable. In fact, since strings are often constructed via APIs like
concat and so forth, or their contents similarly inspected, it is common that the individual parts that eventually make up text are not necessarily well-formed UTF-16.
A simplified example is a string builder or buffer containing an array of strings of a fixed length, say 1024 bytes, that will later be concatenated to form the final text. Here, whenever a string is sliced by a fixed length, it might split a so-called surrogate pair (see the presentation linked at the end for more info on the concepts) into half, like so:
let part1 = "𝄞".substring(0, 1);
let part2 = "𝄞".substring(1);
console.log(part1, part2); // � �
Neither of these strings is well-formed, in that the first contains the first half of a surrogate pair and the second contains the second half. This is not a problem when both strings are re-assembled again:
let text = part1 + part2;
console.log(text); // 𝄞
where now the surrogate pair has been fused and the string represents text again.
part2 above will happen, just now it will not be limited to display because the information is lost during processing already through replacement with "�", i.e. when reassembling the string beyond the boundary, the resulting string will not be "𝄞" but "��" (two individually replaced surrogate code points since individually transferred over a WebAssembly boundary).
The argument to nonetheless break this is basically that while I might be technically correct, several experts "don't think this is a problem". No credible evidence for this claim has been provided in five years of discussion, and instead the discussions have degenerated into harassment and defamation of my person for insisting on proper arguments. Typically, seeing this happen over and over again should raise red flags, and one would resort to what is already well-known and applicable precedents.
One of these places is WebIDL, featuring the following prominent "Warning!":
USVString semantics is what the Component Model exclusively proposes by means of fixing its
char type, whereas
DOMString." I would add: Because otherwise stuff will break.
Another such place is the Web Platform Design Principles, that is also very clear:
Here as well, the recommendation is to use
An applicable precedent is JSON, where
JSON.stringify produces a
DOMString. Originally, if the input to
JSON.stringify contained, say, the contents of the string builder, then these contents were preserved in the resulting
DOMString. This indirectly led to a problem, since if the result is saved to disk, it is saved in UTF-8 encoding, breaking the contents of the string builder, producing mojibake as shown above again. This issue has actually been addressed by the proposal Well-formed JSON.stringify, where now
JSON.stringify uses escape sequences so no information can be lost in subsequent steps, including that when saving the result to disk string integrity is preserved. Practically speaking, JSON is a sophisticated manifestation of the simplified string builder example above.
As one can see, a lot of thought went into this over the last few decades already, with recommendations and principles being formulated, whereas the Component Model simply proposes to break all this with unsubstantiated claims, in conflict with all the evidence that has, for some reason, been systematically ignored. "I don't think this is a problem" and "this dude is impolite" summarizes foregoing discussions very well.
What might be much more interesting, however, is that WASI, that the Component Model is a spin-off of, some say a trojan horse of, is eagerly establishing an entire JS-incompatible set of platform APIs. Nothing is reused. Nothing is bridged. This is what 99% of AssemblyScript's objections is about, yet the bulk has been cleverly moved out of view here with manipulative rhetoric and irrelevant accusations, that on its own violates like half of TC39's CoC I'd say, even though nobody will be penalized for this "political finesse". So I hope nobody is surprised that I am not amused by a mere continuation of the WebAssembly CG's practices.
P.S.: Some companies are already experimenting with disabling what makes JS fast, the JIT, for "super duper security". I really hope this is not somehow connected.