JSON.isWellFormed(str: string): boolean

Introduction

This proposal introduces a new method to the JSON namespace:

JSON.isWellFormed(str: string): boolean

The method returns true if and only if the input string does not contain characters that would require escaping when serialized via JSON.stringify.

Motivation

In many applications, developers want to know in advance whether a string will produce a "clean" JSON representation, without escape sequences.

Examples:

  • Optimizing serialization performance.
  • Ensuring human-readable JSON outputs.
  • Validating user input before sending data over the network or storing in a database.

A dedicated JSON.isWellFormed method would provide a fast, standardized way to check this.

Definition

A string is considered well-formed for JSON if it does not contain:

  1. ASCII control characters: U+0000U+001F
  2. Quotation mark (", U+0022)
  3. Reverse solidus (backslash, \, U+005C)
  4. Unpaired UTF-16 surrogates: U+D800U+DFFF

Examples

JSON.isWellFormed("hello world");
// → true

JSON.isWellFormed("hello \"world\"");
// → false (contains quotation mark)

JSON.isWellFormed("line \n break");
// → false (contains control char)

JSON.isWellFormed("emoji 😀");
// → true

JSON.isWellFormed("\uD800"); 
// → false (lone surrogate)

Possible Specification (Informal)

function isWellFormed(str) {
  for (let i = 0; i < str.length; i++) {
    const code = str.codePointAt(i);

    // ASCII control characters
    if (code <= 0x1F) return false;

    // " or \
    if (code === 0x22 || code === 0x5C) return false;

    // Unpaired surrogate
    if (code >= 0xD800 && code <= 0xDFFF) return false;

    if (code > 0xFFFF) i++; // skip surrogate pair
  }
  return true;
}

Prior Art

  • String.prototype.isWellFormed and String.prototype.toWellFormed (added in ES2024) ensure valid Unicode strings but are unrelated to JSON escaping.
  • This proposal specifically targets JSON serialization safety.
1 Like

would you be able to share links to code that is checking this?

In my experience most code is using JSON as a serialisation format, so they only care that you’ll get the same value back when parsing the JSON. Not if it needs to include escape characters to do so.

1 Like

see fast-json-stringify/lib/serializer.js at cc14cc8abe958274cacca05e8ab8e3a8fd0410b8 · fastify/fast-json-stringify · GitHub

"fastify" check those chars for each response because it use a built in json-stringify (GitHub - fastify/fast-json-stringify: 2x faster than JSON.stringify())

1 Like

Thanks! So it’s common for custom JSON encoder libraries, rather than the top level application itself?

It’s not very common in everyday usage, but it can be extremely useful in critical high-traffic scenarios. JSON.stringify is relatively slower, and custom stringifiers can provide a real performance boost.

see:

custom stringifiers need something like JSON.isWellFormed()

Actually it uses regex

but regex are slow!

1 Like

I think you’re reversing cause and effect here. The restrictive JSON stringifiers are faster because they model output on a schema (either official JSON Schema(tm) or something proprietary) and do not support the entire JSON grammar.

"Well-formed" is thus a misnomer and even the authors of those packages wouldn’t call their happy path "well-formed," it’s just simple enough for them to use all-userland code.

Like you wouldn’t call a CSV "well-formed" if it doesn’t have escaped delimiters and dquotes, would you? You might call it "simple" or "naive" though.

1 Like

What makes you think that your proposed method would be any faster? And why do you think regexps are slow?

2 Likes

About performance, similar to String.prototype.isWellFormed() (docs):

isWellFormed() is more efficient, as engines can directly access the internal representation of strings.

In the same way, JSON.isWellFormed could access the internal representation of strings and thus be more efficient than using regular expressions.

eg. in V8 src/json/json-stringifier.cc - v8/v8 - Git at Google the efficiency comes from vectorized word-level scanning, with a fallback to per-character checks when needed. This is what should be exposed as JSON.isWellFormed