Update JSON Spec

Hi,

some years ago I thought "sooner or later protocol-buffers will win. It
is better, it supports more data types, is faster, ..."

Now I think it won't happen.

Nevertheless JSON misses the support for basic data types:
Binary, Datetime, Timedelta.

Is there already work in progress to update the JSON spec?

The JSON APIs have an extension mechanism: replacers and revivers. Does that not suffice for these types?

Changing JSON to, by default, deserialize more types of values might be problematic.
There're already problems where programs naively trust objects constructed via JSON.parse from attacker-controlled strings.

Maybe some standard way to chain replacers/revivers would help if there's not such a thing already.

You mentioned replacers and revivers. This is a valid work-around. There are thousand ways to work around the root cause. With other words: It is no solution.

You say "There're already problems where programs naively trust objects constructed via JSON.parse"

You can't blame the spec if some implementations are insecure.

Support for datetime, timedelta and binary in JSON would help a lot of people.

I know this will take years. But where and how to start?

As the current editor of the JSON spec, I think I can speak to this. Unfortunately, I suspect you are not likely to be happy with the answer. The short version is: the design of JSON is frozen and will never change. The phrasing of the specification itself may change if somebody discovers ambiguities or some interpretation of the existing spec language that is inconsistent with the design of the JSON format, but that format itself is fixed. This unquestionably has its plusses and minuses. One of the big minuses, as you observe, is that some commonly used data types have no defined representation, which is a genuine source of frustration and implementation difficulty. Based on what we have learned in the past 20 years, if we had it to do over again it's likely we'd have done a few things differently (goodness knows there are things I'd like to change!). However, at this stage changing JSON would break the world, so we just can't.

JSON's bedrock immutability is part and parcel of its value as a data interchange format, because it helps protect us from one of the Internet's nastiest sources of instability, version shear: what happens when software written to one version of a data spec tries to read data written by software written to a different version of that spec. Higher level data representations can have rules for managing backwards compatibility and higher level protocols can have means of negotiating a common understanding between communicants, but being able to do those things rests on having some primitive, underlying layer of stability that can be reliably assumed by all parties. JSON is intended to be such a layer. JSON's fundamental simplicity and stability is a large part of why it displaced XML for a huge swath of applications, despite the fact that XML actually had support for all those things you cited even before JSON came on the scene.

Arguably it might be time to consider creating a new generation fundamental data representation standard, but this would be a new and different standard from JSON. I have some opinions about what that might look like, but I'll save them for a different discussion in the interest of maintaining the focus of this thread. Certainly if you have ideas for such a representation (and the time and energy to devote, no trivial thing!) it could be a worthy cause.

4 Likes

Thank you very much for your answer. I feel honored that you, the current editor of the JSON spec, replied to me.

Yes, JSON is fixed. I accept this now.

If I understood you correctly, then you are open to start a successor. Great news!

I think a successor should be like JSON if you don't use additional data types. But it should give you the ability to extend it. If sender and receiver agree on a particular extension, then it should be easy to implement it.

Since you are the current editor, you have way more detailed knowledge than I have. I am curious what are you missing from the current spec? Could you please tell us what you would like to add? At this moment, I don't care for the syntax. Explaining your goal (semantic) would be helpful.

The biggest thing that annoys me about JSON as it exists now is that property names must be quoted strings. In the beginning this was driven by the fact that we were parsing with eval() (preflighted with a regex for security -- remember, all this was prior to the JSON object and JSON.parse() existing), and in ES3 JavaScript reserved words couldn't be used as property names in object literals even though that's since been fixed. The alternatives were to require quoted strings always be used or to put the list of JS reserved words into the JSON spec; to help the design go down more smoothly with potential adopters we opted for the former, despite the annoyance of having to quote things that didn't really need it. I'd argue this was the right decision for the spec at the time, even though every JSON parser I've written before or since still accepts unquoted identifiers as property names (very convenient if you are using JSON for a wire protocol and for debugging purposes want to simulate one end of a connection using Telnet and fingers on a keyboard).

I think I'd also have included comments (again, my parsers always accept them), since a big JSON use case is config files and the like. The main reasons for omitting comments were grammar parsimony and avoiding something unnecessary which implementors might mess up, but we also wanted to thwart the horrible anti-pattern of putting metadata in comments.

In retrospect I think I might want to have included some kind of syntax for binary data, though this is a bit more speculative since JS itself has no such syntax. Even more speculative would be some provision for internal object pointers, to allow non-tree data structures to be encoded, but that's really playing with fire.

Primitive support for things like dates and times seems quite a bit more dubious to me, since the syntax for these isn't as basic, as timeless (pardon), or as universally agreed upon as it is for, say, numbers and strings. Perhaps more generally useful would be some way to annotate values with arbitrary type tags that could be registered with the parser (i.e., something like the replacer/reviver mechanism, only usable), which could be used for things like dates but also for other stuff. Conceivably one might take ES6 template strings as a point of inspiration, but now we're definitely speculating.

I've been contemplating championing another pass on the ES JSON object API, since we've added types to JavaScript (like BigInteger) and are contemplating adding more (like BigDecimal, records, or tuples) that are entirely compatible with and consistent with the existing JSON syntax but which we now have no way of distinguishing when encoding or decoding. It's possible such an evolution might be able to accommodate some of the above (notably comments). But I think that's a separate discussion.

thank you very much for your reply.

"property names must be quoted strings" You want {foo: 1} to be same as {"foo": 1}?

About comments. This is already in my list: https://github.com/guettli/lets-fix-json/blob/master/README.md#add-comments

Thank you for the idea "pointers". That was new to me. But it really makes sense. This would be very cool. I added this to the "let's fix json" page.

Binary, datetime, timedelta are in above list, too.

About the syntax. I think a string-prefix like in Python (b'...' means binary) would be feasible solution:
For example Datetime could use the prefix "dt". Example: dt"1985-04-12T23:20:50.52Z".

What do you think?

Hi Chip,

I added your hint to the list:

Unfortunately the discourse forum does not allow me to post a link to the github page.

I have a question: Do you want to support this?

{foo: "bar"}

Or even not-quoted strings in the values:

{foo: bar}

Sorry about the false positives with linking, @guettli. This has been resolved for future posts.

Hi aki, thank you for resolving the "false positive spam detection" issue.

Nevertheless, this question is still open:

HI Chip,

I have a question: Do you want to support this?

{foo: "bar"}

Or even not-quoted strings in the values:

{foo: bar}

My intent was the first (not quoting the key). Not quoting the string in the value seems like a bad idea to me.

If you consider coming up with a new language, you might want to have a look at Yet Another Markup Language or JSON5 which both are supersets of JSON and contain some of the features discussed here.

see https://github.com/tc39/proposal-json-parse-with-source/issues/6

for discussion on idiot-proof ways to JSON.parse/JSON.stringify bigint in arbitrary data. feedback on security is welcome as well.

@chip, re. "breaking the world", a new JSON version could be made trivially incompatible. Say, by mandating an equal sign as the first character, or an initial comment (although that may not fly since as you mentioned some parsers already tolerate comments).

The rest of the language could be specified to be backwards compatible, easing the transition.

IMHO, JSON should have native support for C‑style comments, just like the initial drafts did.