Sunday, April 12, 2009

JSON vs XML: confessions of a JSON advocate

First of all: before getting into the issue here, let me just say that this hurts me more than you (not to worry, there's no spanking for anyone) . This because I am about to confess some misgivings I am having with my favorite data format, JSON.

So, here I am: a fan of JSON as a data format. The problem is not that I didn't like what JSON is and has. The problem is with things that it has not. And I am almost ashamed to admit it but many of them are found in -- GASP! -- xml. Yes, it is bit sacrilegous to admit this. But it is true: while nothing that is in JSON is bad per se, there are things omitted that should not be.

So here's my brief post-Festivus Airing of Grievances regarding JSON.

1. Comments are not really optional for a textual data format!

Enough said. Every textual data format should have a way to embed human-readable unstructured notes, injectable by humans as well as systems transforming or generating content. I often use XML comments to include information about time when a document was generated, or to contain simple debug information. This is very handy, and harmless for automated processing as it can (and should) just ignore such comments. And for actual processing hints there are also XML processing instructions to use, likewise ignorable by processors who don't care about them.

JSON format almost had its comments, too: an earlier (pre-RFC) version actually did include comments: C and C++ styles I think (i.e. ones that Javascript uses).

Why they were removed is beyond me, and is in my opinion the biggest mistake made in specification. Comments just should be available.

2. Sometimes redundancy is Useful: case of Elements vs Attributes

(or, "Data is Lonely without Metadata")

It may be confusing to have 2 somewhat overlapping dimensions in XML: that of structured (nested) child elements, and unstructured element attributes. But there is one practical and useful way to separate the two: think of elements and their textual content as actual data, and attributes as metadata (for element data). This simple separation works surprisingly well; and is a useful distinction for use cases like data binding.

For example: type of an object can be stored in a type attribute (like, say, "xsi:type"), and field values commonly as child elements. Or store all identifiers as id attributes (like generic "xml:id" as per Xml:id specification), separate from data contained as elements and textual values stored in elements. But useful for adding references to the element sub-trees.

JSON has no such facility, so any metadata has to be either in-line mixed with data, or structured as siblings. Initially this may not seem like a big deal, but it gets confusing pretty quickly in practice.

So why doesn't this matter with actual (Java) Objects? Isn't JSON more "object oriented", being an object notation, not markup language? Well, ava Objects DO have metadata that is orthogonal to data (object state, i.e. its member fields)! What else is class information than metadata, separate from actual data? All that typing -- both class declarations, and runtime Object types -- is metadata, not data; similarly for all method information. And most obviouly the latest additional to class metadata, Java annotations, is pure orthogonal metadata. It is not a perfect analogy (class info is per-class, like static memebers and methods; whereas actual data is per-instance), but indicates the need of place for both data and metadata.

3. As Simple as Possible, but No Simpler

Although both of above paragraphs could be repeated here -- as in JSON being simplified beyond reasonable, by omitting comments -- there is more.

For example: unquoted linefeeds are not allowed within JSON String values; linefeeds must be quoted just like other control characters. This is Bad. Why are they not allowed to be included as is, given how common they are in text? I suspect it was done in effort to make it easier to "parse" JSON, by allowing single-line regexps to work. But I don't care -- if I parse something, I do it properly. Regexps alone do not parse make (they make lexer, useful and used by parsers, but not parsers). Linefeeds are displayable characters just like anything else. It's quite ok to let them be used within String values: after all, they are often needed there. So why force quoting them, even though they are not used as separators?

There are also things that I think are good or at least acceptable riddances: for example, while it is often useful to have choice of quotes in xml (single or double quotes), I'm not crying after loss of apostrophes. I could write a parser that handles multiple kinds of String value markers; but I can also generate content using just one kind. But it does complicate hand-writing and modifying content.

4. Is Ordering really irrelevant?

In XML content order is mostly significant; the only exception being attributes that are unordered. This makes some parts of data binding more challenging, because objects usually have no concept of ordering for properties. Because of this there are many legal easily definable XML structures that can not be easily be mapped to (Java) objects.

But while sometimes problematic, ordering can also be valuable. For example, it is great that it is possible to guarantee that certain elements (like, say, "header") comes before others (like, say, "footer"). The only conceptually correct way to do this in JSON is to use Lists (aka Arrays). But their values are anonymous, unlike those of Maps. Alternatively it is possible for JSON processors to preserve actual physical ordering; but the problem is that not all processors will do this; not the least because specification discourages this.

And the most obviously useful ordering is that the metadata (attributes) always precedes data (elements). That is something you can count on; and for common types of metadata (those class types and identifiers, see above), this is pretty optimal arrangement.

5. Other problems?

One thing of interest regarding list above is that none of them is a commonly stated reason by those who advocating using XML over JSON.

Conversely, I think that most commonly used reasons are very poor excuses of arguments; usually based on fundamental misunderstanding of actual benefits of XML, or good use cases for either XML or JSON. Perhaps I should collect list of such claims to shoot them down next. :-)

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.