First of all: before getting into the issue here, let me just say that
this hurts me more than you (not to worry, there's no spanking for
anyone) . This because I am about to confess some misgivings I am having
with my favorite data format, JSON.
So, here I am: a fan of JSON as a data format. The problem is not that I
didn't like what JSON is and has. The problem is with things that it has
not. And I am almost ashamed to admit it but many of them are found in
-- GASP! -- xml. Yes, it is bit sacrilegous to admit this. But it is
true: while nothing that is in JSON is bad per se, there are things
omitted that should not be.
So here's my brief post-Festivus Airing of Grievances regarding JSON.
1. Comments are not really optional for a textual data format!
Enough said. Every textual data format should have a way to embed
human-readable unstructured notes, injectable by humans as well as
systems transforming or generating content. I often use XML comments to
include information about time when a document was generated, or to
contain simple debug information. This is very handy, and harmless for
automated processing as it can (and should) just ignore such comments.
And for actual processing hints there are also XML processing
instructions to use, likewise ignorable by processors who don't care
about them.
JSON format almost had its comments, too: an earlier (pre-RFC) version
actually did include comments: C and C++ styles I think (i.e. ones that
Javascript uses).
Why they were removed is beyond me, and is in my opinion the biggest
mistake made in specification. Comments just should be available.
2. Sometimes redundancy is Useful: case of Elements vs Attributes
(or, "Data is Lonely without Metadata")
It may be confusing to have 2 somewhat overlapping dimensions in XML:
that of structured (nested) child elements, and unstructured element
attributes. But there is one practical and useful way to separate the
two: think of elements and their textual content as actual data, and
attributes as metadata (for element data). This simple separation works
surprisingly well; and is a useful distinction for use cases like data
binding.
For example: type of an object can be stored in a type attribute (like,
say, "xsi:type"), and field values commonly as child elements. Or store
all identifiers as id attributes (like generic "xml:id" as per Xml:id
specification), separate from data contained as elements and textual
values stored in elements. But useful for adding references to the
element sub-trees.
JSON has no such facility, so any metadata has to be either in-line
mixed with data, or structured as siblings. Initially this may not seem
like a big deal, but it gets confusing pretty quickly in practice.
So why doesn't this matter with actual (Java) Objects? Isn't JSON more
"object oriented", being an object notation, not markup language? Well,
ava Objects DO have metadata that is orthogonal to data (object state,
i.e. its member fields)! What else is class information than metadata,
separate from actual data? All that typing -- both class declarations,
and runtime Object types -- is metadata, not data; similarly for all
method information. And most obviouly the latest additional to class
metadata, Java annotations, is pure orthogonal metadata. It is not a
perfect analogy (class info is per-class, like static memebers and
methods; whereas actual data is per-instance), but indicates the need of
place for both data and metadata.
3. As Simple as Possible, but No Simpler
Although both of above paragraphs could be repeated here -- as in JSON
being simplified beyond reasonable, by omitting comments -- there is
more.
For example: unquoted linefeeds are not allowed within JSON String
values; linefeeds must be quoted just like other control characters.
This is Bad. Why are they not allowed to be included as is, given how
common they are in text? I suspect it was done in effort to make it
easier to "parse" JSON, by allowing single-line regexps to work. But I
don't care -- if I parse something, I do it properly. Regexps alone do
not parse make (they make lexer, useful and used by parsers, but not
parsers). Linefeeds are displayable characters just like anything else.
It's quite ok to let them be used within String values: after all, they
are often needed there. So why force quoting them, even though they are
not used as separators?
There are also things that I think are good or at least acceptable
riddances: for example, while it is often useful to have choice of
quotes in xml (single or double quotes), I'm not crying after loss of
apostrophes. I could write a parser that handles multiple kinds of
String value markers; but I can also generate content using just one
kind. But it does complicate hand-writing and modifying content.
4. Is Ordering really irrelevant?
In XML content order is mostly significant; the only exception being
attributes that are unordered. This makes some parts of data binding
more challenging, because objects usually have no concept of ordering
for properties. Because of this there are many legal easily definable
XML structures that can not be easily be mapped to (Java) objects.
But while sometimes problematic, ordering can also be valuable. For
example, it is great that it is possible to guarantee that certain
elements (like, say, "header") comes before others (like, say,
"footer"). The only conceptually correct way to do this in JSON is to
use Lists (aka Arrays). But their values are anonymous, unlike those of
Maps. Alternatively it is possible for JSON processors to preserve
actual physical ordering; but the problem is that not all processors
will do this; not the least because specification discourages this.
And the most obviously useful ordering is that the metadata (attributes)
always precedes data (elements). That is something you can count on; and
for common types of metadata (those class types and identifiers, see
above), this is pretty optimal arrangement.
5. Other problems?
One thing of interest regarding list above is that none of them is a
commonly stated reason by those who advocating using XML over JSON.
Conversely, I think that most commonly used reasons are very poor
excuses of arguments; usually based on fundamental misunderstanding of
actual benefits of XML, or good use cases for either XML or JSON.
Perhaps I should collect list of such claims to shoot them down next. :-)