What me like YAML? (Confessions of a JSON advocate)
Ok. I have to admit that I learnt something new and gained bit more
respect for YAML data format recently, when working on the
proof-of-concept for YAML-on-Jackson (jackson-dataformat-yaml;
more on this on yet another Jackson 2.0 article, soon).
And since it would be intellectually dishonest not to mention that my formerly negative view on YAML has brightened up a notch, here's my write-up on this bit of enlightenment.
1. Bad First Impressions Stick
My first look at YAML via its definition basically made my stomach turn. It just looked so much like a bad American Ice Cream: "Too Much of Everything" -- hey, if it isn't enough to have chocolate, banana and walnut, let's throw in bit of caramel, root beer essence and touch of balsamic vinegar; along with bit of organic arugula to spice things up!". That isn't the official motto, I thought, but might as well be. If there is an O'Reilly book on YAML it surely must have platypus as the cover animal.
That was my thinking up until few weeks ago.
2. Tale of the Two Goals
I have read most of YAML specification (which is not badly written at all) multiple times, as well as shorter descriptions. My overall conclusion has always been that there are multiple high-level design decisions that I disagree with, and that these can mostly be summarized that it tries to do too many things, tries to solve multiple conflicting use cases.
But recently when working on adding YAML support as Jackson module (based on nice SnakeYAML library, solid piece of code, very unlike most parsers/generators I have seen), I realized that fundamentally there are just two conflicting goals:
- Define a Wiki-style markup for data (assuming it is easier to not only write prose in, but also data)
- Create a straight-forward Object serialization data format
(it is worth noting that these goals are orthogonal, functionality-wise; but they conflict at level of syntax, visual appearance and complicate handling significantly, mostly because there is always "more than one way to do it" (Perl motto!))
I still think that one could solve the problem better by defining two,
not one, format: first one with a Wiki dialect; and second one with a
clean data format.
But this lead me to think about something: what if those weird Wiki-style aspects were removed from YAML? Would I still dislike the format?
And I came to conclusion that no, I would not dislike it. In fact, I might like it. A lot.
Why? Let's see which things I like in YAML; things that JSON does not have, but really really should have in the ideal world.
3. Things that YAML has and JSON should have
Here's the quick rundown:
- Comments: oh lord, what kind of textual data format does NOT have comments? JSON is the only one I know of; and even it had them before spec was finalized. I can only imagine a brain fart of colossal proportions caused it to be removed from the spec...
- (optional) Document start and end markers ("---" header, "..." footer"). This is such a nice thing to have; both for format auto-detection purpose as well as for framing for data feeds. It's bit of a no-brainer; but suspiciously, JSON has nothing of sort (XML does have XML declaration which _almost_ works well, but not quite; but I digress)
- Type tags for type metadata: in YAML, one can add optional type tags, to further indicate type of an Object (or any value actually). This is such an essential thing to have; and with JSON one must use in-band constructs that can conflict with data. XML at least has attributes ("xsi:type").
- Aliases/anchors for Object Identity (aka "id / idref"): although data is data, not objects with identity, having means to optionally pass identity information is very, very useful. And here too XML has some support (having attributes for metadata is convenient); and JSON has nada.
The common theme with above is that all extra information is optional;
but if used, it is included discreetly and can be used as appropriate by
encoders, decoders, with or without using language- or platform-specific
And I think YAML actually declares these things pretty well: it is neither over nor under engineered with respect to these features. This is surprisingly delicate balance, and very well chosen. I have seen over-complicated data formats (at Amazon, for example) that didn't know where to stop; and we can see how JSON stopped too short of even most rudimentary things (... comments). Interestingly, XML almost sort-of has these features; but they come about with extra constructs (xsi:type via XML Schema), or are side effects of otherwise quirky features (element/attribute separation).
Having had to implement equivalent functionality on top of simplistic JSON construct ("add yet another meta-property, in-line with actual data; allow a way to configure it to reduce conflicts"), I envy having these constructs as first-level concepts, convenient little additions that allow proper separation of data and metadata (type, object id; comments).
4. Uses for YAML
Still, having solved/worked around all of above problems -- Jackson 1.5 added full support for polymorphic types ("type tags"); 2.0 finally added Object Identity ("alias/anchor"), use of linefeeds for framing can substitute for document boundaries -- I do not have compelling case for using YAML for data transfer. It's almost a pity -- I have come to realize that YAML could have been a great data format (it is also old enough to have challenged popularity of JSON, both seem to have been conceived at about same time). As is, it is almost one.
Somewhat ironically, then, is that maybe Wiki features are acceptable for the other main use case: that of configuration files. This is the use case I have for YAML; and the main reason for writing compatibility module (inspired by libs/frameworks like DropWizard which use YAML as the main config file format).