Friday, October 19, 2007

Heineken's Documentation Principle

or, On Two Kinds of Documentation

I was thinking about software documentation the other day, at work. Documentation, as you may know, is the thing Agile Software folks care a wee about, just not as much as, say, the really important stuff like working systems and impressive stack of unit tests. This was in the context of actually considering of writing some more of said documentation. This contemplation remained as an abstract exercise (as is usually the case) without degenerating into actual physical documentation process, but it resulted in an observation: it occured to me that most documentation that I have written (or am about to write) falls neatly into two general categories:

  1. Documentation I have yet to write, that exists in my head, is fully up-to-date, and would be very useful if only it was written down
  2. Documentation I have actually written down, which is generally incomplete and out-of-date

Of these, the first category is obviously much larger than the second one. Nonetheless, all other documentation combined would be a mere fraction of the second category, and hence not worth further analysing.

This observation lead to the actual revelation: similar to the way Heisenberg's whatchamacallit (or was it this one?) states that the act of observation itself interferes with the (quantum) state of matter/energy, is it not also the case that the act of writing down of information as documentation renders it immediately obsolete? How else can it be that everything I write down becomes obsolete; yet everything I do not is (and remains) crystal clear in my mind, ready to be written down at a later point of convenience.

Having thus established an important principle in the field of software development, I feel that I can also try naming this principle. As much as it would seem prudent to name it along the more famous uncertainty principle, my ego demands something else. So let's hear it for "Tatu's Exclusion Principle" (analog to Pauli's EP):

"There exists only two kinds of software documentation: one that is up-to-date and useful, but not yet written down; and another that has been written down and is now utterly out-of-date and generally of little use"

... thank you, thank you, I will be here all week! Please don't forget to click the banne... I mean, tip the waitresses!

Wednesday, October 10, 2007

Even More About JSON performance in Java (now in technicolor!)

Thanks to suggestion by the friendly Japex author, I upgraded my testing set to the next level, consisting now of 2 Japex test suites: one that tests performance of small (up to 4 kB) documents, and another that tests performance of medium-sized documents (around 64 kB). Using Japex Micro-benchmark framework was a breeze, and its visualization capabilities make results much more sexy. So what's not to like? Anyway, initial results can be found from here:

In addition to Jackson (in streaming and java object tree modes), all previously mentioned alternate implementations are tested. But most interestingly I also found out about one more alternative, Noggit. Noggit (from Apache Labs) seems like a worthy competitor, given its good performance and small footprint. Its good quality is not a surprise given its author, who is a well-known fellow open sourceror (participating in projects like Lucene). I like its streamability, and design goals that include strict conformance to JSON specification.

Beyond implementations tested, I also added some artificial test documents (due to lack of real-world samples I could find), mostly to test larger document sizes, as well handling of numeric (integer, floating point) data. Numeric test cases are generated using simple generator classes, and other documents converted from xml documents (from xmltest test suite) using Badgerfish converter.

Results are interesting, although many of the findings are in line with my earlier tests. Beyond these similarities (regarding general ranking of implementations with respect to performance), there is the obvious correlation between streaming handling and performance for large documents: while tree models perform adequately for small documents, performance starts to seriously degrade with larger documents. This is analogous to XML processing performance (DOM vs. SAX/StAX). Also of interest was that one of the implementations apparently has problems parsing floating point numbers (which explains those NaN entries).

For anyone interested in reproducing the results, Japex source bundle (test cases, libs used) can be found here.

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.