Wednesday, September 26, 2007

Fresh out of oven: Woodstox 3.2.2 released

In a surprising turn of events, one more 3.x release of Woodstox just occured today. Although the plan was to move all development to trunk (for eventual 4.0 release), it turned out that two more minor feature enhancements could be developed in trunk, but backported easily to 3.2 branch:

  • DOM compatibility was incomplete in 3.2.2, as reported in [WSTX-104]. This was an oversight, in that although decision not to complete support was deliberate, code was left accessible. However, it turned out to be relatively easy to support DOMTarget (i.e. ability to constructed stream/event writers with javax.xml.transform.dom.DOMTarget as destination). So now it is possible to use Stax API to build DOM trees. Should anyone care... :-)
  • EBCDIC encodings were detected ("hey; there's some garbage in here... smells like EBCDIC?"), but no support was included to handle such content (see [WTSX-122]). But after support was requested (yes, Big Iron talks xml too it seems!), it was found simple enough to add support for handling such content (not trivial, it not being 7-bit ascii compatible, just doable). Big thanks to DataDirect folks for their help on resolving this issue.

Anyway; while it would seem that number of developers/teams that desperately need either one of the improved features can be counted with one hand fingers, when you need, you need it. And if you do need to read EBCDIC content and turn fast Stax output to bulky DOM trees, now Woodstox can help you do that, too!

Thank you, thank you, I'll be here all week, don't forget to tip the waitresses...

Thursday, September 20, 2007

More on JSON performance in Java (or, lack thereof!)

(UPDATE, 06-May-2009: here are more recent results)

Other JSON parser implementations

After previous blog entry, I decided to have a look at other alternative JSON parser implementations available for Java. I figured that the json.org's reference JSON parser implementation is probably mostly aimed at show-casing the concept, not to act as the ultimate solution, perhaps some of the other choices might be optimized for performance.

For some reason this appears not to be the case. Some implementations obviously have other goals, like StringTree JSON which takes pride in having miniscule bytecode footprint. Small can be beautiful. But others could conceivably perform quite nicely: for example, BerliOS' JSONTools would seem like a good candidate, given it is built on top of a lexer-generated scanner: this approach could yield some mean lean tokenizers. But not in this case, it seems.

So let's have a look at some numbers. I will use a document similar to the earlier test (it would be nice to have a wider collection of test data, but it'll have to do for now). Here are numbers from "TestJsonPerf" (after running for a while to stabilize timings):

  • Test 'Jackson, stream' -> 190 msecs
  • Test 'Jackson, Java types' -> 304 msecs
  • Test 'Json.org' -> 867 msecs
  • Test 'StringTree' -> 733 msecs
  • Test 'JSONTools (berlios.de)' -> 2727

(and as before, time taken is for 2500 repetitions of parsing a given json document from in-memory buffer)

So... StringTree implementation is on par with the reference implementation, actually, even a little bit faster (although nowhere near Jackson speedwise). But what is rather surprising is exactly how slow JSONTools appears to be. This was a big surprise, given how one would expect different outcome. With amount of code the package has, it perhaps has some other particularly interesting features to make up for rather more relaxed pace?

Although benchmarks can be misleading, it does seem like Jackson has a suitable raison d'etre even if it was only due to its efficiency. To me it is still a bit puzzling as to why no one had so far considered performance to be something to look for. Did everyone just assume that JSON would be super-fast purely by virtue of being a simple format to handle? Surely it should be well-known that xml parsers are ridiculously extensively optimized, and that naive approaches yield less than stellar speeds.

Anyway, with this brief detour, I'll be off working on the second core data mapper for Jackson, "Dynamic JSON mapper" (or, perhaps, "JsonTypeMapper"?). It'll be something closer to how people work with XML trees, but without keeping simple things from being simple.

Tuesday, September 18, 2007

Jackson is Fast... but how fast?

Faster Than The Speeding Bullet?

Nope: that would be Superman. But perhaps Jackson can at least sting like a bee? Anyway, to try to answer the question, I decided to repurpose code from StaxTest (loose set of performance test components used for Woodstox development) and see how Jackson compares to the venerable Json.org reference implementation. Test classes in question will be available as part of the next Jackson source code bundle (under src/perf), and others can check out their experiences. But here are some choice tidbits until then.

First of all, I decided to use sample documents from http://www.json.org/example.html. The documents are quite short (from less than 1 kB to about 4), but since there do not seem to be similar sample document repositories as there are for xml, these would have to do. The test consists of repeated parsing of specified document. Document is first read into a byte array before running tests (to minimize I/O overhead), and then feed using implementation dependant mechanism.

For repetition count of 2500 over the largest (4 kB) of sample JSON documents, on my (t)rusty old single-CPU Athlon box, I got following numbers:

  • Jackson, fully streaming: 224 milliseconds
  • Jackson using simple Java type mapper: 333 milliseconds
  • Json.org reference implementation: 883 milliseconds

(I also did test out the other documets; numbers I saw were similar)

Fully streaming case will just iterate over all tokens of the input, without further processing. Java mapper, on the other hand, will actually construct in-memory representation (Lists, Maps, Numbers, Strings, Booleans). So for this particular case, Jackson would be about 4 times as fast as the reference implementation, when using the fastest mode. This comparison is not completely fair, of course, since the reference implementation does actually build an in-memory representation. Of course it is not necessarily true that one always needs such "tree", so your mileage may vary.

At any rate, a simplified and somewhat naive answer would be that Jackson may be 3 - 4 times as fast as the reference implementation if you use the fastest access mode (streaming); and 2 - 3 times as fast if you need an Object representation of JSON data. The usual disclaimers apply, of course: it is not always easy to give fair comparison; different kind of input might give different results and so forth. But hopefully this gives some perspective on kinds of improvements one could get. And I would love to see others doing similar measurements.

But how about the absolute speed?

So it seems like Jackson might be a wee bit faster than the most commonly used alternative. But beyond this, how would JSON compare to, say, equivalent XML parsing? Well, given the input document size and repetition counts, streaming parsing with JSON appears to proceed with respectable rate of about 50 MBps on this particular system. The usual XML processing rates using Woodstox, on same machine, is anywhere between 10 and 30 MBps, depending on complexity of the document (plain text and elements are fastest to process, attribute slower and so forth). So assuming similar information density (some people claim JSON has less fluff, but this seems debatable -- however, I haven't heard anyone claim that XML would have more compact representation in its textual serialization) it would appear that processing JSON is indeed somewhat faster, which is to be expected given simplifity of JSON as a data (transfer) format.

The real question is whether this advantage can be converted to even more significant speed boost at higher level, like when doing full Java data binding (a la JAXB). We should find it in near future once people get more serious about building toolkits on top of efficient JSON parsers...

Monday, September 17, 2007

Jackson JSON-processor, v0.7

After some bugfixes, added unit testing, and one significant new feature, it is good time to release the next pre-1.0 version of Jackson JSON-processor.

So what's new? In addition to the basic stream parser (reader) and generator (writer) implementations, there is now support for simple data binding, implemented by mapper class org.codehaus.jackson.map.JavaTypeMapper. This mapper allows for mapping from JSON content into corresponding basic JDK data types (Lists, Maps, Strings, Numbers, Booleans and null) and back. When mapping from Java objects to JSON, a few more types are recognized (like primitive arrays, various basic Collections and so on), but no attempt is done to handle Java beans. Such support may be added via other mappers, but for now it is more important to cover the simplest cases.

Simple examples should show-case how easy it is to use this mapper. Let's start by mapping JSON to Java objects:

  JsonFactory jf = new JsonFactory();
  Object result = new JavaTypeMapper().read(jf.createJsonParser(new StringReader("[ 1, 15, true ]")));

So what would 'result' look like? It would be equivalent to:

  List result = new ArrayList();
  result.add(Integer.valueOf(1));
  result.add(Integer.valueOf(15));
  result.add(Boolean.TRUE);

And the other direction (outputting JSON given basic Java wrapper or collection instances is about as simple:

  JsonFactory jf = new JsonFactory();
  StringWriter sw = new StringWriter();
  JsonGenerator gen = new JsonFactory().createJsonGenerator(sw);
  Map m = new LinkedHashMap();
  m.put("key", Integer.valueOf(29));
  m.put("value", "something");
  m.put("enabled", Boolean.TRUE);
  new JavaTypeMapper().writeAny(gen, m);

And the output would look something like:

{"key":29, "value":"something", "enabled":true}

Rather simple? I think so. No extra metadata needed, no messing with annotations or (heaven forbid!) xml configuration files. Nor new java classes, only purpose of which is to act as C struct equivalents. Simple tool for simple (but common!) use cases.

So what's next? Jackson 0.8, obviously, but what will that contain? There is another kind of straightforward mapper that will probably added: one that uses basic node structure, somewhat similar to XML tree model (XOM, JDOM, DOM, DOM4J) nodes: ones that can be both conveniently traveled, and accessed in dynamic type-safe manner (a la "duck typing"). Stay tuned!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.