Thursday, July 10, 2008

Jackson with some Objectivity: TIMTOWTDI

(that is: "There Is More Than One Way To Do It", aka "Tim Toady")

Now that Jackson is creeping closer to its 1.0 release, package also contains simple robust functionality to map json data to and from Java objects. To be precise, Jackson actually has not just one way to do it but two ways.

Methods are called "Java type mapper" and "JSON type mapper", although other nicknames can be readily coined. For example: "Poor Man's Objects" (aka "Everything's a Map with numbers, Strings and booleans") and "DOM wanna-be" (or "You Know Tree By Nodes It Has").

Java Type Mapper

This mapper is most similar to other Json mappers that everyone (and their monkey [and monkey's fleas' buddies cousins]) have written. Given Json content you can get Maps or Lists that contain other Maps, List, Strings, wrappers (Integer, Long, Boolean) and nulls. And given Maps, List et al., you can generate Json content. Here's how to use this functionality:

  String jsonContent = "{ \"name\" : \"Jackson\", \"data\" : [ 0, 15 ] }";

JsonFactory jf = new JsonFactory(); // need factory for creating parser to use Map hash = (Map) new JavaTypeMapper().read(jf.createJsonParser(new StringReader(jsonContent))); String name = (String) hash.get("name"); List l = (List) hash.get("data"); int maxValue = (Integer) l.get(1);

And to write content back as Json:

  
  StringWriter sw = new StringWriter();
  JsonGenerator gen = new jf.createJsonGenerator(sw);
  new JavaTypeMapper().writeAny(gen, hash);

Simple enough: values are basic Java data structs and values, and you modify them using normal List.add(), Map.put() methods; iterate over elements and so on.

So why do I call it "Poor Man's Objects"? Because most of the time, Maps, Lists (etc) are used to emulate "real" objects (like beans), with more dynamic access. In some cases this works well (when accessing data in a dynamic context, say, from jsp page); in others it just means losing type-safety without gaining anything. And that is probably the biggest missing piece: ability to map data to and from beans. That is something I hope to address in near future (but probably not within core Jackson project itself).

Json Type Mapper

Whereas the first mapper should be familiar for anyone used to other Java json processing packages, the other alternative should look familiar to those who are working with XML using tree models such as XOM, JDOM, DOM4j, or (unlucky bastards), DOM. This mapper constructs an in-memory tree, represented as set of Nodes traversable using convenient path accessors. One obvious benefit here is that one can eliminate casts: as long as you know type of List entries and Map entry values, you just use appropriate accessor and get results correctly typed (or, type cast exception if type didn't match). So, you will do something like:

  JsonNode rootNode = new JsonTypeMapper().read(jf.createJsonParser(new StringReader(jsonContent)));

// for node traversal, can either use getElementValue(int)/getFieldValue(String), or getPath().
// Difference is dealing with missing elements: getPath() allows for safe de-referencing of missing
// values (essentially creating dummy "missing" node that resolves to N/A value)
String name = rootNode.getPath("name").getTextValue(); // or '.getValueAsText()' for extra safety int maxValue = rootNode.getPath("data").getPath(1).getIntValue();

Looks better? I tend to think so -- while this could be called Poor Man's XPath, I think it reasonably convenient as is. In addition to data extraction as shown, tree can be modified, children iterated and so forth. Beyond explicit traversal using node methods, there are also future plans for implementing JsonPath to be able to use more concise notation. Such implementation could be based on the Json Type Mapper and should be easy to implement.

Which one should I use?

Whichever fits your use case, of course. But what would that be? One way to think about is whether you prefer path-access (json type mapper), or Java container traversal (java type mapper). Or more generally: if you have plenty of Java structs waiting to be thrown around, Java type mapper may be more convenient way to join things together. Especially so when you are actually just generating Json content, and not parsing it. So perhaps it makes most sense to generate Json from java stuff using Java type mapper; but to extract data from Json content using Json type mapper.

Anyway: I am using Java Type mapper at work for freezing/thawing state of a batch processing system (over restarts), but I hope others are using one or both in more creative ways. Please let me know if you do, add a comment here or email me (at yahoo.com, cowtowncoder).

Monday, February 18, 2008

Release Early, Release Often: Jackson 0.9

After fixing couple of bugs in mapper, I decided to release the next pre-1.0 release of Jackson Json Processor, version 0.9.0. There are no drastic changes, just couple of bug fixes.

As usual, release notes lists changes, and CREDITS file lists kind developers who have helped weed out the bugs fixed.

Wednesday, February 06, 2008

Maintenance releases: Woodstox 3.2.4, Jackson 0.8.2

Public service announcements for 2008

Quick update on state of projects: both Woodstox and Jackson projects released minor bug-patch versions to kick off the new year.

Releases are available from respective home pages, and here are quick links for change lists:

Wednesday, October 10, 2007

Even More About JSON performance in Java (now in technicolor!)

Thanks to suggestion by the friendly Japex author, I upgraded my testing set to the next level, consisting now of 2 Japex test suites: one that tests performance of small (up to 4 kB) documents, and another that tests performance of medium-sized documents (around 64 kB). Using Japex Micro-benchmark framework was a breeze, and its visualization capabilities make results much more sexy. So what's not to like? Anyway, initial results can be found from here:

In addition to Jackson (in streaming and java object tree modes), all previously mentioned alternate implementations are tested. But most interestingly I also found out about one more alternative, Noggit. Noggit (from Apache Labs) seems like a worthy competitor, given its good performance and small footprint. Its good quality is not a surprise given its author, who is a well-known fellow open sourceror (participating in projects like Lucene). I like its streamability, and design goals that include strict conformance to JSON specification.

Beyond implementations tested, I also added some artificial test documents (due to lack of real-world samples I could find), mostly to test larger document sizes, as well handling of numeric (integer, floating point) data. Numeric test cases are generated using simple generator classes, and other documents converted from xml documents (from xmltest test suite) using Badgerfish converter.

Results are interesting, although many of the findings are in line with my earlier tests. Beyond these similarities (regarding general ranking of implementations with respect to performance), there is the obvious correlation between streaming handling and performance for large documents: while tree models perform adequately for small documents, performance starts to seriously degrade with larger documents. This is analogous to XML processing performance (DOM vs. SAX/StAX). Also of interest was that one of the implementations apparently has problems parsing floating point numbers (which explains those NaN entries).

For anyone interested in reproducing the results, Japex source bundle (test cases, libs used) can be found here.

Thursday, September 20, 2007

More on JSON performance in Java (or, lack thereof!)

Other JSON parser implementations

After previous blog entry, I decided to have a look at other alternative JSON parser implementations available for Java. I figured that the json.org's reference JSON parser implementation is probably mostly aimed at show-casing the concept, not to act as the ultimate solution, perhaps some of the other choices might be optimized for performance.

For some reason this appears not to be the case. Some implementations obviously have other goals, like StringTree JSON which takes pride in having miniscule bytecode footprint. Small can be beautiful. But others could conceivably perform quite nicely: for example, BerliOS' JSONTools would seem like a good candidate, given it is built on top of a lexer-generated scanner: this approach could yield some mean lean tokenizers. But not in this case, it seems.

So let's have a look at some numbers. I will use a document similar to the earlier test (it would be nice to have a wider collection of test data, but it'll have to do for now). Here are numbers from "TestJsonPerf" (after running for a while to stabilize timings):

  • Test 'Jackson, stream' -> 190 msecs
  • Test 'Jackson, Java types' -> 304 msecs
  • Test 'Json.org' -> 867 msecs
  • Test 'StringTree' -> 733 msecs
  • Test 'JSONTools (berlios.de)' -> 2727

(and as before, time taken is for 2500 repetitions of parsing a given json document from in-memory buffer)

So... StringTree implementation is on par with the reference implementation, actually, even a little bit faster (although nowhere near Jackson speedwise). But what is rather surprising is exactly how slow JSONTools appears to be. This was a big surprise, given how one would expect different outcome. With amount of code the package has, it perhaps has some other particularly interesting features to make up for rather more relaxed pace?

Although benchmarks can be misleading, it does seem like Jackson has a suitable raison d'etre even if it was only due to its efficiency. To me it is still a bit puzzling as to why no one had so far considered performance to be something to look for. Did everyone just assume that JSON would be super-fast purely by virtue of being a simple format to handle? Surely it should be well-known that xml parsers are ridiculously extensively optimized, and that naive approaches yield less than stellar speeds.

Anyway, with this brief detour, I'll be off working on the second core data mapper for Jackson, "Dynamic JSON mapper" (or, perhaps, "JsonTypeMapper"?). It'll be something closer to how people work with XML trees, but without keeping simple things from being simple.

Tuesday, September 18, 2007

Jackson is Fast... but how fast?

Faster Than The Speeding Bullet?

Nope: that would be Superman. But perhaps Jackson can at least sting like a bee? Anyway, to try to answer the question, I decided to repurpose code from StaxTest (loose set of performance test components used for Woodstox development) and see how Jackson compares to the venerable Json.org reference implementation. Test classes in question will be available as part of the next Jackson source code bundle (under src/perf), and others can check out their experiences. But here are some choice tidbits until then.

First of all, I decided to use sample documents from http://www.json.org/example.html. The documents are quite short (from less than 1 kB to about 4), but since there do not seem to be similar sample document repositories as there are for xml, these would have to do. The test consists of repeated parsing of specified document. Document is first read into a byte array before running tests (to minimize I/O overhead), and then feed using implementation dependant mechanism.

For repetition count of 2500 over the largest (4 kB) of sample JSON documents, on my (t)rusty old single-CPU Athlon box, I got following numbers:

  • Jackson, fully streaming: 224 milliseconds
  • Jackson using simple Java type mapper: 333 milliseconds
  • Json.org reference implementation: 883 milliseconds

(I also did test out the other documets; numbers I saw were similar)

Fully streaming case will just iterate over all tokens of the input, without further processing. Java mapper, on the other hand, will actually construct in-memory representation (Lists, Maps, Numbers, Strings, Booleans). So for this particular case, Jackson would be about 4 times as fast as the reference implementation, when using the fastest mode. This comparison is not completely fair, of course, since the reference implementation does actually build an in-memory representation. Of course it is not necessarily true that one always needs such "tree", so your mileage may vary.

At any rate, a simplified and somewhat naive answer would be that Jackson may be 3 - 4 times as fast as the reference implementation if you use the fastest access mode (streaming); and 2 - 3 times as fast if you need an Object representation of JSON data. The usual disclaimers apply, of course: it is not always easy to give fair comparison; different kind of input might give different results and so forth. But hopefully this gives some perspective on kinds of improvements one could get. And I would love to see others doing similar measurements.

But how about the absolute speed?

So it seems like Jackson might be a wee bit faster than the most commonly used alternative. But beyond this, how would JSON compare to, say, equivalent XML parsing? Well, given the input document size and repetition counts, streaming parsing with JSON appears to proceed with respectable rate of about 50 MBps on this particular system. The usual XML processing rates using Woodstox, on same machine, is anywhere between 10 and 30 MBps, depending on complexity of the document (plain text and elements are fastest to process, attribute slower and so forth). So assuming similar information density (some people claim JSON has less fluff, but this seems debatable -- however, I haven't heard anyone claim that XML would have more compact representation in its textual serialization) it would appear that processing JSON is indeed somewhat faster, which is to be expected given simplifity of JSON as a data (transfer) format.

The real question is whether this advantage can be converted to even more significant speed boost at higher level, like when doing full Java data binding (a la JAXB). We should find it in near future once people get more serious about building toolkits on top of efficient JSON parsers...

Monday, September 17, 2007

Jackson JSON-processor, v0.7

After some bugfixes, added unit testing, and one significant new feature, it is good time to release the next pre-1.0 version of Jackson JSON-processor.

So what's new? In addition to the basic stream parser (reader) and generator (writer) implementations, there is now support for simple data binding, implemented by mapper class org.codehaus.jackson.map.JavaTypeMapper. This mapper allows for mapping from JSON content into corresponding basic JDK data types (Lists, Maps, Strings, Numbers, Booleans and null) and back. When mapping from Java objects to JSON, a few more types are recognized (like primitive arrays, various basic Collections and so on), but no attempt is done to handle Java beans. Such support may be added via other mappers, but for now it is more important to cover the simplest cases.

Simple examples should show-case how easy it is to use this mapper. Let's start by mapping JSON to Java objects:

  JsonFactory jf = new JsonFactory();
  Object result = new JavaTypeMapper().read(jf.createJsonParser(new StringReader("[ 1, 15, true ]")));

So what would 'result' look like? It would be equivalent to:

  List result = new ArrayList();
  result.add(Integer.valueOf(1));
  result.add(Integer.valueOf(15));
  result.add(Boolean.TRUE);

And the other direction (outputting JSON given basic Java wrapper or collection instances is about as simple:

  JsonFactory jf = new JsonFactory();
  StringWriter sw = new StringWriter();
  JsonGenerator gen = new JsonFactory().createJsonGenerator(sw);
  Map m = new LinkedHashMap();
  m.put("key", Integer.valueOf(29));
  m.put("value", "something");
  m.put("enabled", Boolean.TRUE);
  new JavaTypeMapper().writeAny(gen, m);

And the output would look something like:

{"key":29, "value":"something", "enabled":true}

Rather simple? I think so. No extra metadata needed, no messing with annotations or (heaven forbid!) xml configuration files. Nor new java classes, only purpose of which is to act as C struct equivalents. Simple tool for simple (but common!) use cases.

So what's next? Jackson 0.8, obviously, but what will that contain? There is another kind of straightforward mapper that will probably added: one that uses basic node structure, somewhat similar to XML tree model (XOM, JDOM, DOM, DOM4J) nodes: ones that can be both conveniently traveled, and accessed in dynamic type-safe manner (a la "duck typing"). Stay tuned!

Thursday, August 30, 2007

New Fast JSON-processor: Jackson!

Developers, meet mr. Jackson: possibly the world's fastest JSON parser! (and at very least, fastest one written in Java).

You heard it here first folks: a new streaming light-weight, and VERY fast JSON-processor (parser+generator) package written from scratch, in Java, is now officially out in the wild. Check out Jackson Hatchery Page for details.

So what's in it for me? This new release will eventually develop to be a similar high-quality, easy-to-use and widely used building block as Woodstox is. And that will happen fastest when there are fearless Early Adopters who check out new things.

Stay tuned: I will try to cover more ground with my next entry. For now, I just wanted to get the word out.

Oh, and please do send me feedback if you end up using it. While basic fundamental unit test coverage exists, there are bound to be some rough edges.


Sponsored By

Related Blogs

(by Author (topics))

Recommended Tools

Powered By