Tuesday, January 20, 2009

Json processing with Jackson: Method #2/3: Data Binding

(for background, refer to the earlier "Three Ways to Process Json" entry)

After reviewing the first "canonical" Json processing method (reading/writing Stream of Events), let's go up the abstraction level, and consider the second approach: that of binding (aka mapping) data between Java Objects and Json content. That is, given Json content in some form (like, say, Stream-of-Events...), library can construct equivalent Java objects; and conversely it can write Json content given Java objects. And do this without explicit instructions for low-level read/write operations (such as code from the preceding blog entry). This is often the most "natural" approach to Java programmers, since it is Object-centric. Approach is often referred to as "code-first" in related contexts, such as when discussing methods to process xml content.

Jackson's Data Binding support comes through a single mapper object, org.codehaus.jackson.map.ObjectMapper. It can be used to read Json content and construct Java object(s); or conversely write Json content that describes given Object. The design is quite similar to what XStream, or JAXB do with xml. The main differences (beyond data format used) are conceptual -- XStream focuses on Object serialization, Jackson on data binding; and JAXB2 supports both "schema-first" and "code-first" (and maybe emphasizes former more) whereas Jackson does not use schemas of any kind. But similarities are still more striking that differences.

So much for the background: let's have a look at how things work, by using Data Binding interface to do same work as was done in the first entry using Stream-of-Events abstract.

1. Reading Objects from Json

Ok. Given that our first example needed about two dozens lines of code, how much code might we need here? It should be less, to support the claim of being more convenient. How about:

  ObjectMapper mapper = new ObjectMapper();
  TwitterEntry entry = mapper.readValue(new File("input.json"));

... two? I guess you could make a one-liner too; or, if you want to separate pieces out more, half a dozen. But definitely much less than the manual approach. And the difference only grows when considering more complex objects and object graphs: whereas manual serialization needs more and more code, data binding code may not grow at all. Sometimes you may need to configure mapper more to deal with edge cases, or add annotations to support non-standard naming; but even then it is just a fraction of code to write.

Here are some more examples, just to show how to do simple things:

  Boolean yesOrNo = mapper.readValue("true"); // returns Boolean.TRUE
int[] ids = mapper.readValue("1, 3, 98"); // new int[] { 1, 3, 98 }
Map<String, List<String> dictionary = mapper.readValue( "{ \"word\" : [ \"synonym1\", \"synonym2\" ] }", new TypeReference<Map<String, List<String>() { }); // trickier, due to Type Erasure
Object misc = mapper.readValue("[ 1, true, null ]", Object.class); // above will return a List with Integer(1), Boolean.TRUE and null as its elements
// and here's something different: instead of TwitterEntry, let's claim content is a Map! Map<String,Object> entryAsMap = mapper.readValue(new File("input.json"), new TypeReference<Map<String,Object>>() {} ); // works!
Map<String,Object> entryAsMap = (Map<String,Object>)mapper.readValue(new File("input.json"), Object.class); // as does this

Of these, only last two example may seem surprising: didn't we actually serialize a bean... so how can it become a Map? Because there is no such thing as type (java class) in Json content: ObjectMapper does its best to map Json content to specific Java type, and in general, Objects can be viewed as sort of "static Maps". Hence it is perfectly fine to "Map to Map" here. And finally, ObjectMapper also has sort of special handling for base type "Object.class": it signals that mapper is to use whatever Objects are best matches to Json content in question. For Strings this means Strings, for booleans java.lang.Boolean, for Json arrays java.util.List and for Json object structures java.util.Map. In this case it works similar to how explicitly specifying result to be of type Map works.

2. Writing Objects as Json

Given how simple it was to read Java objects from Json, how hard can it be to write them?
Not very:

  mapper.writeValue(new File("result.json"), entry);

In fact, I claim it is pathetically easy. So much for job security!

3. Where's the Catch?

Given how much simpler data binding appears compared to writing equivalent code by hand, why should anyone ever again write code to read or write Json (or xml) by hand? There are some legitimate reasons:

  • Primary problem is that data binding introducing tight and close coupling between data format and Java objects: if one changes, the other must change too. Sometimes this is ok: both can be modified. In other cases it is problematic: you may not be in position to control such changes. And while there are ways to configure binding, override functionality and add handlers, there is diminishing return: at some point it might be better to just bite the bullet and handle it all programmatically.
  • Efficiency may be problematic too: some data binding packages introduce significant overhead (speed, memory usage). Fortunately Jackson is not "one of those package": additional overhead is modest, often in 15-20% range.
  • Data binding is fundamentally non-streaming: so this approach does not work for huge data streams, without some modifications.

Of these, second and third can usually be resolved: performance may not be a problem to begin with, and partial streaming (chunking) can be achieved by binding sub-sections of content at a time, not the whole document, as long as there are suitable sections that can be processed independently.
For example:

 String doc = "[ 1, 2, 3, 4 ]";
 JsonParser jp = new JsonFactory().createJsonParser(doc);
 ObjectMapper mapper = new ObjectMapper();

 jp.nextToken(); // START_ARRAY
 while (jp.nextToken() != JsonToken.END_ARRAY) {
   Integer I = mapper.readValue(jp, Integer.class);
   // will point to "last event used for the Object", i.e. VALUE_NUMBER_INT itself
 }
 // and would work equally well with beans

would map each integer value one by one, separately. Same approach would obviously work with individual beans, Lists and Maps as well.

So this leaves the main problem: that of highly dynamic, non-structured or dynamically typed content. This is where the last processing approach may come in handy.... and that will be the subject of my next sermon. Drive safely!

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.