Json processing with Jackson: Method #3/3: Tree Traversal

Update, 06-Mar-2009: Alas, code example will not work with Jackson 0.9.9 or above due to API changes; check out javadocs for replacements until I get a chance to rewrite the example

(for background, refer to the earlier "Three Ways to Process Json" entry)

Now that we have both the low-level (event streams) and high-level (data binding) approaches covered, let's consider the third and last alternative: that of using a tree model for traversing over Json content.

So what is the Tree Model that is traversed? It is a tree built from Json content. Tree consists of parent-chuld linked nodes that represent Json constructs such as Arrays ("[ ... ]"), Objects ("{ ... }") and values (true, false, Strings, numbers, nulls). This is similar to xml DOM, as DOM is the "standard" tree model for xml, and there are many alternative tree models (such JDom, Dom4j, XOM) available as well.
This tree can then be traversed, data within accessed, possibly modified and written back out as Json.

Before discussing the approach in more detail, let's have a look at some sample code.

1. Sample Usage

Since it is difficult to demonstrate actual benefits of the approach with simple structures (like the Twitter search entry shown earlier), let's consider something more complicated. Following made-up example of a collection of customer records will have to do:

[
 {
  "name" : {
    "first : "Mortimer",
    "middleInitial : "m",
    "last" : "Moneybags" 
  },
  "address" : {
    "street" : "1729 Opulent Street",
    "zipcode" : 98040,
    "state" : "WA"
  },
  "contactMethods" : [
    { "type" : "phone/home", "ref" : "206-232-1234" },
    { "type" : "phone/work", "ref" : "303-123-4567" }
  ]
 }
// (rest of entries omitted to save space)
]

Let us consider a case where we want to go through all customer entries, and extract some data out of each. Additionally we will add an "email" contact method for each entry, assuming none exist before changes (to simplify code).

TreeMapper mapper = new TreeMapper();
JsonNode root = mapper.readTree(new File("customers.json"));
// we'll get a "org.codehaus.jackson.map.node.ArrayNode" instance for json array, but no need for casts
for (JsonNode customerNode : root) {
  // we know "first" always exists if "name" exists, and is a TextNode (if not, could use 'getValueAsText')
  // (note: could use 'getElementValue' instead of 'getPath', but it's good practice to use getPath())
  String firstName = customerNode.getPath("name").getFieldValue("first").getTextValue();
  // has an address? (could also just use 'getPath()' which returns 'missing' node)
  int zip = -1;
  if (customerNode.getFieldValue("address") != null) {
    zip = customerNode.getFieldValue("address").getFieldValue("zipcode").getIntValue();
  }
  // either way, let's add email contact (that is assumed to be missing)
  ObjectNode email = mapper.objectNode();
  email.setElement("type", mapper.textNode("email"));
  email.setElement("ref", mapper.textNode(firstName+"_"+zip+"@foobar.com"));
  customerNode.getPath("address").appendElement(email);
}

So here we have something to give an idea of what tree traversal code may look like. Let's go back to conceptual musing for a while, before returning to practical concerns.

2. Differences between Tree Model and Data Binding

At first this approach may appear quite similar to data binding: after all, a bunch of interconnected objects is created from Json to be traversed, access, modified and possibly written out as Json again. But whereas data binding converts Json into Java objects (and vice versa), Tree model represents Json content. Tree models are a true native representation of Json content itself and somewhat removed from "real" Java objects: their only purpose is to allow more convenient access to Json than event streams. There is no business functionality involved with the generic node objects. Also, types available are limited to ones that Json natively supports. One important benefit is that there is simple, efficient and reliable one-to-one mapping between the tree model and Json, which means that there is no loss of information when reading Json into the tree model or writing tree model out as Json; and that such transformation is always possible. This is different from data binding where some conversions may not be possible, or need extra configuration and coding to occur.

Rather than regular java objects (that data binding operates on), the tree model here is quite similar to the "Poor Man's Object", plain old HashMap. HashMaps are often used by developers when they don't think they need a "real" object (or don't want to define Yet Another Class etc). Same benefits and challenges apply to tree models as to using HashMaps as flexible and sometimes convenient alternatives to specific Java classes.

3. Benefits

Given above description of what tree model is, what could be reasons to use them over data binding? Here are some common reasons:

Since we do not need specific Java objects to bind to, there may less code to write. Although access may not be as convenient, for simple tasks (especially for "throw-away" code) it is nice not to have to implement boring bean setter/getter code.
If the structure of Json content is highly irregular, it may be difficult (or impossible) to find or create equivalent Java object structure. Tree model may be the only practical choice.
For displaying any Json content (for, say, Json editor) no typing is generally available: but it is quite easy to render a tree. Tree model is a natural choice for internal access and manipulation.

One analogy is that of contrasting dynamic scripting languages (like Ruby, Python or Javascript) and statically typed languages such as Java: Tree Model would be similar to scripting languages, whereas data binding would be similar to Java.

4. Drawbacks

There are also drawbacks, including:

Since access is mostly untyped, many problems that would be found with typed alternative (data binding) may go unnoticed during development
Memory usage is proportional to content mapped (similar to data binding), so tree models can not be used with huge Json content, unless mapping is done chunk at a time. This is the same problem that data binding encounters; and sometimes the solution is to use Stream-of-Events instead.
For some uses, additional memory usage and processing overhead is unnecesary: specifically, when only generating (writing) Json, there is often no need to build an in-memory tree (or objects with data binding) if only Json output is needed. Instead, Stream-of-Events approach is the best choice.
Using Tree Model often leads to either procedural (non-object-oriented) code, or having to wrap pieces of Tree Model in specific Java classes; at which point more code gets written for little gain (compared to regular objects used with data binding)

In general it is good to be clear on why tree model is used over other alternatives: experience with xml processing often leads developers to be too eager to use tree-based processing for all tasks, even when it is not the best choice.

5. Future Plans

Since Jackson API is still evolving, there are many things within TreeMapper and JsonNode APIs that could and will be improved. More convenience methods will be added to simplify construction of new nodes, and to support common access patterns.

One specifically promising un-implemented idea is that of defining a Path or Query language (think of XPath and XQuery). There is a good chance that something like this gets implemented. There have been proposals (such as JsonPath;); these may form the basis of access language.

6. Next?

After reviewing the 3 canonical approaches, it is time to suggest guidelines for choosing between them.
Stay tuned!

Posted by Tatu Saloranta at Sunday, January 25, 2009 11:30 PM
Categories: Java, JSON
| Permalink |Comments | links to this post

CowTalk

Moo-able Type for Cowtowncoder.com

Sunday, January 25, 2009

Json processing with Jackson: Method #3/3: Tree Traversal

Search

Last posts

Categories

Sponsored By

Archives

Related Blogs

Powered By

About me