Friday, August 12, 2011

Traversing JSON trees with Jackson

1. Three models to rule the...

One of three canonical JSON processing models, tree model, may look a bit like a red-headed stepchild. The amount of effort so far spent on both developing and documenting Jackson data-binding functionality is an order of magnitude higher than all the work for tree model functionality. And considering how much more effort using stream-based processing takes, surprisingly many developers choose it over tree handling.

2. Why I never really liked tree model that much

I confess to having slight aversion to using JSON trees as well; but I have a reasonable excuse: I grow to hate tree-based models with XML. Having survived bad experiences of XML DOM processing (which is both cumbersome and inefficient at same time) tends to inoculate one against further infections. I know this is bit of unjustified bias, considering that most problems with DOM had nothing to do with the basic idea of an in-memory tree model (and not all even due to it being XML...)

3. ... even though I perhaps should have

But Jackson actually does provide reasonable support for JSON trees with its JsonNode based model, and many brave developers have put it to good use. And due to Jackson's extensive support for efficient conversions between models (that is, ability to both combine approaches and to convert data as needed), you don't have to pick and choose just one model but can combine strengths of each model. Tree model's expressive power is actually very useful when doing pre- or post-processing of data binding; or when building quick prototype systems.

4. Basics

A "JSON tree" is defined by one simple thing: org.codehaus.jackson.JsonNode object that acts as the tree of the logical tree. The root node is usually of type 'ObjectNode' (and represents JSON Object), but most operations (all read-operations, specifically) are exposed through basic JsonNode interface.

There are three basic options for creating a JSON tree instance, all accessible via ObjectMapper:

  1. Parse from a JSON source: JsonNode root = mapper.readTree(json);
  2. Convert from a POJO: JsonNode root = mapper.valueToTree(pojo); // special case of 'ObjectMapper.convertValue()'
  3. Construct from scratch: ObjectNode root = mapper.createObjectNode();

The choice largely depends on use case, that is, what do you have to work with; whether you generating new tree from scratch, or modify an existing JSON structure.

After you have the root node you can traverse it modify structure, and convert to other representations (serialize as JSON, convert to a POJO).

5. Back & Forth

Aside from the ability to convert a POJO to a tree, you can easily do the reverse using "ObjectMapper.treeToValue()". Or, if you happen to need a JsonParser, use "ObjectMapper.treeAsTokens()". And to create actual textual JSON, the regular "ObjectMapper.writeValue()" works as expected.

In fact, from ObjectMapper's perspective, JsonNode is just another Java type and is handled using serializers, deserializers which can be overridden if you want to customize handling. You can even replace JsonNodeFactory that ObjectMapper uses, if you want to provide custom JsonNode implementation classes!

6. More convenient traversal

One of things that has quietly improved over time has been traversal. Earliest Jackson versions just supported basic traversal like so:

  JsonNode root = mapper.readTree("{\"address\":{\"zip\":98040, \"city\":\"Mercer Island\"}}");
  JsonNode address = root.get("address");
  if (address != null && address.has("zip")) {
    int zip = address.get("zip").getIntValue();

but it soon became apparent that null checks are a worthless hassle, so alternative access, "path()" was quickly added. It allows for traversing over virtual paths, without worrying whether a node exists: if one does not exist, it will just be evaluate as "missing node" when trying to access actual leaf value:

  JsonNode root = ...;
  int zip = root.path("address").path("zip").getValueAsInt(); // if no such path, returns 0
  // could also do:
  JsonNode zipNode = root.path("address").path("zip");
  if (zipNode.isMissingNode()) { // true if no such path exists

This is fine and dandy for read-only use cases, but it does not help when trying to add things -- while you can traverse path that does not really exist, you can not add anything to it. To address this shortcoming, Jackson 1.8 comes equipped with "with()" method, which will actually create the path if it does not exist. So you can finally write something like this:

  JsonNode root = ObjectMapper.createObjectNode();
  // note: JsonNode.with() returns 'JsonNode'; but ObjectNode.with() 'ObjectNode' -- go contra-variance!
  root.with("address").put("zip", 98040);

which actually makes Jackson Tree usage almost as convenient as I would like it to be. It is especially useful when materializing full trees from scratch: you can implicitly build the tree structure just by traversing it!

7. More?

Jackson tree model is still somewhat spartan, especially compared to features galore of data binding. Going forward it would be nice to add support for things like:

  • Simple path language (JsonPath, JsonQuery?) support, to be able to evaluate expressions to locate nodes.
  • Filtering during construction, to create trimmed/pruned trees, sub-trees
  • More advanced find methods? There already exists a few "findXxx()" methods in JsonNode, but more would make sense, esp. with configurable matchers or filters
  • Method names are bit too verbose (mostly due to historical reasons -- I didn't realize early enough that long method names can hurt when chained calls are used)

But as usual, much of Jackson development work is feedback-driven -- features that get used also get more likely further improved. So if you do find Tree Model useful, let development team know that!

blog comments powered by Disqus

Sponsored By

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.