Friday, August 12, 2011

Traversing JSON trees with Jackson

1. Three models to rule the...

One of three canonical JSON processing models, tree model, may look a bit like a red-headed stepchild. The amount of effort so far spent on both developing and documenting Jackson data-binding functionality is an order of magnitude higher than all the work for tree model functionality. And considering how much more effort using stream-based processing takes, surprisingly many developers choose it over tree handling.

2. Why I never really liked tree model that much

I confess to having slight aversion to using JSON trees as well; but I have a reasonable excuse: I grow to hate tree-based models with XML. Having survived bad experiences of XML DOM processing (which is both cumbersome and inefficient at same time) tends to inoculate one against further infections. I know this is bit of unjustified bias, considering that most problems with DOM had nothing to do with the basic idea of an in-memory tree model (and not all even due to it being XML...)

3. ... even though I perhaps should have

But Jackson actually does provide reasonable support for JSON trees with its JsonNode based model, and many brave developers have put it to good use. And due to Jackson's extensive support for efficient conversions between models (that is, ability to both combine approaches and to convert data as needed), you don't have to pick and choose just one model but can combine strengths of each model. Tree model's expressive power is actually very useful when doing pre- or post-processing of data binding; or when building quick prototype systems.

4. Basics

A "JSON tree" is defined by one simple thing: org.codehaus.jackson.JsonNode object that acts as the tree of the logical tree. The root node is usually of type 'ObjectNode' (and represents JSON Object), but most operations (all read-operations, specifically) are exposed through basic JsonNode interface.

There are three basic options for creating a JSON tree instance, all accessible via ObjectMapper:

  1. Parse from a JSON source: JsonNode root = mapper.readTree(json);
  2. Convert from a POJO: JsonNode root = mapper.valueToTree(pojo); // special case of 'ObjectMapper.convertValue()'
  3. Construct from scratch: ObjectNode root = mapper.createObjectNode();

The choice largely depends on use case, that is, what do you have to work with; whether you generating new tree from scratch, or modify an existing JSON structure.

After you have the root node you can traverse it modify structure, and convert to other representations (serialize as JSON, convert to a POJO).

5. Back & Forth

Aside from the ability to convert a POJO to a tree, you can easily do the reverse using "ObjectMapper.treeToValue()". Or, if you happen to need a JsonParser, use "ObjectMapper.treeAsTokens()". And to create actual textual JSON, the regular "ObjectMapper.writeValue()" works as expected.

In fact, from ObjectMapper's perspective, JsonNode is just another Java type and is handled using serializers, deserializers which can be overridden if you want to customize handling. You can even replace JsonNodeFactory that ObjectMapper uses, if you want to provide custom JsonNode implementation classes!

6. More convenient traversal

One of things that has quietly improved over time has been traversal. Earliest Jackson versions just supported basic traversal like so:

  JsonNode root = mapper.readTree("{\"address\":{\"zip\":98040, \"city\":\"Mercer Island\"}}");
  JsonNode address = root.get("address");
  if (address != null && address.has("zip")) {
    int zip = address.get("zip").getIntValue();

but it soon became apparent that null checks are a worthless hassle, so alternative access, "path()" was quickly added. It allows for traversing over virtual paths, without worrying whether a node exists: if one does not exist, it will just be evaluate as "missing node" when trying to access actual leaf value:

  JsonNode root = ...;
  int zip = root.path("address").path("zip").getValueAsInt(); // if no such path, returns 0
  // could also do:
  JsonNode zipNode = root.path("address").path("zip");
  if (zipNode.isMissingNode()) { // true if no such path exists

This is fine and dandy for read-only use cases, but it does not help when trying to add things -- while you can traverse path that does not really exist, you can not add anything to it. To address this shortcoming, Jackson 1.8 comes equipped with "with()" method, which will actually create the path if it does not exist. So you can finally write something like this:

  JsonNode root = ObjectMapper.createObjectNode();
  // note: JsonNode.with() returns 'JsonNode'; but ObjectNode.with() 'ObjectNode' -- go contra-variance!
  root.with("address").put("zip", 98040);

which actually makes Jackson Tree usage almost as convenient as I would like it to be. It is especially useful when materializing full trees from scratch: you can implicitly build the tree structure just by traversing it!

7. More?

Jackson tree model is still somewhat spartan, especially compared to features galore of data binding. Going forward it would be nice to add support for things like:

  • Simple path language (JsonPath, JsonQuery?) support, to be able to evaluate expressions to locate nodes.
  • Filtering during construction, to create trimmed/pruned trees, sub-trees
  • More advanced find methods? There already exists a few "findXxx()" methods in JsonNode, but more would make sense, esp. with configurable matchers or filters
  • Method names are bit too verbose (mostly due to historical reasons -- I didn't realize early enough that long method names can hurt when chained calls are used)

But as usual, much of Jackson development work is feedback-driven -- features that get used also get more likely further improved. So if you do find Tree Model useful, let development team know that!

Thursday, August 11, 2011

One of coolest, least well-known Jackson features: Mr Bean, aka "abstract type materialization"

1. Quest for simplest JSON processing, eliminating monkey code: "struct classes"

I have found myself using "Java structs" quite often, when accessing JSON services from Java. By this I mean simple public-field-only classes like:

public class RequestDTO {
 public long requestId;
 public String callerId;

While many Java newbies think there is something wrong in using public fields, there is actually very little harm in using such classes for simple data transfer, if no actual business logic is needed for classes themselves.

2. But sometimes "real" classes would be nice

Then again, sometimes it would be nice to use more full-featured Bean(-like) POJOs. Perhaps we want to add some input validation for setters; or add convenience accessors, or even just occasional 'toString()' implementation.

For above example, we might want to get something like:

public class RequestImpl
{ private long requestId; private String callerId; public RequstImpl() { } public long getRequestId() { return requestId; } public String getCallerId() { return callerId; } public void setRequestId(long l) { requestId = l; public void setCallerId(String s) { callerId = s; } @Override public String toString() { return String.format("[request: id %d, caller %s]", requestId, callerId); } }

But ideally we would usually just define something like

public interface Request {
  public long getRequestId();
  public String getCallerId();

  public void setRequestId(long l);
  public void setCallerId(String s);

and somehow get an implementation; alas, that usually means writing boiler-plate implementation for that interface (and if we are masochists, sometimes even intermediate abstract classes...)

So what's the problem here? I don't particular like writing monkey code to declare basic setters, getters, and fields; especially when there is nothing interesting going on there, just mechanical typing. And while one can use IDEs to generate sources, this only helps with bootstrapping: you still get more source code to maintain, which translates to more place where bugs may hide when definitions are edited. Similarly various annotation-based post-processors seem alien to me if they just produce more source code to compile.

3. So why not just like... get implementations "materialize"?

But while I don't like the idea of getting yet more source code generated to be compiled, maintained, I do like the idea of getting actual implementation classes dynamically.

And this is where entry #6 of "7 Jackson killer features" comes in: enter mr. Bean! When enabled, it can actually materialize concrete implementations as needed.

4. Mr Bean: basics

(from FasterXML Mr Bean Wiki page)

Basic usage is simple: you need jackson mrbean jar (included in Jackson distribution), and need to enable functionality with:

  ObjectMapper mapper = new ObjectMapper();
  mapper.registerModule(new MrBeanModule());

and then just watch interfaces appear: for example, with above example:

  Request request = objectMapper.readValue(jsonInput, Request.class); // where Request is an interface

What happens here is that mr Bean extension hooks with ObjectMapper, and whenever an abstract type is encountered and there is no concrete class available (no abstract type mapping; no annotation to indicate concrete type; no @JsonTypeInfo to provide subtype information), it is asked to "materialize" concrete type.

Materialization simply means generating bytecode using ASM, based on getters and/or setters; adding necessary internal fields, loading class and returning it to caller. After this, core Jackson mapper can introspect all information it needs, and what you get is an instance of this implementation. Implementations are cached for later use, and performance-wise they behave similarly to manually implemented ones would.

5. Mr Bean: but wait! There's more!

Ok: so we can get monkey code materialized: getters and setters are implemented, and internal fields added to store values. But this is just the beginning.

First: if you do not need to use setters yourself you can freely omit them from interface definition.
Mr Bean is smart enough to figure out that setters are typically needed to set values (or public fields) if there are getters materialized.
So you can simplify your interfaces/abstract classes to look something like:

  public interface RequestWithoutSetters {
    public long getRequestId();
    public String getCallerId();

and things will still work just fine; you can't access setters (which actually may be a good thing), but Jackson data binder can populate values just fine (internally either setters get generated; or public fields added to implementation, this is an implementation detail).

Aside from simplistic get/set Bean it is more commont to want a partial implementation; an abstract class where you provide some methods and/or fields, but can leave implementation of trivial properties to Mr Bean. This it can do just fine: mr Bean can materialize abstract classes, just "filling in the blanks".

So you can ask for a class like:

  public abstract class RequestBase {
    public long getRequestId();
    public String getCallerId();
    @Override public String toString() {
      return String.format("[request: id %d, caller %s]", requestId, callerId); }

and things work, well, as expected. Note, too, that you can implement setters and getters, not just "other" methods.

And finally: you can use annotations normally as well, adding them to your interface/abstract class definition. Thanks to Jackson's powerful and versatile annotation handling (including annotation "inheritance" for methods), you can do something like:

  // JSON we get has weird names; need to annotate
  public abstract class RequestBase {
    public long getRequestId();
    public String getCallerId();

and get things configured as per annotations.

6. Known issues?

Mr Bean seems to work to degree I need it to work. But there are some potential concerns you may need to be aware of:

  • Jackson has multiple ways of dealing with abstract types: do you want bean materialized or not? As mentioned above, mr Bean does not try to materialize abstract types that seem to expect different kind of handling; for example, if interface has @JsonTypeInfo annotation, assumption is that polymorphic handling can figure out actual type. But it is possible that there are corner cases (esp. when using "default typing") there might be conflicts. So polymorphic types may not mix well with mr Bean materialization
  • Generic signatures may not be added as expected. Although you can declared generic types for abstract methods just fine, and Jackson mapper should fine declarations, there are some issues due to complexities in getting generic declarations work with ASM. You may need to use additional annotations (@JsonDeserialize(contentAs=...)) in some cases.

Above is just a list of potential concerns -- as far as I know, they haven't been found to be much of a problem in actual use so far.

7. What Next?

Usage, usage, usage! It would be great to get more Jackson users use this potentially hugely work-saving feature. And if you find the feature useful, make sure to let your friends know! (if you hate it, just let me know :-) ).

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.