Saturday, August 18, 2012

Replacing standard JDK serialization using Jackson (JSON/Smile), java.io.Externalizable

1. Background

The default Java serialization provided by JDK is a two-edged sword: on one hand, it is a simple, convenient way to "freeze and thaw" Objects you have, handling about any kind of Java object graphs. It is possibly the most powerful serialization mechanism on Java platform, bar none.

But on the other hand, its shortcomings are well-document (and I hope, well-known) at this point. Problems include:

  • Poor space-efficiency (especially for small data), due to inclusion of all class metadata: that is, size of output can be huge, larger than about any alternative, including XML
  • Poor performance (especially for small data), partly due to size inefficiency
  • Brittleness: smallest changes to class definitions may break compatibility, preventing deserialization. This makes it a poor choice for both data exchange between (Java) systems as well as long-term storage

Still, the convenience factor has led to many systems using JDK serialization to be the default serialization method to use.

Is there anything we could do to address downsides listed above? Plenty, actually. Although there is no way to do much more for the default implementation (JDK serialization implementation is in fact ridiculously well optimized for what it tries to achieve -- it's just that the goal is very ambitious), one can customize what gets used by making objects implement java.io.Externalizable interface. If so, JDK will happily use alternate implementation under the hood.

Now: although writing custom serializers may be fun sometimes -- and for specific case, you can actually write very efficient solution as well, given enough time -- it would be nice if you could use an existing component to address listed short-comings.

And that's what we'll do! Here's one possible way to improve on all problems listed above:

  1. Use an efficient Jackson serializer (to produce either JSON, or perhaps more interestingly, Smile binary data)
  2. Wrap it in nice java.io.Externalizable, to make it transparent to code using JDK serialization (albeit not transparent for maintainers of the class -- but we will try minimizing amount of intrusive code)

2. Challenges with java.io.Externalizable

First things first: while conceptually simple, there are couple of rather odd design decisions that make use of java.io.Externalizable bit tricky:

  1. Instead of passing instances of java.io.InputStream, java.io.OutputStream, instead java.io.ObjectOutput and java.io.ObjectInput are used; and they do NOT extend stream versions (even though they define mostly same methods!). This means additional wrapping is needed
  2. Externalizable.readExternal() requires updating of the object itself, not that of constructing new instances: most serialization frameworks do not support such operation
  3. How to access external serialization library, as no context is passed to either of methods?

These are not fundamental problems for Jackson: first one requires use of adapter classes (see below), second that we need to use "updating reader" approach that Jackson was supported for a while (yay!). And to solve the third part, we have at least two choices: use of ThreadLocal for passing an ObjectMapper; or, use of a static helper class (approach shown below)

So here are the helper classes we need:

final static class ExternalizableInput extends InputStream
{
  private final ObjectInput in;

  public ExternalizableInput(ObjectInput in) {
   this.in = in;
  }

  @Override
  public int available() throws IOException {
    return in.available();
  }

  @Override
  public void close() throws IOException {
    in.close();
  }

  @Override
  public boolean  markSupported() {
    return false;
  }

  @Override
  public int read() throws IOException {
   return in.read();
  }

  @Override
  public int read(byte[] buffer) throws IOException {
    return in.read(buffer);
  }

  @Override
  public int read(byte[] buffer, int offset, int len) throws IOException {
    return in.read(buffer, offset, len);
  }

  @Override
  public long skip(long n) throws IOException {
   return in.skip(n);
  }
}

final static class ExternalizableOutput extends OutputStream { private final ObjectOutput out; public ExternalizableOutput(ObjectOutput out) { this.out = out; } @Override public void flush() throws IOException { out.flush(); } @Override public void close() throws IOException { out.close(); } @Override public void write(int ch) throws IOException { out.write(ch); } @Override public void write(byte[] data) throws IOException { out.write(data); } @Override public void write(byte[] data, int offset, int len) throws IOException { out.write(data, offset, len); } }

/* Use of helper class here is unfortunate, but necessary; alternative would
* be to use ThreadLocal, and set instance before calling serialization.
* Benefit of that approach would be dynamic configuration; however, this
* approach is easier to demonstrate.
*/
class MapperHolder { private final ObjectMapper mapper = new ObjectMapper(); private final static MapperHolder instance = new MapperHolder(); public static ObjectMapper mapper() { return instance.mapper; } }

and given these classes, we can implement Jackson-for-default-serialization solution.

3. Let's Do a Serialization!

So with that, here's a class that is serializable using Jackson JSON serializer:


  static class MyPojo implements Externalizable
  {
        public int id;
        public String name;
        public int[] values;

        public MyPojo() { } // for deserialization
        public MyPojo(int id, String name, int[] values)
        {
            this.id = id;
            this.name = name;
            this.values = values;
        }

        public void readExternal(ObjectInput in) throws IOException {
            MapperHolder.mapper().readerForUpdating(this).readValue(new ExternalizableInput(in));
} public void writeExternal(ObjectOutput oo) throws IOException { MapperHolder.mapper().writeValue(new ExternalizableOutput(oo), this); }
}

to use that class, use JDK serialization normally:


  // serialize as bytes (to demonstrate):
MyPojo input = new MyPojo(13, "Foobar", new int[] { 1, 2, 3 } ); ByteArrayOutputStream bytes = new ByteArrayOutputStream(); ObjectOutputStream obs = new ObjectOutputStream(bytes); obs.writeObject(input); obs.close(); byte[] ser = bytes.toByteArray();

// and to get it back:
ObjectInputStream ins = new ObjectInputStream(new ByteArrayInputStream(ser)); MyPojo output = (MyPojo) ins.readObject();
ins.close();

And that's it.

4. So what's the benefit?

At this point, you may be wondering if and how this would actually help you. Since JDK serialization is using binary format; and since (allegedly!) textual formats are generally more verbose than binary formats, how could this possibly help with size of performance?

Turns out that if you test out code above and compare it with the case where class does NOT implement Externalizable, sizes are:

  • Default JDK serialization: 186 bytes
  • Serialization as embedded JSON: 130 bytes

Whoa! Quite unexpected result? JSON-based alternative 30% SMALLER than JDK serialization!

Actually, not really. The problem with JDK serialization is not the way data is stored, but rather the fact that in addition to (compact) data, much of Class definition metadata is included. This metadata is needed to guard against Class incompatibilities (which it can do pretty well), but it comes with a cost. And that cost is particularly high for small data.

Similarly, performance typically follows data size: while I don't have publishable results (I may do that for a future post), I expect embedded-JSON to also perform significantly better for single-object serialization use cases.

5. Further ideas: Smile!

But perhaps you think we should be able to do better, size-wise (and perhaps performance) than using JSON?

Absolutely. Since the results are not exactly readable (to use Externalizable, bit of binary data will be used to indicate class name, and little bit of stream metadata), we probably do not greatly care what the actual underlying format is.
With this, an obvious choice would be to use Smile data format, binary counterpart to JSON, a format that Jackson supports 100% with Smile Module.

The only change that is needed is to replace the first line from "MapperHolder" to read:

private final ObjectMapper mapper = new ObjectMapper(new SmileFactory());

and we will see even reduced size, as well as faster reading and writing -- Smile is typically 30-40% smaller in size, and 30-50% faster to process than JSON.

6. Even More compact? Consider Jackson 2.1, "POJO as array!"

But wait! In very near future, we may be able to do EVEN BETTER! Jackson 2.1 (see the Sneak Peek) will introduce one interesting feature that will further reduce size of JSON/Smile Object serialization. By using following annotation:

@JsonFormat(shape=JsonFormat.Shape.OBJECT)

you can further reduce the size: this occurs as the property names are excluded from serialization (think of output similar to CSV, just using JSON Arrays).

For our toy use case, size is reduced further from 130 bytes to 109; further reduction of almost 20%. But wait! It gets better -- same will be true for Smile as well, since while it can reduce space in general, it still has to retain some amount of name information normally; but with POJO-as-Arrays it will use same exclusion!

7. But how about actual real-life results?

At this point I am actually planning on doing something based on code I showed above. But planning is in early stages so I do not yet have results from "real data"; meaning objects of more realistic sizes. But I hope to get that soon: the use case is that of storing entities (data for which is read from DB) in memcache. Existing system is getting CPU-bound both from basic serialization/deserialization activity, but especially from higher number of GCs. I fully expect the new approach to help with this; and most importantly, be quite easy to deploy: this because I do not have to change any of code that actually serializes/deserializes Beans -- I just have to modify Beans themselves a bit.

Forcing escaping of HTML characters (less-than, ampersand) in JSON using Jackson

1. The problem

Jackson handles escaping of JSON String values in minimal way using escaping where absolutely necessary: it escapes two characters by default -- double quotes and backslash -- as well as non-visible control characters. But it does not escape other characters, since this is not required for producing valid JSON documents.

There are systems, however, that may run into problems with some characters that are valid in JSON documents. There are also use cases where you might prefer to add more escaping. For example, if you are to enclose a JSON fragment in XML attribute (or Javascript code), you might want to use apostrophe (') as quote character in XML, and force escaping of all apostrophes in JSON content; this allows you to simple embed encoded JSON value without other transformations.

Another specific use case is that of escaping "HTML funny characters", like less-than, greater-than, ampersand and apostrophe characters (double-quote are escaped by default).

Let's see how you can do that with Jackson.

2. Not as easy to change as you might think

Your first thought may be that of "I'll just do it myself". The problem is two-fold:

  1. When using API via data-binding, or regular Streaming generator, you must pass unescaped String, and it will get escaped using Jackson's escaping mechanism -- you can not pre-process it (*)
  2. If you decide to post-process content after JSON gets written, you need to be careful with replacements, and this will have negative impact on performance (i.e. it is likely to double time serialization takes)

(*) actually, there is method 'JsonGenerator.writeRaw(...)' which you can use to force exact details, but its use is cumbersome and you can easily break things if you are not careful. Plus it is only applicable via Streaming API

3. Jackson (1.8) has you covered

Luckily, there is no need for you to write custom post-processing code to change details of content escaping.

Version 1.8 of Jackson added a feature to let users customize details of escaping of characters in JSON String values.
This is done by defining a CharacterEscapes object to be used by JsonGenerator; it is registered on JsonFactory. If you use data-binding, you can set this by using ObjectMapper.getJsonFactory() first, then define CharacterEscapes to use.

Functionality is handled at low-level, during writing of JSON String values; and CharacterEscapes abstract class is designed in a way to minimize performance overhead.
While there is some performance overhead (little bit of additional processing is required), it should not have significant impact unless significant portion of content requires escaping.
As usual, if you care a lot about performance, you may want to measure impact of the change with test data.

4. The Code

Here is a way to force escaping of HTML "funny characters", using functionality Jackson 1.8 (and above) have.


import org.codehaus.jackson.SerializableString;
import org.codehaus.jackson.io.CharacterEscapes;

// First, definition of what to escape public class HTMLCharacterEscapes extends CharacterEscapes { private final int[] asciiEscapes; public HTMLCharacterEscapes() {
// start with set of characters known to require escaping (double-quote, backslash etc) int[] esc = CharacterEscapes.standardAsciiEscapesForJSON();
// and force escaping of a few others: esc['<'] = CharacterEscapes.ESCAPE_STANDARD; esc['>'] = CharacterEscapes.ESCAPE_STANDARD; esc['&'] = CharacterEscapes.ESCAPE_STANDARD; esc['\''] = CharacterEscapes.ESCAPE_STANDARD; asciiEscapes = esc; }
// this method gets called for character codes 0 - 127 @Override public int[] getEscapeCodesForAscii() { return asciiEscapes; }
// and this for others; we don't need anything special here @Override public SerializableString getEscapeSequence(int ch) { // no further escaping (beyond ASCII chars) needed: return null; } }

// and then an example of how to apply it
public ObjectMapper getEscapingMapper() {
ObjectMapper mapper = new ObjectMapper();
mapper.getJsonFactory().setCharacterEscapes(new HTMLCharacterEscapes());
return mapper;
}

// so we could do:
public byte[] serializeWithEscapes(Object ob) throws IOException
{
return getEscapingMapper().writeValueAsBytes(ob);
}


And that's it.

Thursday, May 03, 2012

Jackson Data-binding: Did I mention it can do YAML as well?

Note: as useful earlier articles, consider reading "Jackson 2.0: CSV-compatible as well" and "Jackson 2.0: now with XML, too!"

1. Inspiration

Before jumping into the actual beef -- the new module -- I want to mention my inspiration for this extension: the Greatest New Thing to hit Java World Since JAX-RS called DropWizard.

For those who have not yet tried it out and are unaware of its Kung-Fu Panda like Awesomeness, please go and check it out. You won't be disappointed.

DropWizard is a sort of mini-framework that combines great Java libraries (I may be biased, as it does use Jackson), starting with trusty JAX-RS/Jetty8 combination, building with Jackson for JSON, jDBI for DB/JDBC/SQL, Java Validation API (impl from Hibernate project) for data validation, and logback for logging; adding bit of Jersey-client for client-building and optional FreeMarker plug-in for UI, all bundled up in a nice, modular and easily understandable packet.
Most importantly, it "Just Works" and comes with intuitive configuration and bootstrapping system. It also builds easily into a single deployable jar file that contains all the code you need, with just a bit of Maven setup; all of which is well documented. Oh, and the documentation is very accessible, accurate and up-to-date. All in all, a very rare combination of things -- and something that would give RoR and other "easier than Java" frameworks good run for their money, if hipsters ever decided to check out the best that Java has to offer.

The most relevant part here is the configuration system. Configuration can use either basic JSON or full YAML. And as I mentioned earlier, I am beginning to appreciate YAML for configuring things.

1.1. The Specific inspirational nugget: YAML converter

The way DropWizard uses YAML is to parse it using SnakeYAML library, then convert resulting document into JSON tree and then using Jackson for data binding. This is useful since it allows one to use full power of Jackson configuration including annotations and polymorphic type handling.

But this got me thinking -- given that the whole converter implementation about dozen lines or so (to work to degree needed for configs), wouldn't it make sense to add "full support" for YAML into Jackson family of plug-ins?

I thought it would.

2. And Then There Was One More Backend for Jackson

Turns out that implementation was, indeed, quite easy. I was able to improve certain things -- for example, module can use lower level API to keep performance bit better; and output side also works, not just reader -- but in a way, there isn't all that much to do since all module has to do is to convert YAML events into JSON events, and maybe help with some conversions.

Some of more advanced things include:

  • Format auto-detection works, thanks to "---" document prefix (that generator also produces by default)
  • Although YAML itself exposes all scalars as text (unless type hints are enabled, which adds more noise in content), module uses heuristics to make parser implementation bit more natural; so although data-binding can also coerce types, this should usually not be needed
  • Configuration includes settings to change output style, to allow use of more aesthetically pleasing output (for those who prefer "wiki look", for example)

At this point, functionality has been tested with a broad if shallow set of unit tests; but because data-binding used is 100% same as with JSON, testing is actually sufficient to use module for some work.

3. Usage? So boring I tell you

Oh. And you might be interested in knowing how to use the module. This is the boring part, since.... there isn't really much to it.

You just use "YAMLFactory" wherever you would normally use "JsonFactory"; and then under the hood you get "YAMLParser" and "YAMLGenerator" instances, instead of JSON equivalents. And then you either use parser/generator directly, or, more commonly, construct an "ObjectMapper" with "YAMLFactory" like so (code snippet itself is from test "SimpleParseTest.java")


  ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
User user = mapper.readValue("firstName: Billy\n"
+"lastName: Baggins\n"
+"gender: MALE\n"
+"userImage: AQIDBAY=",
User.class);


and to get the functionality itself, Maven dependency is:

<dependency>
  <groupId>com.fasterxml.jackson.dataformat</groupId>
  <artifactId>jackson-dataformat-yaml</artifactId>
  <version>2.0.0</version>
</dependency>

4. That's all Folks -- until you give us some Feedback!

That's it for now. I hope some of you will try out this new backend, and help us further make Jackson 2.0 the "Universal Java Data Processor"

Tuesday, April 10, 2012

What me like YAML? (Confessions of a JSON advocate)

Ok. I have to admit that I learnt something new and gained bit more respect for YAML data format recently, when working on the proof-of-concept for YAML-on-Jackson (jackson-dataformat-yaml; more on this on yet another Jackson 2.0 article, soon).
And since it would be intellectually dishonest not to mention that my formerly negative view on YAML has brightened up a notch, here's my write-up on this bit of enlightenment.

1. Bad First Impressions Stick

My first look at YAML via its definition basically made my stomach turn. It just looked so much like a bad American Ice Cream: "Too Much of Everything" -- hey, if it isn't enough to have chocolate, banana and walnut, let's throw in bit of caramel, root beer essence and touch of balsamic vinegar; along with bit of organic arugula to spice things up!". That isn't the official motto, I thought, but might as well be. If there is an O'Reilly book on YAML it surely must have platypus as the cover animal.

That was my thinking up until few weeks ago.

2. Tale of the Two Goals

I have read most of YAML specification (which is not badly written at all) multiple times, as well as shorter descriptions. My overall conclusion has always been that there are multiple high-level design decisions that I disagree with, and that these can mostly be summarized that it tries to do too many things, tries to solve multiple conflicting use cases.

But recently when working on adding YAML support as Jackson module (based on nice SnakeYAML library, solid piece of code, very unlike most parsers/generators I have seen), I realized that fundamentally there are just two conflicting goals:

  1. Define a Wiki-style markup for data (assuming it is easier to not only write prose in, but also data)
  2. Create a straight-forward Object serialization data format

(it is worth noting that these goals are orthogonal, functionality-wise; but they conflict at level of syntax, visual appearance and complicate handling significantly, mostly because there is always "more than one way to do it" (Perl motto!))

I still think that one could solve the problem better by defining two, not one, format: first one with a Wiki dialect; and second one with a clean data format.
But this lead me to think about something: what if those weird Wiki-style aspects were removed from YAML? Would I still dislike the format?

And I came to conclusion that no, I would not dislike it. In fact, I might like it. A lot.

Why? Let's see which things I like in YAML; things that JSON does not have, but really really should have in the ideal world.

3. Things that YAML has and JSON should have

Here's the quick rundown:

  1. Comments: oh lord, what kind of textual data format does NOT have comments? JSON is the only one I know of; and even it had them before spec was finalized. I can only imagine a brain fart of colossal proportions caused it to be removed from the spec...
  2. (optional) Document start and end markers ("---" header, "..." footer"). This is such a nice thing to have; both for format auto-detection purpose as well as for framing for data feeds. It's bit of a no-brainer; but suspiciously, JSON has nothing of sort (XML does have XML declaration which _almost_ works well, but not quite; but I digress)
  3. Type tags for type metadata: in YAML, one can add optional type tags, to further indicate type of an Object (or any value actually). This is such an essential thing to have; and with JSON one must use in-band constructs that can conflict with data. XML at least has attributes ("xsi:type").
  4. Aliases/anchors for Object Identity (aka "id / idref"): although data is data, not objects with identity, having means to optionally pass identity information is very, very useful. And here too XML has some support (having attributes for metadata is convenient); and JSON has nada.

The common theme with above is that all extra information is optional; but if used, it is included discreetly and can be used as appropriate by encoders, decoders, with or without using language- or platform-specific resolution mechanisms.
And I think YAML actually declares these things pretty well: it is neither over nor under engineered with respect to these features. This is surprisingly delicate balance, and very well chosen. I have seen over-complicated data formats (at Amazon, for example) that didn't know where to stop; and we can see how JSON stopped too short of even most rudimentary things (... comments). Interestingly, XML almost sort-of has these features; but they come about with extra constructs (xsi:type via XML Schema), or are side effects of otherwise quirky features (element/attribute separation).

Having had to implement equivalent functionality on top of simplistic JSON construct ("add yet another meta-property, in-line with actual data; allow a way to configure it to reduce conflicts"), I envy having these constructs as first-level concepts, convenient little additions that allow proper separation of data and metadata (type, object id; comments).

4. Uses for YAML

Still, having solved/worked around all of above problems -- Jackson 1.5 added full support for polymorphic types ("type tags"); 2.0 finally added Object Identity ("alias/anchor"), use of linefeeds for framing can substitute for document boundaries -- I do not have compelling case for using YAML for data transfer. It's almost a pity -- I have come to realize that YAML could have been a great data format (it is also old enough to have challenged popularity of JSON, both seem to have been conceived at about same time). As is, it is almost one.

Somewhat ironically, then, is that maybe Wiki features are acceptable for the other main use case: that of configuration files. This is the use case I have for YAML; and the main reason for writing compatibility module (inspired by libs/frameworks like DropWizard which use YAML as the main config file format).

Friday, April 06, 2012

Take your JSON processing to Mach 3 with Jackson 2.0, Afterburner

(this is part on-going "Jackson 2.0" series, starting with "Jackson 2.0 released")

1. Performance overhead of databinding

When using automatic data-binding Jackson offers, there is some amount of overhead compared to manually writing equivalent code that would use Jackson streaming/incremental parser and generator. But how overhead is there? The answer depends on multiple factors, including exactly how good is your hand-written code (there are a few non-obvious ways to optimize things, compared to data-binding where there is little configurability wrt performance).

But looking at benchmarks such as jvm-serializers, one could estimate that it may take anywhere between 35% and 50% more time to serialize and deserialize POJOs, compared to highly tuned hand-written alternative. This is usually not enough to matter a lot, considering that JSON processing overhead is typically only a small portion of all processing done.

2. Where does overhead come?

There are multiple things that automatic data-binding has to do that hand-written alternatives do not. But at high level, there are really two main areas:

  1. Configurability to produce/consume alternative representations; code that has to support multiple ways of doing things can not be as aggressively optimized by JVM and may need to keep more state around.
  2. Data access to POJOs is done dynamically using Reflection, instead of directly accessing field values or calling setters/getters

While there isn't much that can be done for former, in general sense (especially since configurability and convenience are major reasons for popularity of data-binding), latter overhead is something that could be theoretically eliminated.

How? By generating bytecode that does direct access to fields and calls to getters/setters (as well as for constructing new instances).

3. Project Afterburner

And this is where Project Afterburner comes in. What it does really is as simple as generating byte code, dynamically, to mostly eliminate Reflection overhead. Implementation uses well-known lightweight bytecode library called ASM.

Byte code is generated to:

  1. Replace "Class.newInstance()" calls with equivalent call to zero-argument constructor (currently same is not done for multi-argument Creator methods)
  2. Replace Reflection-based field access (Field.set() / Field.get()) with equivalent field dereferencing
  3. Replace Reflection-based method calls (Method.invoke(...)) with equivalent direct calls
  4. For small subset of simple types (int, long, String, boolean), further streamline handling of serializers/deserializers to avoid auto-boxing

It is worth noting that there are certain limitations to access: for example, unlike with Reflection, it is not possible to avoid visibility checks; which means that access to private fields and methods must still be done using Reflection.

4. Engage the Afterburner!

Using Afterburner is about as easy as it can be: you just create and register a module, and then use databinding as usual:


Object mapper = new ObjectMapper()
mapper.registerModule(new AfterburnerModule());
String json = mapper.writeValueAsString(value);
Value value = mapper.readValue(json, Value.class);

absolutely nothing special there (note: for Maven dependency, downloads, go see the project page).

5. How much faster?

Earlier I mentioned that Reflection is just one of overhead areas. In addition to general complexity from configurability, there are cases where general data-binding has to be done using simple loops, whereas manual code could use linear constructs. Given this, how much overhead remains after enabling Afterburner?

As per jvm-serializers, more than 50% of speed difference between data-binding and manual variant are eliminated. That is, data-bind with afterburner is closer to manual variant than "vanilla" data-binding. There is still something like 20-25% additional time spent, compared to highest optimized cases; but results are definitely closer to optimal.

Given that all you really have to do is to just add the module, register it, and see what happens, it just might make sense to take Afterburner for a test ride.

6. Disclaimer

While Afterburner has been used by a few Jackson users, it is still not very widely used -- after all, while it has been available since 1.8, in some form, it has not been advertised to users. This article can be considered an announcement of sort.

Because of this, there may be rought edges; and if you are unlucky you might find one of two possible problems:

  • Get no performance improvement (which is likely due to Afterburner not covering some specific code path(s)), or
  • Get a bytecode verification problem when a serializer/deserializer is being loaded

latter case obviously being nastier. But on plus side, this should be obvious right away (and NOT after running for an hour); nor should there be a way for it to cause data losses or corruption; JVMs are rather good at verifying bytecode upon trying to load it.

Notes on upgrading Jackson from 1.9 to 2.0

If you have existing code that uses Jackson version 1.x, and you would like to see how to upgrade to 2.0, there isn't much documentation around yet; although Jackson 2.0 release page does outline all the major changes that were made.

So let's try to see what kind of steps are typically needed (note: this is based on Jackson 2.0 upgrade experiences by @pamonrails -- thanks Pierre!)

0. Pre-requisite: start with 1.9

At this point, I assume code to upgrade works with Jackson 1.9, and does not use any deprecated interfaces (many methods and some classes were deprecated during course of 1.x; all deprecated things went away with 2.0). So if your code is using an older 1.x version, the first step is usually to upgrade to 1.9, as this simplifies later steps.

1. Update Maven / JAR dependencies

The first thing to do is to upgrade jars. Depending on your build system, you can either get jars from Jackson Download page, or update Maven dependencies. New Maven dependencies are:

<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-annotations</artifactId>
  <version>2.0.0</version>
</dependency>
<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-core</artifactId>
  <version>2.0.0</version>
</dependency>
<dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.0.0</version> </dependency>

The main thing to note is that instead of 2 jars ("core", "mapper"), there are now 3: former core has been split into separate "annotations" package and remaining "core"; latter contains streaming/incremental parser/generator components. And "databind" is a direct replacement of "mapper" jar.

Similarly, you will need to update dependencies to supporting jars like:

  • Mr Bean: com.fasterxml.jackson.module / jackson-module-mrbean
  • Smile binary JSON format: com.fasterxml.jackson.dataformat / jackson-dataformat-smile
  • JAX-RS JSON provider: com.fasterxml.jackson.jaxrs / jackson-jaxrs-json-provider
  • JAXB annotation support ("xc"): com.fasterxml.jackson.module / jackson-module-jaxb-annotations

these, and many many more extension modules have their own project pages under FasterXML Git repo.

2. Import statements

Since Jackson 2.0 code lives in Java packages, you will need to change import statements. Although most changes are mechanical, there isn't strict set of mappings.

The way I have done this is to simply use an IDE like Eclipse, and remove all invalid import statements; and then use Eclipse functionality to find new packages. Typical import changes include:

  • Core types: org.codehaus.jackson.JsonFactory/JsonParser/JsonGenerator -> com.fasterxml.jackson.core.JsonFactory/JsonParser/JsonGenerator
  • Databind types: org.codehaus.jackson.map.ObjectMapper -> com.fasterxml.jackson.databind.ObjectMapper
  • Standard annotations: org.codehaus.jackson.annotate.JsonProperty -> com.fasterxml.jackson.annotation.JsonProperty

It is often convenient to just use wildcards imports for main categories (com.fasterxml.jackson.core.*, com.fasterxml.jackson.databind.*, com.fasterxml.jackson.annotation.*)

3. SerializationConfig.Feature, DeserializationConfig.Feature

The next biggest change was that of refactoring on/off Features, formerly defined as inner Enums of SerializationConfig and DeserializationConfig classes. For 2.0, enums were moved to separate stand-alone enums:

  1. DeserializationFeature contains most of entries from former DeserializationConfig.Feature
  2. SerializationFeature contains most of entries from former SerializationConfig.Feature

Entries that were NOT moved along are ones that were shared by both, and instead were added into new MapperFeature enumeration, for example:

  • SerializationConfig.Feature.DEFAULT_VIEW_INCLUSION became MapperFeature.DEFAULT_VIEW_INCLUSION

4. Tree model method name changes (JsonNode)

Although many methods (and some classes) were renamed here and there, mostly these were one-offs. But one area where major naming changes were done was with Tree Model -- this because 1.x names were found to be rather unwieldy and unnecessarily verbose. So we decided that it would make sense to try to do a "big bang" name change with 2.0, to get to a clean(er) baseline.

Changes made were mostly of following types:

  • getXxxValue() changes to xxValue(): getTextValue() -> textValue(), getFieldNames() -> fieldNames() and so on.
  • getXxxAsYyy() changes to asYyy(): getValueAsText() -> asText()

5. Miscellaneous

Some classes were removed:

  • CustomSerializerFactory, CustomDeserializerFactory: should instead use Module (like SimpleModule) for adding custom serializers, deserializers

6. What else?

This is definitely an incomplete list. Please let me know what I missed, when you try upgrading!

Tuesday, March 27, 2012

Jackson 2.0: now with XML, too!

(note: for general information on Jackson 2.0.0, see the previous article, "Jackson 2.0.0 released")

While Jackson is most well-known as a JSON processor, its data-binding functionality is not tied to JSON format.
Because of this, there have been developments to extend support for XML and related things with Jackson; and in fact support for using JAXB (Java Api for Xml Binding) annotations has been included as an optional add-on since earliest official Jackson versions.

But Jackson 2.0.0 significantly increases the scope of XML-related functionality.

1. Improvements to JAXB annotation support

Optional support for using JAXB annotations (package 'javax.xml.bind' in JDK) became its own Github project with 2.0.

Functionality is provided by com.fasterxml.jackson.databind.AnnotationIntrospector implementation 'com.fasterxml.jackson.module.jaxb.JaxbAnnotationIntrospector', which can be used in addition to (or instead of) the standard 'com.fasterxml.jackson.databind.introspect.JacksonAnnotationIntrospector'.

But beyond becoming main-level project of its own, 2.0 adds to already extensive support for JAXB annotations by:

  • Making @XmlJavaTypeAdapter work for Lists and Maps
  • Adding support for @XmlID and @XmlIDREF -- this was possible due to addition of Object Identity feature in core Jackson databind -- which basically means that Object Graphs (even cyclic ones) can be supported even if only using JAXB annotations.

the second feature (@XmlID, @XmlIDREF) has been the number one request for JAXB annotation support, and we are happy that it now works.
Canonical example of using this feature would be:


    @XmlAccessorType(XmlAccessType.FIELD)
    public class Employee
{ @XmlAttribute @XmlID protected String id; @XmlAttribute protected String name; @XmlIDREF protected Employee manager; @XmlElement(name="report") @XmlIDREF protected List<Employee> reports; public Employee() { reports = new ArrayList<Employee>(); } }

where entries would be serialized such that the first reference to an Employee is serialized fully, and later references use value of 'id' field; conversely, when reading XML back, references get re-created using id values.

2. XML databinding

Support for JAXB annotations may be useful when there is need to provide both JSON and XML representations of data. But to actually produce XML, you need to use something like JAXB or XStream.

Or do you?

One of experimental new projects that Jackson project started a while ago was something called "jackson-xml-databind".
After being developed for a while along with Jackson 1.8 and 1.9, it eventually morphed into project "jackson-dataformat-xml", hosted at Github.

With 2.0.0 we have further improved functionality, added tests; and also worked with developers who have actually used this for production systems.
This means that the module is now considered full supported and no longer an experimental add-on.

So let's have a look at how to use XML databinding.

The very first thing is to create the mapper object. Here we must use a specific sub-class, XmlMapper

  XmlMapper xmlMapper = new XmlMapper();
// internally will use an XmlFactory for parsers, generators

(note: this step differs from some other data formats, like Smile, which only require use of custom JsonFactory sub-class, and can work with default ObjectMapper -- XML is bit trickier to support and thus we need to override some aspects of ObjectMapper)

With a mapper at hand, we can do serialization like so:


  public enum Gender { MALE, FEMALE };
  public class User {
    public Gender gender;
    public String name;
    public boolean verified;
    public byte[] image;
  }

  User user = new User(); // and configure
  String xml = xmlMapper.writeValueAsString(user);

and get XML like:
  <User>
<gender>MALE</gender>
<name>Bob</name>
<verified>true</verified>
<image>BARWJRRWRIWRKF01FK=</image>
</User>

which we could read back as a POJO:

  User userResult = xmlMapper.readValue(xml, User.class);

But beyond basics, we can obviously use annotations for customizing some aspects, like element/attribute distinction, use of namespaces:


  JacksonXmlRootElement("custUser")
public class CustomUser { @JacksonXmlProperty(namespace="http://test") public Gender gender;
@JacksonXmlProperty(localname="myName") public String name; @JacksonXmlProperty(isAttribute=true) public boolean verified; public byte[] image; } // gives XML like:
<custUser verified="true">
<ns:gender xmlns:ns="http://test">MALE</gender>
<myName>Bob</myName>
<image>BARWJRRWRIWRKF01FK=</image>
</custUser>

Apart from this, all standard Jackson databinding features should work: polymorphic type handling, object identity for full object graphs (new with 2.0); even value conversions and base64 encoding!

3. Jackson-based XML serialization for JAX-RS ("move over JAXB!")

So far so good: we can produce and consume XML using powerful Jackson databinding. But the latest platform-level improvement in Java lang is the use of JAX-RS implementations like Jersey. Wouldn't it be nice to make Jersey use Jackson for both JSON and XML? That would remove one previously necessary add-on library (like JAXB).

We think so too, which is why we created "jackson-jaxrs-xml-provider" project, which is the sibling of existing "jackson-jaxrs-json-provider" project.
As with the older JSON provider, by registering this provider you will get automatic data-binding to and from XML, using Jackson XML data handler explained in the previous section.

It is of course worth noting that Jersey (and RESTeasy, CXF) already provide XML databinding using other libraries (usually JAXB), so use of this provider is optional.
So why advocate use of Jackson-based variant? One benefits is good performance -- a bit better than JAXB, and much faster than XStream, as per jvm-serializer benchmark (performance is limited by the underlying XML Stax processor -- but Aalto is wicked fast, not much slower than Jackson).
But more important is simplification of configuration and code: it is all Jackson, so annotations can be shared, and all data-binding power can be used for both representations.

It is most likely that you find this provider useful if the focus has been on producing/consuming JSON, and XML is being added as a secondary addition. If so, this extension is a natural fit.

4. Caveat Emptor

4.1 Asymmetric: "POJO first"

It is worth noting that the main supported use case is that of starting with Java Objects, serializing them as XML, and reading such serialization back as Objects.
And the explicit goal is that ideally all POJOs that can be serialized as JSON should also be serializable (and deserializable back into same Objects) as XML.

But there is no guarantee that any given XML can be mapped to a Java Object: some can be, but not all.

This is mostly due to complexity of XML, and its inherent incompatibility with Object models ("Object/XML impedance mismatch"): for example, there is no counterpart to XML mixed content in Object world. Arbitrary sequences of XML elements are not necessarily supported; and in some cases explicit nesting must be used (as is the case with Lists, arrays).

This means that if you do start with XML, you need to be prepared for possibility that some changes are needed to format, or you need additional steps for deserialization to clean up or transform structures.

4.2 No XML Schema support, mixed content

Jackson XML functionality specifically has zero support for XML Schema. Although we may work in this area, and perhaps help in using XML Schemas for some tasks, your best bet currently is to use tools like XJC from JAXB project: it can generate POJOs from XML Schema.

Mixed content is also out of scope, explicitly. There is no natural representation for it; and it seems pointless to try to fall back to XML-specific representations (like DOM trees). If you need support for "XMLisms", you need to look for XML-centric tools.

4.3 Some root values problematic: Map, List

Although we try to support all Java Object types, there are some unresolved issues with "root values", values that are not referenced via POJO properties but are the starting point of serialization/deserialization. Maps are especially tricky, and we recommend that when using Maps and Lists, you use a wrapper root object, which then references Map(s) and/or List(s).

(it is worth noting that JAXB, too, has issues with Map handling in general: XML and Maps do not mesh particularly well, unlike JSON and Maps).

4.4 JsonNode not as useful as with JSON

Finally, Jackson Tree Model, as expressed by JsonNodes, does not necessarily work well with XML either. Problem here is partially general challenges of dealing with Maps (see above); but there is the additional problem that whereas POJO-based data binder can hide some of work-arounds, this is not the case with JsonNode.

So: you can deserialize all kinds of XML as JsonNodes; and you can serialize all kinds of JsonNodes as XML, but round-tripping might not work. If tree model is your thing, you may be better off using XML-specific tree models such as XOM, DOM4J, JDOM or plain old DOM.

5. Come and help us make it Even Better!

At this point we believe that Jackson provides a nice alternative for existing XML producing/consuming toolkits. But what will really make it the first-class package is Your Help -- with increased usage we can improve quality and further extend usability, ergonomics and design.

So if you are at all interested in dealing with XML, consider trying out Jackson XML functionality!

Monday, March 26, 2012

Jackson 2.0.0 released: going GitHub, handling cyclic graphs, builder style...

After furious weeks of coding and testing, the first major version of upgrade of Jackson is here: 2.0 was just released, and available (from Download page, for example)

1. Major Upgrade?

So how does this upgrade differ from "minor" upgrades (like, from 1.8 to 1.9)? Difference is not based on amount of new functionality introduced -- most Jackson 'minor' releases have contained as much new stuff as major releases of other projects -- although 2.0 does indeed pack up lots of goodies.

Rather, major version bump indicates that code that uses Jackson 1.x is neither backwards nor forwards compatible with Jackson 2.0.
That is, you can not just replace 1.9 jars with 2.0 and hope that things work. They will not.

Why not? 2.0 code differs from 1.x with respect to packaging, such that:

  1. Java package used is "com.fasterxml.jackson" (instead of "org.codehaus.jackson")
  2. Maven group ids begin with "com.fasterxml.jackson" (instead of "org.codehaus.jackson")
  3. Maven artifact ids have change a bit too (core has been split into "core" and "annotations", for example)

These are actually not big changes in and of itself: you just need to change Maven dependencies, and for Java package, change import statements. While some amount of work, these are mechanical changes. But it does mean that upgrade is not basic plug-n-play operation.

In addition, some classes have moved within package hierarchy, to better align functional areas. Some have been refactored or carved (most notably, SerializationConfig.Feature is now simply SerializationFeature, and DeserializationConfig.Feature is now DeserializationFeature). Most cases of types moving should be easy to solve with IDEs, but we will also try to collect some sort of upgrade guide.

For more details on packaging changes, check out "Jackson 2.0 release notes" page.

1.1 Why changes to package names?

The reason for choosing to move to new packages is to allow both Jackson 1.x and Jackson 2.x versions to be used concurrently. While smaller projects will find it easier to just convert wholesale, many bigger systems and (especially) frameworks will find ability to do incremental upgrades useful. Without repackaging one would have to upgrade in "all-or-nothing" way. But with repackaging this can be avoided, and existing functionality converted gradually (within some limits; transitive dependencies may still be problematic).

2. But wait! It is totally worth it!

I started with the "bad news" first, to get that out of the way, since there is lots to like about the new version.
I will write more detailed articles on specific features later on, but let's start with a brief overview.

2.1 Community improvements: Better collaboration with GitHub

First big change is that Jackson project as a whole has moved to Github. While many extension projects (modules) had already started there, now all core components have moved as well:

as well as standard extension components such as:

and many, many more (total project count is 17!)

This should help make it much easier to contribute to projects; as well as make it easier for packages to evolve at appropriate pace: there is less need to synchronize "big" releases outside of 3 core packages, and it is much easier to give scoped access to new contributors.

2.2 Feature: Handle Any Object Graphs, even Cyclic ones!

One of biggest so far unsupported use case been ability to handle serialization and deserialization of cyclic graphs, and elimination of duplicates due to shared references. Although existing @JsonManagedReference annotation works for some cases (esp. many ORM-induced parent/child cases), there has been no general solution.

But now there is. Jackson 2.0 adds support for concept called "Object Identity":ability to serialize Object Id for values, use this id for secondary references; and ability to resolve these references when deserializing). This feature has many similarities to "Polymorphic Type information" handling which was introduced in Jackson 1.5.

Although full explanation of how things work deserves its own article, the basic idea is simple: you will need to annotate classes with new annotation @JsonIdentityInfo (or, use it for properties that reference type for which to add support), similar to how @JsonTypeInfo is used for including type id:


  @JsonIdentityInfo(generator=ObjectIdGenerators.IntSequenceGenerator.class, property="@id")
  public class Identifiable {
    public int value;

    public Identifiable next;
  }

and with such definition, you could serialize following cyclic two-node graph:


Identifiable ob1 = new Identifiable(); ob1.value = 13; Identifiable ob2 = new Identifiable(); ob2.value = 42; // link as a cycle: ob1.next = ob2; ob2.next = ob1; // and serialize! String json = objectMapper.writeValueAsString(ob1);

to get JSON like:

  {
   "@id" : 1,
   "value" : 13,
   "next" : {
    "@id" : 2,
    "value" : 42,
    "next" : 1
   }
  }

and obvious deserialize it back with:

  Identifiable result = objectMapper.readValue(json, Identifiable.class);

assertSame(ob1.next.next, ob1);

Most details (such as id generation algorithm used, property use for inclusions etc) are configurable; more on this on a later article.
Until then, Javadocs should help.

2.3 Feature: Support "Builder" style of POJO construction

Another highly-requested feature has been ability to support POJOs created using "Builder" style. This means that POJOs are created using a separate Builder object which has methods for changing property values; and a "build" method that will create actual immutable POJO instance. For example, considering following hypothetical Builder class:


 public class ValueBuilder {
  private int x, y;

  // can use @JsonCreator to use non-default ctor, inject values etc
  public ValueBuilder() { }

  // if name is "withXxx", works as is: otherwise use @JsonProperty("x") or @JsonSetter("x")!
  public ValueBuilder withX(int x) {
    this.x = x;
    return this; // or, construct new instance, return that
  }
  public ValueBuilder withY(int y) {
    this.y = y;
    return this;
  }
  public Value build() {
    return new Value(x, y);
  }
}

and value class it creates:

@JsonDeserialize(builder=ValueBuilder.class) // important!
public class Value {
  private final int x, y;
  protected Value(int x, int y) {
    this.x = x;
    this.y = y;
  }
}

we would just use it as expected, as long annotations have been used as shown above:

  Value v = objectMapper.readValue(json, Value.class);

and it "just works"

2.4 Ergonomics: Simpler, more powerful configuration

Although ObjectMapper's immutable friends -- ObjectReader and ObjectWriter -- were introduced much earlier, 2.0 will give more firepower for both, making them in many ways superior to use of ObjectMapper. In fact, while you can still pass ObjectMappers and create ObjectReaders, ObjectWriters on the fly, it is recommend that you use latter if possible.

So what was the problem solved? Basically, ObjectMapper is thread-safe if (and only if!) it is fully configured before its first use. This means that you can not (or, at least, not supposed) to try to change its configuration once you have used it. To further complicate things, some configuration options would work even if used after first read or write, whereas others would not, or would only work in seemingly arbitrary cases (depending on what was cached).

On the other hand, ObjectReader and ObjectWriter are fully immutable and thus thread-safe, but would also allow creation of newly configured instances. But while this allowed handling of some cases -- such as that of using different JSON View for deserialization -- number of methods available for reconfiguration was limited.

Jackson 2.0 adds significant number of new fluent methods for ObjectReader and ObjectWriter to reconfigure things; and most notably, it is now possible to change serialization and deserialization features (SerializationFeature, DeserializationFeature, as noted earlier). So, to, say, serialize a value using "pretty printer" you could use:

  ObjectWriter writer = ObjectMapper.writer(); // there are also many other convenience versions...
  writer.withDefaultPrettyPrinter().writeValue(resultFile, value);

or to enable "root element" wrapping AND specifying alternative wrapper property name:

  String json = writer
    .with(SerializationFeature.WRAP_ROOT_VALUE)
    .withRootName("wrapper")
    .writeValueAsString(value);

basically, anything that can work on per-call basis will now work through either ObjectReader (for deserialization) or ObjectWriter (for serialization).

2.5 Feature parity: JSON Views for deserialization

One of frustrations with Jackson 1.x has been that all filtering functionality has been limited to serialization side. Not any more: it is now possible to use JSON Views for deserialization as well:

  Value v = mapper
   .reader(Value.class)
   .withView(MyView.class) 
   .readValue(json);

and if input happened to contain properties not included in the view, values would be ignored without setting matching POJO properties.

2.6 Custom annotations using Annotation Bundles

Another ergonomic feature is so-called "annotation bundles". Basically, by addition of meta-annotation @JacksonAnnotationsInside, it is now possible to specify that annotations from a given (custom) annotations should be introspected and used same way as if annotations were directly included. So, for example you could define following annotation:

  @Retention(RetentionPolicy.RUNTIME)
  @JacksonAnnotationsInside
  @JsonInclude(Include.NON_NULL) // only include non-null properties
  @JsonPropertyOrder({ "id", "name" }) // ensure that 'id' and 'name' are always serialized before other properties
  private @interface StdAnnotations

and use it for POJO types as a short-hand:

  @StdAnnotations
  public class Pojo { ... }

instead of separately adding multiple annotations.

2.7 @JsonUnwrapped.prefix / suffix

One more cool new addition is for @JsonUnwrapped annotation (introduced in 1.9). It is now possibly to define prefix and/or suffix to use for "unwrapped" properties, like so:

  public class Box {
    @JsonUnwrapped(prefix="topLeft") Point tl;
    @JsonUnwrapped(prefix="bottomRight") Point br;
  }
  public class Point {
    int x, y;
  }

which would result in JSON like:

  {
   "topLeft.x" : 0,
   "topLeft.y" : 0,
   "bottomRight.x" : 100,
   "bottomRight.y" : 80  
  }

This feature basically allows for scoping things to avoid naming collisions. It can also be used for fancier stuff, such as binding of 'flat' properties into hierarchic POJOs... but more on this in a follow-up article.

3.0 And that's most of it, Folks!

At least for now. Stay tuned!

EDIT:

Links to the continuing "Jackson 2.0 saga":

Tuesday, October 11, 2011

Jackson 1.9 new feature overview

Jackson 1.9 was just released. As usual, it can downloaded from the Download page, and detailed release information can be found from 1.9 release page.

Let's have a look into contents of this release.

1. Overview

One of focus areas on this release was once again to tackle oldest significant issues and improvement ideas; and two of major new features are long-standing issues (ability to inline/unwrap JSON values; unify annotation handling for getters/setters/fields). Another big goal was to improve ergonomics: to simplify configuration, shorten commonly used usage patterns and so on. And finally there was also intent to try to "2.0 proof" things, by trying to figure out things that need to be deprecated to allow removal of obsolete methods as well as indicate cases where improved functionality is available.

2. Major features

(note: classification of features into major, medium and minor categories is not exact science, and different users might consider different things more important than others -- here we simply use categorization that the release page uses)

Major features included in 1.9 are:

  • Allow inlining/unwrapping of child objects using @JsonUnwrapped
  • Rewrite property introspection part of framework to combine getter/setter/field annotations
  • Allow injection of values during deserialization
  • Support for 'external type id' by adding @JsonTypeInfo.As.EXTERNAL_PROPERTY
  • Allow registering instantiators (ValueInstantiator) for types

2.1 @JsonUnwrapped

Ability to map JSON like

  {
    "name" : "home",
    "latitude" : 127,
    "longitude" : 345
  }

to classes defined as:

  class Place {
    public String name;

@JsonUnwrapped public Location location; }
class Location { public int latitude, longitude; }

has been on many users' wish list for a while now; and with addition of @JsonUnwrapped (used as shown above) this simple structural transformation can now be achieved without custom handling

2.2 "Unified" properties, merging ("sharing") of annotations of getters/setters/fields

Another long-standing issue has been that of isolation between annotations used by getters, setters and fields. Basically annotation added to a getter was only ever used for serialization, and would never have any effect on deserialization; similarly setter never affected deserialization. While this is not a problem for many annotation use cases, it would make following use case work quite different from what users intuitively expect:

  class Point {
@JsonProperty("width")
public int getW();
public void setW(int w); // must be separately renamed
}

which would actually lead to there being two separate properties: "width" that is written out during serialization; and "w" that is expected to be received when deserializing. Many users would intuitively expect annotation to be "shared" between two parts of logically related accessors. Same issue also affects annotations like @JsonIgnore and @JsonTypeInfo, requiring use of seemingly redundant annotations.

Jackson 1.9 solves this by adding new internal representation of logical property, and merging resulting annotations using expected priorities (meaning that annotations on a getter have precedence over setter when serializing, and vice versa).

There are also other more subtle changes, related to these changes. For example, class like:

  class ValueBean {
    private int value;

    public int getValue() { return value; }
  }

can now be deserialized succesfully, even without field "value" being visible or annotated: since it is joined with getter ("getValue()"), and getter is explicitly annotated, field is included as the accessor to use for assigning value for the property.

The last important benefit of this feature is that now handling of Jackson and JAXB annotations is much more similar, which should make JAXB annotations works better as a result (code was simplified significantly) -- this because JAXB had always considered annotations to be shared in this way.

2.3 Value Injection for Deserialization

Value injection here means ability to insert ("inject") values into POJOs outside of general data binding: that is, values that do not come from JSON input. Instead, values to inject are specified during configuration of ObjectMapper or ObjectReader used for data binding.

Why is this needed? Some Java types require additional context information to be able to construct POJO instances, for example. And in other cases, you may want to pre-populate values of some fields; and while there are other mechanims (for example, you can pass an existing POJO instance for "updateValue()") method) they are quite limited.

Only two things are needed for value injection:

  1. Means to indicate properties for which values are to be injected, and
  2. Definition of values to inject

Default mechanism is to handle first part by using new annotation, @JacksonInject, so that we could have:

  public class InjectableBean
  {
    @JacksonInject("seq") private int sequenceNumber;
    public String name;
  }

and second part is handled by allowing configuration of ObjectMapper or ObjectWriter instance with InjectableValues, object that can find values to inject given value id. Value ids can be specified as either Strings, or as Classes; if Class is used, Class.getName() is used to get actual String id to use. For above POJO, we could handle deserialization as follows:

  ObjectMapper mapper = new ObjectMapper();
  Integer sequenceNumber = SequenceGenerator.next(); // or whatever
  InjectableValues inject = new InjectableValues.Std()
   .addValue("seq", id)
  final String json = "{\"name\":\"Lucifer\"}";
  InjectableBean value = mapper.reader(InjectableBean.class).withInjectableValues(inject).readValue(json);

For more on this feature, check out FasterXML Wiki's entry on Value Injection.

2.4 External Type Id

Jackson has had support for full polymorphic type handling since 1.5, allowing configuration of both type identifier in use (usually either a class name, or logical type name) and type inclusion mechanism (as property, as wrapper array, as single-element wrapper object).
This covers wide range of usage scenarios, but there is one inclusion mechanism that is sometimes used but could not be supported by Jackson: that of using "external type identifier". This style of type inclusion is used by some data formats, most notably geoJSON.

By external type identifier we mean case such as this:

 {
  "type" : "rectangle",
  "shape" :  {
   "width": 20.0,
   "height" : 40.0
  }
 }

where type is included as a property ("type") that is outside of JSON Object being typed.

With 1.9 we can support such use case by using @JsonTypeInfo with a new inclusion value:

  public class ShapeContainer
  {
    @JsonTypeInfo(use=Id.NAME, include=As.EXTERNAL_PROPERTY, property="type")
    public Shape shape;    
  }
 
static class Shape { }
@JsonTypeName("rectangle") // or rely on class name, Rectangle static class Rectangle extends Shape { public double width, height; }

One thing to note here is that this inclusion mechanism should only be used with properties; annotating classes with @JsonTypeInfo that indicates external type identifiers can cause conflicts.

2.5 Value instantiators

And last but not least, 1.9 also allows much more control over mechanism used to create actual POJO value instances. While Jackson 1.2 added support for @JsonCreator annotation, there has not been a way to add custom creator objects.

With 1.9, we get following pieces:

  • ValueInstantiator (abstract class), extended by objects used to create value instances
  • ValueInstantiators (interface), provider for per-type ValueInstantor instances (as well as ValueInstantiators.Base abstract class for actual implementations)
  • Module.setupContext method addValueInstantiators(); as well as SimpleModule method addValueInstantiator(), for adding provider(s), so modules can easily provide instantiators for types they support
  • @JsonValueInstantiator annotation that can be used as an alternative to specify instantiator used for annotated type.

Above pieces are basically enough to support all three modes of construction @JsonCreator allows (so basically @JsonCreator could be implemented as module, if we wanted!):

  1. "Default" construction that takes no arguments and uses no-argument constructor or factory method
  2. "Delegate-based" construction, in which JSON value is first bound to an intermediate type (such as java.util.Map or Jackson JsonNode), and this instance is passed to single-argument creator method
  3. "Property-based" construction, in which one or more named values (JSON properties) are bound to specified types that match creator arguments, and these are passed to creator method.

Mapping of above construction methods to ValueInstantiator methods is fairly straight-forward:

  1. Simple no-arguments construction (ValueInstantiator.createUsingDefault()): used if the other construction mechanisms are not available: consumes no JSON properties.
  2. Delegate-based construction (ValueInstantiator.createUsingDelegate(Object)): similar to annotating a single-argument constructor or factory method with @JsonCreator, but NOT specifying argument name with @JsonProperty. If specified (i.e. value instantiator indicates it supports this), JSON value for property is first bound into intermediate (delegate) type, and then this value is passed to delegate creator method. Jackson mapper will handle all the details of initial binding, passing delegate object as the argument.
  3. Property-based construction (ValueInstantiator.createFromObjectWith(Object[] args)): similar to using @JsonCreator with arguments that all have @JsonProperty annotation to specify JSON property name to bind.

It is worth noting that order in which availability of different modes is checked is reverse of above: first a check is made to see if property-based method is available; if not, then delegate-based, and finally default construction.

Since this is possibly the most complicated new feature, I will need to defer a full example to another blog post. But let's consider a very simple ValueInstantiator implementation that just supports the default (no-argument) instantiation:

  class SimpleInstantiator extends ValueInstantiator
  {
    @Override public String getValueTypeDesc() { // only needed for error messages
      return MyType.class.getName();
    }

    @Override // yes, this creation method is available
    public boolean canCreateUsingDefault() { return true; }

    @Override
    public MyType createUsingDefault() {
      return new MyType(true);
    }
  }

and similarly you can add support for delegate- or property-based methods.

3. Other notable features

Aside from above-mentioned major features, there are many other useful improvements:

  • "mini-core" jar (jackson-mini-1.9.0.jar)
  • DeserializationConfig.Feature.UNWRAP_ROOT_VALUE
  • @JsonView for JAX-RS methods to return a specific JsonView
  • Terse(r) Visibility: ObjectMapper.setVisibility(), VisibilityChecker.with(Visibility)
  • Add standard naming-strategy implementation(s)
  • Add JsonTypeInfo.defaultSubType property to indicate type to use if class id/name missing
  • Add SimpleFilterProvider.setFailOnUnknownId() to disable throwing exception on missing filter id

"Mini core": as name suggests, there is now a new jar (jackson-mini-1.9.0.jar) that is about 40% smaller than the default one -- about 136kB or so. Size reduction is achieved by leaving out text files (LICENSE), as well as annotations, but otherwise functionality is equivalent to standard core package, i.e. supports streaming API (JsonParser/JsonGenerator, JsonFactory).

DeserializationConfig.Feature.UNWRAP_ROOT_VALUE is counterpart to SerializationConfig.Feature.WRAP_ROOT_VALUE; and there is also now a new annotation -- @JsonRootName -- that can be used to use custom wrapper name instead of the simple class name. This is useful with interoperability, as some frameworks insist on adding such wrappers.

One of few improvements to JAX-RS provider is that now you can add @JsonView annotation to JAX-RS resource methods, and if one is found, it will be set as the active Serialization View during serialization of the result value.

One nice ergonomic improvement is the ability to use much more compact configuration methods for changing default introspection visibility levels.
For example, you can use:

  objectMapper.setVisibility(JsonMethod.FIELD, JsonAutoDetect.Visibility.ANY);

to make all fields auto-detectable, regardless of their visibility. Or, to prevent all auto-detection, you could use:

  objectMapper.setVisibilityChecker(m.getVisibilityChecker()
  	.with(JsonAutoDetect.Visibility.NONE));

An improvement to naming strategy support is inclusion of one "standard" naming strategy -- CAMEL_CASE_TO_LOWER_CASE_WITH_UNDERSCORES -- which converts between standard Java Bean names (that setters and getters use), and C-style names (like used by Twitter). You can enable this converter by:

  mapper.setPropertyNamingStrategy(PropertyNamingStrategy.CAMEL_CASE_TO_LOWER_CASE_WITH_UNDERSCORES);

and from there on, can consume JSON like:

 { "first_name" : "Joe" }

to bind to class like:

public class Name { public String firstName; }

without having to use @JsonProperty to fix name mismatch.

As to sub-typing, you can now use new @JsonTypeInfo property defaultSubType to indicate, as name suggests, default sub-type to use in case where type name was missing or could not be resolved: use it like:

  @JsonSubType(use=Id.NAME, include=As.PROPERTY, defaultSubType=GenericImpl.class)
  public abstract class BaseType { }

And finally, one improvement to Json Filter functionality is ability to specify that it is ok to use a filter id that does not refer to an actual filter (i.e. can not be resolved by the currently configured filter provider) -- use 'SimpleFilterProvider.setFailOnUnknownId(false)' to make this the default behavior. Missing filter is then assumed to mean "no filtering", that is, serialization is handled as if no filter was specified.

Wednesday, September 28, 2011

Advanced filtering with Jackson, Json Filters

I wrote a bit earlier on "filtering properties with Jackson". While it was comprehensive in that all main methods of filtering were covered, there wasn't much depth. Specifically, only very basic usage of Json Filters (@JsonFilter annotation, SimpleFilterProvider as provider) was considered. This approach does allow more dynamic filtering than, say, @JsonView, but it is still somewhat limited. So let's consider more advanced customizability.

1. Refresher on Json Filters

Ok, so the basic idea with Json Filters is that:

  1. Classes can have an associated Filter Id, which defines logical filter to use.
  2. A provider is needed to get the actual filter instance to use, given id: this will be configured by assigning a FilterProvider (such as 'SimpleFilterProvider') to ObjectMapper or ObjectWriter.
  3. Jackson will dynamically (and efficiently) resolve filter given class uses, dynamically, allowing per-call reconfiguration of filtering.

From this it is clear that there are 2 main things you can configure: mechanism that is used to find Filter id of a given class, and mechanism used for mapping this id to actual filter used (implementation of which can be as complicated as you want).

So let's have a look at both parts.

2. Configuring mapping from id to filter instance

Of mechanisms, latter one may be easier to understand and use: one just has to implement 'FilterProvider', which has but one method to implement:

  public abstract class FilterProvider {
    public abstract BeanPropertyFilter findFilter(Object filterId);
  }

given this, 'SimpleFilterProvider' is little more than a Map<String,BeanPropertyFilter>, except for adding couple of convenience factory methods that build 'SimpleBeanPropertyFilter' instances given property names, so you typically just instantiate one with calls like:

  SimpleBeanPropertyFilter filter = SimpleBeanPropertyFilter.filterOutAllExcept("a"));

which would out all properties except for one named "a". This filter is then configured with ObjectMapper like so:

  FilterProvider fp = new SimpleFilterProvider().addFilter("onlyAFilter", filter);
  objectMapper.writer(fp).writeValueAsString(pojo);

which would, then, apply to any Java type configured to use filter with id "onlyAFilter".

3. Configuring discovery of filter id

From above example we know we need to indicate classes that are to use our "onlyAFilter". The default mechanism is to use:

  @JsonFilter("onlyAFilter")
  public class FilteredPOJO {
    //...
  }

But this is just the default. How so? The way Jackson figures out its annotation-based configuration is actually indirect, and fully customizable: all interaction is through configured 'AnnotationIntrospector' object, which amongst other things defines this method:

  public Object findFilterId(AnnotatedClass ac);

which is called when serializer needs to determine id of the filter to apply (if any) for given class. Since the default implementation (org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector) has everything else working fine, what we can do is to sub-class it and override this method.
For example:

  public class MyFilteringIntrospector extends JacksonAnnotationIntrospector
  {
    @Override
    public Object findFilterId(AnnotatedClass ac) {
      // First, let's consider @JsonFilter by calling superclass
      Object id = super.findFilterId(ac);
      // but if not found, use our own heuristic; say, just use class name as filter id, if there's "Filter" in name:
      if (id == null) {
        String name = ac.getName();
        if (name.indexOf("Filter") >= 0) {
          id = name;
        }
      }
      return id;
    }
  }

Above functionality is just to show what is possible, not that it makes sense. Alternatively you could of course define your own annotations to check; or have List of known class names, check class definition or interfaces type implements. The main point is just that you are not limited to using @JsonFilter annotation, but can use pretty much any logic you want, within limits of your coding skills.

The only caveat is that the resolution from Class to matching id is only guaranteed to be called once per ObjectMapper; so any variation in filtering of specific class needs to happen at either mapping of id to filter, or within filter itself.

4. Don't be afraid of sub-classing (Jackson)AnnotationIntrospector

Actually, the key take away might as well be the fact that AnnotationIntrospector is designed to be customizable. It was initially created to allow easy reuse of JAXB annotations (via JAXBAnnotationIntrospector; combining things with AnnotationIntrospector.Pair); but it is also a very powerful general-purpose customization mechanism. But at this point quite underused one at that.

5. Addendum

Some additional notes based on feedback I received:

  • Custom BeanPropertyFilter implementations are obviously powerful too: not only can they completely change what (if anything) gets written for property, they can base this on all configuration accessible via SerializerProvider which is passed to serializeAsField(): for example, it can check to see what serialization view is available by calling 'provider.getSerializationView()'.

Related Blogs

(by Author (topics))

Powered By

Powered by Thingamablog,
Blogger Templates and Discus comments.

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.