Take your JSON processing to Mach 3 with Jackson 2.0, Afterburner
(this is part on-going "Jackson 2.0" series, starting with "Jackson 2.0 released")
1. Performance overhead of databinding
When using automatic data-binding Jackson offers, there is some amount of overhead compared to manually writing equivalent code that would use Jackson streaming/incremental parser and generator. But how overhead is there? The answer depends on multiple factors, including exactly how good is your hand-written code (there are a few non-obvious ways to optimize things, compared to data-binding where there is little configurability wrt performance).
But looking at benchmarks such as jvm-serializers, one could estimate that it may take anywhere between 35% and 50% more time to serialize and deserialize POJOs, compared to highly tuned hand-written alternative. This is usually not enough to matter a lot, considering that JSON processing overhead is typically only a small portion of all processing done.
2. Where does overhead come?
There are multiple things that automatic data-binding has to do that hand-written alternatives do not. But at high level, there are really two main areas:
- Configurability to produce/consume alternative representations; code that has to support multiple ways of doing things can not be as aggressively optimized by JVM and may need to keep more state around.
- Data access to POJOs is done dynamically using Reflection, instead of directly accessing field values or calling setters/getters
While there isn't much that can be done for former, in general sense (especially since configurability and convenience are major reasons for popularity of data-binding), latter overhead is something that could be theoretically eliminated.
How? By generating bytecode that does direct access to fields and calls to getters/setters (as well as for constructing new instances).
3. Project Afterburner
And this is where Project Afterburner comes in. What it does really is as simple as generating byte code, dynamically, to mostly eliminate Reflection overhead. Implementation uses well-known lightweight bytecode library called ASM.
Byte code is generated to:
- Replace "Class.newInstance()" calls with equivalent call to zero-argument constructor (currently same is not done for multi-argument Creator methods)
- Replace Reflection-based field access (Field.set() / Field.get()) with equivalent field dereferencing
- Replace Reflection-based method calls (Method.invoke(...)) with equivalent direct calls
- For small subset of simple types (int, long, String, boolean), further streamline handling of serializers/deserializers to avoid auto-boxing
It is worth noting that there are certain limitations to access: for example, unlike with Reflection, it is not possible to avoid visibility checks; which means that access to private fields and methods must still be done using Reflection.
4. Engage the Afterburner!
Using Afterburner is about as easy as it can be: you just create and register a module, and then use databinding as usual:
Object mapper = new ObjectMapper() mapper.registerModule(new AfterburnerModule());
String json = mapper.writeValueAsString(value);
Value value = mapper.readValue(json, Value.class);
absolutely nothing special there (note: for Maven dependency, downloads, go see the project page).
5. How much faster?
Earlier I mentioned that Reflection is just one of overhead areas. In addition to general complexity from configurability, there are cases where general data-binding has to be done using simple loops, whereas manual code could use linear constructs. Given this, how much overhead remains after enabling Afterburner?
As per jvm-serializers, more than 50% of speed difference between data-binding and manual variant are eliminated. That is, data-bind with afterburner is closer to manual variant than "vanilla" data-binding. There is still something like 20-25% additional time spent, compared to highest optimized cases; but results are definitely closer to optimal.
Given that all you really have to do is to just add the module, register it, and see what happens, it just might make sense to take Afterburner for a test ride.
While Afterburner has been used by a few Jackson users, it is still not very widely used -- after all, while it has been available since 1.8, in some form, it has not been advertised to users. This article can be considered an announcement of sort.
Because of this, there may be rought edges; and if you are unlucky you might find one of two possible problems:
- Get no performance improvement (which is likely due to Afterburner not covering some specific code path(s)), or
- Get a bytecode verification problem when a serializer/deserializer is being loaded
latter case obviously being nastier. But on plus side, this should be obvious right away (and NOT after running for an hour); nor should there be a way for it to cause data losses or corruption; JVMs are rather good at verifying bytecode upon trying to load it.