Saturday, August 18, 2012

Replacing standard JDK serialization using Jackson (JSON/Smile),

1. Background

The default Java serialization provided by JDK is a two-edged sword: on one hand, it is a simple, convenient way to "freeze and thaw" Objects you have, handling about any kind of Java object graphs. It is possibly the most powerful serialization mechanism on Java platform, bar none.

But on the other hand, its shortcomings are well-document (and I hope, well-known) at this point. Problems include:

  • Poor space-efficiency (especially for small data), due to inclusion of all class metadata: that is, size of output can be huge, larger than about any alternative, including XML
  • Poor performance (especially for small data), partly due to size inefficiency
  • Brittleness: smallest changes to class definitions may break compatibility, preventing deserialization. This makes it a poor choice for both data exchange between (Java) systems as well as long-term storage

Still, the convenience factor has led to many systems using JDK serialization to be the default serialization method to use.

Is there anything we could do to address downsides listed above? Plenty, actually. Although there is no way to do much more for the default implementation (JDK serialization implementation is in fact ridiculously well optimized for what it tries to achieve -- it's just that the goal is very ambitious), one can customize what gets used by making objects implement interface. If so, JDK will happily use alternate implementation under the hood.

Now: although writing custom serializers may be fun sometimes -- and for specific case, you can actually write very efficient solution as well, given enough time -- it would be nice if you could use an existing component to address listed short-comings.

And that's what we'll do! Here's one possible way to improve on all problems listed above:

  1. Use an efficient Jackson serializer (to produce either JSON, or perhaps more interestingly, Smile binary data)
  2. Wrap it in nice, to make it transparent to code using JDK serialization (albeit not transparent for maintainers of the class -- but we will try minimizing amount of intrusive code)

2. Challenges with

First things first: while conceptually simple, there are couple of rather odd design decisions that make use of bit tricky:

  1. Instead of passing instances of,, instead and are used; and they do NOT extend stream versions (even though they define mostly same methods!). This means additional wrapping is needed
  2. Externalizable.readExternal() requires updating of the object itself, not that of constructing new instances: most serialization frameworks do not support such operation
  3. How to access external serialization library, as no context is passed to either of methods?

These are not fundamental problems for Jackson: first one requires use of adapter classes (see below), second that we need to use "updating reader" approach that Jackson was supported for a while (yay!). And to solve the third part, we have at least two choices: use of ThreadLocal for passing an ObjectMapper; or, use of a static helper class (approach shown below)

So here are the helper classes we need:

final static class ExternalizableInput extends InputStream
  private final ObjectInput in;

  public ExternalizableInput(ObjectInput in) { = in;

  public int available() throws IOException {
    return in.available();

  public void close() throws IOException {

  public boolean  markSupported() {
    return false;

  public int read() throws IOException {

  public int read(byte[] buffer) throws IOException {

  public int read(byte[] buffer, int offset, int len) throws IOException {
    return, offset, len);

  public long skip(long n) throws IOException {
   return in.skip(n);

final static class ExternalizableOutput extends OutputStream { private final ObjectOutput out; public ExternalizableOutput(ObjectOutput out) { this.out = out; } @Override public void flush() throws IOException { out.flush(); } @Override public void close() throws IOException { out.close(); } @Override public void write(int ch) throws IOException { out.write(ch); } @Override public void write(byte[] data) throws IOException { out.write(data); } @Override public void write(byte[] data, int offset, int len) throws IOException { out.write(data, offset, len); } }

/* Use of helper class here is unfortunate, but necessary; alternative would
* be to use ThreadLocal, and set instance before calling serialization.
* Benefit of that approach would be dynamic configuration; however, this
* approach is easier to demonstrate.
class MapperHolder { private final ObjectMapper mapper = new ObjectMapper(); private final static MapperHolder instance = new MapperHolder(); public static ObjectMapper mapper() { return instance.mapper; } }

and given these classes, we can implement Jackson-for-default-serialization solution.

3. Let's Do a Serialization!

So with that, here's a class that is serializable using Jackson JSON serializer:

  static class MyPojo implements Externalizable
        public int id;
        public String name;
        public int[] values;

        public MyPojo() { } // for deserialization
        public MyPojo(int id, String name, int[] values)
   = id;
   = name;
            this.values = values;

        public void readExternal(ObjectInput in) throws IOException {
            MapperHolder.mapper().readerForUpdating(this).readValue(new ExternalizableInput(in));
} public void writeExternal(ObjectOutput oo) throws IOException { MapperHolder.mapper().writeValue(new ExternalizableOutput(oo), this); }

to use that class, use JDK serialization normally:

  // serialize as bytes (to demonstrate):
MyPojo input = new MyPojo(13, "Foobar", new int[] { 1, 2, 3 } ); ByteArrayOutputStream bytes = new ByteArrayOutputStream(); ObjectOutputStream obs = new ObjectOutputStream(bytes); obs.writeObject(input); obs.close(); byte[] ser = bytes.toByteArray();

// and to get it back:
ObjectInputStream ins = new ObjectInputStream(new ByteArrayInputStream(ser)); MyPojo output = (MyPojo) ins.readObject();

And that's it.

4. So what's the benefit?

At this point, you may be wondering if and how this would actually help you. Since JDK serialization is using binary format; and since (allegedly!) textual formats are generally more verbose than binary formats, how could this possibly help with size of performance?

Turns out that if you test out code above and compare it with the case where class does NOT implement Externalizable, sizes are:

  • Default JDK serialization: 186 bytes
  • Serialization as embedded JSON: 130 bytes

Whoa! Quite unexpected result? JSON-based alternative 30% SMALLER than JDK serialization!

Actually, not really. The problem with JDK serialization is not the way data is stored, but rather the fact that in addition to (compact) data, much of Class definition metadata is included. This metadata is needed to guard against Class incompatibilities (which it can do pretty well), but it comes with a cost. And that cost is particularly high for small data.

Similarly, performance typically follows data size: while I don't have publishable results (I may do that for a future post), I expect embedded-JSON to also perform significantly better for single-object serialization use cases.

5. Further ideas: Smile!

But perhaps you think we should be able to do better, size-wise (and perhaps performance) than using JSON?

Absolutely. Since the results are not exactly readable (to use Externalizable, bit of binary data will be used to indicate class name, and little bit of stream metadata), we probably do not greatly care what the actual underlying format is.
With this, an obvious choice would be to use Smile data format, binary counterpart to JSON, a format that Jackson supports 100% with Smile Module.

The only change that is needed is to replace the first line from "MapperHolder" to read:

private final ObjectMapper mapper = new ObjectMapper(new SmileFactory());

and we will see even reduced size, as well as faster reading and writing -- Smile is typically 30-40% smaller in size, and 30-50% faster to process than JSON.

6. Even More compact? Consider Jackson 2.1, "POJO as array!"

But wait! In very near future, we may be able to do EVEN BETTER! Jackson 2.1 (see the Sneak Peek) will introduce one interesting feature that will further reduce size of JSON/Smile Object serialization. By using following annotation:


you can further reduce the size: this occurs as the property names are excluded from serialization (think of output similar to CSV, just using JSON Arrays).

For our toy use case, size is reduced further from 130 bytes to 109; further reduction of almost 20%. But wait! It gets better -- same will be true for Smile as well, since while it can reduce space in general, it still has to retain some amount of name information normally; but with POJO-as-Arrays it will use same exclusion!

7. But how about actual real-life results?

At this point I am actually planning on doing something based on code I showed above. But planning is in early stages so I do not yet have results from "real data"; meaning objects of more realistic sizes. But I hope to get that soon: the use case is that of storing entities (data for which is read from DB) in memcache. Existing system is getting CPU-bound both from basic serialization/deserialization activity, but especially from higher number of GCs. I fully expect the new approach to help with this; and most importantly, be quite easy to deploy: this because I do not have to change any of code that actually serializes/deserializes Beans -- I just have to modify Beans themselves a bit.

blog comments powered by Disqus

Sponsored By

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.