Friday, February 26, 2010

Jackson, compliments du jour, en francaise

I wish my french skills were little bit more refined (6 months of suggestopedic teaching, 2 hours per week apparently is not enough to get more than general idea of text :) ). But from what I gather, Android pour l’entreprise – 6 – Oubliez Gson, Jackson rocks my world! seems to have generally positive outlook on Jackson for JSON processing on Android platform (oubliez meaning "forget"... it's good that cognac ratings help my language skills here!). And apparently much of this is due to Android VM (Dalvik) being somewhat sensitive to GC-induced stress; so Jackson's focus on efficiency (not just speed, but focus on simplicity of code, trying to avoid extraneous intermediate storage and code) really pays off.

It is great that a library can be versatile enough to perform well on wide set of platforms; and it is absolutely marvellous that there are users who put Jackson to good use, and let others know what works and what doesn't.

Anyway, thought I'll share this; Android developers in practicular might find this interesting. Also: author of the article has suggested couple of good improvements to Jackson, too, to make things work even better in future.

Tuesday, December 22, 2009

Another Jackson adopter: Spring-json

As has been reported earlier, Spring 3.0 (and Spring-json module, specifically) now has a Jackson-based JSON view variant: see MappingJacksonJsonView and spring-json view comparison. Comparison seems reasonable, mentioning annotation-based configurability and performance as strong points of jackson-based view.

UPDATE, 28-Dec-2009: One more for the road: Restlet is also including Jackson-based extension.

Sunday, December 20, 2009

Jackson 1.4: more control over writing JSON, improved interoperability

First things first: if you haven't noticed yet, Jackson 1.4.0 was just released.

This release focuses mainly on writer (serialization) side, but there are also continuing improvements to interoperability. I will review main improvements below.

1. JSON generation improvements

1.1 Ignoring properties

A new annotation, @JsonIgnoreProperties, allows:

  • omitting serialization of listed properties: @JsonIgnoreProperties({ "secretField", "internalProperty" }); listed properties will not be included in JSON output
  • omitting listed properties from being deserialized; if encountered they are just ignored even if there is a setter for them (regardless of whether setter is marked to be ignored or not)
  • ignoring all unknown properties for annotated class during deserialization (similar to disabling DeserializationConfig.Feature.FAIL_ON_UNKNOWN_PROPERTIES), but only affects instances of annotated class. This is done with property "ignoreUnknown": @JsonIgnoreProperties(ignoreUnknown=true) (note: has no effect on serialization)

1.2 JsonView

New @JsonView annotation allows defining logical views for serialization: sets of properties to be written out for given view.

Let's consider a simple example, where we want to control amount of information written out, based on, say, user's credentials. To define 3 classes of properties, we can define views (more about identification below):

  class Views { // container for View classes
static class Public { }
static class ExtendedPublic extends PublicView { }
static class Internal extends ExtendedPublicView { }
  }
  
And to define access levels for our info class, we would do
public class PersonalInformation { // Bean that uses Views to define subsets of properties to include // Name is public @JsonView(Views.Public.class) String name; // Address semi-public @JsonView(Views.ExtendPublic.class) Address address; // SSN only for internal usage @JsonView(Views.Internal.class) SocialSecNumber ssn; }

Given this set up, we would define View to use for serialization by:

  objectMapper.writeValueUsingView(out, infoInstance, Views.Public.class); // short-cut
  // or full version:
  objectMapper.getSerializationConfig().setSerializationView(Views.Public.class);
  objectMapper.writeValue(out, beanInstance); // will use active view set via Config
  // (note: can also pre-construct config object with 'mapper.copySerializationConfig'; reuse configuration)

Which in this particular case would only contain "name" property. If we had used view Views.ExtendedPublic.class, we would have gotten 2 fields; and with Views.Internal.class, all 3.

Views are identified by classes: you can either create specific marker classes, or use existing classes. Views use inheritance indicated by class structure: such that a view is considered a sub-view of another view if it extends that view. Child views include properties that parent views include.

For more description see JsonView wiki page.

1.3 Ordering properties output

Another new annotation, @JsonPropertyOrder, allows defining complete and partial field orderings:

  • You can define explicit ordering by listing properties as annotation value: @JsonPropertyOrder({ "id", "name" }) would ensure that "id" and "name" are output before any other properties during serialization
  • You can specify that anything not explicitly ordered will be output in alphabetic order: @JsonPropertyOrder(alphabetic=true)
  • Without these settings order is undefined because JVM does not expose order of underlying fields or methods (but see note below)

Beyond these definitions, 1.4 also guarantees that properties used with @JsonCreator annotations (constructors, factories) are serialized before other properties, unless there are explicitly ordered properties (which will have priority). This change was to optimize @JsonCreator property usage: ideally these properties should be readable before other properties -- although JSON logical model does not provide for such guarantee, Jackson will try to do its best to make ordering optimal.

2. Interoperability improvements

Goal of interoperability improvement is to make Jackson a "universal" JSON data binding tool on JVM. That is: we hope to make Jackson usable from other JVM languages, not just Java -- already one can use it from quite a few (reported to work from Groovy, Clojure), and hopefully supporting others like Scala in near future (Scala lists are not well handled, yet) -- as well as interoperate nicely with most common data libraries.

To this end, there is now default support (i.e. no need for custom converters or mix-in annotations) for following data types:

  • DOM (xml) trees: properties declared as DOM Document, Element and Node will now be properly serialized to, and deserialized from JSON Strings. Useful if you want to embed XML as JSON properties (sometimes good for interoperability)
  • Joda DateTime type, and mechanism to easily add more types as needed (file a Jira request if you need more!)
  • Handling of javax.xml types that some platforms lack (Android and GAE have had some issues) much improved so that they are dynamically and reliably added, if underlying types are present.

Last point should also make it yet easier to make Jackson run on new "subset platforms"; containers that support subset of JDK 1.5 (or have issues with some parts).

3. Plans for 1.5

So what next? 1.4 release can be thought of a "minor" minor release, similar to 1.3; compared to fundamentally new functionality of 1.2 (mix-in annotations, JsonCreators), 1.3 and 1.4 have consisted of smaller (but more numerous) evolutionary incremental improvements. In many ways, Jackson releases have mirror JDK releases, come to think of that.

Anyway: the next Big Thing will be "Polymorphic Deserialization", which is by far the most requested feature. That is, ability to deserialize instances of correct types, even in absence of static type information (declared type of, say, List<Object> could still be deserialized to contain whatever actual type of serialized instance was). Getting this done is important in itself, but the most important aspect in my mind is to Do It Right. This should not be a stop-gap solution, or something to rewrite in near future. It should be comprehensive, flexible and robust solution to the non-trivial problem. And plan is to do just that; now that other queued blocking issues (like that of finall getting much-requested JsonView done) have been dealt with.

Wednesday, December 16, 2009

More Jackson adoption: Mule/iBeans

As per this announcement, Mule is another major framework that has officially adopted Jackson for its JSON processing needs ("first we take JAX-RS... then we take ESB!"). That should be good for everyone involved. And more work for the development, polishing and fixing things that new flow of users brings in; as well as plenty more exposure for the project & processor.
That should keep the project honest, relevant, and hopefully producing diamond(s) -- without high pressure, all you'd have would be lump of coal. :-)

On a sort of related note; the latest tally of number of contributors (individuals named on 'release-notes/CREDITS') is 50. Not too shabby -- Jackson will pretty soon bypass Woodstox in most regards (see anearlier blog entry for context); probably not by LOC, but in most other measures.

Tuesday, December 08, 2009

JSON data binding performance (again!): Jackson / Google-gson / JSON Tools... and FlexJSON too

(note: this is a follow-up on an earlier measurements)

1. A New Contestant: FlexJson

After realizing that FlexJson is actually capable of both serialization and deserialization (somehow I thought it would only serialize things), I decided to add it as the fourth contestant in the "full service Java/JSON data binding" category of tests.

Initially I was bit discouraged to find that it makes one rookie mistake: assumes that somehow JSON comes in (and goes out) as Java Strings. But aside from this glitch, package actually looks quite solid -- and its exclusion/inclusion mechanism looks interesting. Maybe not exactly my cup of joe (if it was, after all, Jackson API would look more like it does), but a viable alternative. And I can see how ability to prevent deep copy would come in handy sometimes. And finally, some of the features actually exceed what Jackson can currently do, regarding polymorphic deserialization (since FJ includes class name by default, I assume it can do it) and some level of cyclic-dependency handling (ignoring serialization of cyclic references at least).

So let's see how "rookie" (yes, I know, it's not exactly a new package, just new addition to the test) fares...

2. Test setup

Tests are run using nice Japex performance test framework, running on my somewhat old AMD work station (~1700 Ghz Athlon -- someone needs to click on those right-hand-side ads to get me a new performance-testing work station! :-) ).

Input data used consists of serialization of tabular data (database dump, good old "db100.xml" used by countless xml tests), converted to Java POJOs, and then to individual data formats (here as JSON, but can be tested as XML and whatnot). Document size is 20k in XML, and slightly less in JSON (about 16k). It would be easy to run using other data sets, but in the past, performance ratios for 2k, 20k and 200k documents have not had radical differences, so 20k one seems like a reasonable choice (but note that the earlier benchmark did in fact use 2k documents, so actual numbers do differ).

Test project itself, "StaxBind" is still in Woodstox SVN repository, accessible via Codehaus SVN page. (one of these days I should just create a Github project -- but not today).

Versions of JSON processing packages are as follows:

  • Jackson 1.2.0
  • Google-gson 1.4
  • Json-tools-core 1.7
  • Flexjson-1.9.1

Code for each library is using default settings, and using what appears as the most efficient interface, for cases where transformations are from byte streams on server side (byte streams in, byte streams out).

3. Results

First things first: here's the money shot:

Data Binding Performance Graph

(or check out the for details)

Another way to represent results is by showing performance ratios, using the slowest implementation as base line (TPS == transactions per second; number of times a 20k document is read, written, or both):

(note: Jackson/manual is omitted since it is hand-written (if simple) serializer/deserializer, and there are no direct counterparts for other packages -- while it would give even bigger faster-than-thou ratio, it wouldn't be a fair comparison)

Impl Read (TPS) Write (TPS) Read+Write (TPS) R+W, times baseline
Jackson (automatic) 1599.272 2463.097 1033.809 25.6
FlexJson 125.277 125.277 94.904 2.35
Json-tools 94.051 126.954 49.008 1.2
GSON 56.58 112.455 40.38 1

So looks like our "new kid on the block" manages to outperform the other two non-Jackson JSON processors here. And at least get within an order-of-magnitude with Jackson... :-)

4. Musings

So it turns out that despite its interfacing (those String/byte conversions), Flexjson package manages to work more efficiently than some other packages that claim "simplicity and performance". And this without actually claiming to be particularly performant, but rather focusing on design of API and ease-of-use aspects. Pretty neat, I respect that.

5. Next?

My current main interest (with respect to performance issues) lie in the area of compressing data for transfer: after all, most of the time there is relative abundance of CPU power compared to available network and I/O bandwidth. This means that trading some CPU (needed for compression and decompression) seems like a bargain for many use cases.

But on the other hand,as we saw earlier, the question is "how much is too much". And that's where my new favorite simple-and-fast algorithm, LZF, comes in. But that's a different story.

Friday, December 04, 2009

Not your type? Jackson as the match-maker

By now, Jackson is becoming widely-known for its lightning-fast streaming JSON parser, as well as for its powerful, intuitive and efficient data binding functionality. But wait! There is more!

(and when you are convinced you need Jackson, head straight to Download page)

1. Background

Has this ever happened to you? You need an array of ints, and all you got is this dingy little List of Strings (representing numbers)! You would think JVM, as smart as it is, could quickly whip up a conversion to "do the needful"... but no. In fact, even simpler conversions like number to/from String, String to/from boolean, Sets to/from Lists and so on are irritating: easy to solve, sure, but with too much monkey code.

And don't even get me started on other conversions between encodings like base64 to byte arrays; dumping Object fields as Maps; or building Objects from Maps (possibly read from properties files). There are tons of simple tasks that should be made even simpler.

2. What if...

Ok, now: let's round up some facts related to POJO serialization and Jackson:

  1. Jackson is great at serializing all kinds of Java objects as JSON
  2. Jackson is awesome at deserializing JSON into Java objects

So: what if... say... you serialized, List<String> into JSON and... lessee... deserialized it back to.... int[]... what would happen? Conversion! (as long as Strings indeed contain numbers -- if not, what would happen is an exception of some kind)

Ah! I see, so, you are saying that serialize+deserialize == conversion! Like:

  //ObjectMapper mapper = new ObjectMapper();
  String json = mapper.writeValueAsString(myStringList);
  int[] intArray = mapper.readValue(json, int[].class);

Presto!

3. Jackson 1.3 gets it

With version 1.3, there is something that simplifies above procedure by 50%:

  int[] intArray = mapper.convertValue(myStringList, int[].class);

(yes, you should have seen that one coming)

3.1. Conversions: Basic types

Primitive type conversions obviously work:

  Boolean b = mapper.convertValue("true", Boolean.class);
Date d = mapper.convertValue("2009-10-10T12:00:00.00+0800");

although are not exactly any shorter than equivalent idioms you would use. But they serve as basis for other conversions, for Lists, Maps and arrays of primitives and wrappers.

3.2. Conversions: Containers

Similarly, containers with various content types work as expected:

  ArrayList<Integer> ints = mapper.convertValue(new Object[] { 13, "1", Integer.valueOf(3) });
Set<String> uniq = mapper.convertValue(new String[] { "a", "b", "a" }); // would produce set with 2 entries

3.3. Conversions Base64<->binary

And if you want to encode binary data, you can do:

  String encoded = mapper.convertValue(new byte[] { 1, 2, 3 });
byte[] decoded = mapper.convertValue(encoded, byte[].class);
(usually reading from a File or such)

3.4. Beans to Maps and back

Finally, you can also convert simple Java beans (or more generally POJOs) into Maps or JSON trees:

  Map fieldMap = mapper.convertValue(myBean, Map.class);
MyBean bean = mapper.convertValue(fieldMap, MyBean.class);
// or read from properties file
Properties props = new Properties();
props.load(new FileInputStream(file));
MyBean bean2 = mapper.convertValue(props, MyBean.class);
// and can convert to a JSON tree as well:
JsonNode rootNode = mapper.convertValue(bean2);

And you can obviously configure bean types with annotations (regular and mix-in annotations) as you like, as necessary for conversion you want to do.

Neat stuff, eh?

4. What does this have to do with JSON?

Good question. Nothing, really. :-)

That is: there is no requirement for the intermediate JSON generation; and in fact, for future Jackson versions, there will be improvements to allow use efficient intermediate data structure for these conversions.

And conveivably one could even refactor functionality into separate bean conversion package, if conversions are widely used without actual JSON processing.

5. Got better use cases?

I hope someone out there can come up with even better examples of this power. If so, let me know!

One area that I hope to improve upon is that of converting java.util.Properties into POJOs. Although sample above works, it does not deal with naming convention of "refField.anotherField.field = 3", which is one natural way to represented nested structures. It should be made to work; just needs little bit of name mangling; especially when trying to handle arrays and lists ("object.listField.4 = abx").

Wednesday, November 04, 2009

Jackson on Google AppEngine?

Looks like some brave folks are using Jackson on Google's App Engine (for JSON processing). Neat. Some minor problems exist (need to add a patch similar to one suggested), and chances are this is not the only thing one can encounter. But there shouldn't be anything fundamental difficult, based on succesful usage.

Oh, and on related note: Jackson 1.3 was just released. Check out feature list for details; list is long -- lots of fixes, incremental improvements -- but this time around there weren't many big features, just plenty of bug mop-up, added convenience methods, extended support for partially supported things and so on.

UPDATE, 06-Dec-2009: There have been some important fixes to issues runing on GAE and Android platforms -- make sure you use version 1.3.2 instead of 1.3.0 if running on either one (and please add comments if you do; whether it works, or you find other issues: sooner problems are reported, sooner they get fixed)

Wednesday, October 28, 2009

Data Format anti-patterns: converting between secondary artifacts (like xml to json)

One commonly asked but fundamentally flawed question is "how do I convert xml to json" (or vice versa).
Given frequency at which I have encountered it, it probably ranks high on list of data format anti-patterns.

And just to be clear: I don't mean that there is any problem in having (or wanting to have) systems that produce data using multiple alternative data formats (views, representations). Quite on contrary: ability to do so is at core of REST(-like) web services, which are one useful form of web services. Rather, I think it is wrong to convert between such representations.

1. Why is it Anti-pattern?

Simply put: you should never convert from secondary (non-authoritative) representation into another such representation. Rather, you should render your source data (which is usually in relational model, or objects) into such secondary formats. So: if you need xml, map your objects to xml (using JAXB or XStream or what you have); if you need JSON, map it using Jackson. And ditto for the reverse direction.

This of course implies that there are cases where such transformation might make sense: namely, when your data storage format is XML (Native Xml DBs) or Json (CouchDB). In those cases you just have to worry about the practical problem of model/format impedance, similar to what happens when doing Object-Relational Mapping (ORM).

2. Ok: simple case is simple, but how about multiple mappings?

Sometimes you do need multi-step processing; for example, if your data lives in the database. Following my earlier suggestion, it would seem like you should convert directly from relational model (storage format) into resulting transfer format (json or xml). Ideally, yes: if there are such conversions. But in practice it is more likely that a two-phase mapping (ORM from database to objects; and then from objects to xml or json) works better: mostly because there are good tools for separate phases, but fewer that would do the end-to-end rendition.

Is this wrong? No. To understand why, it is necessary to understand 3 classes of formats that are talking about:

  • Persistence (storage) format, used for storing your data: usually relational model but can be something else as well (objects for object DBs; XML for native XML databases)
  • Processing format: Objects or structs of your processing language (POJOs for Java) that you use for actual processing. Occasionally this can also be something more exotic; like XML when using XSLT (or relational data for complicated reporting queries)
  • Transfer format: Serialization format used to transfer data between end points (or sometimes time-shifting, saving state over restart); may be closely bound to processing format (as is the case for Java serialization)

So what I am really saying is that you should not transfer within a class of formats; in this case between 2 alternate transfer formats. It is acceptable (and often sensible) to do conversions between classes of formats; and sometimes doing 2 transforms is simpler than trying to one bigger one. Just not within a class.

3. Three Formats may be simpler than Just One

One more thing about above-mentioned three formats: there is also a related fallacy of thinking that there is a problem if you are using multiple formats/models (like relational model for storage, objects for processing and xml or json for transfer). Assumption is that additional transformations needed to convert between representations is wasteful enough to be a problem in and of itself. But it should be rather obvious why there are often distinct models and formats in use: because each is optimal for specific use case. Storage format is good for, gee, storing data; processing model good for efficiently massaging data, and transfer format good for piping it through the wire. As long as you don't add gratuitous conversions in-between, transforming on boundary is completely sensible; especially considering alternative of trying to find a single model that works for all cases. One only needs to consider case of "XML for everything" cluster (esp. XML for processing, aka XSLT) to see why this is an approach that should be avoided (or, Java serialization as transfer format -- that is another anti-pattern in and of itself).

Sunday, October 11, 2009

Fresh new hope for JSON Schema: "Orderly" improvements afoot!

Here's something interesting related to on-going (if slowly moving) JSON Schema effort: Orderly micro-language. Orderly just might be that something that makes JSON Schema usable. There are other things that could do that, too, like good tool support; but more convenient syntax seems like the shortest route to improved usability: a custom-built DSL that does NOT (have to) use the target syntax as its own syntax. What a great idea! (not a novel one; RelaxNG compact syntax has been around for a while, and that wasn't new -- no matter, good ideas are good ideas)

As the web site says: "Orderly... is an ergonomic micro-language that can round-trip to JSONSchema ... ... optional. syntactic sugar, fluff. Tools should speak JSONSchema, but for areas where humans have to read or write the schema there should be an option to expose orderly in addition to JSON". Sounds good, I like that.

We shall see if and how this works out. My personal interest is more in the area of type definition language -- for me validation is actually not all that interesting; mostly because I believe it can be done quite well at (Java) object level (see Bean Validation API). So much so that even XML Schema is used much more as type definition language for data binding (as THE Object/XML type system for things like Soap, JAXB) than for actual XML validation, although original focus has squarely on handling validation aspects. Futher indirect proof is that its main competitor, RelaxNG, which is superior alternative for validation, isn't nearly as popular overall -- it would totally squash Schema if validation was the dominant use case for XML schema languages; but RelaxNG is not very useful for data binding, alas (because it allows ambiguity in grammar, acceptable for validation, but problematic when trying to do type inference and matching).

But I digress. I think that a prettified DSL that translates to/from JSON Schema could handle type system aspects just as well JSON Schema would; which is to say, "possibly well enough to be useful". Although JSON Schema as is has some nasty flaws in this area (only single type per schema? you kidding me, right? All references via static URLs? Really?), maybe it can all work out in the end with some spit and polish. Jury is still out.

Thursday, October 08, 2009

Handling Base64-encoded binary data with Jackson

Hopefully by now you know that Woodstox can handle base64-encoded binary data for your XML use cases. You may even know that Jackson can do the same for JSON (notice that "g.writeBinary()" call in Jackson Tutorial?)

But there is actually bit more to know about base64 functionality here. Let's first review core Base64 handling with 3 main processing models Jackson supports.

1. Handling base64-encoded binary with Streaming API

Assuming you get JSON content like:

{
  "binary" : "hc3VyZS4="
}

you can get binary data out by, say:

  JsonParser jp = new JsonFactory().createJsonParser(jsonStr);
  jp.nextToken(); // START_OBJECT
  jp.nextValue(); // VALUE_STRING that has base64 (skips field name as that's not a value)
  byte[] data = jp.getBinaryValue();

And if you want to produce similar data, you can do:

  byte[] data = ...;
  StringWriter sw = new StringWriter();
  JsonGenerator jg = new JsonFactory().createJsonGenerator(sw);
  jg.writeStartObject();
  jg.writeFieldName("binary"); // 1.3 will have "writeBinaryField()" method
  jg.writeBinary(data, 0, data.length);
  jg.writeEndObject();

2. Handling base64-encoded binary with Data Binding

But where JsonParser and JsonGenerator make access quite easy, ObjectMapper makes it ridiculously easy. You just use 'byte[]' as the data type, and ObjectMapper binds data as expected.

  static class Bean {
    public byte[] binary;
  }

  ObjectMapper mapper = new ObjectMapper();
  Bean bean = mapper.readValue(jsonStr, Bean.class); 
  byte[] data = bean.binary; // Want to serialize it? Sure:
  String outputStr = mapper.writeValueAsString(bean); // note: Jackson 1.3 only; otherwise use StringWriter

3. Handling base64-encoded binary with Tree Model

Handling binary data is almost as easy with Tree Model as well:

  JsonNode object = mapper.readTree(jsonStr);
  JsonNode binaryNode = object.get("binary");
  byte[] data = binaryNode.getBinaryValue();

  // or construct from scratch, write?
  ObjectNode rootOb = mapper.createObjectNode();
  rootOb.put("binary", rootOb.binaryNode(data));
  outputStr = mapper.writeValueAsString(rootOb);

4. Additional Tricks

(DISCLAIMER: following features have been tested with Jackson 1.3, not yet released)

But what if you actually just want to encode or decode binary data to/from Base64-encoded Strings, outside context of JSON processing?

Turns out that you can do simple encoding and decoding quite easily. And as an additional bonus, Jackson's strong focus on performance means that the underlying codec is very efficient, even for "extra-curricular" use (where output buffering is not utilized as it is for incremental JSON processing).
In fact, it may just be faster than alternative commonly used processing toolkits.

Anyway: to encode arbitrary binary data as, you can do:

  import org.codehaus.jackson.node.BinaryNode;

  BinaryNode n = new BinaryNode(byteArray);
  String encodedText = n.getValueAsText();

// or as one-liner: encodedText = new BinaryNode(data).getValueAsText();

and to decode given Base64-encoded String, you can retrieve contained binary data by:

  import org.codehaus.jackson.node.TextNode;

  TextNode n = new TextNode(encodedString);
  byte[] data = n.getBinaryValue();

// or as one-liner: data = new TextNode().getBinaryValue();
// or, if encoded using non-standard Base64 variant, try: data = n.getBinaryValue(Base64Variants.MODIFIED_FOR_URL);

Useful? Possibly -- no need to include Jakarta Commons Codec just for Base64 handling, if you happen to use Jackson already.

About me

  • I am known as Cowtowncoder
  • Contact me at @yahoo.com
Check my profile to learn more.

Powered By