Wednesday, March 10, 2010

Users of Jackson, Unite!

I know this is something that users of Jackson JSON library must have been craving for longest time: their Very Own Social Network (no? you'd rather have handling of cyclic types? d'oh!)
Wait no longer: enter jackson-users.ning.com. Place for developers who know the Right Tool for slicing and dicing JSON on Java platform (or possible, any platform).

So what's the point? Besides my own curiosity (and, shall we say, desire to "eat your own dog food"; check out author's profile for more info) -- which was a big factor -- I thought it might be one more way to get more of community feel. Mailing lists are good, blogs and wikis too, but a gathering place with bit of all of them might be even better.
We shall see how it all works out -- and thanks to lower threshold for participating (since it's easier to give access, let members edit and add things), you can be important part of making it Work Right.

See you there: I'll try sending a nice Virtual Gift for member number 100... :-)

Groovy/Gaelyk + Google App Engine + Jackson == Nice Restful Framework

Here's another interesting article: "RESTing with JSON in Gaelyk". Looks like Jackson usage on "Google platforms" (Android, GAE) is becoming more of a maintsream thing. This is great for multiple reasons, and multi-platform support is one of (many) goals of Jackson.
But just having goals matters little if they are not met. Fortunately, thanks to active user/adopter base, goals are being verified over time; and where there are problems, they usually get resolved in timely matter.

Friday, February 26, 2010

Jackson, compliments du jour, en francaise

I wish my french skills were little bit more refined (6 months of suggestopedic teaching, 2 hours per week apparently is not enough to get more than general idea of text :) ). But from what I gather, Android pour l’entreprise – 6 – Oubliez Gson, Jackson rocks my world! seems to have generally positive outlook on Jackson for JSON processing on Android platform (oubliez meaning "forget"... it's good that cognac ratings help my language skills here!). And apparently much of this is due to Android VM (Dalvik) being somewhat sensitive to GC-induced stress; so Jackson's focus on efficiency (not just speed, but focus on simplicity of code, trying to avoid extraneous intermediate storage and code) really pays off.

It is great that a library can be versatile enough to perform well on wide set of platforms; and it is absolutely marvellous that there are users who put Jackson to good use, and let others know what works and what doesn't.

Anyway, thought I'll share this; Android developers in practicular might find this interesting. Also: author of the article has suggested couple of good improvements to Jackson, too, to make things work even better in future.

Thursday, December 31, 2009

Upgrading from Woodstox 3.x to 4.0

It has now been almost one year since Woodstox 4.0 was released.
Given this, it would be interesting to know how many Woodstox users continue using older versions, and how many have upgraded.

My guess (somewhat educated, too, based on bug reports and some statistcs on Maven dependencies) is that adoption has been quite slow. I think this is primarily due to 3 things:

  1. Older versions work well, and fulfill all current needs of the user
  2. New functionality that 4.0 offers is not widely known, and/or is not (currently!) needed
  3. There are concerns that because this is a major version upgrade, upgrade might not go smoothly.

I can not argue against (1): Woodstox has been a rather solid product since first official releases; and 3.2 in particular is a well-rounded rock solid XML processor (if you are using an earlier version, however, at least upgrade to latest 3.2 patch version, 3.2.9!).
And with respect to (2), I have covered most important pieces of new functionality, Typed Access API and Schema Validation.

But so far I have not written anything about incompatible changes between 3.2 and 4.0 versions. So let's rectify that omission.

1. Why Upgrade?

But first: maybe it is worth iterating couple of reasons why you might want to upgrade at all:

  1. You might want to validate XML documents you read or write against W3C Schema (aka XML Schema). Earlier versions only allowed validating against DTDs and Relax NG schemas
  2. If you want to access typed content -- that is, numbers, XML qualified names, even binary content, contained as XML text -- new Typed Access API simplifies code a lot, and also makes it more efficient.
  3. Latest versions of useful helper libraries like StaxMate require Woodstox 4.0 (StaxMate 2.0 needs 4.x, for example)
  4. No new development will be done for 3.2 branch; and eventually not even bug fixes.

Assuming you might want to upgrade, what possible issues could you face?

2. Backwards incompatible changes since 3.2

Based on my own experiences, there are few issues with upgrade. Although the official list of incompatibilities has a few entries, I have only really noticed one class of things that tend to fail: Unit tests!

Sounds bad? Actually, yes and no: no, because these are not real failures (ones I have seen). And yes, since it means that you end up fixing broken test code (extra overhead without tangible benefits). But this is one of challenges with unit tests: fragility is often desireable, but not always so.

Specific problem that I have seen multiple times is related to one cosmetic aspect of XML: inclusion of white space with elements.

Woodstox 3.2 used to output empty elements with "extra" white space, like so:

<empty />

but 4.0 will not add this white space:

<empty/>

(this is a new feature as per WSTX-125 Jira entry)

and so some existing unit tests for systems I have worked on compare literal XML for output tests. This is not optimal, but it is bit less work than writing tests in more robust way, to check for logical (not physical) equality. So whereas they formerly assume existence of such white space, tests need to be modified not to expect it (or allow either way).

3. Other challenges?

Actually, I have not seen any actual problems, or other cosmetic problems. But here are other changes that are most likely to cause compatibility problems (refer to the full list mentioned earlier for couple of changes that are much less likely to do so):

  • "Default namespace" and "no prefix" are now consistently reported as empty Strings, not nulls (unless explicitly specified otherwise in relevant Stax/Stax2 Javadocs). Usually this does not cause problems, because Stax-dependant code has had to deal with inconsistencies with other Stax implementations; but could cause problems if code is expecting null.
  • "IS_COALESCING" was (accidentally) enabled for Woodstox versions prior to 4.0. This was fixed for 4.0 (as per Stax specification), but it is possible that some code was assuming on never getting partial text segments (if developer was not aware of Stax allowing such splitting of segment, similar to how SAX API does it.

4. Upgrade or not?

I would recommend investigating upgrade; if for nothing else, because of maintenance aspect. Pre-4.0 versions will not be actively maintained in future. But it is good to be aware of what has changed, and of course having good set of unit tests should guard against unexpected problems.

And hey, it's soon 2010 -- Woodstox 3.2 is soooo 2008. :-)

Tuesday, December 22, 2009

Another Jackson adopter: Spring-json

As has been reported earlier, Spring 3.0 (and Spring-json module, specifically) now has a Jackson-based JSON view variant: see MappingJacksonJsonView and spring-json view comparison. Comparison seems reasonable, mentioning annotation-based configurability and performance as strong points of jackson-based view.

UPDATE, 28-Dec-2009: One more for the road: Restlet is also including Jackson-based extension.

Sunday, December 20, 2009

Jackson 1.4: more control over writing JSON, improved interoperability

First things first: if you haven't noticed yet, Jackson 1.4.0 was just released.

This release focuses mainly on writer (serialization) side, but there are also continuing improvements to interoperability. I will review main improvements below.

1. JSON generation improvements

1.1 Ignoring properties

A new annotation, @JsonIgnoreProperties, allows:

  • omitting serialization of listed properties: @JsonIgnoreProperties({ "secretField", "internalProperty" }); listed properties will not be included in JSON output
  • omitting listed properties from being deserialized; if encountered they are just ignored even if there is a setter for them (regardless of whether setter is marked to be ignored or not)
  • ignoring all unknown properties for annotated class during deserialization (similar to disabling DeserializationConfig.Feature.FAIL_ON_UNKNOWN_PROPERTIES), but only affects instances of annotated class. This is done with property "ignoreUnknown": @JsonIgnoreProperties(ignoreUnknown=true) (note: has no effect on serialization)

1.2 JsonView

New @JsonView annotation allows defining logical views for serialization: sets of properties to be written out for given view.

Let's consider a simple example, where we want to control amount of information written out, based on, say, user's credentials. To define 3 classes of properties, we can define views (more about identification below):

  class Views { // container for View classes
static class Public { }
static class ExtendedPublic extends PublicView { }
static class Internal extends ExtendedPublicView { }
  }
  
And to define access levels for our info class, we would do
public class PersonalInformation { // Bean that uses Views to define subsets of properties to include // Name is public @JsonView(Views.Public.class) String name; // Address semi-public @JsonView(Views.ExtendPublic.class) Address address; // SSN only for internal usage @JsonView(Views.Internal.class) SocialSecNumber ssn; }

Given this set up, we would define View to use for serialization by:

  objectMapper.writeValueUsingView(out, infoInstance, Views.Public.class); // short-cut
  // or full version:
  objectMapper.getSerializationConfig().setSerializationView(Views.Public.class);
  objectMapper.writeValue(out, beanInstance); // will use active view set via Config
  // (note: can also pre-construct config object with 'mapper.copySerializationConfig'; reuse configuration)

Which in this particular case would only contain "name" property. If we had used view Views.ExtendedPublic.class, we would have gotten 2 fields; and with Views.Internal.class, all 3.

Views are identified by classes: you can either create specific marker classes, or use existing classes. Views use inheritance indicated by class structure: such that a view is considered a sub-view of another view if it extends that view. Child views include properties that parent views include.

For more description see JsonView wiki page.

1.3 Ordering properties output

Another new annotation, @JsonPropertyOrder, allows defining complete and partial field orderings:

  • You can define explicit ordering by listing properties as annotation value: @JsonPropertyOrder({ "id", "name" }) would ensure that "id" and "name" are output before any other properties during serialization
  • You can specify that anything not explicitly ordered will be output in alphabetic order: @JsonPropertyOrder(alphabetic=true)
  • Without these settings order is undefined because JVM does not expose order of underlying fields or methods (but see note below)

Beyond these definitions, 1.4 also guarantees that properties used with @JsonCreator annotations (constructors, factories) are serialized before other properties, unless there are explicitly ordered properties (which will have priority). This change was to optimize @JsonCreator property usage: ideally these properties should be readable before other properties -- although JSON logical model does not provide for such guarantee, Jackson will try to do its best to make ordering optimal.

2. Interoperability improvements

Goal of interoperability improvement is to make Jackson a "universal" JSON data binding tool on JVM. That is: we hope to make Jackson usable from other JVM languages, not just Java -- already one can use it from quite a few (reported to work from Groovy, Clojure), and hopefully supporting others like Scala in near future (Scala lists are not well handled, yet) -- as well as interoperate nicely with most common data libraries.

To this end, there is now default support (i.e. no need for custom converters or mix-in annotations) for following data types:

  • DOM (xml) trees: properties declared as DOM Document, Element and Node will now be properly serialized to, and deserialized from JSON Strings. Useful if you want to embed XML as JSON properties (sometimes good for interoperability)
  • Joda DateTime type, and mechanism to easily add more types as needed (file a Jira request if you need more!)
  • Handling of javax.xml types that some platforms lack (Android and GAE have had some issues) much improved so that they are dynamically and reliably added, if underlying types are present.

Last point should also make it yet easier to make Jackson run on new "subset platforms"; containers that support subset of JDK 1.5 (or have issues with some parts).

3. Plans for 1.5

So what next? 1.4 release can be thought of a "minor" minor release, similar to 1.3; compared to fundamentally new functionality of 1.2 (mix-in annotations, JsonCreators), 1.3 and 1.4 have consisted of smaller (but more numerous) evolutionary incremental improvements. In many ways, Jackson releases have mirror JDK releases, come to think of that.

Anyway: the next Big Thing will be "Polymorphic Deserialization", which is by far the most requested feature. That is, ability to deserialize instances of correct types, even in absence of static type information (declared type of, say, List<Object> could still be deserialized to contain whatever actual type of serialized instance was). Getting this done is important in itself, but the most important aspect in my mind is to Do It Right. This should not be a stop-gap solution, or something to rewrite in near future. It should be comprehensive, flexible and robust solution to the non-trivial problem. And plan is to do just that; now that other queued blocking issues (like that of finall getting much-requested JsonView done) have been dealt with.

Thursday, December 17, 2009

On good, efficient data formats

There are 2 fairly recent additions to category of "good binary data formats that work nice with Java" category: Avro and Kryo. I have meant to write something about both for a while.

1. Avro

Avro is a simple and efficient general-purpose data format, developed as part of Hadoop project. Due to its background, it should work very well with Hadoop (and map/reduce systems in general). It is also quite similar to what I had been thinking of implementing for my own (well, my employer's, rather) large-scale data processing needs, when there was no Avro. From my shallow understanding of Avro, it seems to nicely fit the bill for data format for huge sequence of records; but with self-describing property that is sadly lacking from other contestants like Google's protobuf.

I will hopefully have some more tangible notes to add in future: for now it's enough to note that Avro's performance seems to be pretty good, at least in the "thrift-protobuf" benchmark.

2. Kryo

Strictly speaking Kryo is not a data format, but rather Java object serialization framework that happens to define a data format to handle its main task. But since sending POJOs back and forth over the wire is a very common (perhaps the most common) task for data formats, in Java world, this is not a big difference.

First thing I noticed was its good performance (on above-mentioned benchmark). This is nice, since there is already JDK default serialization that performs adequately for most tasks; so anything else that does binary serialization should be able to meet and beat that performance baseline, to be of interest. But as importantly, API seems straight-forward, simple, and adequately customizable.
So I have reasonably high expectations for this library -- it could be nice complement to something like, say, Dirmi (RMI alternative for JDK default one -- should be coupled with alternative, similarly improved, serialization mechanism, n'est pas?).

3. Disclaimer

Alas, I have not made up compelling use case for using these two projects, yet. But given promise they hold, I should be able to test them out come next year.

Wednesday, December 16, 2009

More Jackson adoption: Mule/iBeans

As per this announcement, Mule is another major framework that has officially adopted Jackson for its JSON processing needs ("first we take JAX-RS... then we take ESB!"). That should be good for everyone involved. And more work for the development, polishing and fixing things that new flow of users brings in; as well as plenty more exposure for the project & processor.
That should keep the project honest, relevant, and hopefully producing diamond(s) -- without high pressure, all you'd have would be lump of coal. :-)

On a sort of related note; the latest tally of number of contributors (individuals named on 'release-notes/CREDITS') is 50. Not too shabby -- Jackson will pretty soon bypass Woodstox in most regards (see anearlier blog entry for context); probably not by LOC, but in most other measures.

Tuesday, December 08, 2009

JSON data binding performance (again!): Jackson / Google-gson / JSON Tools... and FlexJSON too

(note: this is a follow-up on an earlier measurements)

1. A New Contestant: FlexJson

After realizing that FlexJson is actually capable of both serialization and deserialization (somehow I thought it would only serialize things), I decided to add it as the fourth contestant in the "full service Java/JSON data binding" category of tests.

Initially I was bit discouraged to find that it makes one rookie mistake: assumes that somehow JSON comes in (and goes out) as Java Strings. But aside from this glitch, package actually looks quite solid -- and its exclusion/inclusion mechanism looks interesting. Maybe not exactly my cup of joe (if it was, after all, Jackson API would look more like it does), but a viable alternative. And I can see how ability to prevent deep copy would come in handy sometimes. And finally, some of the features actually exceed what Jackson can currently do, regarding polymorphic deserialization (since FJ includes class name by default, I assume it can do it) and some level of cyclic-dependency handling (ignoring serialization of cyclic references at least).

So let's see how "rookie" (yes, I know, it's not exactly a new package, just new addition to the test) fares...

2. Test setup

Tests are run using nice Japex performance test framework, running on my somewhat old AMD work station (~1700 Ghz Athlon -- someone needs to click on those right-hand-side ads to get me a new performance-testing work station! :-) ).

Input data used consists of serialization of tabular data (database dump, good old "db100.xml" used by countless xml tests), converted to Java POJOs, and then to individual data formats (here as JSON, but can be tested as XML and whatnot). Document size is 20k in XML, and slightly less in JSON (about 16k). It would be easy to run using other data sets, but in the past, performance ratios for 2k, 20k and 200k documents have not had radical differences, so 20k one seems like a reasonable choice (but note that the earlier benchmark did in fact use 2k documents, so actual numbers do differ).

Test project itself, "StaxBind" is still in Woodstox SVN repository, accessible via Codehaus SVN page. (one of these days I should just create a Github project -- but not today).

Versions of JSON processing packages are as follows:

  • Jackson 1.2.0
  • Google-gson 1.4
  • Json-tools-core 1.7
  • Flexjson-1.9.1

Code for each library is using default settings, and using what appears as the most efficient interface, for cases where transformations are from byte streams on server side (byte streams in, byte streams out).

3. Results

First things first: here's the money shot:

Data Binding Performance Graph

(or check out the for details)

Another way to represent results is by showing performance ratios, using the slowest implementation as base line (TPS == transactions per second; number of times a 20k document is read, written, or both):

(note: Jackson/manual is omitted since it is hand-written (if simple) serializer/deserializer, and there are no direct counterparts for other packages -- while it would give even bigger faster-than-thou ratio, it wouldn't be a fair comparison)

Impl Read (TPS) Write (TPS) Read+Write (TPS) R+W, times baseline
Jackson (automatic) 1599.272 2463.097 1033.809 25.6
FlexJson 125.277 125.277 94.904 2.35
Json-tools 94.051 126.954 49.008 1.2
GSON 56.58 112.455 40.38 1

So looks like our "new kid on the block" manages to outperform the other two non-Jackson JSON processors here. And at least get within an order-of-magnitude with Jackson... :-)

4. Musings

So it turns out that despite its interfacing (those String/byte conversions), Flexjson package manages to work more efficiently than some other packages that claim "simplicity and performance". And this without actually claiming to be particularly performant, but rather focusing on design of API and ease-of-use aspects. Pretty neat, I respect that.

5. Next?

My current main interest (with respect to performance issues) lie in the area of compressing data for transfer: after all, most of the time there is relative abundance of CPU power compared to available network and I/O bandwidth. This means that trading some CPU (needed for compression and decompression) seems like a bargain for many use cases.

But on the other hand,as we saw earlier, the question is "how much is too much". And that's where my new favorite simple-and-fast algorithm, LZF, comes in. But that's a different story.

Friday, December 04, 2009

Not your type? Jackson as the match-maker

By now, Jackson is becoming widely-known for its lightning-fast streaming JSON parser, as well as for its powerful, intuitive and efficient data binding functionality. But wait! There is more!

(and when you are convinced you need Jackson, head straight to Download page)

1. Background

Has this ever happened to you? You need an array of ints, and all you got is this dingy little List of Strings (representing numbers)! You would think JVM, as smart as it is, could quickly whip up a conversion to "do the needful"... but no. In fact, even simpler conversions like number to/from String, String to/from boolean, Sets to/from Lists and so on are irritating: easy to solve, sure, but with too much monkey code.

And don't even get me started on other conversions between encodings like base64 to byte arrays; dumping Object fields as Maps; or building Objects from Maps (possibly read from properties files). There are tons of simple tasks that should be made even simpler.

2. What if...

Ok, now: let's round up some facts related to POJO serialization and Jackson:

  1. Jackson is great at serializing all kinds of Java objects as JSON
  2. Jackson is awesome at deserializing JSON into Java objects

So: what if... say... you serialized, List<String> into JSON and... lessee... deserialized it back to.... int[]... what would happen? Conversion! (as long as Strings indeed contain numbers -- if not, what would happen is an exception of some kind)

Ah! I see, so, you are saying that serialize+deserialize == conversion! Like:

  //ObjectMapper mapper = new ObjectMapper();
  String json = mapper.writeValueAsString(myStringList);
  int[] intArray = mapper.readValue(json, int[].class);

Presto!

3. Jackson 1.3 gets it

With version 1.3, there is something that simplifies above procedure by 50%:

  int[] intArray = mapper.convertValue(myStringList, int[].class);

(yes, you should have seen that one coming)

3.1. Conversions: Basic types

Primitive type conversions obviously work:

  Boolean b = mapper.convertValue("true", Boolean.class);
Date d = mapper.convertValue("2009-10-10T12:00:00.00+0800");

although are not exactly any shorter than equivalent idioms you would use. But they serve as basis for other conversions, for Lists, Maps and arrays of primitives and wrappers.

3.2. Conversions: Containers

Similarly, containers with various content types work as expected:

  ArrayList<Integer> ints = mapper.convertValue(new Object[] { 13, "1", Integer.valueOf(3) });
Set<String> uniq = mapper.convertValue(new String[] { "a", "b", "a" }); // would produce set with 2 entries

3.3. Conversions Base64<->binary

And if you want to encode binary data, you can do:

  String encoded = mapper.convertValue(new byte[] { 1, 2, 3 });
byte[] decoded = mapper.convertValue(encoded, byte[].class);
(usually reading from a File or such)

3.4. Beans to Maps and back

Finally, you can also convert simple Java beans (or more generally POJOs) into Maps or JSON trees:

  Map fieldMap = mapper.convertValue(myBean, Map.class);
MyBean bean = mapper.convertValue(fieldMap, MyBean.class);
// or read from properties file
Properties props = new Properties();
props.load(new FileInputStream(file));
MyBean bean2 = mapper.convertValue(props, MyBean.class);
// and can convert to a JSON tree as well:
JsonNode rootNode = mapper.convertValue(bean2);

And you can obviously configure bean types with annotations (regular and mix-in annotations) as you like, as necessary for conversion you want to do.

Neat stuff, eh?

4. What does this have to do with JSON?

Good question. Nothing, really. :-)

That is: there is no requirement for the intermediate JSON generation; and in fact, for future Jackson versions, there will be improvements to allow use efficient intermediate data structure for these conversions.

And conveivably one could even refactor functionality into separate bean conversion package, if conversions are widely used without actual JSON processing.

5. Got better use cases?

I hope someone out there can come up with even better examples of this power. If so, let me know!

One area that I hope to improve upon is that of converting java.util.Properties into POJOs. Although sample above works, it does not deal with naming convention of "refField.anotherField.field = 3", which is one natural way to represented nested structures. It should be made to work; just needs little bit of name mangling; especially when trying to handle arrays and lists ("object.listField.4 = abx").

About me

  • I am known as Cowtowncoder
  • Contact me at @yahoo.com
Check my profile to learn more.

Powered By