December 2009

Upgrading from Woodstox 3.x to 4.0

It has now been almost one year since Woodstox 4.0 was released.
Given this, it would be interesting to know how many Woodstox users continue using older versions, and how many have upgraded.

My guess (somewhat educated, too, based on bug reports and some statistcs on Maven dependencies) is that adoption has been quite slow. I think this is primarily due to 3 things:

Older versions work well, and fulfill all current needs of the user
New functionality that 4.0 offers is not widely known, and/or is not (currently!) needed
There are concerns that because this is a major version upgrade, upgrade might not go smoothly.

I can not argue against (1): Woodstox has been a rather solid product since first official releases; and 3.2 in particular is a well-rounded rock solid XML processor (if you are using an earlier version, however, at least upgrade to latest 3.2 patch version, 3.2.9!).
And with respect to (2), I have covered most important pieces of new functionality, Typed Access API and Schema Validation.

But so far I have not written anything about incompatible changes between 3.2 and 4.0 versions. So let's rectify that omission.

1. Why Upgrade?

But first: maybe it is worth iterating couple of reasons why you might want to upgrade at all:

You might want to validate XML documents you read or write against W3C Schema (aka XML Schema). Earlier versions only allowed validating against DTDs and Relax NG schemas
If you want to access typed content -- that is, numbers, XML qualified names, even binary content, contained as XML text -- new Typed Access API simplifies code a lot, and also makes it more efficient.
Latest versions of useful helper libraries like StaxMate require Woodstox 4.0 (StaxMate 2.0 needs 4.x, for example)
No new development will be done for 3.2 branch; and eventually not even bug fixes.

Assuming you might want to upgrade, what possible issues could you face?

2. Backwards incompatible changes since 3.2

Based on my own experiences, there are few issues with upgrade. Although the official list of incompatibilities has a few entries, I have only really noticed one class of things that tend to fail: Unit tests!

Sounds bad? Actually, yes and no: no, because these are not real failures (ones I have seen). And yes, since it means that you end up fixing broken test code (extra overhead without tangible benefits). But this is one of challenges with unit tests: fragility is often desireable, but not always so.

Specific problem that I have seen multiple times is related to one cosmetic aspect of XML: inclusion of white space with elements.

Woodstox 3.2 used to output empty elements with "extra" white space, like so:

but 4.0 will not add this white space:

(this is a new feature as per WSTX-125 Jira entry)

and so some existing unit tests for systems I have worked on compare literal XML for output tests. This is not optimal, but it is bit less work than writing tests in more robust way, to check for logical (not physical) equality. So whereas they formerly assume existence of such white space, tests need to be modified not to expect it (or allow either way).

3. Other challenges?

Actually, I have not seen any actual problems, or other cosmetic problems. But here are other changes that are most likely to cause compatibility problems (refer to the full list mentioned earlier for couple of changes that are much less likely to do so):

"Default namespace" and "no prefix" are now consistently reported as empty Strings, not nulls (unless explicitly specified otherwise in relevant Stax/Stax2 Javadocs). Usually this does not cause problems, because Stax-dependant code has had to deal with inconsistencies with other Stax implementations; but could cause problems if code is expecting null.
"IS_COALESCING" was (accidentally) enabled for Woodstox versions prior to 4.0. This was fixed for 4.0 (as per Stax specification), but it is possible that some code was assuming on never getting partial text segments (if developer was not aware of Stax allowing such splitting of segment, similar to how SAX API does it.

4. Upgrade or not?

I would recommend investigating upgrade; if for nothing else, because of maintenance aspect. Pre-4.0 versions will not be actively maintained in future. But it is good to be aware of what has changed, and of course having good set of unit tests should guard against unexpected problems.

And hey, it's soon 2010 -- Woodstox 3.2 is soooo 2008. :-)

Posted by Tatu Saloranta at Thursday, December 31, 2009 10:33 PM
Categories: Java, XML/Stax
| Permalink |Comments | links to this post

How much does Free cost again?

One of supposedly hard question is what is the actual cost of freedom of speech. It can be debated endlessly, and is often considered priceless (Mastercard trademarks not-with-standing). Hard question, sure, and something I generally don't spend much time thinking about.

And then again, sometimes it is rather easy to know the exact cost. Like in case of my little blog: it is US$ 9.95.

That ultra fancy commenting interface that you see next to entries (and elaborate insanely complicated machinery that powers in in the backend), provided by kind folks of Haloscan, has been free (of cost) so far. But it was very recently brought to my attention that there is now a simple choice for me, a happy user of said system, to make: I am free to either (a) pay up (annual fee), or (b) "vacate the premises" (but with graceful allowance of letting me keep stuff I brought along, i.e. export comments!). What's not to like? What a delightful jule-time offer!

Choice is bit tricky: on one hand, philosophically I do not object to paying for services I use (or offer for others to use). But on the other hand, I really REALLY dislike impression of a bait-and-switch scheme in operation. Worse, given the short transition period -- unless I totally missed earlier notices, just a couple of weeks -- makes this feel rather rushed, "an offer you can't refuse". So in many ways I do feel like just taking my crap (no offense to respected authors of high-quality commentary here) to someplace else. Perhaps even spending lotsa time (which is money) to build a simple replacement for the system. Heck, I have been paid to write such systems. Might as well do some pro bono work for my own blog. But then again... I am lazy and time-constrained.

So: question is; is this dinky little commentary system worth 10 bucks a year? Especially when advertising income from all the sidebar ads is within that same order of magnitude, possibly less.

ps. Anyone know an actual free plug-in comment interface? Send me a note if you do! (or if you must... leave a comment... but if so, before Jan 02, 2010 :-) )

Posted by Tatu Saloranta at Tuesday, December 22, 2009 9:53 PM
Categories:
| Permalink |Comments | links to this post

Another Jackson adopter: Spring-json

As has been reported earlier, Spring 3.0 (and Spring-json module, specifically) now has a Jackson-based JSON view variant: see MappingJacksonJsonView and spring-json view comparison. Comparison seems reasonable, mentioning annotation-based configurability and performance as strong points of jackson-based view.

UPDATE, 28-Dec-2009: One more for the road: Restlet is also including Jackson-based extension.

Posted by Tatu Saloranta at Tuesday, December 22, 2009 7:58 PM
Categories: Java, JSON
| Permalink |Comments | links to this post

Jackson 1.4: more control over writing JSON, improved interoperability

First things first: if you haven't noticed yet, Jackson 1.4.0 was just released.

This release focuses mainly on writer (serialization) side, but there are also continuing improvements to interoperability. I will review main improvements below.

1. JSON generation improvements

1.1 Ignoring properties

A new annotation, @JsonIgnoreProperties, allows:

omitting serialization of listed properties: @JsonIgnoreProperties({ "secretField", "internalProperty" }); listed properties will not be included in JSON output
omitting listed properties from being deserialized; if encountered they are just ignored even if there is a setter for them (regardless of whether setter is marked to be ignored or not)
ignoring all unknown properties for annotated class during deserialization (similar to disabling DeserializationConfig.Feature.FAIL_ON_UNKNOWN_PROPERTIES), but only affects instances of annotated class. This is done with property "ignoreUnknown": @JsonIgnoreProperties(ignoreUnknown=true) (note: has no effect on serialization)

1.2 JsonView

New @JsonView annotation allows defining logical views for serialization: sets of properties to be written out for given view.

Let's consider a simple example, where we want to control amount of information written out, based on, say, user's credentials. To define 3 classes of properties, we can define views (more about identification below):

  class Views { // container for View classes
static class Public { }
static class ExtendedPublic extends PublicView { }
static class Internal extends ExtendedPublicView { }
  }
  
And to define access levels for our info class, we would do

  public class PersonalInformation { // Bean that uses Views to define subsets of properties to include
// Name is public
@JsonView(Views.Public.class) String name;
// Address semi-public
@JsonView(Views.ExtendPublic.class) Address address;
// SSN only for internal usage
@JsonView(Views.Internal.class) SocialSecNumber ssn;
  }

Given this set up, we would define View to use for serialization by:

  objectMapper.writeValueUsingView(out, infoInstance, Views.Public.class); // short-cut
  // or full version:
  objectMapper.getSerializationConfig().setSerializationView(Views.Public.class);
  objectMapper.writeValue(out, beanInstance); // will use active view set via Config
  // (note: can also pre-construct config object with 'mapper.copySerializationConfig'; reuse configuration)

Which in this particular case would only contain "name" property. If we had used view Views.ExtendedPublic.class, we would have gotten 2 fields; and with Views.Internal.class, all 3.

Views are identified by classes: you can either create specific marker classes, or use existing classes. Views use inheritance indicated by class structure: such that a view is considered a sub-view of another view if it extends that view. Child views include properties that parent views include.

For more description see JsonView wiki page.

1.3 Ordering properties output

Another new annotation, @JsonPropertyOrder, allows defining complete and partial field orderings:

You can define explicit ordering by listing properties as annotation value: @JsonPropertyOrder({ "id", "name" }) would ensure that "id" and "name" are output before any other properties during serialization
You can specify that anything not explicitly ordered will be output in alphabetic order: @JsonPropertyOrder(alphabetic=true)
Without these settings order is undefined because JVM does not expose order of underlying fields or methods (but see note below)

Beyond these definitions, 1.4 also guarantees that properties used with @JsonCreator annotations (constructors, factories) are serialized before other properties, unless there are explicitly ordered properties (which will have priority). This change was to optimize @JsonCreator property usage: ideally these properties should be readable before other properties -- although JSON logical model does not provide for such guarantee, Jackson will try to do its best to make ordering optimal.

2. Interoperability improvements

Goal of interoperability improvement is to make Jackson a "universal" JSON data binding tool on JVM. That is: we hope to make Jackson usable from other JVM languages, not just Java -- already one can use it from quite a few (reported to work from Groovy, Clojure), and hopefully supporting others like Scala in near future (Scala lists are not well handled, yet) -- as well as interoperate nicely with most common data libraries.

To this end, there is now default support (i.e. no need for custom converters or mix-in annotations) for following data types:

DOM (xml) trees: properties declared as DOM Document, Element and Node will now be properly serialized to, and deserialized from JSON Strings. Useful if you want to embed XML as JSON properties (sometimes good for interoperability)
Joda DateTime type, and mechanism to easily add more types as needed (file a Jira request if you need more!)
Handling of javax.xml types that some platforms lack (Android and GAE have had some issues) much improved so that they are dynamically and reliably added, if underlying types are present.

Last point should also make it yet easier to make Jackson run on new "subset platforms"; containers that support subset of JDK 1.5 (or have issues with some parts).

3. Plans for 1.5

So what next? 1.4 release can be thought of a "minor" minor release, similar to 1.3; compared to fundamentally new functionality of 1.2 (mix-in annotations, JsonCreators), 1.3 and 1.4 have consisted of smaller (but more numerous) evolutionary incremental improvements. In many ways, Jackson releases have mirror JDK releases, come to think of that.

Anyway: the next Big Thing will be "Polymorphic Deserialization", which is by far the most requested feature. That is, ability to deserialize instances of correct types, even in absence of static type information (declared type of, say, List<Object> could still be deserialized to contain whatever actual type of serialized instance was). Getting this done is important in itself, but the most important aspect in my mind is to Do It Right. This should not be a stop-gap solution, or something to rewrite in near future. It should be comprehensive, flexible and robust solution to the non-trivial problem. And plan is to do just that; now that other queued blocking issues (like that of finall getting much-requested JsonView done) have been dealt with.

Posted by Tatu Saloranta at Sunday, December 20, 2009 5:33 PM
Categories: Java, JSON
| Permalink |Comments | links to this post

Could you please tell me some more about athletes' marital problems, CNN?

It is an unfortunate fact of life that "news" services in US are in sorry, tepid state; and to get decent news coverage one has to use better international sources (BBC, or any european agency), or turn to non-daily/non-TV alternatives (magazines, which still offer reasonable in-depth coverage). But this on-going idiotic episode with a celebrity golf player's domestic issues takes the cake as the low point for this decade (maybe competing with media's criticless bashing of UN Iraq nuclear inspectors back in 2002 -- but I digress).

1. What could POSSIBLY be more important issue?

But hey, there have been recent orgies of lesser relevant news (did Michael Jackson's or Ann-Nicole Smith's deaths really warrant being top news entries). Why is this any different? Aside from being even less relevant -- honestly, gossip pages, or perhaps sports section (... which is ridiculously inflated part of local newspapers and TV programmes, anyway...) would have been better placements; and for respectable publications, possibly not even those -- than anything comparable in recent history, there is the thing that there has actually been lots of newsworthy things to write about.

Like, say, that gathering of world leaders in Copenhagen; discussing urgent (and eventually life-and-death) matters of saving the world. And in domestic section, well, there's plenty of economic stuff to write about, or the thing about medical industry and insurance. Oh, and hey, wasn't there a war of sizable portions also going on (actually, two, but who's counting).

In fact, I can't think of a reason for this even ranking on page 7 of thursday edition of the local newspaper. There are tabloids, after all, that could cover this stuff. Well, except that in US, it's not "newspapers vs tabloids"; it's mainstream (tabloid level) and fringes ("news of the world"). Even mainstream sells manufactured controversies (trademark of tabloid in other countries) and social porn.

And yet, somehow what irritates me most is that I noticed that CNN followed up on this stupid episode like a hawk; as if it really was a major story.

2. What did that "N" originally mean?

So why pick on CNN? After all, CNN is to News what MTV is to Music -- sad, irrelevant misnomer. Ted Turner would be rolling in his grave was he not alive. I guess it has more to do with the fact that CNN is ostensibly in the news business. Newspapers and most other networks are in general "media" business; they are also News dilettantes, spewing some amateur-level newsy stuff. But clearly TV networks are more into general entertainment; and newspapers into advertising with some commentary columns (well, actually, they also do do some local news stuff -- useful and sometimes noteworthy -- maybe I am being too harsh -- but only local, seldom even reaching to regional level).

So it's that when even entities that claim to do News fail to do that, well, that's pathetic.

3. Message to mr. Woods

Ok; enough ranting about sad state of US media. But here's a personal message for the nominal cause of this red herring of a news: Tiger, go stuff that golf club up your ass. Sideways. I don't care about your business (personal or otherwise) -- but it appears that your messy business has suddenly become my business. Stop it. Go, disappear. And for crying out loud, don't cry out loud in public. It is so pathetically unmanly that I feel nauseous. So, grow a spine (a pair you apparently already have). Whatever else you do, do NOT cause more media events. You are rich enough to afford to do whatever that other stupid athlete did after murdering his wife (of hey, yeah, come to think of that, do not do what that guy did in the end -- just the initial part of trying to keep low profile).

Posted by Tatu Saloranta at Saturday, December 19, 2009 10:57 PM
Categories: Philosophic, Rant
| Permalink |Comments | links to this post

On good, efficient data formats

There are 2 fairly recent additions to category of "good binary data formats that work nice with Java" category: Avro and Kryo. I have meant to write something about both for a while.

1. Avro

Avro is a simple and efficient general-purpose data format, developed as part of Hadoop project. Due to its background, it should work very well with Hadoop (and map/reduce systems in general). It is also quite similar to what I had been thinking of implementing for my own (well, my employer's, rather) large-scale data processing needs, when there was no Avro. From my shallow understanding of Avro, it seems to nicely fit the bill for data format for huge sequence of records; but with self-describing property that is sadly lacking from other contestants like Google's protobuf.

I will hopefully have some more tangible notes to add in future: for now it's enough to note that Avro's performance seems to be pretty good, at least in the "thrift-protobuf" benchmark.

2. Kryo

Strictly speaking Kryo is not a data format, but rather Java object serialization framework that happens to define a data format to handle its main task. But since sending POJOs back and forth over the wire is a very common (perhaps the most common) task for data formats, in Java world, this is not a big difference.

First thing I noticed was its good performance (on above-mentioned benchmark). This is nice, since there is already JDK default serialization that performs adequately for most tasks; so anything else that does binary serialization should be able to meet and beat that performance baseline, to be of interest. But as importantly, API seems straight-forward, simple, and adequately customizable.
So I have reasonably high expectations for this library -- it could be nice complement to something like, say, Dirmi (RMI alternative for JDK default one -- should be coupled with alternative, similarly improved, serialization mechanism, n'est pas?).

3. Disclaimer

Alas, I have not made up compelling use case for using these two projects, yet. But given promise they hold, I should be able to test them out come next year.

Posted by Tatu Saloranta at Thursday, December 17, 2009 11:12 PM
Categories: Java, Performance
| Permalink |Comments | links to this post

More Jackson adoption: Mule/iBeans

As per this announcement, Mule is another major framework that has officially adopted Jackson for its JSON processing needs ("first we take JAX-RS... then we take ESB!"). That should be good for everyone involved. And more work for the development, polishing and fixing things that new flow of users brings in; as well as plenty more exposure for the project & processor.
That should keep the project honest, relevant, and hopefully producing diamond(s) -- without high pressure, all you'd have would be lump of coal. :-)

On a sort of related note; the latest tally of number of contributors (individuals named on 'release-notes/CREDITS') is 50. Not too shabby -- Jackson will pretty soon bypass Woodstox in most regards (see anearlier blog entry for context); probably not by LOC, but in most other measures.

Posted by Tatu Saloranta at Wednesday, December 16, 2009 11:01 PM
Categories: Java, JSON
| Permalink |Comments | links to this post

Simple data analysis: 2x swiss pocket knife is sometimes ok

Although I am bit of a "nosql" believer (Cassandra, Voldemort and Amazon S3 kick a55!); and get easily frustrated with non-intuitive behavior of Excel, I admit there are real uses for spreadsheets and relational databases. And not just ones that developers often associate with them (plus there are lots of anti-patterns: do NOT stuff those config files in DB, please). But tasks like, say, Ad hoc chicken-wire-n-bubble-gum low-budget quick-n-dirty data analysis are something where combination of the two makes whole lots of sense.

So I was glad to see this blog post showing how to do that with H2 and a spreadsheet (note: you don't really need Excel -- OpenOffice (OOo) Calc is almost as good in many cases). H2 project itself seems interesting -- quite healthy amount of discussion on dev mailing lists, seemingly reasonable progress. So I may just write something about that in future as well.

Posted by Tatu Saloranta at Thursday, December 10, 2009 7:26 PM
Categories:
| Permalink |Comments | links to this post

Classic Android, with Electric Sheep

Yes, best pop and rock music, as well as sci-fi books seem to have been written in (late) 60s.

Case in point: Philip K Dick's classic, "Do Androids Dream of Electric Sheep?". What an absolute masterpiece. Very prescient, relevant, and fundamentally deep; all without seemingly trying too hard to be anything more than a decent story. None of wanna-be-intellectual babble -- smart with dead-on style; or trying to predict future -- it is enough to just reflect on reality, observations of human mind, and there you have it.

I originally read this book about 10 years ago, but as a translation (in finnish). It made a big impression even then -- even more so than the movie that was based on it (movie is pretty good too, but book is just so much much better... this even though I read book only after seeing the movie, usually "first one wins").

So: few months back I noticed a paperback english copy at the local book store (no, I don't order all my books online...) and decided to re-read it. Was it still good? Not just good, absolutely positively great. So if you like sci-fi but have somehow managed not to read it, go read it. Same applies to most (or perhaps all) books by PKD of course; like The Man in The High Castle, (written in -62), The Three Stigmata of Palmer Eldritch (-65) or Ubik (-69)... the list goes on and on. And even though his later works are very good too (like, say, VALIS and The Transmigration of Timothy Archer, his last book), it's something about those books from 60s that has the absolute brilliant genius. Or, shall we say, "shine of crazy diamons" (I know, I know, that's refecence to a great 70s song... but YEARLY 70s, not that far removed). I just wish he had had a chance to write just some more books. Yes, they are that good.

Oh, and just in case you are wondering: his 50s books are strikingly good too. :-)
I love "The World Jones Made" for example. Totally cool book, which somehow manages to mix in strong environmentalist themes (about 20-30 years before anyone else did, it seems), extension of pan-Gaia, and of course plenty of dark humor and bit of political commentary.

Given above, I concur that I am bit of a fan boy. Now, maybe I should get Kindle for Christmas, to be able to read more...

Posted by Tatu Saloranta at Wednesday, December 09, 2009 7:31 PM
Categories: Philosophic
| Permalink |Comments | links to this post

JSON data binding performance (again!): Jackson / Google-gson / JSON Tools... and FlexJSON too

(note: this is a follow-up on an earlier measurements)

1. A New Contestant: FlexJson

After realizing that FlexJson is actually capable of both serialization and deserialization (somehow I thought it would only serialize things), I decided to add it as the fourth contestant in the "full service Java/JSON data binding" category of tests.

Initially I was bit discouraged to find that it makes one rookie mistake: assumes that somehow JSON comes in (and goes out) as Java Strings. But aside from this glitch, package actually looks quite solid -- and its exclusion/inclusion mechanism looks interesting. Maybe not exactly my cup of joe (if it was, after all, Jackson API would look more like it does), but a viable alternative. And I can see how ability to prevent deep copy would come in handy sometimes. And finally, some of the features actually exceed what Jackson can currently do, regarding polymorphic deserialization (since FJ includes class name by default, I assume it can do it) and some level of cyclic-dependency handling (ignoring serialization of cyclic references at least).

So let's see how "rookie" (yes, I know, it's not exactly a new package, just new addition to the test) fares...

2. Test setup

Tests are run using nice Japex performance test framework, running on my somewhat old AMD work station (~1700 Ghz Athlon -- someone needs to click on those right-hand-side ads to get me a new performance-testing work station! :-) ).

Input data used consists of serialization of tabular data (database dump, good old "db100.xml" used by countless xml tests), converted to Java POJOs, and then to individual data formats (here as JSON, but can be tested as XML and whatnot). Document size is 20k in XML, and slightly less in JSON (about 16k). It would be easy to run using other data sets, but in the past, performance ratios for 2k, 20k and 200k documents have not had radical differences, so 20k one seems like a reasonable choice (but note that the earlier benchmark did in fact use 2k documents, so actual numbers do differ).

Test project itself, "StaxBind" is still in Woodstox SVN repository, accessible via Codehaus SVN page. (one of these days I should just create a Github project -- but not today).

Versions of JSON processing packages are as follows:

Jackson 1.2.0
Google-gson 1.4
Json-tools-core 1.7
Flexjson-1.9.1

Code for each library is using default settings, and using what appears as the most efficient interface, for cases where transformations are from byte streams on server side (byte streams in, byte streams out).

3. Results

First things first: here's the money shot:

Data Binding Performance Graph

(or check out the full results for details)

Another way to represent results is by showing performance ratios, using the slowest implementation as base line (TPS == transactions per second; number of times a 20k document is read, written, or both):

(note: Jackson/manual is omitted since it is hand-written (if simple) serializer/deserializer, and there are no direct counterparts for other packages -- while it would give even bigger faster-than-thou ratio, it wouldn't be a fair comparison)

Impl	Read (TPS)	Write (TPS)	Read+Write (TPS)	R+W, times baseline
Jackson (automatic)	1599.272	2463.097	1033.809	25.6
FlexJson	125.277	125.277	94.904	2.35
Json-tools	94.051	126.954	49.008	1.2
GSON	56.58	112.455	40.38	1

So looks like our "new kid on the block" manages to outperform the other two non-Jackson JSON processors here. And at least get within an order-of-magnitude with Jackson... :-)

4. Musings

So it turns out that despite its interfacing (those String/byte conversions), Flexjson package manages to work more efficiently than some other packages that claim "simplicity and performance". And this without actually claiming to be particularly performant, but rather focusing on design of API and ease-of-use aspects. Pretty neat, I respect that.

5. Next?

My current main interest (with respect to performance issues) lie in the area of compressing data for transfer: after all, most of the time there is relative abundance of CPU power compared to available network and I/O bandwidth. This means that trading some CPU (needed for compression and decompression) seems like a bargain for many use cases.

But on the other hand, as we saw earlier, the question is "how much is too much". And that's where my new favorite simple-and-fast algorithm, LZF, comes in. But that's a different story.

Posted by Tatu Saloranta at Tuesday, December 08, 2009 10:34 PM
Categories: Java, JSON, Performance
| Permalink |Comments | links to this post

Soft ball: entry #101 for 2009, mission kinda complete

Ok, looks like I managed to hit the target I set up recently, to write "one hundred and one" blog entries this year. Granted, this was not a giant stretch of literal muscle, given that at the time I had already gone most of the way. Nonetheless it is good to hit the target.

Since there's still some time to go until New Year, I think I might as well target the next even number, 111 entries.

Stay tuned...

Posted by Tatu Saloranta at Monday, December 07, 2009 7:35 PM
Categories: Silly
| Permalink |Comments | links to this post

Not your type? Jackson as the match-maker

By now, Jackson is becoming widely-known for its lightning-fast streaming JSON parser, as well as for its powerful, intuitive and efficient data binding functionality. But wait! There is more!

(and when you are convinced you need Jackson, head straight to Download page)

1. Background

Has this ever happened to you? You need an array of ints, and all you got is this dingy little List of Strings (representing numbers)! You would think JVM, as smart as it is, could quickly whip up a conversion to "do the needful"... but no. In fact, even simpler conversions like number to/from String, String to/from boolean, Sets to/from Lists and so on are irritating: easy to solve, sure, but with too much monkey code.

And don't even get me started on other conversions between encodings like base64 to byte arrays; dumping Object fields as Maps; or building Objects from Maps (possibly read from properties files). There are tons of simple tasks that should be made even simpler.

2. What if...

Ok, now: let's round up some facts related to POJO serialization and Jackson:

Jackson is great at serializing all kinds of Java objects as JSON
Jackson is awesome at deserializing JSON into Java objects

So: what if... say... you serialized, List<String> into JSON and... lessee... deserialized it back to.... int[]... what would happen? Conversion! (as long as Strings indeed contain numbers -- if not, what would happen is an exception of some kind)

Ah! I see, so, you are saying that serialize+deserialize == conversion! Like:

  //ObjectMapper mapper = new ObjectMapper();
  String json = mapper.writeValueAsString(myStringList);
  int[] intArray = mapper.readValue(json, int[].class);

Presto!

3. Jackson 1.3 gets it

With version 1.3, there is something that simplifies above procedure by 50%:

  int[] intArray = mapper.convertValue(myStringList, int[].class);

(yes, you should have seen that one coming)

3.1. Conversions: Basic types

Primitive type conversions obviously work:

  Boolean b = mapper.convertValue("true", Boolean.class);
  Date d = mapper.convertValue("2009-10-10T12:00:00.00+0800");

although are not exactly any shorter than equivalent idioms you would use. But they serve as basis for other conversions, for Lists, Maps and arrays of primitives and wrappers.

3.2. Conversions: Containers

Similarly, containers with various content types work as expected:

  ArrayList<Integer> ints = mapper.convertValue(new Object[] { 13, "1", Integer.valueOf(3) });
  Set<String> uniq = mapper.convertValue(new String[] { "a", "b", "a" }); // would produce set with 2 entries

3.3. Conversions Base64<->binary

And if you want to encode binary data, you can do:

  String encoded = mapper.convertValue(new byte[] { 1, 2, 3 });
  byte[] decoded = mapper.convertValue(encoded, byte[].class);
(usually reading from a File or such)

3.4. Beans to Maps and back

Finally, you can also convert simple Java beans (or more generally POJOs) into Maps or JSON trees:

  Map fieldMap = mapper.convertValue(myBean, Map.class);
  MyBean bean = mapper.convertValue(fieldMap, MyBean.class);
  // or read from properties file
  Properties props = new Properties();
  props.load(new FileInputStream(file));
  MyBean bean2 = mapper.convertValue(props, MyBean.class);
  // and can convert to a JSON tree as well:
  JsonNode rootNode = mapper.convertValue(bean2);

And you can obviously configure bean types with annotations (regular and mix-in annotations) as you like, as necessary for conversion you want to do.

Neat stuff, eh?

4. What does this have to do with JSON?

Good question. Nothing, really. :-)

That is: there is no requirement for the intermediate JSON generation; and in fact, for future Jackson versions, there will be improvements to allow use efficient intermediate data structure for these conversions.

And conveivably one could even refactor functionality into separate bean conversion package, if conversions are widely used without actual JSON processing.

5. Got better use cases?

I hope someone out there can come up with even better examples of this power. If so, let me know!

One area that I hope to improve upon is that of converting java.util.Properties into POJOs. Although sample above works, it does not deal with naming convention of "refField.anotherField.field = 3", which is one natural way to represented nested structures. It should be made to work; just needs little bit of name mangling; especially when trying to handle arrays and lists ("object.listField.4 = abx").

Posted by Tatu Saloranta at Friday, December 04, 2009 8:17 PM
Categories: Java, JSON
| Permalink |Comments | links to this post

Milk of Human Madness, Jule-tide edition

Ok, in between technical time, it's time to review some goofy stuff while we wait for Santa. Here goes...

1. Can't manage to find time to do something useful...

yet have plenty of time for "time management"?

Sound silly? Have a look at Pomodoro Technique. Great for giggles, as a case study for human insanity.
But if it starts to make some sense at any point, do not hesitate to get some professional help. Immediately.

But then again, there are always some co-workers who might benefit others by such techniques: by not having time to do anything, they could not make mistakes. And that's worth something too (brakes for loose cannons).

update: above comments are just related for application of said technique(s) to software development -- maybe other domains could benefit from intrusive regularly-scheduled interruptions (perhaps augmented by electrical shocks)

2. IRC? Yes, that thing hackers use when they don't want to be overheard!

Oh yes, you can always trust Numb3rs to get technical things FUBAR. Funny stuff.

Now, if you will excuse me, I will have to disconnect from my blog server before connection can be traced by FBI (it's that 30 second rule you may know from movies -- must triangulate fast -- gotta go!)

Posted by Tatu Saloranta at Thursday, December 03, 2009 11:08 PM
Categories: Philosophic, Rant, Silly
| Permalink |Comments | links to this post

CowTalk

Moo-able Type for Cowtowncoder.com

Thursday, December 31, 2009

Upgrading from Woodstox 3.x to 4.0

Tuesday, December 22, 2009

How much does Free cost again?

Another Jackson adopter: Spring-json

Sunday, December 20, 2009

Jackson 1.4: more control over writing JSON, improved interoperability

Saturday, December 19, 2009

Could you please tell me some more about athletes' marital problems, CNN?

Thursday, December 17, 2009

On good, efficient data formats

Wednesday, December 16, 2009

More Jackson adoption: Mule/iBeans

Thursday, December 10, 2009

Simple data analysis: 2x swiss pocket knife is sometimes ok

Wednesday, December 09, 2009

Classic Android, with Electric Sheep

Tuesday, December 08, 2009

JSON data binding performance (again!): Jackson / Google-gson / JSON Tools... and FlexJSON too

Monday, December 07, 2009

Soft ball: entry #101 for 2009, mission kinda complete

Friday, December 04, 2009

Not your type? Jackson as the match-maker

Thursday, December 03, 2009

Milk of Human Madness, Jule-tide edition

Search

Last posts

Categories

Archives

Related Blogs

Powered By

About me