Thursday, December 31, 2009

Upgrading from Woodstox 3.x to 4.0

It has now been almost one year since Woodstox 4.0 was released.
Given this, it would be interesting to know how many Woodstox users continue using older versions, and how many have upgraded.

My guess (somewhat educated, too, based on bug reports and some statistcs on Maven dependencies) is that adoption has been quite slow. I think this is primarily due to 3 things:

  1. Older versions work well, and fulfill all current needs of the user
  2. New functionality that 4.0 offers is not widely known, and/or is not (currently!) needed
  3. There are concerns that because this is a major version upgrade, upgrade might not go smoothly.

I can not argue against (1): Woodstox has been a rather solid product since first official releases; and 3.2 in particular is a well-rounded rock solid XML processor (if you are using an earlier version, however, at least upgrade to latest 3.2 patch version, 3.2.9!).
And with respect to (2), I have covered most important pieces of new functionality, Typed Access API and Schema Validation.

But so far I have not written anything about incompatible changes between 3.2 and 4.0 versions. So let's rectify that omission.

1. Why Upgrade?

But first: maybe it is worth iterating couple of reasons why you might want to upgrade at all:

  1. You might want to validate XML documents you read or write against W3C Schema (aka XML Schema). Earlier versions only allowed validating against DTDs and Relax NG schemas
  2. If you want to access typed content -- that is, numbers, XML qualified names, even binary content, contained as XML text -- new Typed Access API simplifies code a lot, and also makes it more efficient.
  3. Latest versions of useful helper libraries like StaxMate require Woodstox 4.0 (StaxMate 2.0 needs 4.x, for example)
  4. No new development will be done for 3.2 branch; and eventually not even bug fixes.

Assuming you might want to upgrade, what possible issues could you face?

2. Backwards incompatible changes since 3.2

Based on my own experiences, there are few issues with upgrade. Although the official list of incompatibilities has a few entries, I have only really noticed one class of things that tend to fail: Unit tests!

Sounds bad? Actually, yes and no: no, because these are not real failures (ones I have seen). And yes, since it means that you end up fixing broken test code (extra overhead without tangible benefits). But this is one of challenges with unit tests: fragility is often desireable, but not always so.

Specific problem that I have seen multiple times is related to one cosmetic aspect of XML: inclusion of white space with elements.

Woodstox 3.2 used to output empty elements with "extra" white space, like so:

<empty />

but 4.0 will not add this white space:


(this is a new feature as per WSTX-125 Jira entry)

and so some existing unit tests for systems I have worked on compare literal XML for output tests. This is not optimal, but it is bit less work than writing tests in more robust way, to check for logical (not physical) equality. So whereas they formerly assume existence of such white space, tests need to be modified not to expect it (or allow either way).

3. Other challenges?

Actually, I have not seen any actual problems, or other cosmetic problems. But here are other changes that are most likely to cause compatibility problems (refer to the full list mentioned earlier for couple of changes that are much less likely to do so):

  • "Default namespace" and "no prefix" are now consistently reported as empty Strings, not nulls (unless explicitly specified otherwise in relevant Stax/Stax2 Javadocs). Usually this does not cause problems, because Stax-dependant code has had to deal with inconsistencies with other Stax implementations; but could cause problems if code is expecting null.
  • "IS_COALESCING" was (accidentally) enabled for Woodstox versions prior to 4.0. This was fixed for 4.0 (as per Stax specification), but it is possible that some code was assuming on never getting partial text segments (if developer was not aware of Stax allowing such splitting of segment, similar to how SAX API does it.

4. Upgrade or not?

I would recommend investigating upgrade; if for nothing else, because of maintenance aspect. Pre-4.0 versions will not be actively maintained in future. But it is good to be aware of what has changed, and of course having good set of unit tests should guard against unexpected problems.

And hey, it's soon 2010 -- Woodstox 3.2 is soooo 2008. :-)

blog comments powered by Disqus

Sponsored By

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.