Sunday, June 11, 2006

Let's talk about Stax!

Although the third standard Java API for XML processing, Stax, was specified a while ago (JSR-173 final release happened on 25 March 2004), not much has been written about it, or its usage. There are some articles, and tutorials, but in generally they only touch the surface. These were also mostly written around time the specification was finalized (or sometimes before), and as such do not cover latest developments regarding the current state of the implementations. Much has changed since early 2004, in positive way.

This is unfortunate, since this API (or in general, type of XML processing it defines, often called "pull parsing") offers significant benefits for many types of XML processing tasks.

The lack of articles, and documentation in general has many reasons. Among these are:

  • After initial interest in Stax, and articles, problems were found with the reference implementation. The development of the RI also seemed stagnant. This may have left early adopters disillusioned, and perhaps also indicated that possibilities of Stax implementations themselves are limited.
  • Developers who have interest in and use for Stax are generally more experienced developers, and reasonably quickly managed to solve the immediate problems they had (or abandon the approach); either way, there is little need for tutorial after one has to dig deep in the code, or has lost the interest in the API as whole.
  • Low-level XML processing in general is often not needed: as long as there are higher level processing systems (such as XSL for transformations, XMLBeans and JAXB for data binding, various SOAP libraries for SOAP processing), it is possible for developers to have fully functioning systems without ever directly manipulating XML content. In this regard Stax is similar to Sax: both offer access to XML at the lowest possible abstraction level. The reason, then, for little being written could be that there is no perceived need.

But I believe it would be very useful to get more and more accessible content regarding Stax API itself, as well as the current state of and plans for the actively developed implementations. Regarding issues listed above:

  • Since the reference implementation was released and open source, 2 new actively maintained implementations have been released, both of which surpass functionality and quality of the reference implementation:
  • Even the experienced developers would benefit a lot from learning about some of more subtle issues regarding Stax implementations. For example, even though Stax API defines functionally how things should work, there are often multiple functionally equivalent ways of doings things, with varying efficiency. It is not necessarily clear, without further documentation, which are the "best practices" of doing things. Since Stax processing is potentially the fastest way to process XML from Java, performance differences are often more important with Stax, than with higher level tools.Al
  • Although it is often not necessary to process XML at the lowest possible level, it is still useful to know how to do it when it is necessary. For example, there are currently few higher-level libraries or tools that can operate with documents whose size exceeds available memory. Since streaming processing (that both SAX and Stax can do) can do just this, it is very useful to know about approaches. And since both SAX and Stax have their own pros and cons (both at API and implementation level), general knowledge of Stax should prove useful for anyone dealing with XML on Java platform.

So... It is all nice to talk about writing about Stax. But wouldn't it be better to actually write about the dang thing? As an author of Woodstox, I am in good position to try to do my share by writing about thing or two I know about Stax, Woodstox, performance tricks and tips, and all related things. So how about we "Talk About Stax"? ("... Let's talk about all the good things, And the bad things that may be...")

Stay tuned: I will start writing things I have meant to document for a long time, including but not limited to:

  • How to REALLY use the XMLStreamWriter ("Repairing WHAT mode?")
  • What matters with respect to speed: aka "How to make my XML processing code fly"
  • How do I validate documents?
  • What's new with Woodstox "experimental" Stax2 extension to basic Stax API.
  • How can I customize quoting of characters with XMLStreamWriter
  • Is Stax really fast? How about binary encodings, like the Fast Infoset, is that really fast?

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.