Using Woodstox via SAX API
One of less well-known features of Woodstox is that it implements SAX API as well as Stax API (and has since version 3.2). The reason this is not widely known is probably due to Woodstox starting as just a Stax implementation. But by now Woodstox has had enough time to mature as a SAX implementation, and it should be ready for serious use.
Why use SAX (instead of Stax)?
In my opinion, there are 2 main reason to choose to use SAX API instead of Stax:
- Interoperability: maybe the tool you need (like, say, XSLT processor or Schema validator) only supports XML content via SAX API (often since it predates Stax API).
- Chainability/pipelining: Event/callback-based approach SAX is a natural fit for processing pipelines; more so than Stax pull parsing. So although Stax is arguably more convenient to use if your application is in full and complete control of XML processing, it is less convenient if processing has to be done in a pipelined fashion.
I have used SAX for XSLT processing: while some processors like Saxon do offer Stax support, it is often still labelled as experimental, and not considered quite equal with SAX as the input source.
Why use Woodstox as the SAX parser?
But even if you choose to use SAX API, isn't there already another mature full-featured Java XML parser available? So why use Woodstox over the incumbent SAX candidate?
As the author of Woodstox I may be biased here, but I think there are couple of things that favor "Woodstox the SAX Parser":
- Features: Woodstox does offer some configurability other processors do not, such as the ability to process XML content in "fragment" or "multi-document" mode
- Good error reporting: as unpleasant as it is to get errors reported to you, at least Woodstox tries to do that discretely, timely and accurately. In fact, a lot of effort has gone to keep all information (like location, exact cause of problem) accurate and available. One area where this is obvious is handling of DTD problems.
- Performance: Woodstox has been consistently tested to be the fastest Open Source Java XML parser, independent of API used for parsing (Stax or SAX).
How to use Woodstox as a SAX parser?
As with other SAX implementations, there are multiple ways to construct a Woodstox-flavor SAX parser instance, but probably the most commonly used way nowadays is to use JDK-bundled (since 1.4) Java Api for Xml Processing, JAXP (javax.xml.parsers.*). Woodstox class com.ctc.wstx.sax.WstxSAXParserFactory implements javax.xml.parsers.SAXParserFactory. And to construct an instance, there are two main possibilities:
- construct it directly, or
- set appropriate system property (named not-so-surprisingly, "javax.xml.parsers.SAXParserFactory") to point to implementation class, and then call SAXParserFactory.newInstance()
(for more details, check out Javadocs for SAXParserFactory).
So, for example:
SAXParserFactory spf = new WstxSAXParserFactory();
after which you construct a parser (content handler) as usual, doing something like:
spf.setNamespaceAware(true); // yes, better enable namespaces SAXParser sp = spf.newSAXParser(); MyHandler h = new MyHandler(); sp.parse(new File("data.xml", h);
And that's about it. If you know how to use Xerces/SAX, you know how to use Woodstox/SAX. And I hope that if you do, please give feedback on the Woodstox user list.
More to Come!
I should have more to write, regarding the performance aspects of using
Woodstox: as part of "StaxBind" performance benchmark test suite, I have
written and run a few performance tests to measure XSLT performance
(both with Xalan and Saxon, 2 main Java XSLT processors, both of which
work just fine with Woodstox, Xerces and Aalto).
Results look promising (if not earth-shattering) for Woodstox. But I still need to spend a bit more time in ensuring that tests are fair, and results readable, so they will need to be part of a later entry. Stay tuned!