Wednesday, June 17, 2009

Reading DOM documents using Stax XML parser, StaxMate

One of new features of StaxMate 2.0 is the ability to read DOM Documents (given a plain old Stax XMLStreamReader), and write DOM documents (using a Stax XMLStreamWriter). This is something no Stax parser (no, not even Woodstox!) provides, since it is in the "reverse" direction of what Stax implementation could support (reading DOM documents as Stax streams, or directing output of a stream writer into DOM document.

Functionality for converting to/from DOM is contained in class org.codehaus.staxmate.dom.DOMConverter.

To read DOM documents, you do:

  FileInputStream in = new FileInputStream("input.xml");
  XMLStreamReader sr = XMLInputFactory.newInstance().createXMLStreamReader(in);
// ... then do whatever processing (if any), and point to START_ELEMENT
// (or leave at START_DOCUMENT: that'll work too) Document doc = new DOMConverter().buildDocument(sr); in.close();

and to write DOM document:

  FileOutputStream out = new FileOutputStream("output.xml");
  XMLStreamWriter sw = XMLInputFactory.newInstance().createXMLStreamWriter(out);
// and output stuff, if need be... new DOMConverter().writeDocument(doc, sw); sw.close(); out.close();

Ok, so you can do it but why would you? Most commonly this is useful when there is need to use tree-based processing tools like XSL transformers, or access using using XPath. Ability to build smaller documents from sub-trees is crucial to limit memory usage and thereby improve performance (or make such usage possible at all).

So far this interoperability support is still quite limited; but with little bit of encouragement, following future features could be implemented:

  • Similar functionality for building JDOM trees (code actually exist, in old Woodstox "stax-utils" package, just need to clean up), and perhaps XOM, DOM4j. (for XOM, there is already NUX, however, that covers the use case)
  • Ability to directly bind things straight via StaxMate input cursors and output objects. This is an obvious improvement -- the main reason current functionality operates on "raw" Stax objects is just that code to do so existed; to use StaxMate objects, little bit more work is needed to ensure proper synchronization. One nicety from doing this would be ability to filter out non-text/non-element nodes (comments).

As usual, feel free to comment on this functionality, or join StaxMate mailing lists. I will also incorporate these code samples in StaxMate documentation page(s)i.

Thursday, March 12, 2009

StaxMate 2.0: another Big Leap towards convenient, efficient xml processing

StaxMate 2.0.0 is now out, to augment your favorite Stax XML parser (like Woodstox or Aalto).
(for introduction to StaxMate, check out this tutorial).

Improvements for this release are focused in 3 main areas:

  • Convenient AND efficient access to typed content. With a little bit of help from a new version of Stax2 extension API (version 3.0), it is now possibly to efficiently read and write values of numeric (int, long, double), boolean and enumerated (Java enums) types.
    In future, more methods will be added to allow similar access to numeric arrays and base64-encoded binary content.
  • More convenience methods:
    • SMinputCursor.advance() to allow chaining, for example: int value = cursor.childElementCursor().advance().getElemIntValue()
    • SMInputCursor.asEvent() to construct XMLEvents for the current event.
    • Ability to pre-declare namespaces on output (using SMOutputElement.predeclareNamespace()) to minimize number of namespace declarations (not usually needed, sometimes is)
    • SMOutputElement.addElementWithCharacters() for a convenient short-cut.
  • Interoperability improvements:
    • DOMBuilder for building DOM documents out of XMLStreamReaders, and serializing DOM documents and elements using XMLStreamWriter.
    • StaxMate jar is now a fully functioning OSGi bundle

One reason for the major version bump is that this version requires implementation of Stax2 version 3.0, natively implemented by Woodstox and Aalto, and emulated for others (like Sjsxp) using Stax2 reference implementation. This version upgrade will offer wider range of functionality for future, and similar upgrade should not be needed in near future.

Friday, February 13, 2009

Tutorials for Two: StaxMate, Jackson

This took way longer than it should have, but after a wait spanning couple of years, let's hear it for:

I hope that these can attract even more developers to have a good look at these 2 awesome libraries.

Also: any help in writing more tutorial -- especially in form of How-to articles -- would be acutely beneficial and appreciated.

Related Blogs

(by Author (topics))

Powered By

Powered by Thingamablog,
Blogger Templates and Discus comments.

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.