Wednesday, June 17, 2009

Reading DOM documents using Stax XML parser, StaxMate

One of new features of StaxMate 2.0 is the ability to read DOM Documents (given a plain old Stax XMLStreamReader), and write DOM documents (using a Stax XMLStreamWriter). This is something no Stax parser (no, not even Woodstox!) provides, since it is in the "reverse" direction of what Stax implementation could support (reading DOM documents as Stax streams, or directing output of a stream writer into DOM document.

Functionality for converting to/from DOM is contained in class org.codehaus.staxmate.dom.DOMConverter.

To read DOM documents, you do:

  FileInputStream in = new FileInputStream("input.xml");
  XMLStreamReader sr = XMLInputFactory.newInstance().createXMLStreamReader(in);
// ... then do whatever processing (if any), and point to START_ELEMENT
// (or leave at START_DOCUMENT: that'll work too) Document doc = new DOMConverter().buildDocument(sr); in.close();

and to write DOM document:

  FileOutputStream out = new FileOutputStream("output.xml");
  XMLStreamWriter sw = XMLInputFactory.newInstance().createXMLStreamWriter(out);
// and output stuff, if need be... new DOMConverter().writeDocument(doc, sw); sw.close(); out.close();

Ok, so you can do it but why would you? Most commonly this is useful when there is need to use tree-based processing tools like XSL transformers, or access using using XPath. Ability to build smaller documents from sub-trees is crucial to limit memory usage and thereby improve performance (or make such usage possible at all).

So far this interoperability support is still quite limited; but with little bit of encouragement, following future features could be implemented:

  • Similar functionality for building JDOM trees (code actually exist, in old Woodstox "stax-utils" package, just need to clean up), and perhaps XOM, DOM4j. (for XOM, there is already NUX, however, that covers the use case)
  • Ability to directly bind things straight via StaxMate input cursors and output objects. This is an obvious improvement -- the main reason current functionality operates on "raw" Stax objects is just that code to do so existed; to use StaxMate objects, little bit more work is needed to ensure proper synchronization. One nicety from doing this would be ability to filter out non-text/non-element nodes (comments).

As usual, feel free to comment on this functionality, or join StaxMate mailing lists. I will also incorporate these code samples in StaxMate documentation page(s)i.

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.