September 2006

After taking almost a full year to get 3.0 release out, it took less than 2 months to get the first release candidate of Woodstox 3.1 out. Since this is the first "minor" release (there have been three major releases, and multiple patches, but only "pre-release minors", like 2.9), it may be interesting to know what is going on with Woodstox development.

The short answer is that the features of this release (described below) were ones that optimally would have made it into 3.0 release, but that needed more internal changes than dev team was prepared for during the end of 3.0 cycle. On the other hand, these changes did not need major API changes. This made them good candidates for being implemented right after 3.0, but before full (possibly backwards incompatible) 4.0 release.

So what are the new and improved features? There is only one completely new feature:

Xml:id support was added. This allows for properly identifying the unique identifier of an element, without needing a DTD to specify attribute types. This is something that just makes sense, so I am happy it got finally added, and is enabled by default (check out XMLStreamProperties for the property to disable support, if for some weird reason that is needed)

In addition, 3 existing features were improved:

XMLStreamWriter was ignoring namespace declaration calls, as well as prefixes for namespace URIs that were already bound: after changes repairing writer tries to honor prefix, and write out namespace bindings suggested by calls to XMLStreamWriter.writeNamespace(). This change was done to make it even more tempting use the repairing mode: it now does almost the same as non-repairing mode in most cases, and only uses different prefixes and adds automatic namespace bindings when it has to. Bit like having a well functioning automatic shifting.
XMLStreamReader was only reporting SPACE in validating mode, but not in regular DTD-aware mode. Not any more: SPACE is now reported whenever DTD is handled, independent of validation.
Missing validation checks for handling xml:space attribute were added: now it will verify type and enumrated values, in DTD-validating mode.

Additionally, the release will obviously contain all the fixes from 3.0.x maintenance branch.

So, what's next with Woodstox? There are 2 main alternatives: another minor release (to perhaps contain one or both of the most often requsted missing features: native XMLStreamWriter indentation [which, by the way, can be done using either StaxMate, or stax-utils, already], and W3C Schema Validation), or going straight to 4.0 development. Latter will mean some non-backwards - compatible changes (such as requiring JDK 1.4), but also allow actual API changes, and new version of Stax2 extension API.

ps. Regarding stable/maintenance branch, 3.0.2 patch was also released: it fixes couple of user reported problems.

Posted by Tatu Saloranta at Thursday, September 28, 2006 10:09 PM
Categories: Java, XML/Stax
| Permalink |Comments | links to this post

The second part of "introduction to StaxMate" needed before showing how to actually use it, in the context of an actual web application, after introducing the reader (input) side, is obviously introducing the writer (output) side.

In some ways the writer side is simpler than the reader side: there is only one general abstraction, that of "output entities", as defined by SMOutputtable abstract base class. These output entities (sub-classes of SMOutputtable) fall into 2 main categories: output containers that can contain other entities (elements, fragments, and the document entity belong to this class), and leaf entities (text, comments, processing instructions, entity references). Although you conceptually create both types, you will only directly manipulate the former: StaxMate will construct entities of latter type if and as they are needed: most of the time they can be directly output, and there is no need to even instantiate these entities.

In addition to the output entities, namespaces are handled as first-class objects (but very simple ones, from application point-of-view): this is done both for design (cleaner to use, cleaner to implement) and performance reasons (StaxMate can handle namespace bindings very efficiently by using canonical namespace objects).

About the only advanced things beyond outputting things is the possibility to temporarily "freeze" (buffer) output; and this is only needed if out-of-order addition of output entities is needed. More on this later on.

Finally, StaxMate output functionality also adds one often requested feature that is missing from core Stax API: ability to intend ("pretty-print") XML output using simple heuristics. It can be enabled on per-container basis, and is inherited as expected: thus, you can control indentation as much or as little as you want: but most of the time you will just want to enable it, and forget about it.

Having said all of that, here are typical steps one takes when outputting XML content using StaxMate.

1. Getting started

Just like with StaxMate input cursors, the first thing is to instantiate the underlying XMLStreamWriter needed, i.e. something like:

  XMLOutputFactory f = XMLOutputFactory.newInstance();
  XMLStreamWriter sw = f.createXMLStreamWriter(new 
  FileOutputStream("mydoc.xml", "UTF-8"));

2. Create the root-level container, configure

Somewhat similar to needing to create the root-level cursor, you need to create a root-level output container. Here you have two choices: an output document and output fragment. Former is used if you want to output a whole xml document, latter if you need to be able to output part of xml output (for example if you are outputting surrounding document structure using XMLStreamWriter outside of StaxMate). You can also enabled and configure indentation at this point:

  SMOutputDocument doc = SMOutputFactory.createOutputDocument(sw, "1.0", 
  "UTF-8", true);
  // Defines linefeed to use, spaces for indentation (from 1, step by 1)
  doc.setIndentation("\n ", 1, 1);

3. Start outputting things

And then we are ready to start composing the output document:

  SMOutputElement root = doc.addElement("root");
  // But the leaves will be (suggests prefix "ns", defines URI)
  SMNamespace ns = root.getNamespace("http://myns", "ns");
  root.addElement(ns, "leaf"); // empty one
  root.addElement(ns, "leaf").addCharacters("leaf text");
  SMOutputElement leaf3 = root.addElement(ns, "leaf3");
  leaf3.addAttribute("id", "leaf3"); // no-namespace attribute
  leaf3.addComment("ad space for sale!");
  // ... and so on
  root.addComment("end of content");
  /* This is important: root-level element MUST be closed,
  * otherwise StaxMate can not determine when output is done
  */
  doc.closeRoot();

4. The Limitation

Just like with the reader side, the main limitation is that since output is done in streaming way, output will have to be done in the document order: parent before children, and attributes after the element itself, but before children.

But also just like with the reader side, there is a way to relax this requirement a little, by using a feature that trades in added memory usage with some added comfort. This is called "buffering", and is explained next...

5. Advanced Feature(s)

As mentioned above, about the only advanced feature on the output side is the ability to "freeze" output. This is done by using "buffered" output entities, and the reason to use them is to allow out-of-order output of xml content. Or, rather, allow temporary buffering of content during assembling of all the output entities, so that the underlying stream writer can be fed all the output entities in the strict document order. StaxMate can handle all this smoothly, given just two things:

Create special entities that are to be buffered (and whose contents will also be buffered, as long as the containing entity itself is)
Tell StaxMate when buffering is no longer needed ("release") for these items -- you only need to indicate the buffered entity is to be released, and all of its 'normal' contents will also be released.

The first step is done using one of two methods that all containers (documents, fragments, elements) have:

  createBufferedFragment()
  createBufferedElement()

which results in creating of either a fragment, or element, in which other entities can be added using normal output container add methods. These resulting entities can then be added (possibly later, after doing other non-buffered output), either as buffered, calling:

  container.addBuffered(bufferedEntity);

and releasing (when all content to be added out-of-order has been added):

  bufferedEntity.release();

or, if both steps can be combined, calling:

  container.addAndReleaseBuffered(bufferedEntity);

This latter method obviously assumes that you have added other things within the buffered entity -- if not, you could have just created a normal element or fragment and added it.

6. Next Steps

Ok: after this crash course into basics of using StaxMate, we are about ready to start writing the UUID Generator Web Application!

Posted by Tatu Saloranta at Friday, September 01, 2006 10:46 PM
Categories: XML/Stax
| Permalink |Comments | links to this post

CowTalk

Moo-able Type for Cowtowncoder.com

Thursday, September 28, 2006

Woodstox 3.1 (release candidate 1) just released

Friday, September 01, 2006

StaxMate basics, writer side

Search

Last posts

Categories

Archives

Related Blogs

Powered By

About me