Thursday, June 15, 2006

New features of upcoming release 3.0 of Woodstox

Although the final 3.0 release of Woodstox has not yet been done, the API and feature set are now frozen, after the release of the first release candidate. Because of this, now is the perfect time to have a look at what the 3.0 release will bring, compared to the trusty old 2.0.x version.

At high-level, main changes are:

  • Rewritten validation sub-system, along with a new validator implementation for Relax NG.
  • Significant performance improvements to both stream reader and writer; especially when processing small documents.
  • Further improvements to XML conformance, now XML 1.0 and 1.1 conformance is over 99% as measured by the industry standard XMLTest conformance suite (tested using SAXTest and SAX wrappers from stax-utils). Specifically conformance of DTD-handling is significantly improved.
  • Significantly improved test coverage, both regarding features tested and actual code coverage.
  • Improved interoperability; behavior unified with the Stax reference implementation (including changes to one or both, as dictated by accepted Stax specification interpretations); addition of DOMSource that allows creating of XMLStreamReader from DOM tree; and addition of UTF-32 encoding support.
  • Additional operating modes: parsing mode (tree [default], forest, fragment); ability to handle undeclared entities gracefully (in non-entity-expanding mode).
  • Convergence of writer and reader side functionality, by adding features to writers that were missing (optional line number reporting, xml warning handler, disabling of namespace handling), done in the context of Stax2 extended API.

For even more complete picture of many of the individual changes, you can check out the Jira bug-tracking system used by Woodstox project; there are almost 40 resolved entries for 3.0 release.

Of the changes mentioned above, the first one may be the most significant new feature. It also resulted in the complete rewrite of existing (DTD) validation system. The re-designed system is now:

  • Fully pluggable: org.codehaus.stax2.validation.XMLValidationSchemaFactory implementations can be included similar to the way basic Stax implementations of XMLInputFactory and XMLOutputFactory can be included and discovered dynamically. This can theoretically allow implementation of cross-implementation validators in future. 3.0 release includes the rewritten native DTD validator, as well as a Relax NG validator based on Sun's Multi-Schema Validator. There are also plans to include MSV-based W3C Schema Validator in near future
  • Bi-directional: same validators can be used both when parsing (with XMLStreamReader) and when serializing (with XMLStreamWriter).
  • Validators are chainable, so that a single XML event stream can be validated against multiple validation schemas.
  • Customizable error handling: fail-fast (exception on validation error), error-collecting, or a combination (collect up to 50 first errors).

Related to the validation system re-design, the native DTD validator implementation was also rewritten. The result is fully conformant DTD validator, including all well-formedness checks reliably implemented. Handling of the default attribute values and attribute types is now done in DTD-aware but non-DTD-validating mode (unlike in 2.0).

As to testing, one important part is that now code coverage testing started during 3.0 development. Coverage reached 60% - 80% (for code lines covered to methods covered, respectively), which should be a good start going forward. And work with StaxTest Stax conformance test suite (which is included with the reference implementation, and also used for ref. impl. testing) improved compatibility between Woodstox and the reference implementation. And finally, the unit test suite specific to Woodstox itself was significantly improvede. These changes together suggest that the 3.0 release will be the best test release so far, and hopefully have even fewer bugs than 1.0.x and 2.0.x releases earlier.

And the feature that many developers will hopefully find tempting as well is the performance. It may seem unsual that the improvements in standards compliancy could go together with performance improvements, but this is the case for 3.0. Changes are most noticeable when dealing with small documents, since the setup overhead (which is significantly reduced with 3.0) is most noticeably for such documents. Serialization side has also been optimized for the first time -- whereas 1.0 and 2.0 focused in trying to make handling correct, 3.0 development cycle included time for performance improvements. And the results are well worth the time spent.

All in all 3.0 release will hopefully be another big step for Woodstox: the quality and reliability should be even better than those of 2.0. The configurability and feature sets are improved. And all this while increasing performance!

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.