Although the final 3.0 release of Woodstox
has not yet been done, the API and feature set are now frozen, after the
release of the first release candidate. Because of this, now is the
perfect time to have a look at what the 3.0 release will bring, compared
to the trusty old 2.0.x version.
At high-level, main changes are:
-
Rewritten validation sub-system, along with a new validator
implementation for Relax NG.
-
Significant performance improvements to both stream reader and writer;
especially when processing small documents.
-
Further improvements to XML conformance, now XML 1.0 and 1.1
conformance is over 99% as measured by the industry standard XMLTest
conformance suite (tested using SAXTest
and SAX wrappers from stax-utils).
Specifically conformance of DTD-handling is significantly improved.
-
Significantly improved test coverage, both regarding features tested
and actual code coverage.
-
Improved interoperability; behavior unified with the Stax
reference implementation (including changes to one or both, as
dictated by accepted Stax specification interpretations); addition of
DOMSource that allows creating of XMLStreamReader from DOM tree; and
addition of UTF-32 encoding support.
-
Additional operating modes: parsing mode (tree [default], forest,
fragment); ability to handle undeclared entities gracefully (in
non-entity-expanding mode).
-
Convergence of writer and reader side functionality, by adding
features to writers that were missing (optional line number reporting,
xml warning handler, disabling of namespace handling), done in the
context of Stax2 extended API.
For even more complete picture of many of the individual changes, you
can check out the Jira
bug-tracking system used by Woodstox project; there are almost 40
resolved entries for 3.0 release.
Of the changes mentioned above, the first one may be the most
significant new feature. It also resulted in the complete rewrite of
existing (DTD) validation system. The re-designed system is now:
-
Fully pluggable:
org.codehaus.stax2.validation.XMLValidationSchemaFactory
implementations can be included similar to the way basic Stax
implementations of XMLInputFactory and XMLOutputFactory can be
included and discovered dynamically. This can theoretically allow
implementation of cross-implementation validators in future. 3.0
release includes the rewritten native DTD validator, as well as a
Relax NG validator based on Sun's Multi-Schema
Validator. There are also plans to include MSV-based W3C Schema
Validator in near future
-
Bi-directional: same validators can be used both when parsing (with
XMLStreamReader) and when serializing (with XMLStreamWriter).
-
Validators are chainable, so that a single XML event stream can be
validated against multiple validation schemas.
-
Customizable error handling: fail-fast (exception on validation
error), error-collecting, or a combination (collect up to 50 first
errors).
Related to the validation system re-design, the native DTD validator
implementation was also rewritten. The result is fully conformant DTD
validator, including all well-formedness checks reliably implemented.
Handling of the default attribute values and attribute types is now done
in DTD-aware but non-DTD-validating mode (unlike in 2.0).
As to testing, one important part is that now code coverage testing
started during 3.0 development. Coverage reached 60% - 80% (for code
lines covered to methods covered, respectively), which should be a good
start going forward. And work with StaxTest Stax conformance test suite
(which is included with the reference implementation, and also used for
ref. impl. testing) improved compatibility between Woodstox and the
reference implementation. And finally, the unit test suite specific to
Woodstox itself was significantly improvede. These changes together
suggest that the 3.0 release will be the best test release so far, and
hopefully have even fewer bugs than 1.0.x and 2.0.x releases earlier.
And the feature that many developers will hopefully find tempting as
well is the performance. It may seem unsual that the improvements in
standards compliancy could go together with performance improvements,
but this is the case for 3.0. Changes are most noticeable when dealing
with small documents, since the setup overhead (which is significantly
reduced with 3.0) is most noticeably for such documents. Serialization
side has also been optimized for the first time -- whereas 1.0 and 2.0
focused in trying to make handling correct, 3.0 development cycle
included time for performance improvements. And the results are well
worth the time spent.
All in all 3.0 release will hopefully be another big step for Woodstox:
the quality and reliability should be even better than those of 2.0. The
configurability and feature sets are improved. And all this while
increasing performance!