Aalto XML Processor Project

This is the work-in-progress Project Home Page for Aalto XML Processor. Direction of the project itself is discussed at Yahoo Group, and the project page is created to host intermediate artifacts available, such as prototype version of the processor.

News!

  • 13-Mar-2008: Release 0.9.2: bug fixes, implemented Stax 2 (v3.0) Typed Access API (at level comparable to Woodstox 3.9.2)
  • 13-Mar-2008: Release 0.9.1: minor fixes, improvements, non-blocking (async) parser getting more complete.
  • 21-Feb-2008: Fixed a small but important typo in Usage section below -- property names were wrong (thanks Lowell!)
  • 05-Feb-2008: Hacked together this home page, including downloadable implementation jar.

What?

Aalto XML processor is a next-generation Stax XML processor implementation. It is not directly related to other existing mature implementations (such as Woodstox or Sun Java Streaming Xml Parser), although it did come about as a prototype for evaluating implementation strategies that differ from those traditionally used for Java-based parsers.

Two main goals (above and beyond stock Stax/SAX API implementation) are:

  • Ultra-high performance parsing by making "common case fast". This may mean limiting functionality, but not correctness. XML 1.0 compliancy is not sacrificed for speed.
  • Allowing non-block, asynchronous parsing: it should be possible to "feed" more input and incrementally get more XML events out, without forcing the current thread to block on I/O read operation.
The current implementation goes a long towards both goals, proving that they are both achievable using a single implementation.

Features, dependencies

Current version supports non-validating XML 1.0 subset (minus handling of DTD subsets -- that is, entity expansion and attribute defaulting have not been implemented). Supports for Stax2 validation interface is incomplete.

Stax 1.0 API is implemented for the most part, with following main exceptions:

  • Coalescing mode not implemented (similar to Stax reference implementation)
  • Repairing mode of stream writer not implemented (regular non-repairing mode is fully implemented).
  • Non-namespace-aware mode (optional feature) is not implemented for stream writer
Completing Stax 1.0 API is a high priority goal for immediate development.

There is only one additional dependency, beyond requirement to have APIs (Stax, SAX) available either as part of JDK (6.0) or separately: since Aalto implements Stax2 API (developed as part of Woodstox project), Stax2 API jar is needed along with Aalto jar. Version 3.0 (which is part of Woodstox 4.0, and preliminary included with 3.9) should be used.

Usage

Implementation jar contains necessary service definition files (under META-INF/services directory), but it may be necessary to specify factory classes explicitly, using normal JAXP/Stax settings. System properties to use are:
  • -Djavax.xml.stream.XMLInputFactory=org.codehaus.wool.stax.InputFactoryImpl
  • -Djavax.xml.stream.XMLOutputFactory=org.codehaus.wool.stax.OutputFactoryImpl
  • -Djavax.xml.stream.XMLEventFactory=org.codehaus.wool.stax.EventFactoryImpl
  • -Djavax.xml.parsers.SAXParserFactory=org.codehaus.wool.sax.SAXParserFactoryImpl

Download

Before details of distribution (like licensing etc) have been deciced on, only binary jars are available. Implementations jars can be freely evaluated and used without restrictions; distribution to third parties is not allowed without explicit permission (it may become necessary to mirror these jars, but for now this is the download page to use).


Back to Cowtowncoder.com Hatchery page.