Saturday, December 06, 2008

Typed Access API tutorial, part I

So far I have mentioned "Typed Access API" a few times over past blog entries. As in, probably often enough to irritate; at least given that it has mostly been just namedropping. But this is about to change: I will try to give a simple overview of common usage of the new API.

But first things first: API itself consist of not much more than 2 new interfaces:

  • org.codehaus.stax2.typed.TypedXMLStreamReader
  • org.codehaus.stax2.typed.TypedXMLStreamWriter

(both of which are implemented by matching Stax2 XMLStreamReader2 and XMLStreamWriter2 main-level interfaces)

And while there are plenty of methods in there, this is just due to combinatorial explosion due to different data types, structured types (int vs int array), and xml oddities (element vs attribute).

Given this conceptual simplicity (if not brevity), tutorials do not get too lengthy. Still, there's nothing quite as nice as bit of cut'n pastable code to get one started, so let's get coding.

This first tutorial focuses on so-called "simple" types: simple is defined as types supported other than array and binary types. The latter will be covered on follow-up entries.

1. Writing simple values

Let's first try outputting following simple data:

<entries>
  <entry id="1234">
<active>true</active>
<value>10.00</value>
  </entry>
</entries>

it could be done by:

  StringWriter sw = new StringWriter();
  TypedXMLStreamWriter tw = (TypedXMLStreamWriter) 
  XMLOutputFactory.newInstance().createStreamWriter(sw);
  tw.writeStartDocument();
  tw.writeStartElement("entries");
  tw.writeStartElement("entry");
  tw.writeIntAttribute(null, null, "id", 1234);
  tw.writeEndElement(); // /entry
  tw.writeStartElement("active");
  tw.writeBoolean(true);
  tw.writeEndElement();
  tw.writeStartElement("value");
  BigDecimal value = ...; // BigDecimal to keep exact decimal value (no rounding probs)
  tw.writeDecimal(value);
  tw.writeEndElement();
  tw.writeEndElement(); // /entries
  tw.writeEndDocument();

(for a more convenient way, I always recommend StaxMate helper lib -- but that'd lead to another blog entry so for now we'll just use "raw" Stax2 API)

There are also couple of more types in there: about the only 'advanced' simple type included is QName: which can be used to write properly namespaced qualified names; at least if the stream writer is in namespace-repairing mode which allows for automatic namespace declarations to be added by writer.

2. Reading simple values

Typed writing seems like a minor incremental improvement, nothing too drastic. Without type support you would just convert values to Strings; for example:

  tw.writeCharacters(String.valueOf(intValue));

would be functionally equivalent, although less efficient way to achieve the same.

Reader-side is where the action mostly is, since code will be more compact as well as more readable.
So, to read content written by code above, we could use something like:

  String docContent = sw.toString();
  TypedXMLStreamReader tr = (TypedXMLStreamReader) 
  XMLInputFactory.newInstance().createStreamReader(new 
  StringReader(docContent));
  tr.nextTag(); // to point to <entries>
  tr.require(XMLStreamConstants.START_ELEMENT, "", "entries"); // optional check
  tr.nextTag(); // to point to <entry>
  int id = tr.getAttributeAsInt(0); // or: 
  getAttributeAsInt(tr.getAttributeIndex(null, "id"))
  tr.nextTag(); // to point to <active>
  boolean isActive = tr.getElementAsBoolean();
  tr.nextTag(); // to point to <value>
  BigDecimal value = tr.getElementAsDecimal();
  tr.nextTag(); // closing </entry>
  tr.require(XMLStreamConstants.END_ELEMENT, "", "entry"); // optional check
  tr.nextTag(); // closing </entries>

3. So what's the Big Deal?

Ok, so code above is slightly simpler than the alternative: for example, instead of:

  int id = tr.getAttributeAsInt(0);

we could have used:

  int id;
  String value = tr.getAttributeValue(0);
  try {
    id = Integer.parseInt(value);
  } catch (IllegalArgumentException iae) {
    throw new XMLStreamException("value '"+value+"' not an int", tr.getLocation());
  }

(unless we are happy with a random IllegalArgumentException being thrown and can leave out try-catch block; but that will also lose contextual info on where in content problem occured, or with what input -- which is usually not the case)

But maybe added convenience is not that huge: most developers by now have written their own utility methods. There are still other benefits even just for these simple types (we'll cover benefits of non-simple types later on; they are more plentiful):

  • As implied above, proper exception handling is a plus: typed parser can provide more information about the actual problem (location, underlying data to convert)
  • Typed Access API is based on XML Schema Datatype: typing system very similar to Java type system, but not identical. Thus, Typed Access API will work better with other systems based on XML Schema Datatype than using basic JDK parsing/decoding methods. This improves interoperability.
  • Typed Access API methods can be (and in case of Woodstox, are) more efficient than DIY alternative. Based on initial testing, processing throughput can increase significantly even for simplest of types (like booleans, ints): currently by up to 20 - 30
  • Code is bit more readable, since methods explicitly state what is expected

4. Next Steps

Ok so far so good. But let's consider this a warm-up act before moving to "advanced" types: arrays and binary content; as well as custom decoding.

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.