Friday, November 23, 2007

W3C Schema Validation with Woodstox

To All You Schema Lovers

(... yes, both of you)

Ok, so maybe not many software developers truly love W3C Schema, deep down in their cold cold hearts. But the fact is that it may be ugly, bulky and all (and unlikely to grow into a swan too!), but it also has its uses. It is used as data typing language for things like Soap and such. Occasionally it may even be useful for its original raison d'etre, validation of xml documents. So if the earlier validation support in Woodstox (DTD, RelaxNG) was not enough, now you can finally also validate documents you read (and write!) against W3C Schemas. This is possibly with Woodstox 4.0 version, including the first pre-4.0 preview release, 3.9.0 (fresh out of oven).

So how do I use it?

If you have been following this blog for a while, you may recall that this has already been covered -- given that same Stax2 API is used for all validation, be it for reader- or writer-side validation, and whichever supported schema language, all you have to do is indicate the correct type, and validate exactly the same way as you would validate against a RelaxNG schema. For others, here are some helpful pointers:

Essentially it all boils down to these simple steps:

  1. Get a schema factory that knows how to parse W3C Schema instances
  2. Ask factory nicely to parse a schema document and return you the resulting Stax2 validation schema object (be sure to ask very nicely, otherwise it'll insist you must provide something less daft, like RelaxNG schema instead! Sending a small bottle of Gran Marnier or PayPal donation to Woodstox author might help as well)
  3. Construct the stream reader/writer, and tell it to use schema object for validation
  4. Read/write xml content; this is needed as validator gets called when content is read or written.

Which might look something like:

  XMLValidationSchemaFactory sf = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
  XMLValidationSchema vs = sf.createSchema(new URL("http://www.w3.org/schema/sample.xsd"));
  XMLStreamReader2 sr = (XMLInputFactory2) XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream("mydoc.xml"));
  sr.validateAgainst(vs);
  try {
    while (sr.hasNext()) {
      sr.next();
    }
    System.out.println("Validated ok!");
  } catch (XMLValidationException ve) {
    System.err.println("Validation problem: "+ve);
  }
  sr.close();

Or something.

And for something quite cool, try the same when you are writing xml content. Instead of just catching crap some other system sends you (by diligent validation of incoming content), how about do some more due diligence, validate your own output and avoid sending garbage to others!

Thursday, November 22, 2007

Woodstox 3.2.3 released

One more for the road...

Yet another maintenance release was done from Woodstox 3.2 branch, mostly to include one critical bug fix (other 3 changes were minor bug fixes or enchancements). For most users there is no urgent need to upgrade; the exception being users that use stream writers in namespace repairing mode. Check out release notes for full change information.

The next upcoming release should be the first pre-4.0 version, which will finally feature W3C Schema Validation option, based on MSV which was already used for RelaxNG validation.



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.