Friday, September 18, 2009

Typed Access API tutorial, part III/b: binary data, server-side

(note: this is part B of "Typed Access API tutorial: binary data"; first part can be found here)

1. Server-side

After implementing the client, let's next implement matching sample service that simply reads all files from a directory and creates download message that contains all files along with checksums for verifying their correctness (in real use case, those would probably be pre-computed). Simplest way to deploy service is as a Servlet-based web application; a single class and matching web.xml will do the trick.

Resulting code is meant to just show how (relatively) simple handling of binary data is -- obviously a real client and service would have much more checking for error cases, as well as for authentication, authorization, namespacing to avoid collision and so on.

Full source code can be found from Woodstox source code repository (see 'src/samples/') but here is the beef:

    public void doGet(HttpServletRequest req, HttpServletResponse resp)
        throws IOException
        try {
        } catch (XMLStreamException e) {
            throw new IOException(e);

    final static String DIGEST_TYPE = "SHA"; 

private void writeFileContentsAsXML(OutputStream out) throws IOException, XMLStreamException { XMLStreamWriter2 sw = (XMLStreamWriter2) _xmlOutputFactory.createXMLStreamWriter(out); sw.writeStartDocument(); sw.writeStartElement("files"); byte[] buffer = new byte[4000]; MessageDigest md; try { md = MessageDigest.getInstance(DIGEST_TYPE); } catch (Exception e) { // no such hash type? throw new IOException(e); } for (File f : _downloadableFiles.listFiles()) { sw.writeStartElement("file"); sw.writeAttribute("name", f.getName()); sw.writeAttribute("checksumType", DIGEST_TYPE); FileInputStream fis = new FileInputStream(f); int count; while ((count = != -1) { md.update(buffer, 0, count);
// note: can write separate chunks without problems sw.writeBinary(buffer, 0, count); } fis.close(); sw.writeEndElement(); // file sw.writeStartElement("checksum"); sw.writeBinaryAttribute("", "", "value", md.digest()); sw.writeEndElement(); // checksum } sw.writeEndElement(); // files sw.writeEndDocument(); sw.close(); }

As with the client, there really isn't anything too special here. Just the usual service, with bit of Stax2 Typed Access API usage.

I briefly tested this by bundling it up as a web app (if you want to do the same, run Ant target "war.samples" in Woodstox trunk), running web app under Jetty 6.1, and accessing from both web browser and via BinaryClient class. Worked as expected right away (which, granted, was somewhat unexpected... usually there are minor tweaks needed, but not today).

2. Output

Just to give an idea of what results should look like, here's what I can see when download a single file (

<?xml version='1.0' encoding='UTF-8'?>
<files><file name="" checksumType="SHA">IyEvYmluL3NoCgojIExldCdzIGxpbWl0IG1lbW9yeSwgZm9yIHBlcmZvcm1hbmNlIHRlc3RzIHRv IGFjY3VyYXRlbHkgY2FwdHVyZSBHQyBvdmVyaGVhZAoKIyAtRGphdmEuY29tcGlsZXI9IC1jbGll bnQgXApqYXZhIC1YWDpDb21waWxlVGhyZXNob2xkPTEwMDAgLVhteDQ4bSAtWG1zMTZtIC1zZXJ2 ZXJcCiAtY3AgbGliL3N0YXgtYXBpLTEuMC4xLmphcjpsaWIvc3RheF9yaS5qYXJcCjpsaWIvbXN2 L1wqXAo6bGliL2p1bml0L2p1bml0LTMuOC4xLmphclwKOmJ1aWxkL2NsYXNzZXMvd29vZHN0b3g6 YnVpbGQvY2xhc3Nlcy9zdGF4MlwKOnRlc3QvY2xhc3NlczpidWlsZC9jbGFzc2VzL3Rvb2w6YnVp bGQvY2xhc3Nlcy9zYW1wbGVzXAogJCoK</file><checksum value="qAZIQ6GDUJYRgiubW/H+5GZaWg0="/></files>

3. More to known about Base64 variants

One more thing to note is the existence of multiple slightly incompatible Base64 variants (see "URL Applications" section). So which one does Typed Access API use?

The one you define it to use, of course! Stax2 API actually allows caller to specify the variant to use -- sample code just happens to use the default variant (i.e. uses methods that just call alternatives that do take a Base64Variant argument). Stax2-defined Base64 variants (from class 'org.codehaus.stax2.typed.Base64Variants') are:

  • MIME: this is what is usually considered "the base64" variant: uses default alphabet, requires padding, and uses 76-character lines with linefeed for content. This is the default variant used for element content.
  • MIME_NO_LINEFEEDS is similar to MIME, but does not split output in lines -- this is the default variant used for attribute values (due to verbosiveness caused by encoding linefeeds in XML attribute values)
  • PEM is similar to MIME, but mandates shorter (60 character) line length
  • MODIFIED_FOR_URL: uses alternate alphabet (hyphen and underscore instead of plus and slash), does not use padding or line splitting.

And these are all implemented by Woodstox. In addition, one can use custom encodings by implementing custom Base64Variant object and passing that explicitly to base64-binary read- and write-methods.

4. Performance?

Beyond simple usage shown so far, what more is there to know about handling binary data?

One open question is performance: how much faster is Typed Access API, compared to using alternatives like XMLStreamReader.getElementText() followed by decode using, say, JakartaCommons' base64 codec. There are no numbers yet, but producing some will be one of high priority items on my "things to research for Blog" list.

blog comments powered by Disqus

Sponsored By

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.