Tuesday, September 08, 2009

Typed Access API tutorial, part III/a: binary data, client-side

(author's note: oh boy, this last piece of the "Typed Access API series" has been long coming -- apologies, and "better late than never")

Now that we have tackled most of the Stax2 Typed Access API (reading and writing simple values, arrays), let's consider the last remaining part: that of reading and writing base64-encoded binary data. For this installment, let's implement a simple web service that can be used for downloading files, as well as client to use that service.

Use of XML for such purpose may seem bit contrived, but there are other valid use cases for binary-in-xml (even if the example wasn't): for example, it may well make sense to embed small images (like icons), digital signatures, encryption keys and other non-textual data within documents. Sometimes convenience of inlining binary content within message is worth the modest overhead (base64 imposes +33% storage overhead, and similar processing overhead).
For example, in our example, we can embed multiple files with associated metadata quite easily without having to split the logical document. But both client and server can still handle files one-by-one with streaming interfaces, meaning that memory usage need not grow without bounds.

Finally, unlike many other xml processing packages, Woodstox does not cut corners when it comes to processing efficiency: base64 processing implementation is a significant improvement over using existing third-party base64 codes on other processing APIs (regular SAX, Stax or DOM).

So much for the philosophic part of why to use (or not to use) xml. Let's have look at a simple implementation to show binary content handling pieces that we need, along with a bit of glue to make example code work.
(note: source code is also accessible)

1. Message format

Here is the simple xml message format we will be using:

  <files>
<file name="test.jpg" checksumType="SHA">... base64 encoded content ...</file>
<checksum value="...base64 encoded hash of content..." />
<!-- ... and more files, if need be... -->
</files>

That is, a single message contains one or more files, each with associated checksum. Checksym is used to verify that contents were passed unmodified (as opposed to being corrupted by transfer). Simple but functional.

2. Client-side

So let's start with sample client code; code downloads bunch of files from the service (for now assuming URL determines set of files we'll get with some criteria).

For this example we will just use the regular http client that JDK comes equipped with (which actually works pretty well for many use cases -- for others, Jakarta httpclient is the cat's meow).
Full source code can be found at Woodstox SVN repository (under 'src/samples') but here's the interesting Client method:


public List<File> fetchFiles(URL serviceURL) throws Exception
{
  List<File> files = new ArrayList<File>();
URLConnection conn = serviceURL.openConnection(); conn.setDoOutput(false); // only true when POSTing conn.connect(); // note, should check 'if (conn.getResponseCode() != 200) ...' // Ok, let's read it then... (note: StaxMate could simplify a lot!) InputStream in = conn.getInputStream(); XMLStreamReader2 sr = (XMLStreamReader2) XMLInputFactory.newInstance().createXMLStreamReader(in); sr.nextTag(); // to "files" File dir = new File("/tmp"); // for linux... byte[] buffer = new byte[4000]; while (sr.nextTag() != XMLStreamConstants.END_ELEMENT) { // one more 'file' String filename = sr.getAttributeValue("", "name"); String csumType = sr.getAttributeValue("", "checksumType"); File outputFile = new File(dir, filename); FileOutputStream out = new FileOutputStream(outputFile); files.add(outputFile); MessageDigest md = MessageDigest.getInstance(csumType); int count; // Read binary contents of the file, calc checksum and write while ((count = sr.readElementAsBinary(buffer, 0, buffer.length)) != -1) { md.update(buffer, 0, count); out.write(buffer, 0, count); } out.close(); // Then verify checksum sr.nextTag(); byte[] expectedCsum = sr.getAttributeAsBinary(sr.getAttributeIndex("", "value")); byte[] actualCsum = md.digest(); if (!Arrays.equals(expectedCsum, actualCsum)) { throw new IllegalArgumentException("File '"+filename+"' corrupt: content checksum does not match expected"); } sr.nextTag(); // to match closing "checksum" } return files; }

Much of the code deals with connecting to the service; actual access is rather simple; only complexity comes from streamability of API (i.e. you read chunks of binary data, instead of reading the whole thing).

What is left, then, is the server side... which will follow shortly (I swear, won't take months this time)

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.