Wednesday, September 23, 2009

JSON data binding performance: Jackson vs Google-gson vs BerliOS JSON Tools

UPDATED: see a more up-to-date version here

Earlier I have published some results on performance of "simple" JSON parsing -- simple meaning that processing is manual, to allow for processing JSON using wide variety of Java+JSON tools available. This includes processors from ultra-fast streaming processors (like Jackson) all the way to "good old JSON.org" parser. But it also excluded at least one potentially good tool (google-gson), since it requires "untyped" access, ability to traverse arbitrary JSON structure for testing.

Also: more and more access is nowadays done using a more convenient class of tools, called data binding (or mapping; or sometimes serialization) tools (libraries, packages). In such cases application just asks library to convert JSON to a Java Object (or vice versa), and that's about it. Very convenient; especially for strongly typed web services.

So, with that background, let's see what are performance characteristics of available tools.

1. JSON Data Binding: Contestants

Now, list of tools that allow doing is somewhat limited: I am aware of following:

Given that all of them can do conversions with similar ease (at least for simple Java types), is there much difference in performance? To figure this out, I will be using somewhat incorrectly named StaxBind (really, it should be renamed PojoBind or something) sub-project of Woodstox. Data to bind is a simple rendition of tabular data, with List of beans that contain personal information (name, address and so on); document size (for this test) being about 2 kilobytes.

2. Results!

And yes, indeed, results look vaguely familiar (see here, for example). Considering the "bigger is better" aspect -- value measured, "tps", is number of documents read, written, or read-modify-written per second -- difference from slowest (google-gson) to fastest (Jackson) is a solid order of magnitude.

Data Binding Performance Graph

Looks like Jackson still the King of JSON, regarding processing speed -- and by ridiculously high margin too... If you are already a Jackson user, you may want to congratulate yourself on choosing a very efficient (even green! save those cycles!) tool. A pat on your back might be warranted as well. To put performance in perspective; being able to read ten thousand 2k documents per second (throughput of about 20 megabytes per second), on an almost obsolete AMD Athlon based PC (my home PC) is not too shabby; and all this without little if any glue code.

Actually, as you can see, there is one (and only one!) thing faster than Jackson Data Mapper: "raw" hand-written data mapper. And even that is just a bit faster; probably only worth the extra hand-written code for high-volume use cases, or where number of POJO types is very limited.

3. Some details

Given the big difference in perceived performance, avid readers might be interested in reproducing results, or at least perusing source code. All code is within "staxbind" module in the primary Codehaus Woodstox SVN repository., and author (me!) can be contacted for more details (for some reason Codehaus interface makes access sometimes bit harder than needs be), questions and suggestions.

But there is nothing particularly complicated about code; here's how core methods for tested packages actually look like (interfaces are defined by StaxBind package itself; template T translates to "DbData" (POJO type)).

3.1 Jackson test code

Jackson code is simplest of alternatives, as it supports direct streaming access

public class StdJacksonConverter extends StdConverter
{
ObjectMapper mapper = new ObjectMapper();
//...
public T readData(InputStream in) throws IOException {
return _mapper.readValue(in, _itemClass);
}    
public int writeData(OutputStream out, T data) throws Exception {
JsonGenerator jg = _jsonFactory.createJsonGenerator(out, JsonEncoding.UTF8);
_mapper.writeValue(jg, data);
jg.close();
return -1;
}
}  

3.2 Json-tools test code

Test code here needs a couple of more lines, since there is no way to directly go from POJOs to stream/String and back. But nothing excessive.

public class StdJsonToolsConverter extends StdConverter
{
final JSONMapper _mapper = new JSONMapper();
//...
public T readData(InputStream in) throws Exception {
// two-step process: parse to JSON value, bind to POJO
JSONParser jp = new JSONParser(in);
JSONValue v = jp.nextValue();
return (T) _mapper.toJava(v, _itemClass);
}
public int writeData(OutputStream out, T data) throws Exception {
JSONValue v = _mapper.toJSON(data);
String jsonStr = v.render(false);
OutputStreamWriter w = new OutputStreamWriter(out, "UTF-8");
w.write(jsonStr);
w.flush();
return -1;
}
}

3.3 Google-gson test code

This test code is bit shorter than Json-tools one, since package does not use intermediate tree form. Surprisingly this does not seem to translate to better performance, as the package ends up taking its time doing conversions. On positive note, there should be plenty of room for improvement in this area...

public class StdGsonConverter extends StdConverter
{
final Gson _gson = new Gson();

public T readData(InputStream in) throws IOException {
return _gson.fromJson(new InputStreamReader(in, "UTF-8"), _itemClass);
}

public int writeData(OutputStream out, T data) throws Exception {
OutputStreamWriter w = new OutputStreamWriter(out, "UTF-8");
this._gson.toJson(data, w);
w.flush();
return -1;
}
}

Saturday, September 19, 2009

BAM Sandwich (Bacon, Avocado, Mayonnaise)

Many people are familiar with the "BLT Sandwich" -- an acceptable if mediocre bread-based culinary construct that mixes good stuff (Bacon) with acceptable (Tomato) and trivial (Lettuce).

But why settle for such a mediocre concoction? Shouldn't there be a sandwich that focuses on essentials and provides more (ful)filling eating experience? And complements a good drink of beer exquisitely?

1. Wham! BAM!

Yes. There should. And -- more importantly -- there is!

That is why I feel duty-bound to declare the invention of a new sandwich: Bacon-Avocado-Mayonnaise Sandwich, henceworth known as the "BAM Sandwich". It not only tastes awesomer-er than a can of Brawn-do, but also sounds kick-ass.
(... and it's got the elecrolytes your muscles crave!)

2. How?

Here's how you can construct a tasty instance of (Tatu's World-Famous) BAM Sandwich:

  1. Fry the bacon on a frying pan or skillet; move to side once crisp (can rest on a paper towel). NOTE: Do NOT throw away the melted tasty fat! (see step 3).
  2. Make the avocado spread; use ripe avocados, mix with bit of lemon juice, salt, pepper (white or black) or tabasco, and optionally some sour cream. Result is essentially something between mashed avocado and guacamole, depending on your preference.
  3. Fry 2 slices of toast (regular cheap-o sliced bread; or, for bonus points, olive-oil-rosemary or potato bread) using the bacon fat from step 1; preferably use the same frying pan or skillet as you used for frying bacon.
  4. Construct the sandwich:
    1. Start with one fried slice of bread
    2. Spread some Avocado spread on the slice
    3. Stack (as much) bacon (as you want) on top of avocado spread
    4. Spread some mayonnaise on the other slice
    5. Add the other slice on top of bacon, mayonnaise side facing bacon

Once properly constructed, enjoy with a good glass of your favorite beverage; like a frothy pint of Hefeweizen such as Blue Moon (lighter beers seem to go better with somewhat dense food like BAMwich!).

3. Random helpful preparation notes

Here are some additional notes on preparing a Good Solid BAM:

  • Bacon must absolutely be fried fully crisp. Floppy bacon does not a proper BAM sandwich make!
  • You can use as many bacon strips per sandwich as you want, but minimum is 3 strips for an adult male. For ladies, the legal minimum limit is 2. Small kids are not allowed to touch this tasty treat (ours do not even like it! Those ungrateful little...) -- in fact, a rule of thumb is that if you can't drink the accompanying beer, you are not to eat the sandwich.
  • If you absolutely must (by direct doctor's order, for example) reduce your saturated fat intake, you can be given exemption from having to fry the toast: regular toasting can be accepted as a low-fat alternative. But note: if you do this for anything but life-threatening medical reasons, you will totally lose your man-food street cred and run the risk of growing pair of bunny ears.

Feel free to share your additional tips, in form of comments, backtracks or emails (heck, even clicking on an ad you can see near this entry counts as a useful additional tip! Har har, I'll be here all week, thank you very many!)

4. Musings on Food Terminology and Coining of new Phrases

Due to high degree of compatibility between BAM and optimal male diet & taste, I think this soon-to-be classic sandwich can be called a manwich, if there is such a word. And if there is no such word yet, there will be.

5. Unhelpful side note

It has been brought to my attention that someone has previously tried to tie acronym "BAM" with sequence of words "Bacon, Arugula and Mango". Yuck. If there is any justice in the universe, person responsible for such disgrace is forced to eat his or her own dog food, in substantial quantities.
And with no Blue Moon to wash it down with.


Friday, September 18, 2009

Typed Access API tutorial, part III/b: binary data, server-side

(note: this is part B of "Typed Access API tutorial: binary data"; first part can be found here)

1. Server-side

After implementing the client, let's next implement matching sample service that simply reads all files from a directory and creates download message that contains all files along with checksums for verifying their correctness (in real use case, those would probably be pre-computed). Simplest way to deploy service is as a Servlet-based web application; a single class and matching web.xml will do the trick.

Resulting code is meant to just show how (relatively) simple handling of binary data is -- obviously a real client and service would have much more checking for error cases, as well as for authentication, authorization, namespacing to avoid collision and so on.

Full source code can be found from Woodstox source code repository (see 'src/samples/BinaryService.java') but here is the beef:


    public void doGet(HttpServletRequest req, HttpServletResponse resp)
        throws IOException
    {
        resp.setContentType("text/xml");
        try {
            writeFileContentsAsXML(resp.getOutputStream());
        } catch (XMLStreamException e) {
            throw new IOException(e);
        }
    }

    final static String DIGEST_TYPE = "SHA"; 

private void writeFileContentsAsXML(OutputStream out) throws IOException, XMLStreamException { XMLStreamWriter2 sw = (XMLStreamWriter2) _xmlOutputFactory.createXMLStreamWriter(out); sw.writeStartDocument(); sw.writeStartElement("files"); byte[] buffer = new byte[4000]; MessageDigest md; try { md = MessageDigest.getInstance(DIGEST_TYPE); } catch (Exception e) { // no such hash type? throw new IOException(e); } for (File f : _downloadableFiles.listFiles()) { sw.writeStartElement("file"); sw.writeAttribute("name", f.getName()); sw.writeAttribute("checksumType", DIGEST_TYPE); FileInputStream fis = new FileInputStream(f); int count; while ((count = fis.read(buffer)) != -1) { md.update(buffer, 0, count);
// note: can write separate chunks without problems sw.writeBinary(buffer, 0, count); } fis.close(); sw.writeEndElement(); // file sw.writeStartElement("checksum"); sw.writeBinaryAttribute("", "", "value", md.digest()); sw.writeEndElement(); // checksum } sw.writeEndElement(); // files sw.writeEndDocument(); sw.close(); }

As with the client, there really isn't anything too special here. Just the usual service, with bit of Stax2 Typed Access API usage.

I briefly tested this by bundling it up as a web app (if you want to do the same, run Ant target "war.samples" in Woodstox trunk), running web app under Jetty 6.1, and accessing from both web browser and via BinaryClient class. Worked as expected right away (which, granted, was somewhat unexpected... usually there are minor tweaks needed, but not today).

2. Output

Just to give an idea of what results should look like, here's what I can see when download a single file (run.sh):


<?xml version='1.0' encoding='UTF-8'?>
<files><file name="run.sh" checksumType="SHA">IyEvYmluL3NoCgojIExldCdzIGxpbWl0IG1lbW9yeSwgZm9yIHBlcmZvcm1hbmNlIHRlc3RzIHRv IGFjY3VyYXRlbHkgY2FwdHVyZSBHQyBvdmVyaGVhZAoKIyAtRGphdmEuY29tcGlsZXI9IC1jbGll bnQgXApqYXZhIC1YWDpDb21waWxlVGhyZXNob2xkPTEwMDAgLVhteDQ4bSAtWG1zMTZtIC1zZXJ2 ZXJcCiAtY3AgbGliL3N0YXgtYXBpLTEuMC4xLmphcjpsaWIvc3RheF9yaS5qYXJcCjpsaWIvbXN2 L1wqXAo6bGliL2p1bml0L2p1bml0LTMuOC4xLmphclwKOmJ1aWxkL2NsYXNzZXMvd29vZHN0b3g6 YnVpbGQvY2xhc3Nlcy9zdGF4MlwKOnRlc3QvY2xhc3NlczpidWlsZC9jbGFzc2VzL3Rvb2w6YnVp bGQvY2xhc3Nlcy9zYW1wbGVzXAogJCoK</file><checksum value="qAZIQ6GDUJYRgiubW/H+5GZaWg0="/></files>

3. More to known about Base64 variants

One more thing to note is the existence of multiple slightly incompatible Base64 variants (see "URL Applications" section). So which one does Typed Access API use?

The one you define it to use, of course! Stax2 API actually allows caller to specify the variant to use -- sample code just happens to use the default variant (i.e. uses methods that just call alternatives that do take a Base64Variant argument). Stax2-defined Base64 variants (from class 'org.codehaus.stax2.typed.Base64Variants') are:

  • MIME: this is what is usually considered "the base64" variant: uses default alphabet, requires padding, and uses 76-character lines with linefeed for content. This is the default variant used for element content.
  • MIME_NO_LINEFEEDS is similar to MIME, but does not split output in lines -- this is the default variant used for attribute values (due to verbosiveness caused by encoding linefeeds in XML attribute values)
  • PEM is similar to MIME, but mandates shorter (60 character) line length
  • MODIFIED_FOR_URL: uses alternate alphabet (hyphen and underscore instead of plus and slash), does not use padding or line splitting.

And these are all implemented by Woodstox. In addition, one can use custom encodings by implementing custom Base64Variant object and passing that explicitly to base64-binary read- and write-methods.

4. Performance?

Beyond simple usage shown so far, what more is there to know about handling binary data?

One open question is performance: how much faster is Typed Access API, compared to using alternatives like XMLStreamReader.getElementText() followed by decode using, say, JakartaCommons' base64 codec. There are no numbers yet, but producing some will be one of high priority items on my "things to research for Blog" list.

Tuesday, September 08, 2009

Typed Access API tutorial, part III/a: binary data, client-side

(author's note: oh boy, this last piece of the "Typed Access API series" has been long coming -- apologies, and "better late than never")

Now that we have tackled most of the Stax2 Typed Access API (reading and writing simple values, arrays), let's consider the last remaining part: that of reading and writing base64-encoded binary data. For this installment, let's implement a simple web service that can be used for downloading files, as well as client to use that service.

Use of XML for such purpose may seem bit contrived, but there are other valid use cases for binary-in-xml (even if the example wasn't): for example, it may well make sense to embed small images (like icons), digital signatures, encryption keys and other non-textual data within documents. Sometimes convenience of inlining binary content within message is worth the modest overhead (base64 imposes +33% storage overhead, and similar processing overhead).
For example, in our example, we can embed multiple files with associated metadata quite easily without having to split the logical document. But both client and server can still handle files one-by-one with streaming interfaces, meaning that memory usage need not grow without bounds.

Finally, unlike many other xml processing packages, Woodstox does not cut corners when it comes to processing efficiency: base64 processing implementation is a significant improvement over using existing third-party base64 codes on other processing APIs (regular SAX, Stax or DOM).

So much for the philosophic part of why to use (or not to use) xml. Let's have look at a simple implementation to show binary content handling pieces that we need, along with a bit of glue to make example code work.
(note: source code is also accessible)

1. Message format

Here is the simple xml message format we will be using:

  <files>
<file name="test.jpg" checksumType="SHA">... base64 encoded content ...</file>
<checksum value="...base64 encoded hash of content..." />
<!-- ... and more files, if need be... -->
</files>

That is, a single message contains one or more files, each with associated checksum. Checksym is used to verify that contents were passed unmodified (as opposed to being corrupted by transfer). Simple but functional.

2. Client-side

So let's start with sample client code; code downloads bunch of files from the service (for now assuming URL determines set of files we'll get with some criteria).

For this example we will just use the regular http client that JDK comes equipped with (which actually works pretty well for many use cases -- for others, Jakarta httpclient is the cat's meow).
Full source code can be found at Woodstox SVN repository (under 'src/samples') but here's the interesting Client method:


public List<File> fetchFiles(URL serviceURL) throws Exception
{
  List<File> files = new ArrayList<File>();
URLConnection conn = serviceURL.openConnection(); conn.setDoOutput(false); // only true when POSTing conn.connect(); // note, should check 'if (conn.getResponseCode() != 200) ...' // Ok, let's read it then... (note: StaxMate could simplify a lot!) InputStream in = conn.getInputStream(); XMLStreamReader2 sr = (XMLStreamReader2) XMLInputFactory.newInstance().createXMLStreamReader(in); sr.nextTag(); // to "files" File dir = new File("/tmp"); // for linux... byte[] buffer = new byte[4000]; while (sr.nextTag() != XMLStreamConstants.END_ELEMENT) { // one more 'file' String filename = sr.getAttributeValue("", "name"); String csumType = sr.getAttributeValue("", "checksumType"); File outputFile = new File(dir, filename); FileOutputStream out = new FileOutputStream(outputFile); files.add(outputFile); MessageDigest md = MessageDigest.getInstance(csumType); int count; // Read binary contents of the file, calc checksum and write while ((count = sr.readElementAsBinary(buffer, 0, buffer.length)) != -1) { md.update(buffer, 0, count); out.write(buffer, 0, count); } out.close(); // Then verify checksum sr.nextTag(); byte[] expectedCsum = sr.getAttributeAsBinary(sr.getAttributeIndex("", "value")); byte[] actualCsum = md.digest(); if (!Arrays.equals(expectedCsum, actualCsum)) { throw new IllegalArgumentException("File '"+filename+"' corrupt: content checksum does not match expected"); } sr.nextTag(); // to match closing "checksum" } return files; }

Much of the code deals with connecting to the service; actual access is rather simple; only complexity comes from streamability of API (i.e. you read chunks of binary data, instead of reading the whole thing).

What is left, then, is the server side... which will follow shortly (I swear, won't take months this time)

Thursday, September 03, 2009

Mo' On Cowtalk: 100 Entries per year; Even/Odd rule; rate me!

First things first: as you may have noticed, it is now possible to rate Blog entries here (thanks Haloscan for this update). Feel free to rate articles -- this will help me get a feel as to what works and what doesn't. Besides, it should be much quicker to click the star icon than write even a short comment.

1. Goal: 101 articles per year!

After tallying up number of entries I have written so far this year (about 80), I realized that I might be able to get total annual count to 3 digits. So just to improve odds that I actually will reach that goal, here's my public pledge: I will write more than 100 entries this year. As any good software engineer would document this, I'll add "write 101 blogs" on my TODO list. :-)

2. Limiting Fillers: Even/Odd rule

Sort of related to above: I will also do my best to follow what I call the "even/odd rule". From now on, I will try write at least half of the articles on "hard" (as opposed to soft) topics: technical subjects; coding, design, architecture, or things directly related to such topics. This is the "even" part. The rest ("softies") can be about fluffier stuff; be that related to food, music or human insanity -- that's the "odd" part. Hopefully this will balance competing goals of writing many entries (more fluff) with strive for technical relevance (approximation of quality); technical articles take longer to write, since they often entail having to write sample code or so measurements. Plus it's often more fun to write light tongue-in-cheek (or, foot in the mouth...) material; and fundamentally just keeps me off trouble.

With respect to technical entries: what makes this little bit more challenging is that I have started to write more and more on FasterXML documentation Wiki (esp. Jackson documentation -- check it out!). Nonetheless, until FasterXML gets an "official" Wiki, this will be my main recreational (and, possibly, technical) writing media (only rivalled by my drivels on mailing lists).

Anyway: I am always looking for more feedback on my writing; so beyond starting to rate entries, please continue adding comments. They have been very useful, and hopefully will continue to be in future.

Tuesday, September 01, 2009

Project B3 (BlackBerry Beer) 2009

Some of you may have heard of my last beer-related project, BlackBerryBeer 2008. Due to unfortunate logistical problems (aka my laziness), the project failed. Year changing to 2009, making project name obsolete, did not help either. In fact, the project failed bad enough so that it is one of very very few projects I don't even list on my Monster resume.

Project B3 2008 Post-Mortem

As any good followers of the Cult of Process, we (the "project team") decided to have a retrospective on what went wrong with the project.

Post-mortem findings include following insights:

  • It is necessary to pick the blackberries before snow falls: failure to do so will make succesful completion impractical, perhaps even impossible. (sidenote: this was the fatal blow to B3 2008! Triple Boo for snow!)
  • Figuring out details like who to use for actual manufacturing (and at what cost) is important. Turns out these things don't just magically sort themselves out (Double Boo for things that don't take care of themselves)
  • Concrete plan is needed for storing resulting half a cask of custom-crafted beer -- our fridge can not contain more than maybe a dozen sixpacks, which is less than half a cask. It was pointed out that it is possible to alleviate this problem a little by drinking more beers faster; and that fortunately our home has multiple bathrooms to help with resulting logistics problem of excreting excess urinary liquids
  • Blackbery bushes have nasty thorns; and resulting bruises heal slowly (Boo for thorny plants!)

B3 2008 is Dead, Long Live BlackBerryBeer 2009 Project!

On positive note, it was also determined that these problems will be overcome with this next-generation project. After all, the stakeholders are now exceedingly thirsty; the blackberry season is not yet over in the grand state of Washington; and all the itchy blackberry wounds have been fully healed by now (in fact, some new ones have been gained to further the goal picking the dang berries; further strengthening our resolve for succesful outcome!)

Project Goals

One good thing about having a failed project is that usually you can reuse much of planning material; generally goals are reusable, sometimes even secondary artifacts like resourcing and scheduling.
This is the case here: goals have not changed a bit. We still want to:

  • Produce a batch of unique beer using some local ingredients (this is where Blackberries come into picture: after all, the only other plentiful local resource -- rain -- is not a particularly recognizable ingredient in the end)
  • Without having to handle the brewing part (as students we did this part -- it's fun, but only first couple of times; and we are well beyond that!)

We are confident that these goals will be met by the project; similar to how we were confident last year (turns out that optimism is, too, recycable! Hooray for optimism!)

The Plan

(note: plan hand-translated from our PM's MS Project Diagram)

  1. Pick the berries (use of child labor approved, maybe even encouraged -- kids don't fear thorns that blackberry bushes use for their protection; and are slightly easier to control than the other commonly encountered creature [Ursus Americanus] with known good blackberry picking skills)
  2. Find a micro-brewery that can brew small batches (half a cask?) for reasonable prices
  3. Procure other ingredients if need be
  4. Bring the stuff to the brewmeister
  5. Wait for craftsmen to brew the magic
  6. Bring The Beer Home!
  7. Drink! Smile! Have Fun!

Apologies for not having a flashy Flash version of the plan. If you want to see a flashier plan, try drinking enough vodka to make the list above spin and bounce on your computer screen (hint: wear 3D glasses for extra fun!)

Current State of the Project: Green (with Slight Chance of Yellow?)

So: although total collected blackberry harvest is still somewhat below required level (dang -- we also need to figure out what that level is!), we are confident that the end result will be enjoyable to drink, and going to be such enjoyed during year of 2009.

One more positive lack of development: we still haven't run across a single bottle of blackberry beer (although there are some Wild Internet Rumors that hint at possibility of future sightings). This is different from many other flavors of fruit beer: our project team has already field-tested multiple brands of blueberry beer, at least one tasty brand of strawberry beer (hi there Strawberry Blonde! Call me!); and of course the always-good Pyramid Apricot Ale.

What Could Possibly Go Wrong?

Ok ok ok. That's a so-called rhetorical question. You can stop listing suggestions now ("I find your lack of faith disturbing!").

Post Scriptum

Once a decent batch of B3 is ready (in 2 months? Just in time for Thanksgiving!), volunteer beer drinkers may be needed. Our project team is thirsty, but even our bladders have limits. More info will be forthcoming if and when reinforcements are needed.

Stay Thirsty! And download responsibly!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.