Thursday, May 24, 2007

New xml performance comparison: Woodstox fastest java xml parser

I bumped into another interesting new article, at xml.com, this one regarding xml parser performance: XML Parser Benchmarks: Part 1. One particular interesting feature is that this time comparison included both native and Java xml parsers, using what seemed to be fair comparison (although it'd be nice o independently verify the results, which doesn't seem possible).

Amongst the findings were:

  • Woodstox was the fastest xml conformant Java parser of the ones they measured
  • Although libxml2 (written in C) was faster than the fastest Java parser, difference was not particularly high (in my opinion)
  • Throughput for all parsers was rather high: for Woodstox sustained throughput looked something like 25 MBps, which is in line with my own measurements. So for one's typical short (in xml terms at least...) soap messages, it's possible to parse (and write) thousands of messages per second. Parsing and writing really should not be the most expensive step any more.

According to their measurement, one would pay something like 30% extra overhead for Java. To me that seems like a bargain. And on the other hand, that there is some difference also suggests it is probably a fair comparison (as opposed some of more suspicious "my language is faster than your language" comparisons): here are parts of xml processing where native code still has advantage (low level byte manipulation for character decoding, for example, memory mapping of content), so it seems reasonable there is some speed benefit. It may also be a win-win situation: those who favour using low-level languages to squeeze out last cpu cycle will find comfort in that all that tweaking with memory management and buffer handling will have some dividends. And others can feel ok with the comfort of a managed runtime environment, with modest overhead.

Tuesday, May 22, 2007

Nice new Stax 2 tutorial: "StAX the odds with Woodstox"

I was delighted to find a brand new tutorial for Stax 2 API at VSJ: StAX the odds with Woodstox, written by Sing Li. Given limited amount of documentation I have been able to produce, I think it's great to have others cover different areas, including introductionary texts. Here's hoping that more developers find features of Woodstox that extend beyond basic vanilla Stax 1.0 interface.

Friday, May 11, 2007

Java 6: finally able to access interface MAC address!

Better Late Than Never

I recently learnt about one neat addition to Java6 (thanks Taras!): java.net.NetworkInterface class now has method getHardwareAddress() (see Java6 Javadocs), which returns MAC address of Ethernet interfaces. It seems that this addition was made with little fanfare, so I wasn't even aware of it. However, it addresses a long-standing small but significant issue: previously accessing this information has required use of native code via JNI, or executing platform-dependant scripts (like Unix ifconfig)

So what? What is the low-level hardware address for my Ethernet card useful for? There are at least two important use cases (web server implementors probably have others, as well as people writing tools for dealing with network interfaces and so on):

  1. To be able to generate time/location based UUIDs one needs MAC address (see Java Uuid Generator project for more information). Jug pacakge actually has small native C-libraries (compiled on multiple platforms) to support this generation method. Other UUID generation packages (like JDK's castrated java.util.UUID) either only produce other kinds of UUIDs (JDK), or require calling application to figure address out.
  2. For per-instance/host software licenses, some sort of permanent (but not necessarily public) identity is very useful.

Although the first case is the one I personally care for most, it turns out there are probably many more developers that care about second use case. This based on feedback I have received regarding Jug: it seems there are actually more users who only use the native code for accessing Mac address than there are users who actually generate UUIDs (or is it that they just have more problems? I know I get more emails from them, at any rate). So for software packages that can count on only running on Java6 JVMs, problems is now more easily solved.

But this is also good news for UUID use case too, since now the whole UUID generation can be done using just Java: Jug, for example, has JNI accessible libraries only for couple most popular platforms. I should probably spend some time to write new version of Jug that can take advantage of Java 6, if running on it (and downgrade gracefully to JNI if not).

Finally, it is interesting to note that for some reason java.net.UUID has not been upgraded to take care of this new functionality. It is a shame, since it would be very little work to implement missing generation functionality. But for Jug I guess it's good news, given that JDK UUID varant will remain sub-optimal given it can only generate random number based UUIDs (which theoretically speaking are just fine, but which most developers are sceptical of). So it will still remain relevant even for users that do run on Java6.

Monday, May 07, 2007

StaxMate 1.0 released

After rather lengthy incubation period, I am happy to announce the first official release of StaxMate, "the perfect companion to your Stax processor". I hope that this release and the related setup of the new Codehaus home for the project will let more developers find the nifty library. As those who have read earlier entries on this blog know, I think StaxMate makes developing efficient streaming xml processing easy and much less painstaking than using "raw" Stax (or SAX) APIs.

For introduction on what StaxMate is and what it does, please refer to StaxMate project home page. Also, I will try to write some more entries on this blog. And finally, the Uuid Web Service example (an earlier "blog mini-series
" here at CowTownCoder), available from Subversion repository as well as from home page, should show simple but realistic example of how to implement basic "Plain Old Xml" style web services.

Beyond basics, here are some other miscellanous tidbits about StaxMate:

  • Licensing is almost simples possible: BSD (new one, without advertising clause). Share and Enjoy.
  • Version 1.x of StaxMate is compatible with version 2.0 of Stax2 extension API, implemented natively by Woodstox 3.x series, but can also be emulated for other Stax implementations.
  • StaxMate is already in production use at least at one Fortune 500 company, for multiple web services.
  • Outside the sphere of influence of its author, author knows so far of 2 other (groups of) users. But boy do I expect the adoption curve to "do the hockey stick"! :-)

As to future, here are basic plans for StaxMate future (in roughly this order):

  • Rewrite current convoluted implementation of cursor synchronization, by making cursors aware of the nesting level they started at, and keeping track of current nesting level within context object.
  • Verify that StaxMate works correctly on top of other Stax processors, such as Sun's SJSXP.
  • Add "typed" functionality: read and write primitive datatype values, such as integers and floating point numbers, efficiently. This will probably be coordinated with adding of low-level support for this in Stax2 extension set, as implemented by Woodstox.
  • Add support for XPath based navigation, for proper streaming subset of XPath. This would basically mean "traverse to the first matching node, in document order". XPath implementation could be based on Jaxen.

Finally, I would very much like to get feedback on anyone who uses, or even just plays with StaxMate. Drop me a line at "cowtowncoder at yahoo dot com", or leave a note here!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.