Thursday, April 24, 2008

Progress of Servlet 3.0 specs, pluggability

I have not really been following progress towards Servlet 3.0 specs (beyond observing some discussion regarding "asynchronous"/non-blocking servlets). But this blog entry by Greg the Jetty Author (a very smart and talented individual, and thankfully, a leader within Open Source Java community) caught my eye. It is good to know that there will be more pluggability -- monolithic nature of web.xml has actually been one of my pet peeves lately. But perhaps the most interesting "new" thing was the reference to things that already exist within Servlet 2.5 specs. I guess this should teach me to pay closer attention to new developments... while 2.5 annotations seem limited, they are existing extension points nonetheless.

Anyway, who'd have thunk: after years of trusty service by mostly basic vanilla Servlet API specification there seems to be actual useful improvements to be had from the latest revision.

Quick Introduction: project Aalto, from Cowtown Skunkworks

As some of you already know, Yet Another high-performance Java xml processor project was recently launched. Aalto xml processor is work-in-progress, and approaching its 1.0 release. I will try to write bit more on reasons behind starting this project on another entry, but for now it is enough to know that there are 2 main technical goals:

  1. Be Wicked Fast (check this out for some suggestions as to what is achievable)
  2. Implement Non-Blocking XML parsing mode (reads from underlying content do not block, but rather return EVENT_INCOMPLETE or such)

Both of these goals are already achieved to some degree: Aalto is almost twice as fast as Woodstox on many common documents (and hence matches or exceeds speeds of native code parsers like libxml2 -- I kid you not; likewise, binary xml parsers will get good run for their money when being compared to Aalto); and it does have experimental non-blocking (aka asynchronous) parser implementation. Challenges still remain, such as how to define standard extensions to support non-blocking mode.

For those interested in learning more, the important links are:

And how about immediate roadmap? Plan is to get Stax 1.0 API completed for 1.0 release (to be released within next few months), and the missing pieces are:

  • Implementation of coalescing mode (which, however, is missing from the Stax Reference Implementation, so hardly a must-have feature even if supposedly non-optional as far as Stax specs are concerned)
  • Implementation of repairing XMLStreamWriter

Other than these main features, the only significant missing thing is DTD-handling: Aalto does not parse DTDs (it does know how to skip internal subsets well), and although there is nothing fundamentally preventing from adding support, amount of work is big enough that it will not be done before 2.0 (if even then).

Anyway, hope to write little bit more about this exciting new (or, "new old"... project history is not all that short) project shortly. Don't switch the channel!

Saturday, April 19, 2008

How does one parse "XML" documents with multiple roots?

Ok, sure, title is bit of a trick question: after all, no xml document is allowed to have more (or less) than one root element. So the correct answer would appear to be "one does not". But there are ways to phrase this question more properly, for example by considering there to be implicit (and/oor, incomplete, insufficient, missing) framing -- failure of which to handle would lead to what looks like a "forest of xml documents". Or, perhaps one just wants to parse an "xml fragment", which can consists of multiple main level elements. And sometimes business reasons dictate one just has to deal with broken stuff. Money talks and bullshit gets worked with.

With this background, it is nice to know that Woodstox xml parser can indeed deal with such non-standard xml constructs. For details of how to do this, one has to venture into using Woodstox-specific input properties, specifically, use com.ctc.wstx.api.WstxInputProperties# P_INPUT_PARSING_MODE, and set (inputFactoryInstance.setProperty(...)) it to one of non-default values (PARSING_MODE_DOCUMENTS or PARSING_MODE_FRAGMENT). Best of all, you can just read this nice article for actual code samples and more musing on why this sometimes needs to be done. The article is, I think, yet another way user community is really what makes good things great, in the Open Source ecosystem. Maybe I should figure out a way to more systematically link to such stories from Woodstox project page?

Saturday, April 12, 2008

For those about to grok... we salute you!

Ok ok. I know it's bit too geeky to blog about what others are blogging (as well as have this meta-discussion of the fact). But I really must point out some Grade A Alpha Geek investigation by Kohsuke (little wonder he is considered a "rock star programmer"). I mean, color me strange, but I actually do find details of low-level assembly code produced by JVM's JIT compiler fascinating. I hope that doesn't quite classify me an Asperger.

Anyway, it is darn cool what all optimizations are being made. It also confirms my suspicion: current JVMs are not your yester-years simple stack machines any more: they are not only more complicated than you think, they may well be more complicated and clever than you can imagine.

Friday, April 11, 2008

Jetty Keeps On Rocking!

For a while now, I have been a happy Jetty user. Life has been good for me: Jetty is a "Container with Good Attitude": it is transparent, easily and well integratable component. The last part -- Jetty being, and viewing itself as, a component -- is the most important part for me. While I have used Tomcat with success, I just can't help but prefer Jetty due to its pragmatic egalitarian attitude. It's not "the platform", if you don't want it to be; if you want it to, it can do that too. But there is no "Hollywood Rule" here ("don't call us, we call you"). Little wonder that platform builders (Eclipse, most J2EE Application servers), too, are major Jetty users. When you need to get a servlet container for your stack, or for testing purposes, it is a very good choice.

Anyway, as a content user I was happy to see Jetty author's blog entry, which confirms my suspicion: more and more developers have found Jetty. And apparently like what they see. While the metrics in question is not the most relevant out there (after all, I suspect most Jetty instances really are behind-the-firewall web service containers), it is telling that Jetty is doing well within even this market segment.

Congratulations to Greg and the Jetty team. And please keep up the good work.

Let's hear it for the "51 Unsung Heroes of Woodstox"

One thing that has amazed me throughout the development cycle of Woodstox has been amount of invaluable, free and extensive (and sporadic, unexpected and chaotic as well) support from the user community. Most communication has been very helpful, especially when a user has both reported a problem and pointed out a reasonable solution for it. And quite often even contributed a unit test to verify the fix. That's kind of support you just can't buy.

Extensiveness of support is evident from the simple fact that there has been no fewer than 51 individual contributors to Woodstox. And while some have provided more than their share of useful feedback, each and every one of them has been integral part of success of Woodstox. Implementation quality would be nowhere as good as it is now without these individuals, and scope of functionality would likewise be less, and not directed as well to reflect actual user needs. And their help should be appreciated by the huge group of developers who use Woodstox even without realizing they do this: given that Woodstox is the de facto standard Stax implementation within J2EE world (being the default that ships with JBoss, Geronimo, possibly with Glassfish; being recommended by CXF, Axis 2 etc. etc.). I guess it could be called a virtuous cycle, "take a penny, leave a penny". Whatever it is, it works and does wonders.

Anyway, the full CREDIT file is the simplest way to browse Woodstox List of Fame. Let's just hope that next 4 years will bring as much as good help as first 4 years of the project!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.