Thursday, April 30, 2009

The Hybrid Rocket: hold on to your funky hats

Toyota Prius was THE car I wanted back in early 2001, when I was buying my first car in the US. It was so cool; and back in the day would have been practical enough as well. With at most 3 people at a time to move around, there's plenty of room, and cargo space was functional for food stuff. But most importantly I cared about environmental aspects back when it wasn't yet fashionable (... in the US, that is), and most people around seemed to want something idiotic like a tacky pickup truck or fugly SUV. Whatever.

Unfortunately, 3+ month waiting list for purchases made it hard to even find one instance to test drive, much less drive out of the lot. In fact, the only thing I achieved when trying to do that was to get 2 hour of high-pressure sales schpiel to get me to by a used Toyota Corolla. Nothing against that dependable vehicle, but that just didn't cut it (eventually I ended up buying a cute fiery red turbo beetle -- but that's a whole another story).

But maybe I should consider myself lucky: as detailed in The Flip Side of the Perfect Prius, Prius owners may be in for more "unexpected adventures" than they bargained for. Specifically, it seems prone to some kind of sudden acceleration syndrome (or, as embarrassing if somewhat less dangerous, "total lock-up" episodes). The case of "Prius in a creek" from Colorado looks particularly creepy. And while I could understand Toyota dealers' attitude for individual incidents ("you floored it instead of braking" / "just ran out of gas"), number of incidents, as well as type of people reporting problems suggests there is something more at work here.

That's very unfortunate, for something that in many ways is a harbinger for cleaner tomorrow. I hope the problem, whatever it is, gets resolved.

Tuesday, April 21, 2009

Educational, and Good Fun Geeky Waste of Time: StackOverflow!

From not-so-news-anymore department: I have grown addicted to another geeky game of sort: StackOverflow.com . It's just a simple question/answer site for programmers, but thanks to its game-like scoring system and merit badges it is subversively addictive. Perhaps I shouldn't be surprised, really, having been addicted to on-line games before: but at this point I should know better. :-)

On plus side (well, additional plus?), usage can actually be useful and educational too. Beyond trying to increase your karma by answering (and commenting, voting) many answers (and questions too) are actually useful and interesting. After all, there are only that many co-located knowledgable co-workers from whom you can learn. But number of colleagues you can collaborate with in virtual environment is less bounded; and in many ways ranking is more merit-based than the pecking order at your one's place of employment. This makes it easier to consider answers more on merits of themselves than on popularity of the guy who answered it (of course you can be a fanboy too and just look at ranking -- there's Jon Skeet that knows everything -- it's up to you)

At any rate: if you happen to roam about, I decided to use another Alter Ego there (StaxMan) -- but you can still see the usual cowboy logo, to spot me.

And now, if you'll excuse me, I am off to earn my first thousand points and a silver badge!

Monday, April 20, 2009

Rock Star Programming: the Amazin' (even "brillant"!) Paula Bean

Hey, what d'ya know: somehow I had so far totally managed to miss this pe(a)rl of programming: Brillant (sic!) Paula Bean.

That is totally awesome. I wish all the best for the author of that magnificent snippet of code: may she enjoy her job at Oracle, as a senior software architect! (ok, I admit -- that's a totally wild guess, I have no idea where P might be nowadays, perhaps as likely as not in ranks of program management).

Tuesday, April 14, 2009

Towards Jackson 1.0 release, update 0.9.9-4

On seemingly endless road to the official Jackson 1.0 release, another minor milestone is reached. 0.9.9-4 is now out, and contains an assortment of bug fixes, most to handling of generics-related problems uncovered by actual heavy production use. Included are also improvements to configuration, especially related to date handling; plus basic work on deserialization exceptions (Throwable and sub-classes).

As usual, go get it from the Download page, and report any oddities found (if any) via one of Mailing Lists.

Sunday, April 12, 2009

JSON vs XML: confessions of a JSON advocate

First of all: before getting into the issue here, let me just say that this hurts me more than you (not to worry, there's no spanking for anyone) . This because I am about to confess some misgivings I am having with my favorite data format, JSON.

So, here I am: a fan of JSON as a data format. The problem is not that I didn't like what JSON is and has. The problem is with things that it has not. And I am almost ashamed to admit it but many of them are found in -- GASP! -- xml. Yes, it is bit sacrilegous to admit this. But it is true: while nothing that is in JSON is bad per se, there are things omitted that should not be.

So here's my brief post-Festivus Airing of Grievances regarding JSON.

1. Comments are not really optional for a textual data format!

Enough said. Every textual data format should have a way to embed human-readable unstructured notes, injectable by humans as well as systems transforming or generating content. I often use XML comments to include information about time when a document was generated, or to contain simple debug information. This is very handy, and harmless for automated processing as it can (and should) just ignore such comments. And for actual processing hints there are also XML processing instructions to use, likewise ignorable by processors who don't care about them.

JSON format almost had its comments, too: an earlier (pre-RFC) version actually did include comments: C and C++ styles I think (i.e. ones that Javascript uses).

Why they were removed is beyond me, and is in my opinion the biggest mistake made in specification. Comments just should be available.

2. Sometimes redundancy is Useful: case of Elements vs Attributes

(or, "Data is Lonely without Metadata")

It may be confusing to have 2 somewhat overlapping dimensions in XML: that of structured (nested) child elements, and unstructured element attributes. But there is one practical and useful way to separate the two: think of elements and their textual content as actual data, and attributes as metadata (for element data). This simple separation works surprisingly well; and is a useful distinction for use cases like data binding.

For example: type of an object can be stored in a type attribute (like, say, "xsi:type"), and field values commonly as child elements. Or store all identifiers as id attributes (like generic "xml:id" as per Xml:id specification), separate from data contained as elements and textual values stored in elements. But useful for adding references to the element sub-trees.

JSON has no such facility, so any metadata has to be either in-line mixed with data, or structured as siblings. Initially this may not seem like a big deal, but it gets confusing pretty quickly in practice.

So why doesn't this matter with actual (Java) Objects? Isn't JSON more "object oriented", being an object notation, not markup language? Well, ava Objects DO have metadata that is orthogonal to data (object state, i.e. its member fields)! What else is class information than metadata, separate from actual data? All that typing -- both class declarations, and runtime Object types -- is metadata, not data; similarly for all method information. And most obviouly the latest additional to class metadata, Java annotations, is pure orthogonal metadata. It is not a perfect analogy (class info is per-class, like static memebers and methods; whereas actual data is per-instance), but indicates the need of place for both data and metadata.

3. As Simple as Possible, but No Simpler

Although both of above paragraphs could be repeated here -- as in JSON being simplified beyond reasonable, by omitting comments -- there is more.

For example: unquoted linefeeds are not allowed within JSON String values; linefeeds must be quoted just like other control characters. This is Bad. Why are they not allowed to be included as is, given how common they are in text? I suspect it was done in effort to make it easier to "parse" JSON, by allowing single-line regexps to work. But I don't care -- if I parse something, I do it properly. Regexps alone do not parse make (they make lexer, useful and used by parsers, but not parsers). Linefeeds are displayable characters just like anything else. It's quite ok to let them be used within String values: after all, they are often needed there. So why force quoting them, even though they are not used as separators?

There are also things that I think are good or at least acceptable riddances: for example, while it is often useful to have choice of quotes in xml (single or double quotes), I'm not crying after loss of apostrophes. I could write a parser that handles multiple kinds of String value markers; but I can also generate content using just one kind. But it does complicate hand-writing and modifying content.

4. Is Ordering really irrelevant?

In XML content order is mostly significant; the only exception being attributes that are unordered. This makes some parts of data binding more challenging, because objects usually have no concept of ordering for properties. Because of this there are many legal easily definable XML structures that can not be easily be mapped to (Java) objects.

But while sometimes problematic, ordering can also be valuable. For example, it is great that it is possible to guarantee that certain elements (like, say, "header") comes before others (like, say, "footer"). The only conceptually correct way to do this in JSON is to use Lists (aka Arrays). But their values are anonymous, unlike those of Maps. Alternatively it is possible for JSON processors to preserve actual physical ordering; but the problem is that not all processors will do this; not the least because specification discourages this.

And the most obviously useful ordering is that the metadata (attributes) always precedes data (elements). That is something you can count on; and for common types of metadata (those class types and identifiers, see above), this is pretty optimal arrangement.

5. Other problems?

One thing of interest regarding list above is that none of them is a commonly stated reason by those who advocating using XML over JSON.

Conversely, I think that most commonly used reasons are very poor excuses of arguments; usually based on fundamental misunderstanding of actual benefits of XML, or good use cases for either XML or JSON. Perhaps I should collect list of such claims to shoot them down next. :-)

Saturday, April 11, 2009

On Versioning (RESTish) Web Service APIs

One perennial thing that crops up like crocuses around March, is the question of the Right Ways To Version RESTish Web Services.
There are multiple good, unbounded number of bad, and plenty of ugly ways too. But the Good ones are mostly just pragmatic guidelines; there is no science behind most of it. Some bad ones are polished, but the real snake oil comes with the ugly ones. But I digress. Since good ones tend to be somewhat boring, there's bit less discussion on those.

So it is great to find that there are others whose those align nicely with yours, especially ones that document their thinking. It not only lends credence to your ideas, but makes it easier to express them in form of "have a look at here", sort of as baseline for discussion. In this case there's this in-process document called "Versioning HTTP APIs" that quite nicely serves this purpose.

Anyway: I thought the article is worth reading; and I agree with at least 90+% rate. Which is pretty good, about the rate of agreement with my own writings after a year or so. :-)

Friday, April 10, 2009

Soap Shakes, Jersey Rocks

I have grown to be a big fan of JAX-RS ("Java API for RESTful Web Services" -- but where did "X" come from?), and especially the flagship JAX-RS implementation, Jersey. Although I have only been actual user for past couple of months (and lurker on the mailing lists for less than a year), I am fully sold on it by now. In many ways, it might be the only web services framework you need to known, on J2EE platform.

I think of it as moving from manual to automatic shifting. Very convenient, lettig me forget about clutch and gears, and just focus on the road and traffic ahead. Or perhaps Hanoi Rocks playing on CD.

But what exactly is it that JAX-RS and Jersey provide? Here is small sampling of what I like.

1. Jersey the Plumber

You know the gunk between the server receiving a request, and that request reaching actual business logic layer? With Jersey, you only need to be vaguely aware that it exists, give it a hint as to how you want the endpoints to match, and it Does The Plumbing.

You create end points by annotating your method to take in @GET or @POST (etc) requests. And kindly request that things like

  • URL path components (@PathParam
  • Query (@QueryParam) and Form (@FormParam) parameters
  • HTTP headers (@HeaderParam
  • Cookie values (@CookieParam
  • POST payload (any parameter without other annotations
are brought to your method as parameters so annotated. And that's what they will be when method gets called, when request matches the path definition. Pretty nifty. And similarly response is converted back to actual serialized response message, along with headers (for content type) and status code. This means that dealing with the interface (GIGO) takes one method with perhaps 6 lines of code; or maybe 10 if you do proper exception handling to deal with breakage of your business logic.

So long for having to know what kind of stinky stuff clogs the drains: feel free to focus on the actual business logic within service.

In fact, this convenience alone is worth the framework. Seriously. It makes writing services almost fun.

2. Needles and Pins

Almost as important as getting the input parameters in (and reponse out) is to get hold of context objects, to access configuration and shared contextual facilities like connection pools and caches . And this too is provided to you by simple act of annotation: in this case by annotating members of your resource classes (any class with JAX-RS annotations to the class or at least one of its methods is a resource).

Need some Servlet Context with your service? Just declare a member variable is desired type, add @Context annotation, and it'll be there when you need it.

3. Any Way You Want It -- Single Format is SO passe...

Remember that Last Fashionable web service framework? What was it.... something named by a commone household detergent? The one that allowed you to use any data format, as long as it was based on pointy brackets? (if not, good for you).

Well: nowadays SoapSponge Squarepants may act like it was hip Json the Menace: but the only way to do this is to fake Json (aka FrankenJSON). And usually that's as far as "cross-format" tolerance goes. And this despite not having "xml" anywhere within acronym itself.

Not so with JAX-RS: any format can be used. And while XML still is sort of "First among equals" (due to bit of preferential treatment that implementations like Jersey give to JAXB), it is barely noticeable and does not get in the way. In fact my simple built-in-a-day-or-two SQL report service can provide results as XML, JSON, HTML or CSV; and if I need to, in PDF and some more of alphabet soup too.

Addendum: Word of Warning

Is there anything I don't like with JAX-RS or Jersey?

No big regrets yet: but there is one thing I have noticed that can get you to a rat hole: excessive abstraction and indirection. There are many hooks to further create factories, providers and injectors that sometimes application developers are turned into framework developers. It is good to keep simple things simple, and only add indirection when it actually makes sense. There is always time to create frameworks later on, once you learn good ways to build clean and working code.

But having ways to build abstract extensions is not really fault of the framework I guess: and most of the things exposed are made good use of itself.

Sunday, April 05, 2009

Another pleasant (if woefully late) discovery: jQuery

I admit it: I am generally a late adopter of new things. It has its benefits -- I have missed countless train wrecks by waiting for others to figure out god awful messes hidden in well-intended APIs, libraries, patterns, styles and wherever such dangers lurk. But downside is also obvious -- I could have been more productive had a found gems in cases where they hadn't been hidden for months or years.

This is the case with jQuery. What a pleasant awe-inspiring library: compact, powerful, intuitive and straight-forward to use. I suspect there's no point in elaborating more as to why I like it -- most everything has already been said. Just google for testimonials.
All I can say is that I wish I had dug in earlier.

But better late than never.

And on a related note another obvious observation ("Hey, this is Captain Obvious calling... ") is that JAX-RS (Jersey) and jQuery go together very nicely; especially when using Jackson-based JSON binding.

Friday, April 03, 2009

Improving XSLT performance with Woodstox/SAX , part2: now with Xalan

Ok, so we now know that Woodstox can pimp Saxon-based XSLT processing by up to 30%. But how about Xalan?

Here are results running same stylesheets over 3 parsers (Xerces, Woodstox, Aalto):

And here are the complete results.

As before, we can expect some incremental speed improvements (like 20-30% more data processed within given resources) with Woodstox, and tad more with Aalto. Another cheap way to improve your CPU economy. Consider it a Green Idea of the day or something.

Thursday, April 02, 2009

Improving XSLT performance with Woodstox/SAX (or Aalto): 20-30% more thorughput!

One thing I mentioned earlier regarding potential use cases for Woodstox' SAX API implementation was to speed up XSLT processing. 2 main Java XSLT processor contestants -- Xalan and Saxon -- both use SAX as their primary input source. Since Woodstox is a tad faster for both reading and writing XML than its leading open source competitors (including one JDK bundles, Xerces), one would expect some speed boost for using Woodstox as the SAX parser.

To test out this hypothesis, I wrote some more StaxBind code (which currently lives within Woodstox svn repository at Codehaus) to allow running set of XSLT stylesheets over documents: in this case test cases of XSLTMark test suite.
For additonal fun, I also included tests using Aalto given it also implements SAX and is complete enough to handle XSLT transformations just fine. For the first test run, I chose Saxon; I will later do the same with Xalan.

So here's a quick look at combined results over couple dozen tests:

Or, check out complete results for full overview.

The executive summary is that using Woodstox can give you 20-30% more throughput for these test cases; and consistently at least some speed up. Aalto can give another slight boost as well. And given how simple it all is (see here for a reminder), it should be a nice easy win.

What next? Well, in addition to testing Xalan, it would also be nice to get a more representative set of real-world stylesheets. But for now, I'll assume these results are representative.

Wednesday, April 01, 2009

A Modest Proposal for Rewriting Woodstox, Jackson, using Protocol Buffers

(Update, April 2nd: Never mind: changed my mind -- revert back to the old course!)


After playing some more with Google Protocol Buffer implementation, I have become more and more impressed by it. It is easy to love its debuggability, expressiveness and extensive tool support. But most of all it is the performance aspects that have caught my attention. Those propeller heads at the big G have certainly gotten performance boosted to Infinity and Beyond (almost to 11, dare I say).

Given its superior performance, I figured it is pointless to continue working on direct approaches to parsing JSON and XML.

Instead, my new plan -- effective immediately! -- is to retool Woodstox (XML) and Jackson (JSON) parsers to make use of some PB goodness. Here is how I think it can be done.

1. For main parsing and generator, use Protocol Buffers

The core reading and writing of content should be done using Protobuf; and consequently all content needs to be in compact ProtoBuf binary data format.

While this is the obvious right way to go, it does add some problems because existing legacy applications will expect "native" APIs to process content. And on the other hand, legacy content will still be using textual data formats in question.
So to make things work, there is need for just wee bit of "glue" both above and below ProtoBuf layer.

2. Below ProtoBuf, use Simple Light-weight converters

For XML, the natural light-weight translation mechanism from textual XML into PB format is XSLT, possibly augmented with XSLT 2.0 type information, derived from dynamically generated Schema Types. If necessary (for example, if performance is not as good as expected), these can be converted to binary XML during runtime (and possibly re-parsed using EXI Binary XML): this to minimize amount of processing done using inefficient textual format. And if nothing else works, it is always possible to add more layers to improve efficiency.

For JSON choices are more limited, but I am confident that some combination of JsonPath and YAML should do the trick. Another possibility would be to use something like BadgerFish mapping convention (for binary data, I am thinking of defining straight-forward complementary mapping, code named "StinkySock", but more on that later on).

3. Above ProtoBuf use Some More Converters

Above PB, some limited amount of glue is also needed, to produce kinds of events current crop of applications need (Stax, Jackson API). The simplest mapping for XML seems to be using SAX API first (since that is easy to expose). But as Stax sources can not use SAX (push vs pull), it will be necessary to use intermediate structure: DOM seems just like the simple thing to use. And since DOM can be read via DOMSource, it is easy to produce Stax events from there (Woodstox can actually already do that, which makes it totally trivial).

I will leave the details of converting from Protocol Buffer tokens into JSON for interested readers -- suffice it to say that it should be possible to concoct similarly simple and elegant solution as outlined above, without undue effort.

4. Want to know more?

Although there are still some details open, there is much more to discuss -- instead of boring you here, feel free to read more on my plans.

As usual, please let me know what you think -- I am very excited about this new approach!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.