Tuesday, May 26, 2009

With a little from my friends (Beijing re-mix)

You can't really make this stuff up: Passer-by pushes suicide jumper in south China. Nor can you improve upon the story. Priceless!

But I suppose excluding the fact that the would-be-jumper failed his mission, this went smoother than in most other places. No one trying to sweet talk you off the bridge or such (yeah yeah, I know, jumper just wanted attention etc... but humour me here).

Now: I can only think of the torture one would face in places like, say, downtown Helsinki. During daytime hours, no one would pay any attention ("better not look; he's drunk, or crazy, or both") whatsoever, and during night time, well, he'd be pestered by dozens of drunk (and possibly a few crazy) asking stupid questions, telling lame jokes and repeating the same at least a dozen times. Then wandering off for a while to resurface a while later to repeat the thing.

And although you might think this latter crowd might be more useful wrt. jumping part -- by sheer act of clumsily bumping someone off the bridge, chances are that a sober suicide-contemplator would be more likely to grasp onto something than the drunken person. So it's quite likely the depressed person might just become an accidental hero by saving a drunkard's life.

That's why suicide candidates in Helsinki never use bridges, or other public places, for this purpose. There are no kind elderly chinese gentlemen to offer a "helping hand".

Tuesday, May 19, 2009

On importance of choosing the right tool

Tool Choice Matters: it makes the difference between "Nailed it" and "Screwed it up"...

Friday, May 15, 2009

How many classes does it take to serialize a POJO?

(or: the usefulness of class count as metrics for simplicity)

A recent JSON package comparison got me thinking about perceived simplicity (or lack thereof) of libraries. While I do not really think number of classes is a generally useful metrics of a package (nor generally correlate with its fitness), I can at least see how an argument could be made that sometimes "small is beautiful" (especially if a more accurate metrics like resulting jar size was used) -- it would seem strange if supporting libraries are significantly larger than the main code of a plug-in or such.

But what I think is the actual fallacy is using number of implementation classes as some sort of proxy for simplicity of package; especially regarding being simple to use (intuitive, easy to use etc). As in assuming that a package with, say, 12 classes, is simpler to use than one with 250 classes. The problem is this: from user perspective, only those classes that user has to directly interact with really matter: they are the public API, and contain all the complexity user is faced with. Implementation classes seldom matter -- they are there, get used, but are not exposed to you. There is no cognitive load on such implementation details.

So back to the original question title asked: regarding Jackson specifically, how many classes do YOU as a developer really need to know to use it?

I think it can be as low as just one: the all-powerful (org.codehaus.jackson.map.)ObjectMapper.
For most users, that's the only class they need to be familiar with, from within Jackson class library.

And even power users only need to know a couple of additional classes:

  • (org.codehaus.jackson.)JsonFactory for constructing other things
  • (org.codehaus.jackson.)JsonParser if streaming parsing (or data binding, tree model) is needed
  • (org.codehaus.jackson.)JsonGenerator if streaming JSON writing (or bean, tree model serialization) is needed
  • (org.codehaus.jackson.)JsonNode if Tree Model is used for processing, instead of or in addition to streaming processing or data binding.

which would give us grand total of 5 classes you need to familiarize yourself with. And for good developers that deal with error cases, one more (JsonException) for bit more of error handling.

From there on, additional classes (exceptions, configuration objects) are only needed when more functionality is needed; and most of additional classes are rather simple: especially annotations which usually are little more than markers, tags.

In fact, another allegedly "simpler" with 7 classes probably requires you to know all them. And chances are there is less modularity in division of concerns, likely leaking unnecessary implementation details into API.

Of course, this is not the only problem with "classes as measure of complexity" idea -- having a properly modular API with more classes can be much more palatable than one with just a single monster swiss pocket knife class -- but it should be enough to get you thinking seriously whether to apply such simplistic metrics for evaluating simplicity.

Assessing simplicity has lots of complexity to it. And fundamentally, like beauty, simplicity is in the eye of beholder.

Thursday, May 14, 2009

"Darwin fish giving the Jesus fish a friendly little hug..."

I like reading prose written by authors who are better at writing than I am. Granted, this does not limit my choice of reading greatly, but it does generally guide me heavily towards printed media. This because quality of work within this quaint (and, sadly, dying) segment of word industry is significantly higher than online. It is a shame that (paid) columnists may go extinct faster than most currently endangered animal species; and with perhaps greater certainty.

Anyway: what reminded me of my liking of printed media was my favorite column, "Uptight Seattleite", and its take on relationship between Darwin fish and its buddy Jesus fish.

Priceless. And eerily fitting considering the background of declining newspaper industry, as seen as a part of normal evolutionary progress in world.

Monday, May 11, 2009

Jackson JSON-processor turns 1.0.0

Ok: it is now official: the official Jackson JSON-processor version 1.0.0 has just been released. Get it while it's Hot!

Wednesday, May 06, 2009

json+gzip nicely packed, but has it Got Speed?

One commonly occuring them on discussions on merits (or lack thereof) is the question "but does the size matter". That is: while textual formats are verbose, they can be efficiently compressed using common every day algorithms like Deflate (compression algorithm that gzip uses). From information theory standpoint, equivalent information should compress to same size -- if one had optimal (from information theory POV) compressor -- regardless of how big the uncompressed message is. And this is quite apparent if you actual test it out in practice: even if message sizes between, say, xml, json and binary xml (such as Fast Infoset) vary a lot, gzipping each gives rougly same compressed file size.

But what is less often measured is how much actual overhead does compression incur; especially relative to other encoding/decoding and parsing/serializing overhead. Given all advances in parsing techniques and parser implementations, this can be significant overhead: compression is much more heavy-weight process than regular streaming parsing; and even decompression has its costs, especially for non-byte-aligned formats.

So: I decided to check "cost of gzipping" with Jackson-based json processing. Using the same test suite as my earlier JSON performance benchmarks, I got following results.

First, processing small (1.4k) messages (database dumps) gives us following results:
(full results here)

and medium sized (16k): (full results)

(just to save time -- results using bigger files gave very similar results as medium ones, regading processing speed)

So what is the verdict?

1. Yes, redundancies are compressed away by gzip

Hardly surprising is the fact that JSON messages in this test compressed very nicely -- result data (converted from ubiquitous "db10.xml" etc test data) is highly redundant, and thereby highly compressible.

And even for less optimal cases, just gzipping generally reduces message sizes by at least 50%; similar to compression ratios for normal text files. This is usually slightly better than what binary formats achieves; oftentimes even including binary formats that omit some of non-redundant data (like Google Protococol Buffers which, for example, requires schema to contain field names and does not include this metadata in message itself).

2. Overhead is significant, 3x-4x for reading, 4 - 6x for writing

But it all comes at high cost: overhead is highest for smallest messages, due to significant fixed initialization overhead cost (buffer allocations, construction of huffman tables etc). But even for larger files, reading takes about three times as long as without compression, if we ignore possible reading speed improvements due to reduced size. And the real killer is writing side: compression is the bottleneck, and you'll be lucky if it takes less than five times as long as writing regular uncompressed data.

3. Is it worth it?

Depends: how much is your bandwidth (or storage space) worth, relative to CPU cycles your programe spends?
For optimal speed, trade-off does not seem worth it, but for distributed systems costs may be more in networking/storage side, and if so compression may still pay off. Especially so for large-scale distributed data crunching, like doing big Map/Reduce (Hadoop) runs.

Or how about this: for "small" message (1.4k uncompressed), you can STILL read 22,000, write 12,000, or read+write 8,000 messages PER SECOND (per CPU). That is, what, about 7900 messages more processed per second than what your database can deal with, in all likelihood. Without compression, you could process perhaps 14,000 more messages for which no work could be done due to contention at DB server, or some other external service... speed only matters if the road is clear.

Yes, it may well make sense even if it costs quite a bit. :-)

4. How about XML?

If I have time, I would like to verify how XML+GZIP combination fares: I would expect same ratios to apply to xml as well. The only difference should be that due to somewhat higher basic overhead, relative additional overhead should be just slightly lower. But only slightly.

Tuesday, May 05, 2009

Finally: explicit support for JSON on the main Javascript platform... :)

Once again, this is not really news (as in being somehow new), but better late than never: apparently Firefox 3.1 has native JSON support functionality built in (and some version of IE 8 as well?).

So why does this matter? Article outlines main reasons: security/convenience (using "eval" is insecure; and other parsing methods need to be included as json libraries) and speed. I am actually surprised that speed difference is only 3-to-1 -- I would imagine it could be even more. Perhaps they should embed Jackson within Firefox to further speed it up. :-)
(just kidding -- I'm sure there are decent fast C/C++ parsers out there too; I just suspect one used is not yet one of those...)

This is double-plus-good, since the json parser library I tried to use ("jQuery json") kind of sucks; badly enough that I have reverted to using plain old Javascript eval for now. But maybe I can change code to auto-detect "JSON" object, and to use it if available.

Monday, May 04, 2009

A new (?) contender in "useful + elegant path language for JSON": jPath

Ok now: there have been contenders (most notably JsonPath) for the title of "the JSON path". But so far it's still up for grabs -- none has gotten things quite right.

But it looks like there's now one more interesting take, JPath. Maybe it's just my new liking to jQuery, but this one seems to get the most important aspect (syntax!) right. For me at least, it all comes down to having expressive and elegant DSL to express simple path navigation. Will need to play with it a bit to see if things click like they should; looks like it'd be a natural with jQuery. If so, this might be just what we need.

Sunday, May 03, 2009

Frustrations of a Java developer: Tomcat, Jetty both suck

Ok: now, this should not be rocket surgery: I just want to start from a clean slate, download a servlet container (like Jetty or Tomcat) in its own non-modified directory, create a trivial web app (web.xml, index.html and a jsp file), and run it with default settings.
Easy? You might think so.

But not necessarily so: with both Tomcat and Jetty, chances are you (I) will STILL bump into multiple dead ends (after what, 10 years of development for these containers!), with unhelpful error messages. Yes, I could make an IDE auto-generate much of the stuff, but that should only be automating what I would do manually anyway. And true. I should be cut'n pasting working configs (which I do, just didn't have thing to carbon copy at hand). Even then, I should be able to get things going in a jiffy with all the experience using this dog-forsaken beasts.

So what happens with the default settings? With Jetty, I got this:

[tatu@www sample-struts2]$ java -jar jetty/start.jar cfg/jetty.xml
java.lang.ClassNotFoundException: org.mortbay.xml.XmlConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at org.mortbay.start.Main.invokeMain(Main.java:179)
at org.mortbay.start.Main.start(Main.java:523)
at org.mortbay.start.Main.main(Main.java:119)

And this with proper (?) setting for system property "jetty.home" (I think). As a seasoned Jetty veteran, I should be able to figure it out (and I will). It is somehow using some default class path discovery which doesn't find what it's looking for. Disappointing.

But because of this unfortunate hickup, I thought I'll check out Tomcat 6, since one thing it has going for it is clean (I thought) separation between standard Tomcat deployment (with CATALINA_HOME), and actual custom deployment of web apps (with CATALINA_BASE). Also with version 6, its configurability has improved a lot, even though it is still incomplete (why the hell does it not let you log to places outside of "$CATALINA_BASE/logs"?!?), and thus it's not quite as frustrating to deploy.

Ok... let's see... at first I do manage to get it to deploy, but with its default web apps not mine. Ok, had BASE and HOME mixed up. Correct this and... worse:

Exception in thread "main" java.lang.NoClassDefFoundError:
Caused by: java.lang.ClassNotFoundException:
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
Could not find the main class: .  Program will exit.

Seems obvious? If you think so, you are mistaken (if understandably so): no, this is NOT due to not finding a class due to failure of class introspection.
After LOTS of trial and error, I found out that the real problem is this: all configuration files MUST BE FOUND FROM UNDER $CATALINA_BASE/conf! Never mind that $CATALINA_BASE would have very suitable defaults. Tomcat doesn't care -- it should, mind you -- and does not try checking for common failure conditions.

But the really scary part is that idiotic error message: turns out it is the result of specifically finding file "$CATALINA_BASE/conf/logging.properties" (of all things). So why does it complaing about not finding a class? Who knows? Maybe it just quietly ignores the fact that no properties file is found, and then bumps into not having proper setting for logger class. And its vague enough that I couldn't find anything relevant by googling. Maybe this entry will fix that particular challenge at least.

And the final insult is that the error message only goes to the log file -- from console, things seem to go quite ok.

I am beginning to despise Tomcat 6 as much as its predecessors (wasted too much of my time with Tomcat 4, back in the day). And I'm not entirely happy with Jetty either.

Friday, May 01, 2009

Another JSR with Potential for Goodness: JSR-303, Bean Validation API

Here's something less depressing (... than the acquisition of Sun by one of worst possible suitors [IMO]) from the Java land: JSR 303, "The Bean Validation API", seems like a rather useful little tool to have.

So what is it? Basically, it is a pluggable annotation-based component for validating data constraints on beans, typically used for things like validating user input like web forms. My personal interest, however, is more related to another obvious use case: that of validating request messages for web services. Either way, writing validation code is brain-numbing dull monkey coding. Writing such validation has been a necessary part of many Java developers daily job. But with this new API (and more importantly the leading implementation by Hibernate team) the programming part can be mostly eliminated; and the rest will be simple matter of applying annotation. At least when defining simple rigid data type constraints (min/max values, lengths, non-null, matches a regexp).

Instead, constraints to validate can be declared by simple standard annotations (and/or custom ones that can be built using guidelines and components from the API), attached to Bean fields and/or access methods. For example:

import javax.validation.constraints.*;

  public class MyBean
  {
     @NotNull // can't be null (not optional
     @Size(min=4, max=40) // length, [4, 40]
     String name;

     @Max(20) // no more than 20
     int retries;

     @NotNull
     @Valid // means that instance is recursively validated
     OtherBean childBean;
  }

And validation itself is done using something like

  ValidatorFactory factory = Validation.buildDefaultValidatorFactory();
  Validator v = factory.getValidator();
  Set<ConstraintViolation<MyBean>> probs = v.validate(myBeanInstance);

which returns set of ConstraintViolations, each of which details field that had the problem (path to field via references using dot notation) and matching localizable problem description.

For further discussion, this article is a good follow up; and JavaDocs should be enough to get you going with the details.



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.