Sunday, January 09, 2011

A new, interesting way to accept donations: Flattr

Ok, here is something potentially interesting regarding finding tiny streams of revenue for starving bloggers (no, not me -- I consider mine more a community service kind of deal :)... A small company from Sweden called Flattr seems like a nice way to allow readers to make micro-payments to various web sites they consider worth a tip. Idea is pretty simple, and has been around for a while: there should be a way to make small monthly allowances (paid by web users individually) be distributed to worthy sites, simply, conveniently and efficiently.

But I like the specifics of they Flattr does it; from anonymity to reciprocity: latter meaning that to be able to receive payments, one has to be a member and make payments. Although there is no requirement for parity (i.e. no limitation on how much one can receive -- it is possible to spend as little as 2 euros per month and receive unlimited payments, as far as I understand it), it is quite compatible with current models of how human mind considers fairness. And of course it is a very clever and simple way to foster usage, from business perspective.

I will need to check this out more: I already became a member, and will start using it... I just hope there are sites that I can tag as recipients. Feel free to add comments for stuff you think I might be interested in reading, and that have Flattr widget.

Saturday, January 08, 2011

On Perception of Java Verbosity

Today many software developers consider Java to be the modern-day equivalent of Cobol. This is evident from comments comparing amount of Java code needed to do tasks that can be written as one-liners using more dynamic and expressive scripting languages such as Python or Ruby. Funny how time flies -- it wasn't all THAT long ago that Java was seen as relatively concise language compared to C, due to its in-built support for things like garbage collection and standard library that contained implementations for host of things that in C were DIY (note that I did not say "due to simplicity of language itself")

1. Java verbose?

But while it is true that Java syntax can lead to code much more verbose than seems prudent (especially when traversing and modifying data structures), sometimes its reputation exceeds reality. I was reminded by this by a tweet I came across. The tweet asked "and how many lines would this be in Java", regarding a task of downloading JSON from a URL and parsing contents to extract data; something that can be done with a single line of Python (or Ruby or Perl). Implied assumption being be that it would take many more lines of Java code.

2. Ain't necessarily so

This assumption is not completely baseless: if a developer was to do this as part of a service, a typical java developer might well end up with code that exceeded ten lines; and this even without code itself being badly written. I will come back to question of "why" in a minute.

But assumption is also off base, for the simple reason that it can be a one-liner even in Java; for example:

Response resp = new ObjectMapper().readValue(new URL("").openStream(),Response.class);
// or if you prefer, bind similarly as "Map<String,Object>"

(and in fact, ".openConnection()" is actually unnecesary, as ObjectMapper can just take URL -- but if it didn't, one can open InputStream directly from URL, which sends request, takes response and so forth).

Code snippet just uses standard JDK URLConnection via URL, and a JSON library (Jackson in this case, but might as well be, GSon, flex-json, whatever); and results in request being made, contents read, parsed and bound to an object of caller's choosing, either a Plain Old Java Object, or simple Map.

Given that it IS that simple, why was there assumption that something more was needed?

3. But often is

Above use case happens to be doable in quite concise form; but there are other tasks where Java equivalent ends up being either a call to a very specific library tailored to condense usage, or is much fluffier than equivalents in modern scripting languages. But I don't think this is the main reason for the universal appearance of Java's bloatedness, i.e. it is not just case of choosing a wrong example.

I think it is because most Java developers would actually write piece of code that spanned more than a dozen lines of code. Why? Either because:

  1. They didn't know JDK or libraries, and use much more cumbersome methods (case for less experienced developers)
  2. They actually understand complexities of the task, within context where task needs to be done.

First one is easy to understand: if you don't know your tools, you can't expect a good outcome. But second point needs more explanation.

Let's consider the same task of sending a request to a service that returns a JSON response that we need to return as an object. What possible additional things should we cover, beyond what one-liner did? Here's sampling of possible issues:

  • There is no error handling in code snippet: if there are transient problems with connection, it will just fail for good, regardless of type of problem there is
  • How about problems with service itself? Requesting unknown customer? Do we get an HTTP error response; different JSON or what?
  • Do we really want to wait for unspecified amount of time, if request can not be made (TCP will try its damnest to connect, so it there is an outage it'll be minutes before anything fails)
  • URL to connect to is fixed (and hard-coded), including parameters to send; should they really be hard-coded
  • How is caching handled? What are connection details?
  • When there are failures, who is notified and how?
  • Are we happy with the default JDK URLConnection? It may not work all that well for some use cases (i.e. shouldn't be using Apache httpclient or something)

To cover such concerns for production systems, one probably would want much more complicated handling: possible retries for transient errors; definitely logging to indicate hard failures; way to handle error responses and indicates those to caller. Due to testing, end points being used are typically dynamically determined and passed; connection settings may need to be changed, and sometimes different parameters need to be sent. And for production systems we probably need more caching; whereas during testing we may want to disable any and all caching.

Since there are often many more aspects to cover, there is then tendendy to wrap all calls within helper objects or functionality; and if we did define something like "fetchJSONDataFromURL()", it surely would end up being more than dozen of lines of code. Yet calling functionality might still be no longer than a single Java statement.

So which one should we focus on? Helper method that is, say 50 lines long; or call to use it, which may be a one-liner? Former is what can be used to "prove" how bloated Java code is; yet it is written just once, whereas one-liners to use it are written ideally much more often.

By the way, above is not meant to say that it is ALWAYS necessary to handle all kinds of obscure error modes, or to create perfect system that is as efficient as possible. It is clearly not, and Java developers seem especially prone to over-complicating and -engineering solutions. But in other cases, happy-go-lucky approach (that I would claim is more common with "perl scripters") won't do. This is just a long way of saying that complexity of code should be based on actual requirements; and that those requirements vary widely.

4. Concise Java by Composition

I think my insight (if any) here is this: since Java, the language, offers relatively in way of writing compact code, economical source code must come from proper use of libraries, as well as design of those libraries. Furthermore, I think many Java developers have started wrongly believing that Java code must be verbose; and that this makes perception more of a self-fulfilling prophecy. This means that to write compact Java code one absolutely MUST be familiar with libraries to use for things that JDK does not support well (or at all).

Tuesday, January 04, 2011

Annual Update on State of Java JSON data binding performance

Yes, 'tis the season for performance measurements again: last time I covered this subject about a year ago with "JSON data binding performance (again!)".
During past 12 months many of the tested libraries have released new versions; some with high hopes for performance improvements (Gson 1.6, for example, was rumored to have faster streaming parser). So it seems prudent to have another look at how performant are Java JSON data binding libraries currently available.

1. Libraries tested

Libraries tested are the same as last time; with versions:

  1. Jackson 1.6 (was 1.2)
  2. Gson 1.6 (was 1.4)
  3. Json-tools (core, 1.7) (no change)
  4. Flex-json 2.1 (was 1.9.1)

(for what it's worth, I was also hoping to include tests for "json-marshaller", but lack of documentation coupled with seeming inability to parse directly from stream or even String suggested that it's not yet mature enough to be included)

2. Test system

I switched over to the light side at home (replaced my old Athlon/Linux box to a small spunky Mini-Mac, yay!), so the test box has a 2.53 GHz Intel Core 2 Duo CPU. This is somewhat box; it seems to be about 4x as fast for this particular task. Test is single-threaded, so it would be possible to roughly double the throughput with different Japex test setup; however, test threads are CPU-bound and independent so seems to be little point in adding more concurrency.

Test code is available from Woodstox SVN repository (under "staxbind" folder), and runs on Japex. Converters for libraries are rather simple; data consists of medium-sized (20k) documents for anyone interested in replicating results.

3. Pretty Graphs

Ok, here is the main graph:

Data Binding Performance Graph

and for more data, check out the full results.

4. "But what does it MEAN?"

As with previous tests, upper part of double-graph just indicates amount of data read and/or written (which is identical for first three, but flex-json seems to insist including some additional type metadata), which can be ignored; the lower-graph indicates through-put (higher bar means faster processing) and is the interesting part.

There are three tests; read JSON (into data object(s)), write JSON (from data object(s)) and read-then-write which combines two operations. I use last result, since it gives reasonable approximation for common use with web services where requests are read, some processing done, and a response written.

From graph it looks like results have not changed a lot; here is the revised speed ratio, using the slowest implementation (Gson) as the baseline:

Impl Read (TPS) Write (TPS) Read+Write (TPS) R+W, times baseline
Jackson (automatic) 7240.202 9161.873 4023.464 14.75
FlexJson 721.743 1402.848 462.594 1.69
Json-tools 524.119 1007.068 341.123 1.25
GSON 714.106 462.935 272.637 1

5. Thoughts?

Not much has changed; Jackson is still an order of magnitude faster than the rest, and relative ranking of the other libraries hasn't changed either.

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.