Monday, June 28, 2010

Async HTTP Client 1.0 released

Ok, this announcement went out last week, but it is important to re-iterate: version 1.0 of Ning async HTTP Client is now out!

This is good news for multiple reasons: obviously ability to do asynchronous (read: non-blocking, i.e. no need to use one thread for each and every open connection) HTTP communication is valuable in itself. But I also hope that this spurs some friendly competition in improving all HTTP-based communication on Java platform -- there are still features that are not implemented, and due to complexity of efficient connection handling, there should be still room for improvement with performance as well. And finally this should lead to wider adoption of this relatively new library, so that it gets properly battle-tested and proven.

I am actually planning on using this client for cases where regular blocking client could work, to see how well it performs and exactly how easy it is to use non-blocking API, compared to existing blocking alternatives.

Tuesday, June 08, 2010

Introducing My Newest Open Source project: Tr13 (that is so 1337!)

I have briefly mentioned that I have been working on a new little low-level data structure package, called (Ning) Tr13, started at GitHub. Now that code is getting ready and tested, it is time to "formally" introduce it.

Tr13 is an interesting project form multiple angles: for one, it is my first "corporate-sponsored" Open Source project (i.e. related to may day-to-day work -- although not focus of it, just a useful thing for me and hopefully my colleagues at Ning). This alone is worth celebrating; the whole process has been remarkably smooth from 15-minute end-to-end approval process (take note Amazon developer community...), tosteady if somewhat slow development itself (there is nothing groundbreaking from CS perspective -- more on this below).
This is also a rare case of my already having reasonably clear idea as to where I would be using this package. It may sound surprising, but truthfully most open source libraries I have written, I have not actually absolutely needed them before writing. :-)
(there are exceptions: Java UUID Generator is something I wrote since I needed it -- but not so for Woodstox, or even Jackson, to some degree, both of which I have extensively used after I wrote them).
And finally, well, it is a change from my more common data format parsing / data binding work. Although I was thinking of "climbing up the stack" to write something higher level, I ended up going the other way, a slight detour.

Anyway, I digress. So...

What exactly is tr13?

It is a space-efficient ("compact") read-only implementation of Trie data structure, with slight modification to use Patricia/Radix structure for leaves (but not for branches, since that did not seem useful for my use cases).

Current implementation supports two kinds of Tries:

  1. Byte[] key, variable integer (~= long) value
  2. Byte[] key, byte[] value

In both cases, byte[] is typically a String, but can be any other serializable thing; for example, ints and longs could be trivially converted to 4/8 byte keys. Separate variant for variable-length integers is further optimization for my initial use case, where values are generally small ints, which can fit in a single byte, but where it is convenient not to be bound by arbitrary limits (like 256).

Internally structure is actually just a regular byte array (or, alternatively, ByteBuffer, to allow for "GC-free" tries!). Data format needs to be documented, but is a rather simple implementation of basic trie ideas, nothing to write a dissertation about.

Why does it Rock?!

Given how good plain old JDK HashMap can be -- you could just use HashMap<String,Integer> for the first case -- what is so special about tr13?

Simply put: memory usage, or lack thereof (ditto for GC activity). This because:

  1. Overall memory usage is pretty tight -- for test data I have (30 million entries of typically 13-byte key, small integer number == 600 meg input file), trie structure actually uses less memory than its file representation (by about 35% less). And,
  2. Since it is all in just TWO objects (raw byte array, wrapper Trie object), it should keep GC rather happy.

This is not coincidental, since the use case we have is that of sizable routing tabls that should ideally be kept fully in-memory. Hence compactness of data structure, with reasonable lookup times (i.e. no linear search), was design criteria. Even other somewhat frugal data structures (like b-trees) are not anywhere as space efficient.

Downsides?

First obvious limitation is that it is a read-only data structure: you must have all the data at hand when building it. Data also has to be ordered, not necessarily alphabetically (although that is the usual way), but in a stable lexicographic ordering, such that build process has invariants it needs. This is not as big a limitation as it might seem, at least when merging functionality (more on this later on) is added. For now it may be necessary to re-build data structure regularly, and keep an additional "short-term" lookup data structure at hand for additions that are needed between re-builds.

As to performancem resulting data structure is not the fastest way to find things: from complexity standpoint it is a linear algorithm, compared to theoretically constant time for hash-based alternatives. This may or may not matter; it should not matter a lot with respect to key length (HashMap has to calculate hash for key and compare key bytes, unless keys can be canonicalized). Another potential performance concern is that trie data structure does not have much locality; but this too is unlikely to be much worse than for typical alternatives. In fact, due to high memory efficiency, and use of plain byte array, it is not guaranteed that HashMap has any better locality, even if a single lookup results in a match (this due Java GC's habit of "shuffling" object data around).

All in all, Tr13 does much more work when finding values, since it traverses data structure once per each key byte (except for Patricia-optimized tail, which may be multi-byte linear match). So I would not expect it to be as fast as direct hash lookups.

But in the end, these are speculations, and the real test would be use.

... or, how about a performance test?

Performance Testing

Ok: the question of efficiency is something worth looking deeper into. And so I wrote a benchmark that compares VIntTrieLookup (byte[]-to-vint trie) against HashMap<String,Integer> (for curious, it's under 'src/java/test/manual/' in source tree). Test loads all the entries in memory, and then performs randomized lookups (one lookup for every key, but order shuffled so as to be different from 'natural' ordering of trie).

When given a subset of the whole data file (turns out HashMap hogs so much memory that I just can't test the whole data set!) -- a 32 meg, 2 million entry chunk -- here are the results:

Memory usage:

  1. Tr13: 20.6 megs (reduction of about one third)
  2. HashMap<String,Integer>: 314 megs

So in this case, HashMap<String,Integer> uses about 15 times more memory than Tr13 (note: Integer values are canonicalized using Integer.valueOf -- without this, usage would be 360 megs).

And what about performance? What is the cost of compactness?

For 3 million lookups, time taken was:

  1. Tr13: about 5000 milliseconds (about 400,000 lookups per second)
  2. HashMap<String,Integer>: about 1200 milliseconds (about 1,670,000 lookups per second)

so HashMap is about 4 times faster than Tr13 for lookups, on my system (2.33 ghz mini-mac, JDK 1.6), for given data set.

Since my first need is to be able to do lookups in memory, performance difference is not of particular importance (there might be one or two lookups per request, so anything in tens of thousands per second would work), but it is good to know rough order of magnitude difference.

What next?

At this point Tr13 should be ready for use, and I plan to release version 0.5 soon. The only thing I am waiting for is to get a Maven repository set up at Sonatype OSS repository (kudos to Sonatype for this free hosting!), so that Maven-based (and I assume, Ivy-based) projects can use it. This is actually what I need for my daytime use.

But beyond Maven deployment, there are some additional features I want to implement in near future, such as:

  • Ability to traverse through entries (Map.Entry<> style)
  • Ability to merge two tries efficiently -- could be built quite easily given iterator functionality
  • ... document the data structure (ok ok, not a development task, but "hard" (read: tedious) yet necessary work) -- it is very simple, but not so simple that reading code would make it obvious.
  • Current implementations are limited to 2 gigs, due to Java byte[] and ByteBuffer limitations. But fundamentally there is nothing particularly hard about writing segmented versions where only segments would have this limitation (well, probably values too, but I don't see any need for 2gb+ values in my tries...)

Merging functionality specifically would be useful for "semi-mutable" lookup data structures. For data that evolves slowly (which is the case for routing table use case I have), it is possible to keep track of needed modifications (in a, say, HashMap), and do primary lookup against that data structure, and only if no match is found, against immutable trie. And when this short-term data structure goes enough, or enough time passes, trie could be re-built with existing and new data. Repeat and rinse.

And perhaps it might even be possible to work some more on fundamental data structure itself, and find ways to make it mutable. Insertions (and deletion) operations would be rather costly, due to global changes needed (another downside of compactness is that changes tend to spill far). But even slow mutability would be better than no mutability; and perhaps there could be ways to trade some space (intentional unused regions within trie, to be used for insertions?) for lesser likelihood of having move all the data around.

Wednesday, April 28, 2010

Making Java Reflection field access 40 times faster

Think it can not be done? Think again!

Here is a useful nugget of information that should be common knowledge among performance-conscious Java developers:but based on code I've seen in the wild it might not yet be, hence this entry.

Basically, if you just do this:

  Field f = MyBean.class.getDeclaredField("id");
  f.set(beanInstance, valueForField);

performance stinks -- setting value is about hundreds of times slower than direct field access. But if you add this little call:

  f.setAccessible(true);

in there, overhead is drastically reduced. In a simple micro-benchmark running on my old desktop speed difference is 40x, and while YMMV, it will be an order of magnitude or more and this ratio does not seem to improve with new JVMs (has been steady with 1.4, 1.5 and 1.6). This actually takes overhead down enough so as to make reflection-based access "fast enough" for most use cases: Jackson for example just uses it and does not bother with byte-code generation to create most optimal access (but who knows? maybe it will one day).

So what is the difference? Access checks that would be done by default are heavy-weight, as can be seen from profiler stack traces and JDK sources. And all setAccessible() method does is set a boolean flag that will by-pass these checks. Works for Methods and Constructors as well, although with slightly less pronounced results.

Anyway, thought others might find this useful.

Sunday, March 21, 2010

Naively Simple, and Very Powerful: WebSockets!

I don't know how I have managed to miss the next "Simple Big Thing", but I noticed that something called WebSockets has been around for months. And yet I only now bumped into (via announcement for Jetty 8, which boasts support for Servlet 3 API as well as WebSockets). Better late than never I guess.

The idea with WS is very simple, and even somewhat elegant: start up with a regular HTTP connect with hand shake, and then convert it to what amounts to a socket. And use simple properties of UTF-8 encoding to provide simple and inexpensive framing mechanism. Start and end markers used are bytes 0x00 and 0xFF which can not be included in UTF-8 encoded textual data (along with couple of other bytes); and this mostly guarantees that you can nicely included structured textual data using XML or JSON. In fact, elegance of the solution reminds me of UTF-8 encoding which also is surprisingly well designed and simple. In fact, ability of WebSockets to do what it is doing is yet another testament for pure goodness of UTF-8 encoding.

Anyway, this is just a heads up, just in case some of you may have missed this important development. Believe me, this is going to be a big thing very soon. Comet begone; WebSockets is the way forward for efficient low-latency stateful interaction!

Thursday, December 17, 2009

On good, efficient data formats

There are 2 fairly recent additions to category of "good binary data formats that work nice with Java" category: Avro and Kryo. I have meant to write something about both for a while.

1. Avro

Avro is a simple and efficient general-purpose data format, developed as part of Hadoop project. Due to its background, it should work very well with Hadoop (and map/reduce systems in general). It is also quite similar to what I had been thinking of implementing for my own (well, my employer's, rather) large-scale data processing needs, when there was no Avro. From my shallow understanding of Avro, it seems to nicely fit the bill for data format for huge sequence of records; but with self-describing property that is sadly lacking from other contestants like Google's protobuf.

I will hopefully have some more tangible notes to add in future: for now it's enough to note that Avro's performance seems to be pretty good, at least in the "thrift-protobuf" benchmark.

2. Kryo

Strictly speaking Kryo is not a data format, but rather Java object serialization framework that happens to define a data format to handle its main task. But since sending POJOs back and forth over the wire is a very common (perhaps the most common) task for data formats, in Java world, this is not a big difference.

First thing I noticed was its good performance (on above-mentioned benchmark). This is nice, since there is already JDK default serialization that performs adequately for most tasks; so anything else that does binary serialization should be able to meet and beat that performance baseline, to be of interest. But as importantly, API seems straight-forward, simple, and adequately customizable.
So I have reasonably high expectations for this library -- it could be nice complement to something like, say, Dirmi (RMI alternative for JDK default one -- should be coupled with alternative, similarly improved, serialization mechanism, n'est pas?).

3. Disclaimer

Alas, I have not made up compelling use case for using these two projects, yet. But given promise they hold, I should be able to test them out come next year.

Tuesday, December 08, 2009

JSON data binding performance (again!): Jackson / Google-gson / JSON Tools... and FlexJSON too

(note: this is a follow-up on an earlier measurements)

1. A New Contestant: FlexJson

After realizing that FlexJson is actually capable of both serialization and deserialization (somehow I thought it would only serialize things), I decided to add it as the fourth contestant in the "full service Java/JSON data binding" category of tests.

Initially I was bit discouraged to find that it makes one rookie mistake: assumes that somehow JSON comes in (and goes out) as Java Strings. But aside from this glitch, package actually looks quite solid -- and its exclusion/inclusion mechanism looks interesting. Maybe not exactly my cup of joe (if it was, after all, Jackson API would look more like it does), but a viable alternative. And I can see how ability to prevent deep copy would come in handy sometimes. And finally, some of the features actually exceed what Jackson can currently do, regarding polymorphic deserialization (since FJ includes class name by default, I assume it can do it) and some level of cyclic-dependency handling (ignoring serialization of cyclic references at least).

So let's see how "rookie" (yes, I know, it's not exactly a new package, just new addition to the test) fares...

2. Test setup

Tests are run using nice Japex performance test framework, running on my somewhat old AMD work station (~1700 Ghz Athlon -- someone needs to click on those right-hand-side ads to get me a new performance-testing work station! :-) ).

Input data used consists of serialization of tabular data (database dump, good old "db100.xml" used by countless xml tests), converted to Java POJOs, and then to individual data formats (here as JSON, but can be tested as XML and whatnot). Document size is 20k in XML, and slightly less in JSON (about 16k). It would be easy to run using other data sets, but in the past, performance ratios for 2k, 20k and 200k documents have not had radical differences, so 20k one seems like a reasonable choice (but note that the earlier benchmark did in fact use 2k documents, so actual numbers do differ).

Test project itself, "StaxBind" is still in Woodstox SVN repository, accessible via Codehaus SVN page. (one of these days I should just create a Github project -- but not today).

Versions of JSON processing packages are as follows:

  • Jackson 1.2.0
  • Google-gson 1.4
  • Json-tools-core 1.7
  • Flexjson-1.9.1

Code for each library is using default settings, and using what appears as the most efficient interface, for cases where transformations are from byte streams on server side (byte streams in, byte streams out).

3. Results

First things first: here's the money shot:

Data Binding Performance Graph

(or check out the for details)

Another way to represent results is by showing performance ratios, using the slowest implementation as base line (TPS == transactions per second; number of times a 20k document is read, written, or both):

(note: Jackson/manual is omitted since it is hand-written (if simple) serializer/deserializer, and there are no direct counterparts for other packages -- while it would give even bigger faster-than-thou ratio, it wouldn't be a fair comparison)

Impl Read (TPS) Write (TPS) Read+Write (TPS) R+W, times baseline
Jackson (automatic) 1599.272 2463.097 1033.809 25.6
FlexJson 125.277 125.277 94.904 2.35
Json-tools 94.051 126.954 49.008 1.2
GSON 56.58 112.455 40.38 1

So looks like our "new kid on the block" manages to outperform the other two non-Jackson JSON processors here. And at least get within an order-of-magnitude with Jackson... :-)

4. Musings

So it turns out that despite its interfacing (those String/byte conversions), Flexjson package manages to work more efficiently than some other packages that claim "simplicity and performance". And this without actually claiming to be particularly performant, but rather focusing on design of API and ease-of-use aspects. Pretty neat, I respect that.

5. Next?

My current main interest (with respect to performance issues) lie in the area of compressing data for transfer: after all, most of the time there is relative abundance of CPU power compared to available network and I/O bandwidth. This means that trading some CPU (needed for compression and decompression) seems like a bargain for many use cases.

But on the other hand,as we saw earlier, the question is "how much is too much". And that's where my new favorite simple-and-fast algorithm, LZF, comes in. But that's a different story.

Wednesday, November 11, 2009

Just like Java RMI, Just Better: Dirmi

Here is another new interesting new hopefully useful project for developers of distributed Java systems: Dirmi. It is essentially a replacement and upgrade for plain old Java RMI, and both addresses most existing issues with vanilla RMI and extends set of functionality. And does all of that with a few usability improvements. Sounds pretty good to me.

So why do I think this might be a very good replacement? Beyond reading its feature set, I have not yet really used it. But I have confidence knowing its author, who has written such solid packages as Carbonado (ORM to use with BDB, amongst other backends) as Cojen (code generator). I hope Dirmi will get bit more exposure in near future (as well as Carbonado that seems to be somewhat of a well-kept secret and deserves to be more widely known). I will try to write a follow-up if and when I get to play with Dirmi a bit.

Wednesday, September 23, 2009

JSON data binding performance: Jackson vs Google-gson vs BerliOS JSON Tools

UPDATED: see a more up-to-date version here

Earlier I have published some results on performance of "simple" JSON parsing -- simple meaning that processing is manual, to allow for processing JSON using wide variety of Java+JSON tools available. This includes processors from ultra-fast streaming processors (like Jackson) all the way to "good old JSON.org" parser. But it also excluded at least one potentially good tool (google-gson), since it requires "untyped" access, ability to traverse arbitrary JSON structure for testing.

Also: more and more access is nowadays done using a more convenient class of tools, called data binding (or mapping; or sometimes serialization) tools (libraries, packages). In such cases application just asks library to convert JSON to a Java Object (or vice versa), and that's about it. Very convenient; especially for strongly typed web services.

So, with that background, let's see what are performance characteristics of available tools.

1. JSON Data Binding: Contestants

Now, list of tools that allow doing is somewhat limited: I am aware of following:

Given that all of them can do conversions with similar ease (at least for simple Java types), is there much difference in performance? To figure this out, I will be using somewhat incorrectly named StaxBind (really, it should be renamed PojoBind or something) sub-project of Woodstox. Data to bind is a simple rendition of tabular data, with List of beans that contain personal information (name, address and so on); document size (for this test) being about 2 kilobytes.

2. Results!

And yes, indeed, results look vaguely familiar (see here, for example). Considering the "bigger is better" aspect -- value measured, "tps", is number of documents read, written, or read-modify-written per second -- difference from slowest (google-gson) to fastest (Jackson) is a solid order of magnitude.

Data Binding Performance Graph

Looks like Jackson still the King of JSON, regarding processing speed -- and by ridiculously high margin too... If you are already a Jackson user, you may want to congratulate yourself on choosing a very efficient (even green! save those cycles!) tool. A pat on your back might be warranted as well. To put performance in perspective; being able to read ten thousand 2k documents per second (throughput of about 20 megabytes per second), on an almost obsolete AMD Athlon based PC (my home PC) is not too shabby; and all this without little if any glue code.

Actually, as you can see, there is one (and only one!) thing faster than Jackson Data Mapper: "raw" hand-written data mapper. And even that is just a bit faster; probably only worth the extra hand-written code for high-volume use cases, or where number of POJO types is very limited.

3. Some details

Given the big difference in perceived performance, avid readers might be interested in reproducing results, or at least perusing source code. All code is within "staxbind" module in the primary Codehaus Woodstox SVN repository., and author (me!) can be contacted for more details (for some reason Codehaus interface makes access sometimes bit harder than needs be), questions and suggestions.

But there is nothing particularly complicated about code; here's how core methods for tested packages actually look like (interfaces are defined by StaxBind package itself; template T translates to "DbData" (POJO type)).

3.1 Jackson test code

Jackson code is simplest of alternatives, as it supports direct streaming access

public class StdJacksonConverter extends StdConverter
{
ObjectMapper mapper = new ObjectMapper();
//...
public T readData(InputStream in) throws IOException {
return _mapper.readValue(in, _itemClass);
}    
public int writeData(OutputStream out, T data) throws Exception {
JsonGenerator jg = _jsonFactory.createJsonGenerator(out, JsonEncoding.UTF8);
_mapper.writeValue(jg, data);
jg.close();
return -1;
}
}  

3.2 Json-tools test code

Test code here needs a couple of more lines, since there is no way to directly go from POJOs to stream/String and back. But nothing excessive.

public class StdJsonToolsConverter extends StdConverter
{
final JSONMapper _mapper = new JSONMapper();
//...
public T readData(InputStream in) throws Exception {
// two-step process: parse to JSON value, bind to POJO
JSONParser jp = new JSONParser(in);
JSONValue v = jp.nextValue();
return (T) _mapper.toJava(v, _itemClass);
}
public int writeData(OutputStream out, T data) throws Exception {
JSONValue v = _mapper.toJSON(data);
String jsonStr = v.render(false);
OutputStreamWriter w = new OutputStreamWriter(out, "UTF-8");
w.write(jsonStr);
w.flush();
return -1;
}
}

3.3 Google-gson test code

This test code is bit shorter than Json-tools one, since package does not use intermediate tree form. Surprisingly this does not seem to translate to better performance, as the package ends up taking its time doing conversions. On positive note, there should be plenty of room for improvement in this area...

public class StdGsonConverter extends StdConverter
{
final Gson _gson = new Gson();

public T readData(InputStream in) throws IOException {
return _gson.fromJson(new InputStreamReader(in, "UTF-8"), _itemClass);
}

public int writeData(OutputStream out, T data) throws Exception {
OutputStreamWriter w = new OutputStreamWriter(out, "UTF-8");
this._gson.toJson(data, w);
w.flush();
return -1;
}
}

Monday, May 11, 2009

Jackson JSON-processor turns 1.0.0

Ok: it is now official: the official Jackson JSON-processor version 1.0.0 has just been released. Get it while it's Hot!

Wednesday, May 06, 2009

json+gzip nicely packed, but has it Got Speed?

One commonly occuring them on discussions on merits (or lack thereof) is the question "but does the size matter". That is: while textual formats are verbose, they can be efficiently compressed using common every day algorithms like Deflate (compression algorithm that gzip uses). From information theory standpoint, equivalent information should compress to same size -- if one had optimal (from information theory POV) compressor -- regardless of how big the uncompressed message is. And this is quite apparent if you actual test it out in practice: even if message sizes between, say, xml, json and binary xml (such as Fast Infoset) vary a lot, gzipping each gives rougly same compressed file size.

But what is less often measured is how much actual overhead does compression incur; especially relative to other encoding/decoding and parsing/serializing overhead. Given all advances in parsing techniques and parser implementations, this can be significant overhead: compression is much more heavy-weight process than regular streaming parsing; and even decompression has its costs, especially for non-byte-aligned formats.

So: I decided to check "cost of gzipping" with Jackson-based json processing. Using the same test suite as my earlier JSON performance benchmarks, I got following results.

First, processing small (1.4k) messages (database dumps) gives us following results:
(full results here)

and medium sized (16k): (full results)

(just to save time -- results using bigger files gave very similar results as medium ones, regading processing speed)

So what is the verdict?

1. Yes, redundancies are compressed away by gzip

Hardly surprising is the fact that JSON messages in this test compressed very nicely -- result data (converted from ubiquitous "db10.xml" etc test data) is highly redundant, and thereby highly compressible.

And even for less optimal cases, just gzipping generally reduces message sizes by at least 50%; similar to compression ratios for normal text files. This is usually slightly better than what binary formats achieves; oftentimes even including binary formats that omit some of non-redundant data (like Google Protococol Buffers which, for example, requires schema to contain field names and does not include this metadata in message itself).

2. Overhead is significant, 3x-4x for reading, 4 - 6x for writing

But it all comes at high cost: overhead is highest for smallest messages, due to significant fixed initialization overhead cost (buffer allocations, construction of huffman tables etc). But even for larger files, reading takes about three times as long as without compression, if we ignore possible reading speed improvements due to reduced size. And the real killer is writing side: compression is the bottleneck, and you'll be lucky if it takes less than five times as long as writing regular uncompressed data.

3. Is it worth it?

Depends: how much is your bandwidth (or storage space) worth, relative to CPU cycles your programe spends?
For optimal speed, trade-off does not seem worth it, but for distributed systems costs may be more in networking/storage side, and if so compression may still pay off. Especially so for large-scale distributed data crunching, like doing big Map/Reduce (Hadoop) runs.

Or how about this: for "small" message (1.4k uncompressed), you can STILL read 22,000, write 12,000, or read+write 8,000 messages PER SECOND (per CPU). That is, what, about 7900 messages more processed per second than what your database can deal with, in all likelihood. Without compression, you could process perhaps 14,000 more messages for which no work could be done due to contention at DB server, or some other external service... speed only matters if the road is clear.

Yes, it may well make sense even if it costs quite a bit. :-)

4. How about XML?

If I have time, I would like to verify how XML+GZIP combination fares: I would expect same ratios to apply to xml as well. The only difference should be that due to somewhat higher basic overhead, relative additional overhead should be just slightly lower. But only slightly.

Related Blogs

(by Author (topics))

Powered By

Powered by Thingamablog,
Blogger Templates and Discus comments.

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.