Monday, April 04, 2011

Introducing "jvm-compressor-benchmark" project

I recently started one new open source project; this time being inspired by success of another OS project I had been involved in, project is "jvm-serializers" benchmark originally started by Eishay and built by a community of java databinder/serializer experts. What has been great with this project has been amount of energy it seemed ot feed back to development of serializers: highly visible competition for best performance seems to have improved efficiency of libraries a lot. I only wish we had historical benchmark data to compare to see exactly how far have the fastest Java serializers come.

Anyway, I figured that there are other groups of libraries where high performance matters, but where there is lack of actual solid benchmarking information. So while there are a few compression performance benchmarks, they are often non-applicable for Java developers: partly because they just compare native compressor codecs, and partly because focus is more often only on space-efficiency (how much compression is achieved) with little consideration of performance of compression. The last part is particularly frustrating as in many use cases there is significant trade-off between space and time efficiency (compression rate vs time used for compression).

So, this is where the new project -- "jvm-compressor-benchmark" -- comes from. I hope it will allow fair comparison of compression codecs available on JVM, to be used by Java and other JVM lagnuages; and also bring in some friendly competition between developers of compression codecs.

First version compares half a dozen of compression formats and codecs, from the venerable deflate/gzip (which offers pretty good compression ratio with decent speed) to higher-compression-but-slower-operation alternatives (bzip2) and lower-compression-but-very-fast alternatives like lzf, quicklz and the new kid on the block, Snappy (via JNI).

And although the best way to evaluate results is to run tests on your machine, using data sets you care about (which I strongly encourage!), Project wiki does have some pretty diagrams for tests run on "standard" data sets gathered from the web.

Anyway: please check the project out -- at the very least it should give you an idea of how many options there are above and beyond basic JDK-provided gzip.

ps. Contributions are obviously also welcome -- anyone willing to tackle Java version of 7-zip's LZMA, for example, would be most welcome!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.