Saturday, October 22, 2011

On prioritizing my Open Source projects, retrospect #3

(note: continuing story, see the previous installment)

1. What was the plan again?

Ok, it has been almost 8 months since the previous priorization overview (plan was to check after 4, but time flies when you are having fun!)
High-level priority list back then had these entries:

  1. Aalto 1.0 (complete async API, impl)
  2. ClassMate 1.0
  3. Java CacheMate, ideally 1.0
  4. Tr13 1.0
  5. Externalized Mr Bean (depending on interest)
  6. Jackson 1.8
  7. Jackson-xml-databinding 1.0
  8. Work on Smile format

2. And how have we done?

This time hit rate was even bit lower (than previous one at 50%), although there was some progress. In fact, had I checked things after 4 months, only one entry would have been completed (Jackson 1.8).

Item by item, we have:

  1. Aalto: modest progress (did write a blog entry on how to use async parsing at least); still need async SAX implementation, no 1.0 (although 0.9.7 was released right after blog entry)
  2. ClassMate: minor fixes, but no 1.0 yet
  3. CacheMate: significant progress (secondary indexes); I now have 1.0 design (for "raw" in-memory), but not yet implemented -- so kind of half-done
  4. Tr13: no progress
  5. Externalized mr Bean: no demand, no progress
  6. Jackson: 1.8 released (and even more, see below)
  7. Deferred: Externalized Mr Bean -- no work done (only some preliminary scoping)
  8. Jackson-xml-databinding: bug fixes, but no 1.0
  9. Smile format: actual progress -- Pierre from Ning implemented libsmile (C), contributed Smile-detection for unix/linux 'file' command

So it's mostly modest progress and misses this time; plan was not really aligned with what was needed. Only 3 entries had significant progress.

What went wrong? Partially it's just that huge popularity of Jackson swept away many of the plans; and conversely, lack of interest in many of the entries held them back.
But additionally, many other things got implemented. So let's look at that aspect next.

3. What was done instead?

Here are things I can remember, in loose work order:

  • LZF compression ("Ning LZF") -- much progress, quite close to 1.0
  • Jackson modules, such as Afterburner and improvements to already existing ones (scala, hibernate) -- although not yet for CSV or Joda modules (which exist in skeletal form)
  • JVM-compressor-benchmark for comparing space/time efficiency of various compressors on JVM, core done (can always add codecs)
  • Low-gc-membuffers, an experimental FIFO for byte[], with native memory buffers
  • Java merge sort (file-backed configurable efficient merge sort) -- mostly done, although not declared 1.0
  • Lzf4Hadoop, Hadoop integration for LZF compression -- basically done
  • New mode for JVM-serializers benchmark, data streams, for more balanced evaluations; implemented most common codecs
  • Jackson 1.9

Quite a list eh? One completely new "branch" of development was related to LZF compression codec. And continue huge demand for all things Jackson also meant that majority of my time was spent on Jackson and its extensions.

3. Updated list

Given recent developments, popular demand, and on-going plans, here is my current thinking of main priorities:

  • Jackson CSV module: I want to add proper Jackson support for CSV, since it it still a very common (and pretty functional!) input data format, and de facto default export format for lots of data sources. And best of all, this can be done without any work on Jackson core
  • CacheMate: I really want to implement secondary caches, and have a reasonable design (in many ways similar to persistence used by Cassandra/BigTable/HBase) on how to go about it
  • Jackson 2.0: move to github, refactor, redesign, remove deprecated things -- major renovation, to lay foundation for longer term 2.x development
  • ClassMate: getting to official 1.0 would be good, as well as writing blog entry or two on actual usage
  • Jackson XML data binding: fix bugs, declare 1.0, easier to market that way. And of course document
  • Ning-compress (LZF) 1.0: already functional, and feature-wise as good as 1.0, but there are couple of optimization tricks (by mr Dain S who ported Snappy to Java) that I'd still like to investigate, before declaring things 1.0

Other interesting things that might get included are:

  • Aalto 1.0: it would be good to sort of declare it done by implementing Async SAX, announcing the first non-beta release
  • Externalized mr Bean (BeanMate?) still looks like a potentially useful thing that others would want to use (this above and beyond basic refactoring that Jackson 2.0 would dictate, i.e. splitting of the jar as first-level new module)
  • Standardization work for Smile?
  • Maybe even design a splittable variant of LZF (Splitty? Splitz?) -- with improved usage of length indicators (VInts), designed so implementation can be even faster than LZF (on par with Snappy java), yet allow splittability which would be very valuable for Map/Reduce tasks

I expect above list to of course have at most 50% success rate, and for other good stuff to be worked on instead. Especially with likely changes to my daytime job, with possibly changing roles at day-to-day work, changes that will likely boost priority of some other open source efforts, reduce that of others.

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.