Thursday, October 14, 2010

More on Java UUID Generator (JUG), a word on performance

Ok, so Java UUID Generator version 3 is out. Beyond question of what is new with this version (which I outlined in the previous entry), the next asked was "how is performance?".
My first thought was that it should be about the same as what version 2.0 has; but after some more thinking I realized that this is not a very useful answer to give.

So I wrote a simple micro-benchmark to find out (it's in 'src/main/java/perf/') more about performance aspects. Test uses three main generation methods (random-, time- and name-based) for JUG, and matching alternatives that java.util.UUID has (random- and limited named-based variant).

1. Performance of UUID generation, general observations

High-level summary would be:

  1. Performance between equivalent methods between JDK and JUG have similar performance (no surprise) -- JDK just does not provide time-based variant (one most commonly used by non-Java systems); nor ability to configure Random-based variant
  2. Performance between three main generation types varies a lot.

2. Actual numbers

So how big are differences? Here is actual snapshot from running the test on my MacBook: each run generates 250,000 UUIDs, and time taken (after test ran for about a minute) is reported as milliseconds:

  Test 'JDK, random' -> 974 msecs
  Test 'JDK, name' -> 230 msecs
  Test 'Jug, SecureRandom' -> 971 msecs
  Test 'Jug, java.util.Random' -> 33 msecs
  Test 'Jug, time-based' -> 39 msecs 
  Test 'Jug, name-based' -> 256 msecs

Since these values fluctuate a bit, exact values are not as important as ratios. Random-based variants are slowest for both implementations (except for case of using 'java.util.Random' -- more on this in a bit); name-hash (MD5 by default; SHA-1 available with Jug too) methods are bit faster; and time/location-based variant is rather fast, close to 50% of maximum theoretical per-node generation rate.

So generation rates for this system would be:

  • Random-based method (using SecureRandom) generates about 250k UUIDs per second
  • Name-based variant (MD5) without namespace is almost 4 times as fast, generating about 1 million UUIDs per second
  • Time-based variant (Ethernet address plus timestamp) is much faster -- almost 20 times as fast as Random-based default variant -- generating about 5 million UUIDs per second. This is also close to theoretical maximum rate for the method, which is 10 million UUIDs per second (sustained), because timestamp granularity (100 nanoseconds) prevents higher throughput (to work around this, one can generate parallel generators using different Ethernet address)

Quite a difference! It is worth noting, too, that speed of name-based method is related to both existence of namespace (this test did not use one; adding one slows things down slightly) and length of name used; latter because hash is calculated on name used. And finally, hash method used also matters (SHA-1 is slower than MD5).

But what is up with the random variant? And didn't I forget one impressive number from my analysis above?

3. Why is Random-based generation so slow?

Constructing "secure" pseudo-random numbers is not easy, and this is the main reason for slowness. Looking at stack traces, it looks like JDK actually uses random device provided by the underlying operating system; and this is part of the slowness (due to use of JNI). But generation itself is fundamentally slow, even if no system calls were needed. It is quite possible that faster machines might not even speed up performance significantly.

4. Couldn't we just use 'java.util.Random' to speed things up?

But if the "secure" (i.e. "good", meaning one providing good level of randomness) variant is slow, couldn't we use a faster one? Like "java.util.Random"? In fact, didn't numbers above show using it resulting in version as fast as using time-based variant?

Yes, this is possible: and JUG allows you to pass any java.util.Random instance you want (it just defaults to SecureRandom).
But speed is not everything: the main point of UUIDs is the uniqueness, and this is why SecureRandom is used as the default by both JUG and JDK. And while the default pseudo-random number generator that java.util.Random uses is fine for many uses, I am not sure I would trust its simple algorithm for generating UUIDs; even if initialized using system clock (System.currentTimeMillis()). The underlying algorithm just does not have any actual source of randomness beyond initial seed value and I do not trust that its random number distribution, then, is good enough to give strong guarantees of uniqueness (I would love pointers to someone discussing actual measurements or research on this!).

So to me downgrading to using simple text-book pseudo-number algorithms does not seem tempting, even if it speeds things up by an order of magnitude.

5. So what would make more sense?

Well, looking at numbers, how about consider time-based variant? Name-based variant is not quite as useful; not so much due to performance aspects, but since it actually requires you to hand source of uniqueness (name to hash).

6. Thoughts on relevance of above

I was bit surprised by how big the difference actually is. But whether it really matters depends on use case -- oftentimes you don't need even thousands of UUIDs per second (most services can not really serve even one thousands requests per second), so the rate of a quarter million per second is just fine. After all, it only takes about four microseconds (on average) to generate one UUID.

But there are use cases where this matters more; especially if UUIDs are used liberally, for example using UUIDs as identifiers for all business objects generated during processing.
So if your use case truly depends on (or at least benefits from) fast generation of UUIDs, you probably want to use time-based generation method from Jug.

7. How did I do that again?

Ok; since I haven't yet gotten Javadocs set up, here is the way to generate time-based UUIDs.

  // need to pass Ethernet address; can either use real one (shown here)
  EthernetAddress nic = EthernetAddress.fromInterface();
  // or bogus which would be gotten with: EthernetAddress.constructMulticastAddress()
  TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(nic);
  // also: we don't specify synchronizer, getting an intra-JVM syncer; there is
  // also external file-locking-based synchronizer if multiple JVMs run JUG
  UUID uuid = uuidGenerator.generate();

And that's it. Other variants are generated similarly, by creating generator instance with Generators factory.

blog comments powered by Disqus

Sponsored By

Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me
Check my profile to learn more.