Monthly Archives: March 2014

Java’s CountDownLatch JavaDoc is flawed

The first example of usage provided in the CountDownLatch JavaDoc might give you the idea that the example work to synchronize the simultaneous start of multiple threads. At least I thought so until a concurrency test I was working on failed. Closer examination revealed that almost all threads actually missed the start completely. A lot of my concurrency testing involves minimizing phase shifting and do as much preparation as possible in a desperate attempt to avoid parallel execution in theory turning into serial execution in practice. When all is done I seek to maximize the parallelism by a simultaneous start of all worker threads. The CountDownLatch JavaDoc give two properties of their example:

The first is a start signal that prevents any worker from proceeding until the driver is ready for them to proceed.

The second is a completion signal that allows the driver to wait until all workers have completed.

These are the two properties you get. One third property is totally left out:

Only if the administration delay is large enough, do worker threads have time enough to be prepared for the start signal.

..meaning that you’re just lucky if not some of your worker threads, if not all of them, miss the start. Could it be knowing about this third property that made Oracle inject a call to doSomethingElse() before firing the start signal? I understand you might not have seen the problem yet so let me rephrase myself a bit and try my best to be clear.

The driver thread, the thread “administrator”, can be really really quick creating his worker threads. So quick in fact that the operating system hasn’t had the time yet to schedule the worker threads for a first run before the driver thread moves on and fires the start signal. Thus the start is not synchronized among the workers. The start signal prevent workers from starting too early, but it does not prevent workers from starting too late. For a small amount of worker threads, you might never see a problem. But it will become one as soon as the size of the thread pool grows a bit.

The threads will always start too late of course, just like the human runners in a real marathon race. But if you’re up to the task of writing a marathon game where each runner is represented by a thread, wouldn’t you want the start to be as fair as possible? Of course the game design should probably be totally reworked, but I think you get the idea.

The Oracle example never address the issue of threads starting too late, but if a synchronous start is important for you, then their example cannot be applied. As my testing has shown, if the driving thread (the coordinating thread that spawn workers) makes no delay, and the worker threads do (just 10 milliseconds in my test), then all worker threads will miss the start. For a marathon, that would be a disaster. One fix could be to use yet a third CountDownLatch. The third latch will synchronize the driver and make him wait for all workers to become prepared before firing the start signal. Another more clean solution is kind of built on the same idea: Instead of setting the count of the start latch to 1, set it to the amount of worker threads + 1 (the driver) and make all threads including the judge count down the latch. Not until all threads has cooperatively reached the starting line will the race begin. Quite simple really. See the example code here.

I’ve built a smallish test framework for this particular example that also demonstrates both solutions. You can find the source code here. It is a Maven project and you can have it run on your machine within minutes. Enjoy!