The purpose of this exercise is to explore the risks of writing racy code. As such it consists mainly of programs that you should execute on your machine and observe the outcome you get: in some cases the outcomes will seem to contradict the intuitive / expected semantics of the program. These are due to the absence of correct synchronisation between the memory accesses performed by different threads (that is, to data-races). In general bugs due to data-races are extremely difficult to catch, debug, and fix: in your practice of programming you should always guarantee a correct synchronisation between threads.
The code for this lab is available here.
Our (unambitious) goal is to implement correctly mutual exclusion using the classic Dekker algorithm. Our test infrastructure is thus represented by a counter that gets concurrently updated by two threads:
class UnsafeCounter { private int count; void increment () { count ++; } int get () { return count; } } class DekkerCounterTest { final static int STEPS = 5000000; static int i = 0; static UnsafeCounter c = new UnsafeCounter (); // static Dekker m = new Dekker(); public static void main (String[] args) throws InterruptedException { System.out.println("Dekker Counter Demo"); // Spawn THREADS threads... Thread[] thread = new Thread [2]; for (i = 0; i < 2; i++) { thread[i] = new Thread (new Runnable () { int tid = i; public void run () { System.out.println("Running: "+tid); // ...each thread increments the counter STEP times for (int s = 0; s < STEPS; s++) { // m.Pmutex(tid); c.increment(); // m.Vmutex(tid); } } }); thread[i].start(); } // Wait for the threads to terminate for (int i = 0; i < 2; i++) thread[i].join(); // Print the result. System.out.println("The final value of the counter is " + c.get() + "."); } }
Unsurprisingly, if we execute this code, the final state of
the counter is different from 10000000
:
Dekker Counter Demo Running: 0 Running: 1 The final value of the counter is 5882645.
Question: why "unsurprisingly"?
We take the definition of the Dekker mutual exclusion algorithm from a reference book (where it comes with a proof of correctness):
class Dekker { public Dekker () { flag[0] = false; flag[1] = false; turn = 0; } public void Pmutex(int t) { int other; other = 1-t; flag[t] = true; while (flag[other] == true) { if (turn == other) { flag[t] = false; while (turn == other) ; flag[t] = true; } } } public void Vmutex(int t) { turn = 1-t; flag[t] = false; } private int turn; private boolean [] flag = new boolean [2]; }
We uncomment the creation of an instance of
the Dekker
class and the calls to m.Pmutex
and m.Vmutex
in DekkerCounterTest
: the
accesses to the counter are now protected and we expect to observe
the output 10000000
.
Question: Do we?
However, surprisingly on my dual-core laptop...
Dekker Counter Demo Running: 0 Running: 1 [..code does not terminate...]
Question: Can you reproduce the above behaviour on your machine? Do you observe a different surprising behaviour? Or you observe the correct outcome?
To understand why the code above did not terminate, we focus on the
synchronisation pattern of the
turn
variable which implement busy waiting using this pattern:
if (condition) while (condition) ;
Consider the program below that puts at work the pattern above.
public class Synch extends Thread { int val; boolean condition = true; public void run() { if (condition) while (condition) { val=val+1; } } public static void main(String[] args) throws Exception { Synch s=new Synch(); s.start(); Thread.sleep(1000); s.condition=false; System.out.println(s.val); System.out.println(s.val); s.join(); } }
Question: What should this program print? Should it terminate? Why?
Question: What does this program print? Does it terminate?
To understand what happens we need a digression on what happens
under the hood when we run a Java program. First, the program
is compiled by the Java compiler (which can be manually
invoked with the javac command) into Java bytecode. The
Java bytecode is a platform-independent instruction set that
is interpreted by the Java Virtual Machine (which can be
manually invoked on a bytecode file with the java command).
(These two operations are performed silently by eclipse
whenever we execute a program.) The bytecode for
the run
function above is:
public void run(); Code: 0: aload_0 // put the object 'this' on the stack 1: getfield #2 // get the field 'condition' 4: ifeq 27 // jump to 27 if equal to 0 7: aload_0 // ... as above (for the internal loop)... 8: getfield #2 11: ifeq 27 14: aload_0 15: aload_0 16: getfield #3 // get the field 'val' 19: iconst_1 // put the constant 1 on the stack 20: iadd // add them 21: putfield #3 // store the result in 'val' 24: goto 7 // goto 7 (loop) 27: return // return
At runtime the JVM (shorthand for the Java Virtual Machine) not only interprets the code above, but also gathers statistics about which parts of the code are executed frequently. For these the JVM can decide to launch the Just in Time Compiler, which attempts to generate optimised native code for the so-called hot code paths. Here is an extract of the JITted assembly generated by Java HotSpot 1.8.0_25 on my Intel MacPro laptop:
movzbl 0x17c(%rsi),%r11d // read 'condition' test %r11d,%r11d // is it 0? je exit // if yes, jump to 'exit' mov 0x178(%rsi),%r11d // read 'val' into register 11 loop: inc %r11d // increment register 11 mov %r11d,0x178(%rsi) // store register 11 into 'val' jmp loop // jump to 'loop' exit: ...
What happens here is that, since none of the variables has been declared as volatile, the compiler assumes that the condition variable is a constant since no other thread is expected to modify it, and it optimises the method run into:
public void run() { if (condition) while (true) { val=val+1; } }
Observe that the two formulations of run
are equivalent
in a sequential context, but not in a concurrent one as the one above.
To prevent this kind of compiler behaviours, we can declare the
variable condition
as volatile
. In this case the
compiler cannot any longer assume that the
variable condition
will not be modified by a concurrent
thread, and in turn cannot perform the optimisation above.
Question: Declare variable condition
as volatile
in the class Synch
, and observe the
outcome of the program. Is it the expected one?
The above experience gives us an hint to fix our implementation of
the Dekker algorithm: if we declare
turn
as volatile
the code does should not deadlock
anymore.
Question: Declare the variable turn
as volatile
in the class Dekker
, and observe the
outcome of the program. Is it the expected one?
On my laptop the code terminates, but not with the correct value:
Dekker Counter Demo Running: 1 Running: 0 The final value of the counter is 9971979.
Our implementation of the Dekker algorithm exhibits
another race condition on the flag
array, which results in
a similarly unpredictable outcome. However declaring
flag
as volatile
does not solve the issue
(you can try it). This is because volatile boolean
[]
declares flag
to be a
volatile reference to a non-volatile array, and we end up
with race conditions on the on the entries of the array (which
themselves are not volatile). The solution is to rely
on Atomic
objects, in this case on the
AtomicIntegerArray
:
import java.util.concurrent.atomic.AtomicIntegerArray; class Dekker { public Dekker () { flag.set(0,0); flag.set(1,0); turn = 0; } public void Pmutex(int t) { int other; other = 1-t; flag.set(t,1); while (flag.get(other) == 1) { if (turn == other) { flag.set(t,0); while (turn == other) ; flag.set(t,1); } } } public void Vmutex(int t) { turn = 1-t; flag.set(t,0); } private volatile int turn; private AtomicIntegerArray flag = new AtomicIntegerArray(2); }
At last our Dekker implementation is race free, and the test program produces the expected value on my laptop:
Dekker Counter Demo Running: 0 Running: 1 The final value of the counter is 10000000.
Question: Modify Dekker.java
accordingly and observe the result.
Final remark: this exercise has highlighted some of the dangers encountered writing low-level racy-code: getting the synchronisation correct in this context is an hard task even for expert programmers. However Java provides many high-level programming constructs that abstract this complexity away: we will study these in the rest of this course.
Study and tun the code below and (attempt to) explain the observed behaviour.
Hint: "global code motion" is an optimisation that tries to
schedule instructions as late as possible, as long as dependencies
allow it. In this case, in Thread 1, the write to y
cannot be scheduled outside
the loop, but the write to x
can be moved freely inside
the method.
public class Bonus { static int x; static int y; public static void main(String[] args) { try { for (int j = 0; j < 10; j++) { x = 0; y = 0; Thread t1 = new Thread() { public void run() { x = 1; for (int i = 0; i < 10000000; i++) { y = 1; } } }; Thread t2 = new Thread() { public void run() { int ly = y; int lx = x; // Under SC, if we see y == 1 here, we should see x == 1. System.out.println("Thread 2: y = " + ly + ", x = " + lx); } }; t1.start(); t2.start(); t1.join(); t2.join(); } } catch (InterruptedException e) { } } }