Exploring the Java Memory Model
Francesco Zappa Nardelli

The purpose of this exercise is to explore the risks of writing racy code. As such it consists mainly of programs that you should execute on your machine and observe the outcome you get: in some cases the outcomes will seem to contradict the intuitive / expected semantics of the program. These are due to the absence of correct synchronisation between the memory accesses performed by different threads (that is, to data-races). In general bugs due to data-races are extremely difficult to catch, debug, and fix: in your practice of programming you should always guarantee a correct synchronisation between threads.

The code for this lab is available here.

A naive implementation of the Dekker algorithm

Our (unambitious) goal is to implement correctly mutual exclusion using the classic Dekker algorithm. Our test infrastructure is thus represented by a counter that gets concurrently updated by two threads:

    class UnsafeCounter {
        private int count;
        void increment () { count ++; }
        int get () { return count; }
    }

    class DekkerCounterTest {
        final static int STEPS = 5000000;

        static int i = 0;

        static UnsafeCounter c = new UnsafeCounter ();

        // static Dekker m = new Dekker();

        public static void main (String[] args) throws InterruptedException {
          System.out.println("Dekker Counter Demo");

          // Spawn THREADS threads...
          Thread[] thread = new Thread [2];
          for (i = 0; i < 2; i++) {
            thread[i] = new Thread (new Runnable () {
              int tid = i;
              public void run () {
                System.out.println("Running: "+tid);
                // ...each thread increments the counter STEP times
                for (int s = 0; s < STEPS; s++) {
                  // m.Pmutex(tid);
                  c.increment();
                  // m.Vmutex(tid);
                }
              }
            });
            thread[i].start();
          }

          // Wait for the threads to terminate
          for (int i = 0; i < 2; i++) thread[i].join();
            // Print the result.
            System.out.println("The final value of the counter is " + c.get() + ".");
        }
    }
    

Unsurprisingly, if we execute this code, the final state of the counter is different from 10000000:

    Dekker Counter Demo
    Running: 0
    Running: 1
    The final value of the counter is 5882645.

Question: why "unsurprisingly"?

We take the definition of the Dekker mutual exclusion algorithm from a reference book (where it comes with a proof of correctness):

class Dekker {
  public Dekker () {
    flag[0] = false; flag[1] = false; turn = 0;
  }

  public void Pmutex(int t) {
    int other;

    other = 1-t;
    flag[t] = true;
    while (flag[other] == true) {
      if (turn == other) {
        flag[t] = false;
        while (turn == other)
          ;
        flag[t] = true;
      }
    }
  }

  public void Vmutex(int t) {
    turn = 1-t;
    flag[t] = false;
  }

  private int turn;
  private boolean [] flag = new boolean [2];
}

We uncomment the creation of an instance of the Dekker class and the calls to m.Pmutex and m.Vmutex in DekkerCounterTest: the accesses to the counter are now protected and we expect to observe the output 10000000.

Question: Do we?

However, surprisingly on my dual-core laptop...

    Dekker Counter Demo
    Running: 0
    Running: 1
    [..code does not terminate...]

Question: Can you reproduce the above behaviour on your machine? Do you observe a different surprising behaviour? Or you observe the correct outcome?

Compiler optimisations and shared memory concurrency

To understand why the code above did not terminate, we focus on the synchronisation pattern of the turn variable which implement busy waiting using this pattern:

    if (condition)
      while (condition)
        ;  

Consider the program below that puts at work the pattern above.

public class Synch extends Thread {
  int val;
  boolean condition = true;

   public void run() {
     if (condition)
       while (condition) {
         val=val+1;
       }
   }

   public static void main(String[] args) throws Exception {
     Synch s=new Synch();
     s.start();
     Thread.sleep(1000);
     s.condition=false;
     System.out.println(s.val);
     System.out.println(s.val);
     s.join();
   }
}

Question: What should this program print? Should it terminate? Why?

Question: What does this program print? Does it terminate?

To understand what happens we need a digression on what happens under the hood when we run a Java program. First, the program is compiled by the Java compiler (which can be manually invoked with the javac command) into Java bytecode. The Java bytecode is a platform-independent instruction set that is interpreted by the Java Virtual Machine (which can be manually invoked on a bytecode file with the java command). (These two operations are performed silently by eclipse whenever we execute a program.) The bytecode for the run function above is:

       public void run();
    Code:
       0: aload_0                           // put the object 'this' on the stack
       1: getfield      #2                  // get the field 'condition'
       4: ifeq          27                  // jump to 27 if equal to 0
       7: aload_0                           // ... as above (for the internal loop)...
       8: getfield      #2
      11: ifeq          27
      14: aload_0
      15: aload_0
      16: getfield      #3                  // get the field 'val'
      19: iconst_1                          // put the constant 1 on the stack
      20: iadd                              // add them
      21: putfield      #3                  // store the result in 'val'
      24: goto          7                   // goto 7 (loop)
      27: return                            // return
      

At runtime the JVM (shorthand for the Java Virtual Machine) not only interprets the code above, but also gathers statistics about which parts of the code are executed frequently. For these the JVM can decide to launch the Just in Time Compiler, which attempts to generate optimised native code for the so-called hot code paths. Here is an extract of the JITted assembly generated by Java HotSpot 1.8.0_25 on my Intel MacPro laptop:

          movzbl 0x17c(%rsi),%r11d           // read 'condition'
          test   %r11d,%r11d                 // is it 0?
          je     exit                        // if yes, jump to 'exit'
          mov    0x178(%rsi),%r11d           // read 'val' into register 11
    loop: inc    %r11d                       // increment register 11
          mov    %r11d,0x178(%rsi)           // store register 11 into 'val'
          jmp    loop                        // jump to 'loop'
    exit: ...  

What happens here is that, since none of the variables has been declared as volatile, the compiler assumes that the condition variable is a constant since no other thread is expected to modify it, and it optimises the method run into:

         public void run() {
          if (condition)
            while (true) {
                val=val+1;
              }
         }

Observe that the two formulations of run are equivalent in a sequential context, but not in a concurrent one as the one above.

To prevent this kind of compiler behaviours, we can declare the variable condition as volatile. In this case the compiler cannot any longer assume that the variable condition will not be modified by a concurrent thread, and in turn cannot perform the optimisation above. Question: Declare variable condition as volatile in the class Synch, and observe the outcome of the program. Is it the expected one?

A correct implementation of Dekker algorithm

The above experience gives us an hint to fix our implementation of the Dekker algorithm: if we declare turn as volatile the code does should not deadlock anymore.

Question: Declare the variable turn as volatile in the class Dekker, and observe the outcome of the program. Is it the expected one?

On my laptop the code terminates, but not with the correct value:

    Dekker Counter Demo
    Running: 1
    Running: 0
    The final value of the counter is 9971979.

Our implementation of the Dekker algorithm exhibits another race condition on the flag array, which results in a similarly unpredictable outcome. However declaring flag as volatile does not solve the issue (you can try it). This is because volatile boolean [] declares flag to be a volatile reference to a non-volatile array, and we end up with race conditions on the on the entries of the array (which themselves are not volatile). The solution is to rely on Atomic objects, in this case on the AtomicIntegerArray:

import java.util.concurrent.atomic.AtomicIntegerArray;

class Dekker {
  public Dekker () {
    flag.set(0,0); flag.set(1,0); turn = 0;
  }

  public void Pmutex(int t) {
    int other;

    other = 1-t;
    flag.set(t,1);
    while (flag.get(other) == 1) {
      if (turn == other) {
        flag.set(t,0);
        while (turn == other)
          ;
        flag.set(t,1);
      }
    }
  }

  public void Vmutex(int t) {
    turn = 1-t;
    flag.set(t,0);
  }

  private volatile int turn;
  private AtomicIntegerArray flag = new AtomicIntegerArray(2);
}
  

At last our Dekker implementation is race free, and the test program produces the expected value on my laptop:

  Dekker Counter Demo
  Running: 0
  Running: 1
  The final value of the counter is 10000000.

Question: Modify Dekker.java accordingly and observe the result.

Final remark: this exercise has highlighted some of the dangers encountered writing low-level racy-code: getting the synchronisation correct in this context is an hard task even for expert programmers. However Java provides many high-level programming constructs that abstract this complexity away: we will study these in the rest of this course.

Bonus exercise

Study and tun the code below and (attempt to) explain the observed behaviour.

Hint: "global code motion" is an optimisation that tries to schedule instructions as late as possible, as long as dependencies allow it. In this case, in Thread 1, the write to y cannot be scheduled outside the loop, but the write to x can be moved freely inside the method.

public class Bonus {
  static int x;
  static int y;
  public static void main(String[] args) {
    try {
      for (int j = 0; j < 10; j++) {
        x = 0;
        y = 0;

        Thread t1 = new Thread() {
           public void run() {
             x = 1;
             for (int i = 0; i < 10000000; i++) {
               y = 1;
             }
           }
        };

        Thread t2 = new Thread() {
           public void run() {
              int ly = y;
              int lx = x;
              // Under SC, if we see y == 1 here, we should see x == 1.
              System.out.println("Thread 2: y = " + ly + ", x = " + lx);
           }
        };

        t1.start();
        t2.start();
        t1.join();
        t2.join();
      }
    } catch (InterruptedException e) { }
  }
}