Tuesday 16 December 2008

Shuffles, Surprises and Scala

Let's say you want to shuffle an array. This C++ code looks about right:

void exchangeShuffle(int *xs, int n) { // *bad*
  for (int i = 0; i < n; ++i) {
    swap(xs[i], xs[rand() % n]);
  }
}

It compiles, it runs and it even seems to do the right thing. It's Miller Time. Now that you can shuffle cards, you're ready to buy stupiditytax.com from the domain parking trolls, start your online gambling site and profit... or not.

It's a well-documented, but perhaps not well-known, fact that exchangeShuffle yields biased shuffles. In an unbiased shuffle, each of the n!=n(n-1)...(2)(1) distinct orderings of n elements is equally probable. Our exchangeShuffle has nn equally probable possible paths of execution, because there are n possible swaps in each iteration, and there are n iterations. It would produce an unbiased shuffle only if each distinct ordering were the result of an equal number of possible paths of execution; but this can't be, because n! is never an integer divisor of nn for n > 21

The Fisher-Yates shuffle is the preferred way to shuffle a list. In C++, this looks like:

void fisherYatesShuffle(int *xs, int n) {
  for (int i = n; i > 1; --i) {
    swap(xs[i-1], xs[rand() % i]);
  }
}

The Fisher-Yates algorithm satisfies our requirement for n! equally likely paths of execution, because it reduces the number of possible swaps by one in each iteration. It produces unbiased shuffles. (Assuming that your random number source is good enough.)

Think carefully; have you written any bad shuffles recently? I'm afraid I have. You might wonder how bad exchangeShuffle really is. There is at least one theoretical study of this, by Goldstein and Moews. They show that, for a reasonably long list, the most likely outcome is that the exchange shuffle does not change the order of the list at all! That's not good news for your gambling site. 

Before I found that paper, I decided to run some experiments. It seemed like a simple simulation + visualization experiment was in order; where did the elements usually end up? I decided to try this in Scala 2, mainly because Scala ties into Java's Swing GUI toolkit, with which visualization is sometimes less painful than usual. I hope the following will make sense, even if you aren't familiar with Scala.

First, we transcribe the shuffles into Scala and put them into our main application object, called Shuffle.

object Shuffle {
  val rand = new scala.util.Random();

  def swap[T](xs: Array[T], i: Int, j: Int) = {
    val t = xs(i)
    xs(i) = xs(j)
    xs(j) = t
  }

  def exchangeShuffle[T](xs: Array[T]) = {
    for (i <- xs.indices)
      swap(xs, i, rand.nextInt(xs.size))
  }
  
  def fisherYatesShuffle[T](xs: Array[T]) = {
    for (i <- xs.indices.reverse)
      swap(xs, i, rand.nextInt(i + 1))
  }

  // ... visualization code follows ...
}

If you're new to Scala, you might find the following tips helpful. First, Scala uses parentheses (xs(i)) rather than brackets (xs[i]) to index arrays 3. Second, rand.nextInt(k) returns a random integer in the range 0,...,k-1. Third, note that these are generic (template) functions; Array[T] denotes an array whose elements are of an arbitrary type T, like Array<T> in C++ or Java.

Next, we'll need a routine to run the shuffling simulations. A higher-order function is called for; the shuffle parameter takes a procedure as its argument.
  
/**
 * m(i)(j) = number of times the element
 *           at i was shuffled to position j
 */
def measure(n: Int, trials: Int,
            shuffle: Array[Int] => Unit):
             Array[Array[Int]] = {
  // Set up result matrix (all zeros).
  val m = new Array[Array[Int]](n, n);
  
  // Run experiments.
  for (trial <- 1 to trials) {
    val xs = (0 until n).toArray
    shuffle(xs)
    for (i <- xs.indices) {
      m(xs(i))(i) += 1;
    }
  }
   
  return m
}

Row i of the resulting matrix is a histogram recording how many times element i was permuted to each possible position j. Now we're ready to draw it; I'll just put the display code in the main method.

def main(args : Array[String]) : Unit = {
  import java.awt._
  import java.awt.event._
  import javax.swing._

  val trials = 1000000
  val n = 20 // number of elements in array
  var m = new Array[Array[Int]](n,n)
    
  val win = new JFrame("Shuffle")
  val px = 400
  val viz = new JComponent() {
    override def paintComponent(g: Graphics) = {
  val square_px = px / n
      val mAvg = trials / n;
      for (i <- 0 until n; j <- 0 until n) {
        val offset = (127.0 * (m(i)(j) - mAvg)) / mAvg
        val gray = 128 + offset.toInt;
       g.setColor(new Color(gray, gray, gray));
       g.fillRect(square_px * j, square_px * i,
          square_px, square_px)
      }
    }
  }
  viz.setPreferredSize(new Dimension(px, px))
  val controls = new JPanel() {
    val doIt = new JButton("Recompute");
    doIt.addActionListener(new ActionListener() {
      override def actionPerformed(e: ActionEvent) = {
        m = measure(n, trials, exchangeShuffle)
        viz.repaint();
      }
    })
    add(doIt, BorderLayout.WEST);
  }
  win.add(viz, BorderLayout.CENTER)
  win.add(controls, BorderLayout.SOUTH)
  win.pack()
  win.setVisible(true)
}

OK, that was still pretty painful. Scala did some type inference for us, which saved us a little bit of typing when we declared our local variables. Unfortunately, this didn't help us with Swing's action listeners. Man, I hate action listeners; if ever there were a killer application for closures, it's handling user interface events. Anyway, here is the result:



Lighter greys indicate more frequent outcomes. For an unbiased shuffle, the picture would be a uniform grey. In simple terms, it looks like an element is more likely to move a few places backward than a few places forward 4. The largest deviations are typically over +/- 25% from the uniform case. There's a good chance that's significant.

I'll finish with an intriguing aside; note that both of these routines are written in an imperative style. Take a moment to think about how you'd write a shuffle in a functional style. It probably wouldn't look much like exchangeShuffle. Here's mine, in Scala:

def shuffle(xs: List[T]): List[T] = xs match {
  case List() => List()
  case xs => { 
    val i = rand.nextInt(xs.size);
    xs(i) :: shuffle(xs.take(i) ++ xs.drop(i+1))
  }
}

That is, take a random element out of the list, shuffle the rest of the list and cons (::) them together. It's not exactly efficient, but it produces unbiased shuffles. Here, coding in a functional style pushes you toward the right answer. I won't claim that's always true, but I suspect it's not the only example.

Footnotes

1 The reason that n! is never an integer divisor of nn for n > 2 is that n-1 is a factor of n!, and n and n-1 share no prime factors (they are "coprime"), so the prime factorization of n! will always contain some prime factors not in nn. That n and n-1 are coprime follows from a property of integer divisibility: if k divides a and k divides b, then k divides a-b. Applying this property with a=n and b=n-1, we see that a-b=1, so k divides 1, so k=1. That is, 1 is the only common factor of n and n-1, which means they are coprime. Isn't it amazing how even the simplest algorithms have some serious math encoded in them? There's a special case that requires less math: when n is odd, we have that the product of odd numbers is always odd, so nn is odd, but n! is even because it contains the number 2; but, an odd number divided by an even number is never an integer. (These arguments are scattered across the various Wikipedia pages cited in this article; I've just pulled them together here.)

2 Thanks to Ondřej Lhoták for introducing me to Scala.

3 As with most things in Scala, there's a fairly deep reason for this. Scala allows you to treat arrays like functions; both are classes with a method called apply, and the expression xs(i) is defined to be equivalent to xs.apply(i). Functions are treated similarly; the measure method has a parameter shuffle that takes a procedure as its argument; this procedure is just an object with an apply method, and we can call it very cleanly as shuffle(xs). This makes a lot of sense: an array of strings is a function from the non-negative integers to the set of strings. A hash table works the same way; it's a function from the key set to the value set.

4 The theoretical analysis shows that the identity permutation is the most likely. This picture suggests that there are some non-identity permutations with significant probability mass, and that these tend to move each element a couple places backward.

Source Code

Shuffle.scala

package shuffle

object Shuffle {
val rand = new scala.util.Random();

def swap[T](xs: Array[T], i: Int, j: Int) = {
val t = xs(i)
xs(i) = xs(j)
xs(j) = t
}

def exchangeShuffle[T](xs: Array[T]) = {
for (i <- xs.indices)
swap(xs, i, rand.nextInt(xs.size))
}

def fisherYatesShuffle[T](xs: Array[T]) = {
for (i <- xs.indices.reverse)
swap(xs, i, rand.nextInt(i + 1))
}

def functionalShuffle[T](xs: Array[T]) = {
def shuf(xs: List[T]): List[T] = xs match {
case List() => List()
case xs => {
val i = rand.nextInt(xs.size);
xs(i) :: shuf(xs.take(i) ++ xs.drop(i+1))
}
}
shuf(xs.toList).copyToArray(xs, 0)
}

/**
* m(i)(j) = number of times the element at i was shuffled to position j
*/
def measure(n: Int, trials: Int,
shuffle: Array[Int] => Unit): Array[Array[Int]] = {
// Set up result matrix (all zeros).
val m = new Array[Array[Int]](n, n);

// Run experiments.
for (trial <- 1 to trials) {
val xs = (0 until n).toArray
shuffle(xs)
for (j <- xs.indices) {
m(xs(j))(j) += 1;
}
}

return m
}

def printMatrix[T](m: Array[Array[Int]], expected: Int) = {
import scala.Math._
var max_dev = -MAX_DOUBLE;
var min_dev = MAX_DOUBLE;
for (i <- m.indices) {
for (j <- m(i).indices) {
val dev = m(i)(j) / (0.0 + expected)
min_dev = min(dev, min_dev)
max_dev = max(dev, max_dev)
print(dev)
print(" ")
}
println
}
println("min deviation: "+((min_dev-1)*100)+"%")
println("max deviation: "+((max_dev-1)*100)+"%")
}

def main(args : Array[String]) : Unit = {
import java.awt._
import java.awt.event._
import javax.swing._

val trials = 1000000
val n = 20 // number of elements in array
var m = new Array[Array[Int]](n,n)

val win = new JFrame("Shuffle")
val px = 400
val viz = new JComponent() {
override def paintComponent(g: Graphics) = {
val square_px = px / n
val mAvg = trials / n;
for (i <- 0 until n; j <- 0 until n) {
val offset = (127.0 * (m(i)(j) - mAvg)) / mAvg
val gray = 128 + offset.toInt;
g.setColor(new Color(gray, gray, gray));
g.fillRect(square_px * j, square_px * i, square_px, square_px)
}
}
}
viz.setPreferredSize(new Dimension(px, px))
val controls = new JPanel() {
val doIt = new JButton("Recompute");
doIt.addActionListener(new ActionListener() {
override def actionPerformed(e: ActionEvent) = {
m = measure(n, trials, exchangeShuffle)
printMatrix(m, trials / n)
viz.repaint();
}
})
add(doIt, BorderLayout.WEST);
}
win.add(viz, BorderLayout.CENTER)
win.add(controls, BorderLayout.SOUTH)
win.pack()
win.setVisible(true)
}
}

shuffle.cpp

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <iterator>
#include <vector>

using namespace std;

void exchangeShuffle(int *xs, int n) {
for (int i = 0; i < n; ++i) {
swap(xs[i], xs[rand() % n]);
}
}

void fisherYatesShuffle(int *xs, int n) {
for (int i = n; i > 1; --i) {
swap(xs[i-1], xs[rand() % i]);
}
}

void measure(int n, int trials, vector<vector<int> > &m) {
// Set up results matrix, m (all zeros).
m.resize(n);
for (int i = 0; i < n; ++i)
m[i].resize(n, 0);

// Run experiments.
int *xs = new int[n];
for (int trial = 0; trial < trials; ++trial) {
for (int i = 0; i < n; ++i) {
xs[i] = i;
}
exchangeShuffle(xs, n);
for (int j = 0; j < n; ++j) {
m[xs[j]][j] += 1;
}
}
delete [] xs;
}

int main(int argc, char **argv) {
srand(time(NULL));
vector<vector<int> > m;
int n = 10;
int trials = 1000000;
measure(n, trials, m);
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
cout<<m[i][j] / (trials + 0.0) * n<<" ";
}
cout<<'\n';
}
return EXIT_SUCCESS;
}