Binomial coefficients are a surprisingly interesting subject of study, so interesting in fact that Sherlock Holmes’ arch-enemy Professor Moriarty began his career with a paper on them. Incidentally, this fictional *Treatise on the Binomial Theorem* has a longer Wikipedia page than any real work of mathematics I’ll ever write. But I digress. Today we’re going to look at a novel way to calculate binomial coefficients in Haskell, the programming language that is basically math.

The binomial coefficients \(n \choose k\) draw their name from the following “Binomial formula”: \[ (1 + x)^n = \sum_{k = 0}^n {n \choose k} x^k \]

When multiplying out \((1+x)^n\), for each of the \(n\) clauses \((1 + x)\) we have \(n \choose k\) ways of choosing \(k\) of the \(x\)’s to get \(x^k\)

Here \(n \choose k\) is the coefficient on the \(k\)th power of \(x\) in the expansion of the left-hand side. However, a “purer” definition of \(n \choose k\) is the combinatorial one: the number of ways to choose \(k\) objects from a set of \(n\). From this definition one can see why these numbers show up as the coefficients above^{1}. We can also use this point of view to understand some other binomial sum identities – for example, for a finite set \(A\) with \(n\) elements, the cardinality of the power set \(\mathcal P(A)\) (the set of all subsets of \(A\)) is \(2^n\). This gives a way to understand why \[
2^n = \sum_{k=0}^n {n \choose k}
\]

If you’re wondering how we knew the number of subsets in the first place, note that each subset \(S \subseteq A\) can be identified with a function \(\chi_S : A \to \{0,1\}\), where \(\chi_S(a) = 1\) if and only if \(a \in S\), and there are \(2^n\) such functions by a simple counting argument.

which we can of course derive from the binomial formula above but whatever. For each \(k\) with \(0 \leq k \leq n\), there are \(n \choose k\) ways to specify a distinct set of \(k\) elements, i.e. a subset of size \(k\). So the number of subsets of any size is just the sum of these numbers over all possible \(k\), and there are \(2^n\) total subsets^{2}.

Another nice identity is this one: \[ {n \choose k} = \frac{n}{k}{n-1 \choose k-1} \]

Combinatorially we can understand it like so: to count the number of ways to pick \(k\) elements from a set of \(n\) elements, pick one of the \(n\) elements and look at the number of ways to pick the other \(k-1\) elements from the remaining \(n-1\) elements. There are \(n\) ways to pick the first element, so that’s \(n{n-1 \choose k-1}\) ways of picking subsets. But we overcounted – each subset is counted \(k\) times, one time for each of its elements that we pick first. So after dividing by \(k\) to fix this, we’ve shown the identity holds.

The identity is so nice because it gives a natural recursive algorithm to compute the coefficients; just keep applying the identity to get \[ {n \choose k} = \frac{n}{k}\cdot\frac{n-1}{k-1}\cdots\frac{n-k+1}{1}{n-k\choose 0} \]

where \({n - k \choose 0} = 1\). And since this is a post about \(n \choose k\) in Haskell, I’m going to turn this into a recursive Haskell function right? Unfortunately someone beat me to it, and they even used more identities than I have, so let’s take a different approach and go back to the Binomial formula. If we can expand the polynomial \((1 + x)^n\) into a sum, we can just read off the coefficients for each power of \(x\). Easy right?

A polynomial only has finitely-many powers of \(x\), so we can represent a polynomial such as \(a_0 + a_1x + a_2x^2 + \ldots + a_nx^n\) as a list \([a_0, \ldots, a_n]\). In Haskell, we use the typeclass ```
Num
```

to describe things that we can add, multiply, negate, and so on; polynomials can do these things too, so let’s define a ```
Num
```

instance for lists of `a`

s, where `a`

is an instance of ```
Num
```

, representing polynomials with coefficients in `a`

:

```
instance Num a => Num [a] where
fromInteger n = [fromInteger n]
(x:xs) + (y:ys) = (x + y) : (xs + ys)
xs + [] = xs
[] + ys = ys
(x:xs) * (y:ys) = (x*y) : ([x] * ys + xs * (y:ys))
_ * _ = []
```

You can disable warnings for this by passing

`ghc`

the flag`-fno-warn-missing-methods`

.\(\;\;\overbrace{(a_0 + a_1x\ldots)}^A\overbrace{(b_0 + b_1x\ldots)}^B\) \(= \;a_0B + (a_1x\ldots)B\) \(= \;a_0b_0 + a_0(b_1x\ldots) + a_1xB\) \(= \;a_0b_0 + x[a_0(b_1\ldots) + a_1B]\)

This is missing a few ```
Num
```

functions, but we won’t need them^{3}. We add two polynomials by adding the respective coefficients for each power of \(x\), and the rather complicated expression for multiplication can be derived without much work^{4}.

So now \([1,1]\) is our representation for \(1 + x\). Exponentiation comes for free since Haskell implements it using repeated squaring. So if we try it out, ```
[1,1]^2 == [1,2,1]
```

, and ```
[1,1]^4 == [1,4,6,4,1]
```

. Translating these back to math world, \((1+x)^2 = 1 + 2x + x^2\) and \((1+x)^4 = 1 + 4x + 6x^2 + 4x^3 + x^4\). So now it’s only natural to define \(n \choose k\) by:

```
choose :: Int -> Int -> Int
choose n k = ([1,1]^n) !! k
```

where `xs !! k`

is the \(k\)th element of `xs`

. Now we may quickly concoct some quality coefficients:

```
3 `choose` 2 == 3
10 `choose` 6 == 210
434 `choose` 87 == 4614992264942942144
```

or just generate an infinite Pascal’s triangle with ```
map ([1,1]^) [0..]
```

.

```
[[1]
,[1,1]
,[1,2,1]
,[1,3,3,1]
,[1,4,6,4,1]
,[1,5,10,10,5,1]
,[1,6,15,20,15,6,1]
,[1,7,21,35,35,21,7,1]
,[1,8,28,56,70,56,28,8,1]
,[1,9,36,84,126,126,84,36,9,1]
, ...
```

Which is why I like it so much.

This is honestly an absurd^{5} way to calculate binomial coefficients, but the underlying concept of manipulating polynomials as lists is very cool. We can take this in more directions, such as using infinite streams to represent power series, but perhaps that is a topic for a later post.

## Acknowledgments

Thanks to Ira Kones and Rory Bokser for proofreading this post.

When multiplying out \((1+x)^n\), for each of the \(n\) clauses \((1 + x)\) we have \(n \choose k\) ways of choosing \(k\) of the \(x\)’s to get \(x^k\) ↩

If you’re wondering how we knew the number of subsets in the first place, note that each subset \(S \subseteq A\) can be identified with a function \(\chi_S : A \to \{0,1\}\), where \(\chi_S(a) = 1\) if and only if \(a \in S\), and there are \(2^n\) such functions by a simple counting argument. ↩

You can disable warnings for this by passing

`ghc`

the flag`-fno-warn-missing-methods`

. ↩\(\;\;\overbrace{(a_0 + a_1x\ldots)}^A\overbrace{(b_0 + b_1x\ldots)}^B\) \(= \;a_0B + (a_1x\ldots)B\) \(= \;a_0b_0 + a_0(b_1x\ldots) + a_1xB\) \(= \;a_0b_0 + x[a_0(b_1\ldots) + a_1B]\) ↩

Which is why I like it so much. ↩

## Comments