God doesn’t play dice! God creates dices: Dirichlet distribution (Pt.1)

rohola zandie
6 min readDec 20, 2019

When we think about probabilities we try to ask questions like: What is the probability of two dices have the same number? Or, what is the probability of the dice being less than 3? All these questions start with a very broad and yet questionable assumption: The dice is fair and the probability of landing on each of its faces is equal and equal to 1/6. Well, what if the dice is not fair? What if the manufacture of dices make them unfair (for some reason)?

When you start to question the underlying processes that create the so-called fixed probability distributions you start to realize that it’s not only likely but also is very natural and can happen with pretty much every random process we deal with. It’s as if you step into a new world that prior probabilities are not fixed!

Let’s think about the process of manufacturing the dices. The companies that create dices try to make them as fair as possible. But we live in an imperfect world. The processes can be faulty and there IS variation in the process of making fair dices. It’s just like a random variable: it has its own probability distribution. It may sound confusing at first but we can have a probability distribution for dices that themselves represent another random variable with their own probability distribution.

To understand this concept better let's go back to our example with the manufacturer of the dices. The manufacturer makes dices with six faces. But the machines that create them can be faulty. But by faulty we can have different means:

1- The process of creating dices favor some numbers more than others. For example, the process can create dices that land on their face 1 more than the expected proportion of 1/6. This bias can even be more complex, for example, the bias can favor 1 and 4 more than others but the bias is stronger for 4 than 1.

2- The process is not biassed towards any specific number but it just creates biased dices equally for all the faces. So we have dices that land more often on 1 than others or 2 or 3 etc. This one looks like a bigger problem than the previous one. But this can be even more complicated: the process can even be biased itself which means the process still creates biased dices of all kind but can favor one (or a few numbers) more.

All these are good intuitions that can help us to form a more formal and rigid theory that explains the variability of a random variable but we need numbers and symbols to make everything crystal clear.

Beta distribution: a special case

To form a mathematical foundation for our case, it’s always better to start with the most simplified version of our statement. Instead of thinking about unfair dices we can think about unfair coins. Coins are simpler because they only can fall on two faces: heads and tails. All the stories about manufacturing a dice can equally happen to a coin.

First, let’s think about a manufacturing process that tries to minimize the bias. So, we expect to see more fair coins. A simple way to model this process is by giving it a probability distribution. But before thinking about the distributions, we can start by forming a formula that gives higher scores to 0.5 than others and zero score to total bias to 0 (head) and 1 (tail). This can be written like:

if we plot this we get (with some constant factor):

This is what we expect: we want a random variable (or the process) that represent the coins that the proportion of their falling on the head compared to all is 0.5 most of the time. Obviously the random process can be more precise which means we have a sharper peak for the score function. To model this case, we can change the formula to have:

and the result:

As you can see the new manufacturing process is more precise and produce fair coins with a higher probability.

But we just said it’s not a probability but some score. This is very easy to fix. You just need to divide the whole formula by it’s integral over [0 1] to get a probability distribution. But how can we model the unbias forms? For example, we want to model the random process that favors heads (0’s) more than tails. It’s actually very easy with this schema, you only need to extend the above template to incorporate two new parameters:

The choice of subtracting 1 from the parameters is arbitrary and the reason is to make the final formulation simpler. The last step is to make this a real probability distribution. We take the integral that leads to a new function named beta function and the resulting distribution is also beta distribution:

Now, the parameters of alpha and beta can control all kinds of biased distributions. For example for alpha>beta we have coins that are biased towards heads(0’s) more than tails(1’s) and for alpha<1 and beta<1 we have coins that are biased but not specifically heads or tails but both. It’s just like the second case of biased dices above. Here we have different beta distributions for different values of parameters of alpha and beta:

From Wikipedia: https://en.wikipedia.org/wiki/Beta_distribution

The purple curve is similar to our first example that represents a manufacturer that tries to create fair coins. The green curve favors 0’s and blue one favors 1’s and the green tries to show a manufacturing process that creates coins that proportionally fall on the head (0) 0.2 percent of times with the highest probability. And finally, the red curve shows the last case of creating coins that are biased but not specifically towards the head or tail.

Thinking about modeling the random process of creating another probability distribution is not easy to grasp. The example of creating dices and coins can help to form an intuition on how it works. In the next part, we try to extend this idea to Dirichlet distribution. The 6-face dice is harder to think about but we try to find ways to simplify and visualize it.

--

--

rohola zandie

I am a PhD student in NLP and Dialog systems, I am curious about mathematics, machine learning, philosophy and languages.