We have introduced the most basic elements of probability theory, namely random variables and their distributions. Now, I have a question to you. Suppose we have two binomial distributed random variables, \(X_1 \sim \text{Bin}(10, 0.1)\) and \(X_2 \sim \text{Bin}(3, 0.7)\), can we compare them? (You can hover your cursor on \(X_1\) and \(X_2\) to get the meaning of them.) Well, the two random variables have different potential outcome, for \(X_1\), you might get a integer from \(0\) to \(10\), but only three possible values, \(0, 1, 2, 3\) for \(X_2\). Apparently, it is not comparable. Then can we compare their observed values? It doesn’t seem that simple neither. One is an experiment with a low success rate conducted 10 times, while the other has a high success rate but we only conduct it 3 times, so it’s hard to say which will have a higher number of successes. Let’s explore the answer in a straightforward way by actually generating two random numbers from their respective distributions and comparing them.
If you run the code above multiple times, you’ll get both TRUE and FALSE, but you may feel that the realization of \(X_2\) is often higher than the realization of \(X_1\). To verify our intuition, we can repeatedly run the code above and record the result each time. By doing this 1000 times, we can see the percentage of cases where the value of \(X_2\) is greater than the value of \(X_1\).
res =numeric(1000)for(i in1:1000){ X1 =rbinom(1, prob =0.1, size =10) X2 =rbinom(1, prob =0.7, size =3) res[i] =ifelse(X2>X1,1,0)}print(paste0("The proportion of times X2>X1 is ", sum(res)/10,"%"))
[1] "The proportion of times X2>X1 is 71.2%"
Indeed, we often observe that the value of \(X_2\) is greater than \(X_1\). However, is there a way to reach a conclusion directly without relying on experiments? Or perhaps, can we explain this phenomenon using mathematical language? We then need to introduce another important concept, the expected value of a random variable. The expected value of a random variable is very similar to the concept that you are very familiar with, that is average value. If I ask you how often do you go to IKSU per week? Then most likely, you will answer me that you go to IKSU, on average, 3 times per week, since the number of visiting IKSU (the largest sports center in Umeå) per week is not a certain number (you often go there 3 times, but not always, for example, you are sick or have an important exam to prepare). Let me show you the number of my IKSU visits in the last 10 weeks. \[
3,5,3,2,2,4,5,3,3,4
\]
We all know that the average value can be \[
\frac{3+5+3+2+2+4+5+3+3+4}{10} = 3.4
\] Of course, it is a super easy calculation, but let’s have a close look at it. This calculation can be represented as \[
\frac{2 \times 2 + 4\times3 + 2\times4 +2\times 5 }{10} = 0.2 \times 2 + 0.4 \times 3 + 0.2 \times 4 + 0.2 \times 5 = 3.4
\]
Notice that the decimal in front of each integer, the possible value, is the percentage of the corresponding value that happened in the last ten weeks. In the rational world, if you still remember it, the percentage is replaced by the probability. Therefore, the definition of expected value is defined as the weighted sum of all possible values, and the weights are the corresponding probabilities. In a mathematical notation, the expected value of a random variable is presented as \[
\text{E}(X) = \sum_{k} k \Pr (X = k)
\] We can see that the expected value of a binary distributed random variable and a binomial distributed random variable is \(p\) and \(Np\) respectively. It is a good exercise to verify it. Now, we can turn back to the question at the beginning. By simple calculation, we can see that \[
\text{E}(X_2) = 3\times0.7 = 2.1 > \text{E}(X_1) = 10\times0.1 = 1
\] The expected value satisfies linearity. Suppose \(a\) and \(b\) are constant numbers and \(X\) is a random variable, then \(\text{E}(aX+b) = a \text{E}(X) + b\). In other words, linearity means the expectation operator and the linear operator (scalar multiplication and addition) are exchangeable, i.e. \[
\text{E} \left(\sum_{i=1}^n a_iX_i\right) = \sum_{i=1}^n a_i\text{E}(X_i)
\]
3.3.2 Variance
The expected value can help us determine the size of the “common” value of a random variable so that we can compare two random variables. One can also compare two random variables from another dimension, which is “value stability”. For example, we have two coins, one is even and the other is so uneven that there is a very high probability, 90%, of getting Heads. Then imagine that if we flip two coins repeatedly, we will get many heads for the uneven coin and occasionally get Tails; but for the even coin, we will get the same number of heads and tails with high probability. From the perspective of taking values, the values of uneven coins are very stable, while those of uniform coins are not. The stability of a random also refers to variation. High variation means low stability. The two things can be quantified by the variance.
The variance of a random variable \(X\) is defined as \[
\text{Var}(X) = \text{E}(X - \text{E}(X))^2
\]
This formula is very intuitive. First, we calculate the most frequently occurring value of this random variable, \(\text{E}(X)\), and then compare it with any \(X\) by taking the difference. Finally, we calculate the average of the squares of this difference. If this random variable has stable values, then the value of \(X\) should often stay around the expected value, therefore \(X - \text{E}(X)\) will generally be very small; conversely, if it varies significantly, it will be generally large, which aligns perfectly with our initial intention.
Based on this definition, one can easily verify the variance of a binary distributed random variable and a binomial distributed random variable is \(p(1-p)\) and \(np(1-p)\) respectively. In the example above, the variance of the even coin is \(0.25\), but the uneven is \(0.09\).
Different from the expected value, variance doesn’t satisfy the linearity, i.e. the variance operator and the linear operator are not exchangeable. However, it satisfies the following rules, \[
\text{Var}(aX+b) = a^2\text{Var}(X)
\]
Based on the results above, we can easily find that a special linear combination, \(\left( X-\text{E}(X) \right)/\sqrt{\text{Var}(X)}\), produces a standardized random variable, i.e. mean zero and variance 1.