Solutions to the exercises on Probability Knowledge

Task 1: Probabilities of events

Task 1.1 A bag contains 5 red balls, 3 blue balls, and 2 green balls. If you randomly draw one marble from the bag:

What is the probability of drawing a red ball?

\(\Pr(\text{drawing a red ball}) = \frac{5}{5+3+2} = 0.5\)

What is the probability of drawing a blue or green ball?

We want to calculate the event with a keyword OR, so we apply sum rule here, i.e. \[ \Pr(\text{drawing a blue or green ball}) = \Pr(\text{a blue ball}) + \Pr(\text{a green ball}) = 0.5 \] Or, this event can be represented as ‘drawing a ball which is not red’, i.e. \[ \Pr(\text{drawing a blue or green ball}) = 1 - \Pr(\text{drawing a red ball}) = 0.5 \]

Task 1.2 A fair six-sided dice is rolled once.

What is the probability of getting an even number?

There are same numbers of even and odd numbers between 1 and 6, so the probability is \(\frac{3}{6} = 0.5\).

What is the conditional probability of getting a number larger than 3 given the number is an even number?

When we know the it is a even number, then the number of all possible values become 3 and 2 of them are greater than 3, so the probability is \(\frac{2}{3}\).

What is the probability that you first time get a number less than 2 after throwing the dice 5 times?

Let \(X_i = 1\) denotes we got a number less than 2 at the \(i\)th time, then the probability can be represented as \[ \Pr(X_1 = 0 \text{ AND } X_2 = 0 \text{ AND } X_3 = 0 \text{ AND } X_4 = 0 \text{ AND } X_5 = 1) \] By product rule to key words ‘AND’, this probability is equal to \(0.08\).

Task 2: Discrete Random Variable and Distribution

Task 2.1: In R, write a function returning the values of p.m.f of an arbitrary Binomial distribution.

ProbBin = function(k, N=10, p=0.7){
  (factorial(N)/( factorial(k)*factorial(N-k) ))*p^(k)*(1-p)^(N-k)
}
# test my function with the R function `dbinom`
ProbBin(3)

[1] 0.009001692

dbinom(3, 10, 0.7)

[1] 0.009001692

Task 2.2: Apply your function to print out the distribution of Binomial distribution \(\text{Bin}(10,0.7)\).

BinDisribution = numeric(11) # since there are 11 possible values for Bin(10,0.7) 
for(i in 1:11){
  BinDisribution[i] = round(ProbBin(k = i-1),3)
}
BinDisribution = data.frame(X = 0:10, P = BinDisribution)
print(BinDisribution)

    X     P
1   0 0.000
2   1 0.000
3   2 0.001
4   3 0.009
5   4 0.037
6   5 0.103
7   6 0.200
8   7 0.267
9   8 0.233
10  9 0.121
11 10 0.028

Task 3: Characteristic Values

Task 3.1: Explain why the expected value of a Bernoulli distributed random variable, \(X \sim \text{Ber}(p)\), is \(p\).

The expected value of a random variable is the weighted sum of all possible values, the weights are the probability \[ \text{E}(X) = 1\times p + 0 \times (1-p) = p \]

Task 3.2: Explain why the expected value of a Binomial distributed random variable, \(X \sim \text{Bin}(N, p)\), is \(Np\).

Binomial distribution presents the probability of the number of success in \(N\) independent binary outcome experiment. Suppose \(X_i\) for \(i = 1,\dots,N\), then \(\sum_{i = 1}^N X_i\) is binomial distributed. So, by the properties of expected value, we have \[ \text{E}(X) = \text{E}\left( \sum_{i = 1}^N X_i \right) = \sum_{i = 1}^N \text{E}(X_i) = \sum_{i = 1}^N p = Np \]

Task 3.3: Explain why the variance of a Bernoulli distributed random variable is \(p(1-p)\).

Variance is the expected value of the squared distance between \(X\) and its expected value \[ \text{Var}(X) = \text{E} \left( X - \text{E}(X) \right)^2 = \text{E}(X^2) - (\text{E}(X))^2 = p - p^2 = p(1-p) \]

Task 3.4: Explain why the variance of a Binomial distributed random variable is \(Np(1-p)\).

With the same idea of 3.2, \[ \text{Var}(X) = \text{Var}\left( \sum_{i = 1}^N X_i \right) = \sum_{i = 1}^N \text{Var}(X_i) = \sum_{i = 1}^N p(1-p) = Np(1-p) \]

Task 3.5 ( HS ): In Section 3.4.5, we discussed the covariance between two random variables and got formulas of calculating the characteristic values for the weighted sum (linear combination) of random variables. Actually, these formula can be represented in a matrix form, and the matrix form can help us to easily generalize it to multiple setting (more than two random variables). For example, the mean and variance of a linear combination of two random variables are \[ \text{E}(a_1X_1+a_2X_2) = a_1\text{E}(X_1) + a_2\text{E}(X_2) \] It can be represented as \[ \text{E}(\textbf{a}^{\top}\textbf{X}) = \textbf{a}^{\top}\text{E}(\textbf{X}) \] where \(\textbf{a} = (a_1, a_2)^{\top}\) and \(\textbf{X} = (X_1, X_2)^{\top}\). In a multivariate setting, we usually call \(\textbf{X}\) as a random vector, it is a 2 dimensional random vector in this case, however, it can be arbitrary dimension in general.

Now, it is your turn. Can you represent the following result of variance in a matrix form? \[ \text{Var}(a_1X_1+a_2X_2) = a_1^2\text{Var}(X_1) + 2a_1a_2\text{Cov}(X_1,X_2) + a_2^2\text{Var}(X_2) \]

First, we write the formula above in the matrix form \[ \text{Var}(a_1X_1+a_2X_2) = (a_1, a_2)\left( \begin{matrix} \text{Var}(X_1) & \text{Cov}(X_1, X_2) \\ \text{Cov}(X_1, X_2) & \text{Var}(X_2) \end{matrix} \right) \left( \begin{matrix} a_1 \\ a_2 \end{matrix} \right) \] In general, we have \[ \text{Var}(\textbf{a}^{\top}\textbf{X}) = \textbf{a}^{\top} \boldsymbol{\Sigma} \textbf{a} \] where \(\boldsymbol{\Sigma}\) is the covariance matrix of \(\textbf{X}\).

Task 4: Joint distribution

Task 4.1: In Section 3.4.2, I only show you how to calculate the value in the first cell, i.e. \(\Pr( X=1, Y=1 )\). Please calculate the values of the rest 3 cells, i.e. \(\Pr( X=1, Y=0 )\), \(\Pr( X=0, Y=1 )\), and \(\Pr( X=0, Y=0 )\)

Task 4.2: With the same background problem, calculate the probability that you eventually get an orange.

We apply sum rule to key word ‘OR’ first: \[ \Pr (Y = 0) = \Pr \left( (Y = 0 \text{ AND } X = 1) \text{ OR } (Y = 0 \text{ AND } X = 0) \right) \] i.e. \[ \Pr (Y = 0) =\Pr \left( Y = 0 \text{ AND } X = 1 \right) + \Pr \left( Y = 0 \text{ AND } X = 0 \right) \] Then we apply product rule to key word ‘AND’ \[ \Pr (Y = 0) =\Pr \left( Y = 0 | X = 1 \right) \Pr(X = 1) + \Pr \left( Y = 0 | X = 1 \right) \Pr(X = 0) \] Then the final answer is \[ \Pr (Y = 0) = \frac{6}{8}\frac{4}{6} + \frac{1}{4}\frac{2}{6} = \frac{7}{12} \]

Task 4.3: Apply Bayes Formula to calculate the posterior probability that we chose the red box if we get an apple, i.e. \(\Pr( X=1 | Y=1 )\).

Task 4.4: Discuss with your group mates, propose a new example to explain what is the difference between prior and posterior probabilities.

Task 5: Continuous Random Variable and Distribution

Task 5.1: \(X \sim \mathcal{N}(1.2, 1)\), use R to calculate \(\Pr(-1.5 < X < 2.2)\).

pnorm(2.2, 1.2, 1) - pnorm(-1.5, 1.2, 1)

[1] 0.8378778

Task 5.2: Suppose \(X \sim \mathcal{N}(\mu, \sigma^2)\), then what is the distribution of \(\frac{X - \mu}{\sigma}\)?

\(\frac{X - \mu}{\sigma}\) is called standardization of a Normally distributed variable, since the resulting random variable has zero mean and varaince \(1\)

Task 5.3: Explain why 95% of the probability is covered within two SDs around the mean in a Normal distribution.

\[ \Pr \left(\mu + 2\sigma \leq X \leq \mu + 2\sigma \right) = \Pr\left(2 \leq \frac{X-\mu}{\sigma} \leq 2 \right) \] This probability can be calculated as

pnorm(2) - pnorm(-2)

[1] 0.9544997

Task 6: Likelihood Analysis

Task 6.1: With the “Box-Fruits” the background problem, let’s make a small adjustment: we use the flip of a fair coin to decide which box to choose.

Suppose you got an apple—then which box do you think you are more likely to have chosen?

Since we have equal probability to choose the red or blue box, i.e. the priors of blue and red are the same, we only need to evaluate the likelihood of geting an apple conditional on the color of box. So, blue box is more reasonable since \(\Pr(\text{apple} | \text{blue}) = \frac{3}{4} > \Pr(\text{apple} | \text{red}) = \frac{1}{4}\)

Task 6.2 Now, we change back the original setting that we throw a dice to decide the box, but we get red box when getting a number less than 6. Suppose you got an apple—then which box do you think you are more likely to have chosen?

In the new setting, since we have different probability to choose the color, i.e. \(\Pr(\text{blue}) = \frac{1}{6}\) and \(\Pr(\text{red}) = \frac{5}{6}\), only considering the likelihood of getting apple conditional on the color is not sufficient. Simply speaking, although we have higher probability to get an apply in blue box than the red, we can’t simply draw concolusion since we also have higher chance to get blue than red. So, in this setting, we need to apply the posterior probability to make decision, i.e. we need to calculate \[ \Pr(\text{Blue}|\text{Apple}) = \frac{\Pr(\text{Apple}|\text{Blue}) \Pr(\text{Blue})}{\Pr(\text{Apple}|\text{Blue}) \Pr(\text{Blue})+\Pr(\text{Apple}|\text{red}) \Pr(\text{red})} = \frac{\frac{3}{4} \times \frac{1}{6} }{\frac{3}{4} \times \frac{1}{6} + \frac{1}{4} \times \frac{5}{6}} = \frac{3}{8} \] and \[ \Pr(\text{Red}|\text{Apple}) = \frac{\Pr(\text{Apple}|\text{red}) \Pr(\text{red})}{\Pr(\text{Apple}|\text{red}) \Pr(\text{red})+\Pr(\text{Apple}|\text{blue}) \Pr(\text{blue})} = \frac{\frac{1}{4} \times \frac{5}{6} }{\frac{3}{4} \times \frac{1}{6} + \frac{1}{4} \times \frac{5}{6}} = \frac{5}{8} \] Therefore, this time we are more likely to have drawn the red box.

Lecture 3 Homepage