- Basic Probability
- Conditional Probability
- Bayes Theorem
- Random Variables
- Discrete Random Variables
- Continuous Random Variables
- Correlation and Covariance
Q. What do you mean by probability?
Answer
It is a mathematical framework for representing uncertain statements. It provides a means of quantifying uncertainty.
Q. Define probability of an event?
Answer
The probability of an event
$$ P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}} $$
Q. Define disjoint sets.
Answer
Disjoint sets are sets that have no elements in common. In other words, the intersection of disjoint sets is the empty set. If
$$ A \cap B = \emptyset $$
Q. Write the expression of probability of
Answer
The conditional probability of
$$ P(A | B) = \frac{P(A \cap B)}{P(B)} $$
where:
-
$P(A \cap B)$ is the probability that both events$A$ and$B$ occur. -
$P(B)$ is the probability that event$B$ occurs, and it must be greater than zero.
Q. What is the law of total probability?
Answer
The law of total probability is a fundamental rule that provides a way to compute the probability of an event by considering all possible scenarios or conditions. If
$$ P(A) = \sum_{i=1}^{n} P(A | B_i) \cdot P(B_i) $$
where:
-
$P(A | B_i)$ is the conditional probability of$A$ given$B_i$ . -
$P(B_i)$ is the probability of$B_i$ .
Q. What does it mean for two variables to be independent?
Answer
Two random variables
$$ \forall x \epsilon x, y \epsilon y, p(x=x, y=y) = p(x=x) \cdot p(y=y) $$
This means that knowing whether
Q. Given two random variables
Answer
To calculate
$$ P(X) = \sum_{Y} P(X | Y) \cdot P(Y) $$
This expression accounts for the contribution of each conditional probability
Q. You know that your colleague Jason has two children and one of them is a boy. What’s the probability that Jason has two sons?
Answer
Q. Consider a room of
Answer
Q. Given two events
- Define the conditional probability of
$A$ given$B$ . Mind singular cases. - Annotate each part of the conditional probability formulae.
- Draw an instance of Venn diagram, depicting the intersection of the events
$A$ and$B$ . Assume that$A \cup B = H$ .
Answer
Q. Assume you manage an unreliable file storage system that crashed 5 times in the last year, each crash happens independently.
- What's the probability that it will crash in the next month?
- What's the probability that it will crash at any given moment?
Answer
Q. Say you built a classifier to predict the outcome of football matches. In the past, it's made 10 wrong predictions out of 100. Assume all predictions are made independently, what's the probability that the next 20 predictions are all correct?
Answer
Q. State bayes theorem.
Answer
Bayes' Theorem provides a way to update the probability of a hypothesis based on new evidence. It relates the conditional and marginal probabilities of random events.
Mathematically, Bayes' Theorem is stated as:
$$ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} $$
where:
-
$P(A | B)$ is the probability of event$A$ given that$B$ has occurred (posterior probability). -
$P(B | A)$ is the probability of event$B$ given that$A$ has occurred (likelihood). -
$P(A)$ is the probability of event$A$ (prior probability). -
$P(B)$ is the probability of event$B$ (marginal likelihood).
It is named after Reverend Thomas Bayes
Q. Write the simplified version of bayes rules.
Answer
Standard bayes theorem:
$$ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} $$
We can use bayes' theorem with law of total probability:
$$ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A^C) \cdot P(A^C)} $$
Here, we can also make it more general expression:
$$ P(A_i | B) = \frac{P(B | A_i) \cdot P(A_i)}{\sum_{j=1}^{n} P(B | A_j) \cdot P(B_j)} $$
Q. What is normalization constant in Bayes Theorem?
Answer
The denominator in standard bayes theorem is called normalization constant.
$$ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} $$
It is called normalization constant since it is the same regardless of whether or not the event
In case it is unknown, we can use following expressions:
$$ P(B) = P(B|A)P(A) + P(B|A^C)P(A^C) $$
Q. Show the relationship between the prior, posterior and likelihood probabilities.
Answer
The relationship between the prior, posterior, and likelihood probabilities is given by Bayes' Theorem, which can be expressed as:
$$ \text{Posterior} = \frac{\text{Likelihood} \times \text{Prior}}{\text{Evidence}} $$
Q. In a Bayesian context, if a first experiment is conducted, and then another experiment is followed, what does the posterior become for the next experiment?
Answer
Q. Suppose there are three closed doors and a car has been placed behind one of the door at random. There are two goats behind the other two doors. Now you pick a door 1 but the admin knows where the car is and open the door 2 to reveal a goat(admin will always open the door with a goat). Now he offers you to stay at the same chosen door or switch between closed doors i.e door 1 and door 2. Should you switch the door to maximize chances of getting car?
Answer
Q. There are only two electronic chip manufacturers:
- If you randomly pick a chip from the store, what is the probability that it is defective?
- Suppose you now get two chips coming from the same company, but you don’t know which one. When you test the first chip, it appears to be functioning. What is the probability that the second electronic chip is also good?
Answer
Q. There’s a rare disease that only 1 in 10000 people get. Scientists have developed a test to diagnose the disease with the false positive rate and false negative rate of 1%.
- Given a person is diagnosed positive, what’s the probability that this person actually has the disease?
- What’s the probability that a person has the disease if two independent tests both come back positive?
Answer
Q. A dating site allows users to select
Answer
Q. Consider a person A whose sex we don’t know. We know that for the general human height, there are two distributions: the height of males follows
Answer
Q. There are three weather apps, each the probability of being wrong
Answer
Q. Given n samples from a uniform distribution
Answer
Q. You’re part of a class. How big the class has to be for the probability of at least a person sharing the same birthday with you is greater than
Answer
Q. You decide to fly to Vegas for a weekend. You pick a table that doesn’t have a bet limit, and for each game, you have the probability
Answer
Q. In national health research in the US, the results show that the top 3 cities with the lowest rate of kidney failure are cities with populations under
Answer
Q. Bayesian inference amalgamates data information in the likelihood function with known prior information. This is done by conditioning the prior on the likelihood using the Bayes formulae. Assume two events A and B in probability space
Answer
Q. In an experiment conducted in the field of particle physics (Fig. 3.2), a certain particle may be in two distinct equally probable quantum states: integer spin or half-integer spin. It is well-known that particles with integer spin are bosons, while particles with half-integer spin are fermions.
Bosons and fermions: particles with half-integer spin are fermions |
A physicist is observing two such particles, while at least one of which is in a half-integer state. What is the probability that both particles are fermions?
Answer
Q. During pregnancy, the Placenta Chorion Test is commonly used for the diagnosis of hereditary diseases (Fig. 3.3). The test has a probability of
Foetal surface of the placenta |
It is known that
Answer
Q. The Dercum disease is an extremely rare disorder of multiple painful tissue growths. In a population in which the ratio of females to males is equal, 5% of females and 0.25% of males have the Dercum disease (Fig. 3.4).
The Dercum disease< |
A person is chosen at random and that person has the Dercum disease. Calculate the probability that the person is female.
Answer
Q. There are numerous fraudulent binary options websites scattered around the Internet, and for every site that shuts down, new ones are sprouted like mushrooms. A fraudulent AI based stock-market prediction algorithm utilized at the New York Stock Exchange, (Fig. 3.6) can correctly predict if a certain binary option shifts states from 0 to 1 or the other way around, with
The New York Stock Exchange |
A financial engineer has created a portfolio consisting twice as many
Answer
Q. In an experiment conducted by a hedge fund to determine if monkeys (Fig. 3.6) can outperform humans in selecting better stock market portfolios, 0.05 of humans and 1 out of 15 monkeys could correctly predict stock market trends correctly.
Hedge funds and monkeys |
From an equally probable pool of humans and monkeys an “expert” is chosen at random. When tested, that expert was correct in predicting the stock market shift. What is the probability that the expert is a human?
Answer
Q. During the cold war, the U.S.A developed a speech to text (STT) algorithm that could theoretically detect the hidden dialects of Russian sleeper agents. These agents (Fig. 3.7), were trained to speak English in Russia and subsequently sent to the US to gather intelligence. The FBI was able to apprehend ten such hidden Russian spies and accused them of being "sleeper" agents.
Dialect detection |
The Algorithm relied on the acoustic properties of Russian pronunciation of the word (v-o-k-s-a-l) which was borrowed from English V-a-u-x-h-a-l-l. It was alleged that it is impossible for Russians to completely hide their accent and hence when a Russian would say V-a-u-x-h-a-l-l, the algorithm would yield the text v-o-k-s-a-l. To test the algorithm at a diplomatic gathering where
Answer
Q. During World War II, forces on both sides of the war relied on encrypted communications. The main encryption scheme used by the German military was an Enigma machine, which was employed extensively by Nazi Germany. Statistically, the Enigma machine sent the symbols X and Z Fig. (3.8) according to the following probabilities:
The Morse telegraph code |
In one incident, the German military sent encoded messages while the British army used countermeasures to deliberately tamper with the transmission. Assume that as a result of the British countermeasures, an X is erroneously received as a Z (and mutatis mutandis) with a probability
Answer
Q. In context of random variables define the following terms:
- Distributions
- Expectations
- Variance
- PMFs and CDFs
- Support
Answer
Distributions
A distribution describes how the values of a random variable are spread or distributed. It provides a complete description of the probability of different outcomes of the random variable.
Expectations
The expectation of a random variable
For discrete variables:
$$ E_{X~P}[f(x)] = \sum_{x}P(x)f(x) $$
For continuous variables:
$$ E_{X~P}[f(x)] = \int p(x)f(x)dx $$
Variance
The variance gives a measure of how much the values of a function of a random variable
$$ \text{Var}(f(x)) = E\left[(f(x) - E[f(x)])^2\right] $$
Low variance means the values of
PMFs and CDFs
-
Probability Mass Function (PMF): A PMF is used to describe the distribution of a discrete random variable. It specifies the probability of each possible value of the random variable. For a discrete random variable
$X$ , the PMF$P(X = x)$ gives the probability that$X$ takes the value$x$ . -
Cumulative Distribution Function (CDF): A CDF describes the probability that a random variable will take a value less than or equal to a certain threshold. For a random variable
$X$ , the CDF$F(x)$ is defined as:
$$ F(x) = P(X \leq x) $$
Support
The support of a random variable is the set of values that the variable can take with non-zero probability. For a discrete random variable, it is the set of all values where the PMF is positive. For a continuous random variable, it is the interval or set of intervals where the PDF is positive. The support indicates the range over which the variable can realistically take values.
Q. What are the key characteristics of a PDF?
Answer
To be a pdf a function
- The domain of
$p$ must be the set of all possible states of$x$ -
$\forall x \in x, p(x) \ge 0$ , Note here we don't require$p(x) \le 1$ $\int p(x)dx = 1$
Q. What are the properties of expectation?
Answer
- Linearity of expectation
$$ E[aX + b] = aE[X] + b $$
Where
- Expectation of the Sum of Random Variables
$$ E[X+Y] = E[X] + E[Y] $$
Note that this is true irrespective of relationship between
- Law of Unconcious Statistician(LOTUS)
$$ E[g(X)] = \sum_{x}g(x)P(X = x) $$
We can use this to calculate
$$ E[X^2] = \sum_{x}x^{2}P(X=x) $$
- Expectation of constants
$$ E[a] = a $$
Q. Can the values of PMF be greater than 1?
Answer
PMF values are actual probabilities of discrete outcomes and must adhere strictly to the range
Q. Can a Probability Density Function (PDF) be bounded or unbounded?
Answer
Yes, a Probability Density Function (PDF) can be either bounded or unbounded.
Example
A PDF is unbounded if it can reach arbitrarily large values, especially over narrow intervals. For example, the PDF
Q. Write the expression of variance of a random variable?
Answer
Suppose
$$ \text{Var}(X) = E[(X - \mu)^2] $$
This is the average distance of a sample from the distribution to the mean. We can further simplify the above expressions with some workaround,
$$ \text{Var}(X) = E[X^2] - E[X]^2 $$
Q. What’s the difference between multivariate distribution and multimodal distribution?
Answer
Multi-model refers to a dataset (variable) in which there is more than one mode, while multivariate refers to a dataset in which there is more than one variable.
Q. What do you mean by log probability? Explain why do we need it?
Answer
A log probability
Benefits of using log probability:
- It converts product of probabilities into addition
- Easier for computers to compute addition addition than multiplications
- Computers use log probabilities to handle extremely small probabilities accurately due to limitations in floating-point precision.
Q. How would you turn a probabilistic model into a deterministic model?
Answer
We can do quantization of model's outputs. In models that generate probabilistic outputs (e.g., classification with probabilities), we can convert these into deterministic outputs by selecting the highest probability class or using a thresholding mechanism.
Q. What is a moment of function? Explain the meanings of the zeroth to fourth moments.
Answer
Statistical moments are additional descriptors of a curve/distribution. Moments quantify three parameters of distributions: location, shape, and scale.
-
location
- A distribution’s location refers to where its center of mass is along the x-axis. -
Scale
- The scale refers to how spread out a distribution is. Scale stretches or compresses a distribution along the x-axis. -
Shape
- The shape of a distribution refers to its overall geometry: is the distribution bimodal, asymmetric, heavy-tailed?
The $k$th moment of a function
This generalization allows us to make an important distinction:
-
a raw moment is a moment about the origin
$(c=0)$ -
a central moment is a moment about the distribution’s mean
$(c=E[X])$
First five moments in order from $0$th to $4$th moments: total mass
, mean
, variance
, skewness
, and kurtosis
.
-
Zeroth Moment(total mass): The zeroth moment is simply the constant value of 1. It doesn't provide much information about the distribution itself but is often used in mathematical contexts.
-
1st Moment(mean) - The first moment is also known as the mean or expected value. It represents the center of the distribution and is a measure of the average or central location of the data points.
$$$\mu = \frac{1}{n} \sum_{i=1}^{n} x_i$$$
Where:
-
$$\mu$$ (mu) is the mean. -
$$n$$ is the number of data points. -
$$x_i$$ represents individual data points. -
2nd Moment(Variance) - The second moment is the variance. It measures the spread or dispersion of the data points around the mean. It is calculated as the average of the squared differences between each data point and the mean.
$$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$$$
Where:
-
$\sigma^2$ (sigma squared) is the variance. -
3rd Moment(Skewness) - The third moment is a measure of the skewness of the distribution. It indicates whether the distribution is skewed to the left (negatively skewed) or to the right (positively skewed).
- 4th Moment(Kurtosis) - The fourth moment measures the kurtosis of the distribution. Kurtosis indicates whether the distribution is more or less peaked (leptokurtic or platykurtic) compared to a normal distribution.
Q. List down some famous discrete distributions.
Answer
- Bernoulli Distribution : It is a distribution over a single binary random variable.
- Binomial Distribution : Describes the number of successes in a fixed number of independent Bernoulli trials with the same probability of success.
- Multinomial Distribution : Generalizes the binomial distribution to more than two possible outcomes per trial, modeling the frequencies of different outcomes in a fixed number of trials.
- Poisson Distribution : Models the number of events occurring in a fixed interval of time or space, given a constant mean rate of occurrence and independence of events.
Q. Define what is meant by a Bernoulli trial.
Answer
Independent repeated trials of an experiment with exactly two possible outcomes are called Bernoulli trials.
Q. Suppose
- Support
- PMF Equation
- Smooth PMF
- Expectation
- Variance
Answer
- Support :
$x$ is either$0$ and$1$ - PMF Equation
$$ P(X = x) = \begin{cases} p & \text{if } x = 1, \ 1 - p & \text{if } x = 0, \end{cases} $$
- Smooth PMF
$$ P(X = x) = p^x (1 - p)^{1 - x}, \quad x \in {0, 1}. $$
- Expectation
$$ E[X] = p $$
- Variance
$$ \text{Var}(X) = p(1-p) $$
Q. Suppose
- Support
- PMF Equation
- Expectation
- Variance
Answer
- Support :
$x \in {0,1,...,n}$ - PMF Equation
$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, 2, \ldots, n. $$
- Expectation
$$ E[X] = n.p $$
- Variance
$$ \text{Var}(X) = n.p.(1-p) $$
Q. The binomial distribution is often used to model the probability that
Answer
Q. What does the following shorthand stand for?
Answer
It means
Binomial Random Variable |
Q. Find the probability mass function (PMF) of the following random variable:
Answer
The probability mass function (PMF) of the binomial distribution is given by:
$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} $$
where:
-
$n$ is the number of trials, -
$k$ is the number of successes, -
$p$ is the probability of success on each trial, -
$\binom{n}{k}$ is the binomial coefficient, calculated as$\frac{n!}{k!(n - k)!}$ .
Binomial Probability Mass Function |
The PMF gives the probability of having exactly
Q. Define the terms likelihood and log-likelihood of a discrete random variable X given a fixed parameter of interest
Answer
Q. Define the terms likelihood and log-likelihood of a discrete random variable X given a fixed parameter of interest
Answer
Q. Given a fair coin, what’s the number of flips you have to do to get two consecutive heads?
Answer
Q. Derive the expectation and variance of a the binomial random variable
Answer
Here we can use the fact that binomial is the sum of bernoulli indicator random variables
- Expectation of
$X ∼ Binomial(n, p)$ :
$$ X = X_1 + X_2 + \ldots + X_n, $$
where each
$$ P(X_i = 1) = p \quad \text{and} \quad P(X_i = 0) = 1 - p. $$
The expectation of a Bernoulli random variable
$$ E[X_i] = 1 \cdot p + 0 \cdot (1 - p) = p. $$
Using the linearity of expectation:
$$ E[X] = E[X_1 + X_2 + \ldots + X_n] = E[X_1] + E[X_2] + \ldots + E[X_n] = n \cdot p. $$
$$ \boxed{E[X] = n \cdot p.} $$
- Variance of
$X ∼ Binomial(n, p)$ :
The variance of a Bernoulli random variable
$$ \text{Var}(X_i) = E[X_i^2] - (E[X_i])^2. $$
Since
$$ E[X_i^2] = E[X_i] = p. $$
Thus, the variance of
$$ \text{Var}(X_i) = p - p^2 = p(1 - p). $$
Since the
$$ \text{Var}(X) = \text{Var}(X_1 + X_2 + \ldots + X_n) = \text{Var}(X_1) + \text{Var}(X_2) + \ldots + \text{Var}(X_n) = n \cdot p(1 - p). $$
$$ \boxed{\text{Var}(X) = n \cdot p(1 - p).} $$
Q. What is poisson distribution? Define following in context of it.
- Notation
- Parameters
- Support
- PMF
- Expectation
- Variance
Answer
Poisson distribution gives the probability of a given number of events in a fixed interval of time(or space).
Notations
$$ X ~ Poi(\lambda) $$
Parameters
It is constant average rate.
Support
$$ x \in {0,1,...} $$
PMF Expression
$$ P(X = x) = \frac{\lambda^{x}e^{-\lambda}}{x!} $$
Expectation
$$ E[X] = \lambda $$
Variance
$$ \text{Var}(X) = \lambda $$
Q. What are the main assumption for poisson distribution?
Answer
- Events should occur with a known constant mean rate i.e
$\lambda$ should be fixed - Events should be independent of time since the last event
Q. Define categorical distribution?
Answer
The Categorical Distribution is a fancy name for random variables which takes on values other than numbers. As an example, imagine a random variable for the weather today. A natural representation for the weather is one of a few categories: {sunny, cloudy, rainy, snowy}
Q. Proton therapy (PT) is a widely adopted form of treatment for many types of cancer.
A PT device which was not properly calibrated is used to treat a patient with pancreatic cancer (Fig. 3.1). As a result, a PT beam randomly shoots
Histopathology for pancreatic cancer cells |
- Find the statistical distribution of the number of correct hits on cancerous cells in the described experiment. What are the expectation and variance of the corresponding random variable?
- A radiologist using the device claims he was able to hit exactly 60 cancerous cells. How likely is it that he is wrong?
Answer
Q. The 2014 west African Ebola epidemic has become the largest and fastest spreading outbreak of the disease in modern history with a death tool far exceeding all past outbreaks combined. Ebola (named after the Ebola River in Zaire) first emerged in 1976 in Sudan and Zaire and infected over 284 people with a mortality rate of 53%.
The Ebola virus |
This rare outbreak, underlined the challenge medical teams are facing in containing epidemics. A junior data scientist at the center for disease control (CDC) models the possible spread and containment of the Ebola virus using a numerical simulation. He knows that out of a population of k humans (the number of trials), x are carriers of the virus (success in statistical jargon). He believes the sample likelihood of the virus in the population, follows a Binomial distribution:
As the senior researcher in the team, you guide him that his parameter of interest is
Answer the following; for the likelihood function of the form
- Find the log-likelihood function
$l_x(γ) = ln L_x(γ)$ . - Find the gradient of
$l_x(γ)$ . - Find the Hessian matrix
$H(γ)$ . - Find the Fisher information
$I(γ)$ . - In a population spanning
$10,000$ individuals,$300$ were infected by Ebola. Find the MLE for γ and the standard error associated with it.
Answer
Q. Let y be the number of successes in 5 independent trials, where the probability of success is θ in each trial. Suppose your prior distribution for θ is as follows:
- Derive the posterior distribution
$p(θ|y)$ after observing y. - Derive the prior predictive distribution for y.
Answer
Q. Prove that the family of beta distributions is conjugate to a binomial likelihood, so that if a prior is in that family then so is the posterior. That is, show that:
For instance, for h heads and t tails, the posterior is:
Answer
Q. A recently published paper presents a new layer for a new Bayesian neural network (BNN). The layer behaves as follows. During the feed-forward operation, each of the hidden neurons
Likelihood in a BNN model |
The chance of firing, γ, is the same for each hidden neuron. Using the formal definition, calculate the likelihood function of each of the following cases:
- The hidden neuron is distributed according to
$X ∼ binomial(n, γ)$ random variable and fires with a probability of$γ$ . There are 100 neurons and only 20 are fired. - The hidden neuron is distributed according to
$X ∼ Uniform(0,γ)$ random variable and fires with a probability of$γ$ .
Answer
Q. Your colleague, a veteran of the Deep Learning industry, comes up with an idea for for a BNN layer entitled OnOffLayer. He suggests that each neuron will stay on (the other state is off) following the distribution
OnOffLayer in a BNN model |
Answer
Q. A Dropout layer(Fig. 3.12) is commonly used to regularize a neural network model by randomly equating several outputs (the crossed-out hidden node H) to 0.
A Dropout layer (simplified form) |
import torch
import torch.nn as nn
nn.Dropout(0.2)
Where nn.Dropout(0.2) (Line #3 in 3.1) indicates that the probability of zeroing an element is 0.2.
A Bayesian Neural Network Model |
A new data scientist in your team suggests the following procedure for a Dropout layer which is based on Bayesian principles. Each of the neurons
During the training of a neural network, the Dropout layer randomly drops out outputs of the previous layer, as indicated in (Fig. 3.12). Here, for illustration purposes, all two neurons are dropped as depicted by the crossed-out hidden nodes
You are interested in the proportion θ of dropped-out neurons. Assume that the chance of drop-out,
Answer
Q. A new data scientist in your team, who was formerly a Quantum Physicist, suggests the following procedure for a Dropout layer entitled Quantum Drop which is based on Quantum principles and the Maxwell Boltzmann distribution. In the Maxwell-Boltzmann distribution, the likelihood of finding a particle with a particular velocity v is provided by:
The Maxwell-Boltzmann distribution |
In the suggested QuantumDrop layer (3.15), each of the neurons behaves like a molecule and is distributed according to the Maxwell-Boltzmann distribution and fires only when the most probable speed is reached. This speed is the velocity associated with the highest point in the Maxwell distribution (3.14). Using calculus, brain power and some mathematical manipulation, find the most likely value (speed) at which the neuron will fire.
A QuantumDrop layer |
Answer
Q. List down important continuous probability distributions?
Answer
- Uniform distribution
- Exponential distribution
- Normal distribution
Q. List down important continuous probability distributions?
Answer
- Uniform distribution
- Exponential distribution
- Normal distribution
Q. Under what conditions can we say that a random variable is drawn from a uniform distribution?
Answer
A random variable can be said to be drawn from a uniform distribution if it exhibits equal likelihood across its range (for continuous) or set of values (for discrete), with a constant probability density (for continuous) or equal probability (for discrete).
Q. Define the following in context of uniform probability distribution?
- Support
- PDF Expression
- Expectations
- Variance
Answer
- Support :
$x \in [\alpha, \beta]$ - PDF expression
$$ P(X = x_i) = \frac{1}{n} \text{ for each } x_i $$
- Expectations
$$ E[X] = \frac{1}{2}(\alpha + \beta) $$
- Variance
$$ Var(X) = \frac{1}{12}(\beta - \alpha)^2 $$
Q. Given a uniform random variable X in the range of [0,1] inclusively. What’s the probability that X=0.5 ?
Answer
Zero, For a continuous random variable, such as a uniform random variable
$$ P(X = x) = 0 $$
Q. In context of exponential distribution define following terms?
- Notation
- Parameters
- Support
- Pdf expression
- Cdf expression
- Expectation
- Variance
Answer
Notation
$$ X ~ Exp(\lambda) $$
Parameters
Support
PDF Expression
$$ f(x) = \lambda e^{-\lambda x} $$
CDF Equation
$$ F(x) = 1 - e^{-\lambda x} $$
Expectation
$$ E[X] = \frac{1}{\lambda} $$
Variance
$$ Var[X] = \frac{1}{\lambda^2} $$
Q. Based on historical data from the USGS, earthquakes of magnitude 8.0+ happen in a certain location at a rate of 0.002 per year. Earthquakes are known to occur via a poisson process. What is the probability of a major earthquake in the next 4 years?
Answer
Let
$$ P(Y < 4) = F_{Y}(4) = 1 - e^{-lambda.y} $$
$$ P(Y < 4) = 1 - e^{-0.002*4} = 0.008 $$
So the probability of major earthquake in next 4 years is
Q. Can the values of PDF be greater than 1? If so, how do we interpret PDF?
Answer
Yes, the values of a Probability Density Function (PDF) can be greater than 1.
PDF Values Are Not Probabilities
For continuous random variables, the probability of the variable taking any exact value,
Probability from a PDF
To find the probability of a random variable falling within a certain range, we need to integrate the PDF over that range.
$$ P(a \leq X \leq b) = \int_{a}^{b} f(x) , dx $$
Here,
Example
Consider a PDF defined as
$$ \int_{0}^{1/5} 5 , dx = 5 \times \frac{1}{5} = 1 $$
This is a valid PDF even though
Q. What is the expression for the probability density function (PDF) of a normal distribution, and what are the expectation and variance of this distribution?
Answer
The probability density function (PDF) of a normal distribution with mean
$$ f_X(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right) $$
- Expectation (Mean):
$\mu$ - Variance:
$\sigma^2$
Q. If
Answer
$$ Y ~ N(a\mu+b, a^2\sigma^2) $$
Q. What is the expression of CDF of normal distribution?
Answer
The cumulative distribution function (CDF) of a normal distribution gives the probability that a random variable
- CDF of Normal Distribution:
$F_X(x) = \Phi\left(\frac{x - \mu}{\sigma}\right)$ - CDF of Standard Normal Distribution:
$\Phi(z) = \frac{1}{2} \left[1 + \text{erf}\left(\frac{z}{\sqrt{2}}\right)\right]$
Q. What are the properties of standard normal distribution?
Answer
The standard normal distribution is a special case of the normal distribution. It has the following characteristics:
$\mu = 0$ $\sigma^2 = 1$
Q. You’re drawing from a random variable that is normally distributed,
Answer
Q. In what condition we can approximate the binomial distribution into normal or poisson distribution?
Answer
- Use the Poisson approximation when
$n$ is large (>20) and$p$ is small (<0.05). - Use the Normal approximation when
$n$ is large (>20), and$p$ is mid-ranged.$np(1-p) > 10$
Q. It’s a common practice to assume an unknown variable to be of the normal distribution. Why is that?
Answer
Assuming that an unknown variable follows a normal distribution is based on theoretical results like the Central Limit Theorem, mathematical convenience, the ability to approximate other distributions, practical evidence from real-world data, and the robustness of statistical methods. This assumption often simplifies analysis and provides useful approximations and insights.
Q. Is it possible to transform non-normal variables into normal variables? How?
Answer
Yes it is possible to transform non-normal variables into normal variables.
We can try following methods to accomplished that:
- Box-cox transformation
- Log Transformation : In case of right skewed data
- Quantile Transformation
Q. Define the term prior distribution of a likelihood parameter
Answer
Q. In this question, you are going to derive the Fisher information function for several distributions. Given a probability density function (PDF)
- The natural logarithm of the PDF
$lnf(X|γ) = Φ(X|γ)$ . - The first partial derivative
$Φ′(X|γ)$ . - The second partial derivative
$Φ′′(X|γ)$ . - The Fisher Information for a continuous random variable:
Find the Fisher Information
- The Bernoulli Distribution
$X ∼ B(1, γ)$ . - The Poisson Distribution
$X ∼ Poiss(θ)$ .
Answer
Q.
- Define the term posterior distribution.
- Define the term prior predictive distribution.
Answer
Q. What do you mean by covariance?
Answer
Covariance is a quantitative measure of the extent to which the deviation of one variable from its mean matches the deviation of the other from its mean.
$$ \text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] $$
Q. What is the co-variance of two independent random variables?
Answer
If two random variables
$$ Cov(X, Y) = E[XY] - E[Y]E[X] $$
Using product of expectations for independent RV
$$ E[X]E[Y] - E[Y]E[X] = 0 $$
Q. Are independence and zero covariance the same? Give a counterexample if not.
Answer
No
Independence and zero covariance are related but not the same. Independence implies zero covariance, but the reverse is not necessarily true. Zero covariance indicates that there is no linear relationship between the variables. However, it does not necessarily mean that the variables are independent. Non-linear relationships can still exist between variables even if their covariance is zero.
Lets explain this with an example:
Consider two random variables
- A random variable
$𝑋$ with$𝐸[𝑋]=0$ and$𝐸[𝑋^3]=0$ , e.g. normal random variable with zero mean. - Take
$𝑌=𝑋^2$ .
Now it is clear the
Now
Q. Suppose we have a random variable X supported on
Answer
Q. Define variance and co-variance. What is the difference between them?
Answer
-
Variance measures the spread of a random variable while co-variance measures how two random variables vary together.
-
Variance can be only non-negative while co-variance can be positive, negative or zero.
Q. How does sign of covariance decides the direction of relationship between two random variables?
Answer
The sign of co-variance decides how two random variables vary together
- Positive covariance - Both variables vary in the same direction
- Negative covariance - Both variables vary in opposite direction
- Zero covariance - There are linearly independent
Q. Suppose you are conducting an experiment for studying behavior of two random variables
Answer
From the given information we can not conclude about strength of the relationship.
Covariance is scale variant. It means its value is sensitive to scale of the measurement. We need to find out correlation
Q. Prove that
Answer
We know that:
Also
on further simplifications, we get:
Q. Prove that
Answer
From definition of Covariance:
on simplification,
after taking outer expectations:
Since E(X) is a constant since we are taking the average and expectation of a constant is just the same constant.
Using the above expression:
after canceling out terms:
Q. What will be the value of $Cov(X, c) where
$Cov(X)$ $cCov(X)$ $c^2Cov(X)$ $0$
Answer
Q. Write down some properties of Covariance.
Answer
Properties of Covariance:
$Cov(X, Y) = Cov(Y, X)$ $Cov(X, X) = Var(X)$ $Cov(X, c) = 0$ $Cov(X, Y+Z) = Cov(X, Y) + Cov(X, Z)$ -
$Cov(X, Y) = 0$ if X and Y are independent. $Cov(c) = c$
Here
Q. Define correlation. How is it related to Covariance?
Answer
Correlation between two random variables,
$$ \rho(X, Y) = \frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}} $$
Q. What are the benefits of using correlation over covariance?
Answer
- Correlation is scale independent while covariance is scale dependent and hence harder to interpret and compare
- Correlation is bounded i.e it is always between
$-1$ and$1$ - Covariance only measure directional relationship between two variables but correlation also measures strength and the direction both
Q. If you are analyzing two random variables
Answer
Absence of linear relationship between
Q. Can the correlation be greater than 1? Why or why not? How to interpret a correlation value of 0.3?
Answer
No correlation(r) can not be greater than 1 and range of r is
Lets look at the expression of correlation(r) to establish the above statement.
Suppose we have two variables X and Y and we want to investigate the relationship between them.
Now let's define two vectors
We can now write the sample covariance and the sample variances in vector notation as:
similarly variance in X and Y can be expressed in vectorized form:
Now we can write correlation expression using vectors
From cosine rule, we get
Since
For give
- Relationship is positive but week.
- Increasing one variable is resulting in increase in another variable too.
Note that the above conclusions are based on assumption that both variable are linearly dependent.
Q. What is the Pearson correlation coefficient, and how is it calculated?
Answer
Correlation measures the degree to which there is a linear relationship between two variables.
Mathematical expression:
$$ \rho(X, Y) = \frac{Con(X, Y)}{\sigma_{X} \sigma_{Y}} $$
Step-by-step calculation:
- Compute the mean of
$X$ and$Y$ :
$$ \bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i, \quad \bar{Y} = \frac{1}{n} \sum_{i=1}^{n} Y_i $$
- Calculate the Covariance
$$ \text{Cov}(X, Y) = \frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y}) $$
- Calculate the standard deviation of
$X$ and$Y$
$$ \sigma_X = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^2}, \quad \sigma_Y = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (Y_i - \bar{Y})^2} $$
- Substitute into the Pearson Correlation Formula
$$ r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2} \sqrt{\sum_{i=1}^{n} (Y_i - \bar{Y})^2}} $$
Q. What is the significance of a correlation coefficient value of 0?
Answer
A correlation coefficient value of 0, suggests the absence of linear relationship between two random variable. It may indicate the presence of non-linear relationship between variables.
Q. What is the difference between positive and negative correlation?
Answer
- Positive Correlation: An increase or decrease in one variable is associated with a similar change in the other variable (
$\rho > 0$ ); both variables move in the same direction. - Negative Correlation: An increase in one variable is associated with a decrease in the other, and vice versa (
$\rho < 0$ ); both variables move in opposite directions.
Q. What are some limitations of the Pearson correlation coefficient?
Answer
- It only measures strength of linear relationship and can't be used to measure nonlinear relationship
- It can not distinguish between independent and dependent variable. Pearson’s r does not indicate which variable was the cause and which was the effect.
- Can't handle categorical data
- A high or low Pearson correlation does not imply causation.
- It is sensitive to outliers
Q. Can you have a high correlation without causation?
Answer
Yes, we can have a high correlation without causation. This phenomenon is often summarized by the phrase correlation does not imply causation. A strong correlation between two variables simply indicates that they move together in a predictable way, but it does not necessarily mean that one variable causes the other to change.
We might have high correlation but not causality in following scenarios:
- Presence of confounding/latent variables : Presence of third variable
- Coincidence : Variables are related just by chance