6. Special Random Variables and Repeated Sampling Distributions
A random variable has a discrete uniform distribution and it is referred to as a discrete uniform random variable if and only if , its probability function is given by where for
Properties: Assuming that , , , , then
{.example} Example: Let be the random variable that represents the number of dots when one rolls a die. Then follows a discrete uniform distribution taking values . Its probability function is given by Expected value: Variance: Moment generating function:
1 Bernoulli random variable
The Bernoulli random variable takes the value with probability and the value with probability where that is the probability function is given by
Properties:
Remark: This random variable is used when the result of the experiment is a success or a failure.
2 Binomial random variable
The Binomial random variable is defined as the number of successes in trials, each of which has the probability of success
The Binomial random variable: number of successes in trials. One can show that the probability function is given by
where is the number of combinations from a set with elements and .
Remark:
The parameters of the random variable are and
If is a Binomial random variable with parameters and we write
In the case of the Bernoulli random variable
Properties:
If and the are independent random variables that is the sum of independent Bernoulli random variables with parameter is a Binomial random variable with parameters and
If and and and are independent, then
Example: In a given factory, of the produced products have a failure. Let be the random variable that represents the number of products produced with a failure in a sample of 5 products. Then Expected value: Variance: Probability:
Example: Let be a random variable that represents the lifetime in hours of a bulb lamp with density function
Compute the probability that 3 bulb lamps in a sample of 5 have a lifetime smaller 100 hours.
Solution: We can start by computing the probability that 1 bulb lamp has lifetime smaller 100 hours. Let be the random variable that represents the number of bulb lamps in a sample of 5 with a lifetime smaller 100 hours. The required probability is
Example: Suppose that in a group of 1000 computers 50 have a problem in the hardware system. We pick randomly a sample of 100 computers. Let be the random variable that counts the number of computers with hardware problems.
What is the distribution of if the experiment is done with replacement?
Answer: counts the number of successes in a set of computers. The selection is made with replacement.
What is the distribution of if the experiment is done without replacement?
Answer: counts the number of successes in a set of computers, but the selection is made without replacement. This means that does not follow a binomial distribution. Indeed,
Consider a finite population of size that contains exactly objects with a specific feature. The hyper-geometric distribution is a discrete probability distribution that describes the probability of successes in draws (to get objects with the referred feature), without replacement. Properties: If , then
There is no closed form solution for
Example: Assume that there are 20 balls in a box, where 2 are green, 8 are blue, 5 are red and 5 yellow. If someone chooses randomly and without replacement 3 balls from the box. Compute the probability that 1 of them is blue.
Solution: Let be the random variable that counts the number of blue balls in a set of 3 when the experiment is made without replacement. The probability function of this random variable is The required probability is
- In connection with repeated Bernoulli trials, we are sometimes interested in the number of the trial on which the success occurs.
Assume that .
Each trial has two potential outcomes called "success" and "failure". In each trial the probability of success is and of failure is .
We are observing this sequence until the first success has occurred.
If is the random variable that counts the number of trials until a success! and the success occurs on the trial (the first trials are failures), then follows a geometric distribution with a probability of success .
is the random variable that counts the number of trials until a success!
Useful Result: Therefore, , for
Memoryless property:
Remark: is a random variable taking values in that satisfies the memoryless property iff has a geometric distribution.
Assume now that we are observing a sequence of Bernoulli trials until a predefined number of successes has occurred.
If the success is to occur on the trial, there must be successes on the first trials, and the probability for this is where follows a negative binomial distribution with parameters and
Properties:
The Poisson random variable is a discrete rv that describes the number of occurrences within a randomly chosen unit of time or space. For example, within a minute, hour, day, kilometer.
The Poisson probability function is a discrete function defined for non-negative integers. If is a Poisson random variable with parameter we write The Poisson distribution with parameter , it is defined by Properties:
If and the are independent random variables, then .
Example: Assume that in one day the number of people that take bus n1 in a small city follows a Poisson distribution with .
Question: What is the probability that in a random day 5 people take bus n1?
Solution: Firstly we should notice that Then . Now we have to compute the probability
Question: What is the probability that 5 people take bus n1 in two days?
Solution: Let be the rv that represents the number of people that take bus n1 with . Then Now we have to compute the probability
The probability density function and the cumulative distribution function of an exponential random variable with parameter are respectively
Remark: If is an exponential random variable with parameter we write
Properties: Let be an exponential random variable. Then,
Moment Generating Function
and
Lack of memory: for any and
Let , be independent random variables, then
Example: Let be a random variable that represents the lifetime of an electronic component in years. It is known that follows an exponential distribution such that Question: Knowing that , what is the value of ?
Solution: Taking into account that then .
Question: What is the probability that the lifetime of the component electronic is grater than years knowing that it is grater than year?
Solution: Given the memoryless property we have that
Question: Assume that one has 3 similar electronic components that are independent. What is the probability that the lowest lifetime of these electronic components is lower than 2 years?
Solution: Since we have 3 independent and identical components, than we must have 3 random variables representing respectively the lifetime of the electronic component . The lowest lifetime is the random variable According to the properties we have that Therefore,
Poisson Process:
represents the number occurrences in the interval , where . The collection of random variables is called a Poisson process with intensity if
the number of occurrences in disjoint intervals are independent random variables;
the number of occurrences in intervals of the same size are random variables with the same distribution;
.
Relationship between the Poisson and Exponential distribution: Let
be the number occurrences in the interval , where .
be the time spent between the two consecutive occurrences and of the event.
If the collection of random variables is a Poisson process, then
Example: Assume that represents the number of clients that go to a store in hours. The average number of clients in two hours is 5. follows a Poisson process.
Compute the probability that in 1 hour at least 2 clients go to the store.
Answer: Firstly, we can notice that and . Additionally, , meaning that . The requested probability is
Compute the probability that 5 clients go to the store in 1 hour and a half knowing that no clients went there in the first 30 minutes.
Answer: The requested probability is
Example:
Compute the probability that the first client arrives 45 minutes after the opening hour.
Let be the rv that represents the time spent until the first client arrives.
Question: How to model the time between two or three or more occurrences in a Poisson Process?
Gamma distribution: The gamma cumulative distribution function is defined for , , , by the integral where is the Gamma function. The parameters and are called the shape parameter and scale parameter, respectively.
The probability density function for the gamma distribution is
Remarks:
If is a gamma random variable with parameters and we write
if and
Important case: When and we have the chi-squared distribution which has the notation is known as degrees of freedom.
Relationship between the Poisson and Gamma distribution: Let
be the number occurrences in the interval , where .
be the time spent between the two consecutive occurrences and
be the time spent between the occurrences and of the event.
If the collection of random variables is a Poisson process, then
Properties: Let be a Gamma distribution with parameters and .
The Moment generating function of the Gamma distribution is given by: for
Let , ,..., be independent random variables with Gamma distribution then
If , then then
In the case of the chi-squared random variables we have:
Let , ,..., be independent random variables with Chi-squared distribution and then
Exercise 13: Compute the following probabilities:
If is distributed find
If is distributed find
If is find
Exercise 14: Using the moment generating function, show that if and , then .
The probability density function of the uniform random variable on an interval where is the function
The cumulative distribution function is the function
Remark: If is a uniform random variable in the interval we write
The moment generating function (The moment-generating function is not differentiable at zero, but the moments can be calculated by differentiating and then taking
Moments about the origin
Example: Let be a continuous uniform random variable in the interval . Compute the following probabilities:
Questions: Compute the following probabilities:
Inverse transform sampling: Important result in simulation.
This result shows us that, in certain conditions, and if then .
Example: Assume that X follows an exponential distribution with parameter 1. Find the distribution of
The most famous continuous distribution is the normal distribution (introduced by Abraham de Moivre, 1667-1754). The normal probability density function is given by
The cumulative distribution function does not have a close form solution:
When a random variable follows a normal distribution with parameters and we write
Properties:
Moment generating function
There is no closed form solution to the CDF of a normal distribution, which means that one has to use an adequate software to compute the probabilities. Alternatively, one may use the tables with probabilities for the normal distribution with mean equal to and variance equal to . To use this strategy one has to notice that
When and the distribution is denoted as standard normal distribution.
The probability density function of the standard normal distribution is denoted and it is given by The standard normal cumulative distribution function is denoted as
Properties of the standard normal cumulative distribution function:
for
Examples: Assume that the weight of a certain population is modeled by a normal distribution with a mean 50 Kg and a standard deviation 5kg.
Question: What is the probability that someone weighs more than 65kg?
Solution: Let be the random variable that represents the weight of a certain person in the given population. The required probability is where
Question: What is the weight that is exceeded by 80% of the population?
Solution: We want to find the level such that . Now, taking into account the shape of the normal density function we know that the threshold . Therefore, noticing that one may easily check at the tables that
Exercise 20: A baker knows that the daily demand for a specific type of bread is a random variable such that . Find the demand which has probability of being exceeded.
Theorem: (Linear combinations of Normal random variables): Let and be two independent random variables such that $ XN({X},{X}^{2})$ and Let , then where
Remarks:
A special case is obtained when if then where
if
Example: Let and be two independent random variables such that
Question: Compute the following probability
Solution: Firstly, we notice that Therefore, where
Theorem: If the random variable have a normal distribution, and are independent, then
Assuming that and for we have
Thus
If we standardize we have
We have seen the following result:
If then the following holds true:
However, what happens if the are not normally distributed?
The answer is given by the Central Limit Theorem:
Theorem: (The Central Limit Theorem - Lindberg-Levy)
Assume that are independent, , and , then the distribution of converges to a standard normal distribution as tends to infinity.
We write where the symbol reads “distributed asymptotically”
Remarks:
This means that if the sample size is large enough (), then the distribution of is close to the standard normal.
The previous result is useful when , with do not follow a normal distribution (in this case we know the exact distribution of ).
Assume that represents the profit of a store in thousands of euros in a random day. The density function of is given by Compute the probability that the store has a profit greater than 29 thousands of euros in a month (30 days).
Solution: We start by noticing that
Solution: Assume that represents the profit of a store in thousands of euros in day , then we want to compute the following probability: By using the central limit theorem we know that Therefore
A special case of the Central Limit Theorem of Lindberg-Levy is the Central Limit Theorem of De Moivre-Laplace, which corresponds to the case that each is Bernoulli with parameter
Theorem: (The Central Limit Theorem - De Moivre-Laplace) If the are independent Bernoulli random variables with then converges to a standard normal distribution as tends to infinity. We write . {.example}
Example: Assume that a person is infected with a virus with probability 0.05. If we analyze 100 people, what is the probability that at least 7 are infected?
Solution: Let be a random variable defined by with . Therefore, is a Bernoulli random variable. Assuming independence between the rv, we have that Therefore,
Exercise 21: Assume that , with represent the profit, in million of euros, of different companies located in different countries. If
Which company is more likely to have a profit greater than millions?
What is the probability of the profit of these companies does not exceed millions of euros? (Assume independence.)