5. Expected Values of Functions of Random Vectors
Let (X,Y)(X,Y) be a two-dimensional random variable and D(X,Y)D(X,Y) the set of points of discontinuity of the joint cumulative distribution function FX,Y(x,y).FX,Y(x,y).
Definition: Let be a function of the two-dimensional random variable (X,Y)(X,Y). Then, the expected value of is given by
(X,Y)(X,Y) is a two-dimensional discrete random variable: E[g(X,Y)]=∑(x,y)∈D(X,Y)g(x,y)fX,Y(x,y)E[g(X,Y)]=∑(x,y)∈D(X,Y)g(x,y)fX,Y(x,y) provided that ∑(x,y)∈D(X,Y)|g(x,y)|fX,Y(x,y)<+∞.∑(x,y)∈D(X,Y)|g(x,y)|fX,Y(x,y)<+∞.
(X,Y)(X,Y) is a two-dimensional continuous random variable: E[g(X,Y)]=∫+∞−∞∫+∞−∞g(x,y)fX,Y(x,y)dxdyE[g(X,Y)]=∫+∞−∞∫+∞−∞g(x,y)fX,Y(x,y)dxdy provided that ∫+∞−∞∫+∞−∞|g(x,y)|f(x,y)dxdy<+∞.∫+∞−∞∫+∞−∞|g(x,y)|f(x,y)dxdy<+∞.
Example 1 Example: Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of g(X,Y)=X+Yg(X,Y)=X+Y.
Answer: Using the definition of expected value, one gets E(X+Y)=∫+∞−∞∫+∞−∞(x+y)fX,Y(x,y)dxdy=∫20∫10x(x+y)dxdy=∫2013+y2dy=53E(X+Y)=∫+∞−∞∫+∞−∞(x+y)fX,Y(x,y)dxdy=∫20∫10x(x+y)dxdy=∫2013+y2dy=53
Theorem: Let (X,Y)(X,Y) be a discrete two-dimensional random variable with joint probability function fX,Y(x,y)fX,Y(x,y):
If g(X,Y)=h(X)g(X,Y)=h(X) that is g(X,Y)g(X,Y) only depends on XX , then E(g(X,Y))=E[h(X)]=∑(x,y)∈D(X,Y)h(x)fX,Y(x,y)=∑x∈DXh(x)∑y∈DYfX,Y(x,y)=∑x∈DXh(x)fX(x)E(g(X,Y))=E[h(X)]=∑(x,y)∈D(X,Y)h(x)fX,Y(x,y)=∑x∈DXh(x)∑y∈DYfX,Y(x,y)=∑x∈DXh(x)fX(x) provided that ∑(x,y)∈D(X,Y)|h(x)|fX,Y(x,y)<+∞.∑(x,y)∈D(X,Y)|h(x)|fX,Y(x,y)<+∞.
If g(X,Y)=v(Y)g(X,Y)=v(Y) that is g(X,Y)g(X,Y) only depends on YY , then E[v(Y)]=∑(x,y)∈D(X,Y)v(y)fX,Y(x,y)=∑y∈DYv(y)∑x∈DXfX,Y(x,y)=∑y∈DYv(y)fY(y)E[v(Y)]=∑(x,y)∈D(X,Y)v(y)fX,Y(x,y)=∑y∈DYv(y)∑x∈DXfX,Y(x,y)=∑y∈DYv(y)fY(y) provided that ∑(x,y)∈D(X,Y)|v(y)|fX,Y(x,y)<+∞.∑(x,y)∈D(X,Y)|v(y)|fX,Y(x,y)<+∞.
Example 2 Example: Let (X,Y)(X,Y) be a two-dimensional random variable such that fX,Y={15,x=1,2,y=0,1,2,y≤x0,otherwise.fX,Y={15,x=1,2,y=0,1,2,y≤x0,otherwise. Compute the expected value of XX.
Solution:
ii By using the joint probability function: E(X)=∑(x,y)∈D(X,Y)xfX,Y(x,y)=2∑x=1x∑y=015x=85E(X)=∑(x,y)∈D(X,Y)xfX,Y(x,y)=2∑x=1x∑y=015x=85
Example 3 Example: Let (X,Y)(X,Y) be a two-dimensional random variable such that fX,Y={15,x=1,2,y=0,1,2,y≤x0,otherwise.fX,Y={15,x=1,2,y=0,1,2,y≤x0,otherwise. Compute the expected value of XX.
Solution:
iiii By using the marginal function: fX(x)=x∑y=0fX,Y(x,y)={25,x=135,x=20,otherwise.fX(x)=x∑y=0fX,Y(x,y)=⎧⎪⎨⎪⎩25,x=135,x=20,otherwise. Therefore, E(X)=2∑x=1xfX(x)=1×25+2×35=85.E(X)=2∑x=1xfX(x)=1×25+2×35=85.
Theorem: Let (X,Y)(X,Y) be a continuous two-dimensional random variable with joint probability function fX,Y(x,y):fX,Y(x,y):
- If g(X,Y)=h(X)g(X,Y)=h(X) that is g(X,Y)g(X,Y) only depends on XX , then
E[h(X)]=∫+∞−∞∫+∞−∞h(x)fX,Y(x,y)dxdy=∫+∞−∞h(x)(∫+∞−∞fX,Y(x,y)dy)dx=∫+∞−∞h(x)fX(x)dxE[h(X)]=∫+∞−∞∫+∞−∞h(x)fX,Y(x,y)dxdy=∫+∞−∞h(x)(∫+∞−∞fX,Y(x,y)dy)dx=∫+∞−∞h(x)fX(x)dx provided that ∫+∞−∞∫+∞−∞|h(x)|fX,Y(x,y)dxdy<+∞.∫+∞−∞∫+∞−∞|h(x)|fX,Y(x,y)dxdy<+∞.
- If g(X,Y)=v(Y)g(X,Y)=v(Y) that is g(X,Y)g(X,Y) only depends on YY , then
E[v(Y)]=∫+∞−∞∫+∞−∞v(y)fX,Y(x,y)dxdy=∫+∞−∞v(y)(∫+∞−∞fX,Y(x,y)dx)dy=∫+∞−∞v(Y)fY(y)dyE[v(Y)]=∫+∞−∞∫+∞−∞v(y)fX,Y(x,y)dxdy=∫+∞−∞v(y)(∫+∞−∞fX,Y(x,y)dx)dy=∫+∞−∞v(Y)fY(y)dy provided that ∫+∞−∞∫+∞−∞|v(y)|fX,Y(x,y)dxdy<+∞.∫+∞−∞∫+∞−∞|v(y)|fX,Y(x,y)dxdy<+∞.
Example 4 Example: Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of 3X+23X+2.
Answer:
ii Using the joint density function.
Using the definition of marginal expected value, one gets E(3X+2)=3E(X)+2=3∫+∞−∞∫+∞−∞xfX,Y(x,y)dxdy+2=3∫20∫10x2dxdy+2=3∫2013dy+2=4E(3X+2)=3E(X)+2=3∫+∞−∞∫+∞−∞xfX,Y(x,y)dxdy+2=3∫20∫10x2dxdy+2=3∫2013dy+2=4
Example 5 Example: Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of 3X+23X+2.
Answer:
iiii Using the marginal density function.
The marginal density function of XX is given by fX(x)=∫∞−∞fX,Y(x,y)dy={2x,0<x<10,otherwise.fX(x)=∫∞−∞fX,Y(x,y)dy={2x,0<x<10,otherwise. Therefore, E(3X+2)=3E(X)+2=4E(3X+2)=3E(X)+2=4, because E(X)=∫+∞−∞xfX(x)dx=∫102x2dx=23E(X)=∫+∞−∞xfX(x)dx=∫102x2dx=23
Properties:
E[h(X)+v(Y)]=E[h(X)]+E[v(Y)]E[h(X)+v(Y)]=E[h(X)]+E[v(Y)] provided that E[|h(X)|]<+∞,E[|h(X)|]<+∞, EE
E[∑Ni=1Xi]=∑Ni=1E[Xi],E[∑Ni=1Xi]=∑Ni=1E[Xi], where NN is a finite integer, provided that E[|Xi|]<+∞E[|Xi|]<+∞ for i=1,2,...,N.i=1,2,...,N.
Example 6 Example:Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of YY.
Answer: We know that E(X+Y)=E(X)+E(Y)=53E(X+Y)=E(X)+E(Y)=53. Since E(X)=23E(X)=23, then we get that E(Y)=1E(Y)=1.
Definition: The rr th and ss th moment of products about the origin of the random variables XX and YY, denoted by is the expected value of XrYs,XrYs, for r=1,2,...; s=1,2,... which is given by
if X and Y are discrete random variables: μ′r,s=E[XrYs]=∑(x,y)∈D(X,Y)xrysfX,Y(x,y)
if X and Y are continuous random variables: μ′r,s=E[XrYs]=∫+∞−∞∫+∞−∞xrysf(x,y)dxdy
Remarks:
If r=s=1, we have μ′1,1=E[XY]
Cauchy-Schwarz Inequality: For any two random variables X and Y, we have |E[XY]|≤E[X2]1/2E[Y2]1/2 provided that E[|XY|] is finite.
If X and Y are independent random variables, E[h(X)v(Y)]=E(h(X))E(v(Y)) for any two functions h(X) and v(Y).
[Warning: The reverse is not true.]
If X1,X2,...,Xn are independent random variables independent, E[X1X2...Xn]=E(X1)E(X2)...E(Xn).
[Warning: The reverse is not true.]
Definition: The r th and s th moment of products about the mean of the discrete random variables X and Y, denoted by μr,s is the expected value of (X−μX)r(Y−μY)s, for r=1,2,...; s=1,2,... which is given by μr,s=E[(X−μX)r(Y−μY)s]=∑(x,y)∈D(X,Y)(x−μX)r(y−μY)sfX,Y(x,y)
Definition: The r th and s th moment of products about the mean of the continuous random variables X and Y, denoted by for r=1,2,...; s=1,2,... is given by μr,s=E[(X−μX)r(Y−μY)s]=∫+∞−∞∫+∞−∞(x−μX)r(y−μY)sf(x,y)dxdy
The covariance is a measure of the joint variability of two random variables. Formally it is defined as Cov(X,Y)=σXY=μ1,1=E[(X−μX)(Y−μY)]
How can we interpret the covariance?
When the variables tend to show similar behavior, the covariance is positive:
- If high (small) values of one variable mainly correspond to high (small) values of the other variable;
When the variables tend to show opposite behavior, the covariance is negative:
- When high (small) values of one variable mainly correspond to low (high) values of the other;
If there is no linear association, then the covariance will be zero.
Properties:
Cov(X,Y)=E(XY)−E(X)E(Y).
If X and Y are independent Cov(X,Y)=0.
If Y=bZ, where b is constant, Cov(X,Y)=bCov(X,Z).
If Y=V+W, Cov(X,Y)=Cov(X,V)+Cov(X,W).
If Y=b, where b is constant, Cov(X,Y)=0.
If follows from the Cauchy-Schwarz Inequality that |Cov(X,Y)|≤√Var(X)Var(Y).
The covariance has the inconvenient of depending on the scale of both random variables. For what values of the covariance can we say that there is a strong association between the two random variables?
The correlation coefficient is a measure of the joint variability of two random variables that do not depend on the scale: ρX,Y=Cov(X,Y)√Var(X)Var(Y).
Properties:
- If follows from the Cauchy-Schwarz Inequality that −1≤ρX,Y≤1.
If Y=bX+a, where b and a are constants
ρX,Y=1 if b>0.
ρX,Y=−1 if b<0.
If b=0, it is not defined.
Summary of important results:
If Y=V±W, Var(Y)=Var(V)+Var(W)±2Cov(V,W).
If X1,....,Xn are random variables and a1,...,an are constants and Y=∑ni=1aiXi, then Var(Y)=n∑i=1a2iVar(Xi)+2n∑i=1n∑j=1,j<iaiajCov(Xi,Xj)⏟=0, if Xi,Xj are independent.
If X1,....,Xn are random variables, a1,...,an are constants and b1,...,bn are constants, Y1= and Y2=∑ni=1biXi then
Cov(Y1,Y2)=n∑i=1aibiVar(Xi)+n∑i=1n∑j=1,j<i(aibj+ajbi)Cov(Xi,Xj)⏟=0, if Xi,Xj are independent.
Definition: Let (X,Y) be a two dimensional random variable and u(Y,X) a function of Y and X. Then, the conditional expectation of u(Y,X) given X=x, is given by
if X and Y are discrete random variables E[u(Y,X)|X=x]=∑y∈DYu(y,x)fY|X=x(y) where DY is the set of discontinuity points of FY(y) and fY|X=x(y) is the value of the conditional probability function of Y given X=x at y
if X and Y are continuous random variables E[u(Y,X)|X=x]=+∞∫−∞u(y,x)fY|X=x(y)dy where fY|X=x(y) is the value of the conditional probability density function of Y given X=x at y.
provided that the expected values exist and are finite.
Remarks:
If u(Y,X)=Y, then we have the conditional mean of Y, E[u(Y,X)|X=x]=E[Y|X=x]=μY|x (notice that this is a function of x).
If u(Y,X)=(Y−μY|x)2, then we have the conditional variance of Y E[u(Y,X)|X=x]=E[(Y−μY|x)2|X=x]=E[(Y−E[u(Y)|X=x])2|X=x]=Var[Y|X=x]
As usual, Var[Y|X=x]=E[Y2|X=x]−E[Y|X=x]2.
If Y and X are [independent], E(Y|X=x)=E(Y).
Of course we can reverse the roles of Y and X, that is we can compute E(u(X,Y)|Y=y), using definitions similar to those above.
Example: Let (X,Y) be two-dimensional random variable such that fX,Y(x,y)={1/2,0<x<2,0<y<x0,c.c.. Then the conditional density function of Y|X=1 is given by fY|X=1(y)={fX,Y(1,y)fX(1),0<y<10,c.c.={1/21/2,0<y<10,c.c.={1,0<y<10,c.c. where fX(x)={∫x0fX,Y(x,y)dy,0<x<20,c.c.={x2,0<x<20,c.c..
Example: The conditional expected value can be computed as follows:
E(Y|X=1)=∫10yfY|X=1(y)dy=∫10ydy=12. To compute the conditional variance, one may start by computing the following conditional expected value E(Y2|X=1)=∫10y2fY|X=1(y)dy=∫10y2dy=13. Therefore Var(Y|X=1)=E(Y2|X=1)−(E(Y|X=1))2=13−14=112
Example: Let X and Y be two random variables such that fX,Y(x,y)=19, for x=1,2,3,y=0,1,2,3,y≤x
To compute the conditional expected value one has to compute the condition probability function:
fY|X=1(y)={fX,Y(1,Y)fX(1),y=0,10,otherwise={12,y=0,10,otherwise where fX(1)=1∑y=0fX,Y(1,y)=1∑y=019=29 Therefore, E(Y|X=1)=∑y∈DYyfY|X=1(y)=0×12+1×12=12.
Notice that g(y)=E(X|Y=y) is indeed a function of y. Therefore, g(Y) is a random variable because Y can take different values according its distribution, i.e, if Y can take the value y, then g(Y) can take g(y) with probability P(Y=y)>0.
- Discrete random variables
The random variable Z=g(Y)=E(X|Y) takes the values g(y)=E(X|Y=y). Assume that all values of g(y) are different. Then, Z takes the value g(y) with probability P(Y=y)
In general, the probability function of Z=g(Y)=E(X|Y) can be computed in the following way P(Z=z)=P(g(Y)=z)=P(Y∈{y:g(y)=z})
Example: Let (X,Y) be a discrete random variable such that fX,Y(x,y) is represented in the following table
X/Y | 1 | 2 | 3 |
---|---|---|---|
0 | 0.2 | 0.1 | 0.15 |
1 | 0.05 | 0.35 | 0.15 |
One may compute the following conditional probability functions: fY|X=0={4/9,y=12/9,y=23/9,y=30,otherwiseandfY|X=1={1/11,y=17/11,y=23/11,y=30,otherwise. Consequently, E(Y|X=0)=17/9 and E(Y|X=1)=24/11. Therefore, the random variable Z=E(Y|X) has the following probability function P(Z=z)={P(X=0),z=17/9P(X=1),z=24/110,otherwise={0.45,z=17/90.55,z=24/110,otherwise.
- Continuous random variables
The cumulative distribution function of Z=g(Y)=E(X|Y) is, indeed FZ(z)=P(Z≤z)=P(g(Y)≤z)=P(Y∈{y:g(y)≤z}) When g is an injective function, we get that FZ(z)=FY(g−1(z)) or FZ(g(y))=FY(y).
Therefore, we can calculate all the quantities that we know (the expected value, variance, ...) for E(X|Y) or E(Y|X)
Theorem (Law of iterated Expectations) Let (X,Y) be a two dimensional random variable. Then, E(Y)=E(E[Y|X]) provided that E(|Y|) is finite and E(X)=E(E[X|Y]) provided that E(X) is finite.
Remark: This theorem shows that there are two ways to compute E(Y) (resp., E(X)). The first is the direct way. The second way is to consider the following steps:
compute E[Y|X=x] and notice that this is a function solely of x that is we can write g(x)=E[Y|X=x],
according to the theorem replacing g(x) by g(X) and taking the mean we obtain E[g(X)]=E[Y] for this specific form of g(X).
This theorem is useful in practice in the calculation of E(Y) if we know fY|X=x(y) or E[X|X=x] and fX(x) (or some moments of X), but not fX,Y(x,y).
Remarks: The results presented can be generalized for functions of X and Y, i.e., E(u(X,Y))=E(E(u(X,Y)|X)), if E(u(X,Y)) exists.
Example: Let (X,Y) be a bi-dimensional continuous random variable such that E(X|Y=y)=3y−13andfY(y)={1/2,0<y<20,otherwise Taking into account the previous theorem, E(X)=E(E(X|Y))=E(3Y−13)=∫203y−16dy=2/3.
Theorem: Assuming that E(Y2) exists then Var(Y)=Var[E(Y|X)]+E[Var[Y|X]].
Theorem: Let X and Y be two random variables then Cov(X,Y)=Cov(X,E(Y|X))
Example: Let (X,Y) be a bidimensional random variable such that fX|Y=y(x)=1y, 0<x<y (for a fixedy>1)fY(y)=3y−4, y>1 Compute Var(X) using the previous theorem.
Exam question: Let X and Y be two random variables such that E(X|Y=y)=y, for all y such that fY(y)>0. Prove that Cov(X,Y)=Var(Y). Are the random variables independent? Justify your answer.