5. Expected Values of Functions of Random Vectors

Let (X,Y)(X,Y) be a two-dimensional random variable and D(X,Y)D(X,Y) the set of points of discontinuity of the joint cumulative distribution function FX,Y(x,y).FX,Y(x,y).

Definition: Let be a function of the two-dimensional random variable (X,Y)(X,Y). Then, the expected value of is given by

  • (X,Y)(X,Y) is a two-dimensional discrete random variable: E[g(X,Y)]=(x,y)D(X,Y)g(x,y)fX,Y(x,y)E[g(X,Y)]=(x,y)D(X,Y)g(x,y)fX,Y(x,y) provided that (x,y)D(X,Y)|g(x,y)|fX,Y(x,y)<+.(x,y)D(X,Y)|g(x,y)|fX,Y(x,y)<+.

  • (X,Y)(X,Y) is a two-dimensional continuous random variable: E[g(X,Y)]=++g(x,y)fX,Y(x,y)dxdyE[g(X,Y)]=++g(x,y)fX,Y(x,y)dxdy provided that ++|g(x,y)|f(x,y)dxdy<+.++|g(x,y)|f(x,y)dxdy<+.

Example 1 Example: Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of g(X,Y)=X+Yg(X,Y)=X+Y.

Answer: Using the definition of expected value, one gets E(X+Y)=++(x+y)fX,Y(x,y)dxdy=2010x(x+y)dxdy=2013+y2dy=53E(X+Y)=++(x+y)fX,Y(x,y)dxdy=2010x(x+y)dxdy=2013+y2dy=53

Theorem: Let (X,Y)(X,Y) be a discrete two-dimensional random variable with joint probability function fX,Y(x,y)fX,Y(x,y):

  1. If g(X,Y)=h(X)g(X,Y)=h(X) that is g(X,Y)g(X,Y) only depends on XX , then E(g(X,Y))=E[h(X)]=(x,y)D(X,Y)h(x)fX,Y(x,y)=xDXh(x)yDYfX,Y(x,y)=xDXh(x)fX(x)E(g(X,Y))=E[h(X)]=(x,y)D(X,Y)h(x)fX,Y(x,y)=xDXh(x)yDYfX,Y(x,y)=xDXh(x)fX(x) provided that (x,y)D(X,Y)|h(x)|fX,Y(x,y)<+.(x,y)D(X,Y)|h(x)|fX,Y(x,y)<+.

  2. If g(X,Y)=v(Y)g(X,Y)=v(Y) that is g(X,Y)g(X,Y) only depends on YY , then E[v(Y)]=(x,y)D(X,Y)v(y)fX,Y(x,y)=yDYv(y)xDXfX,Y(x,y)=yDYv(y)fY(y)E[v(Y)]=(x,y)D(X,Y)v(y)fX,Y(x,y)=yDYv(y)xDXfX,Y(x,y)=yDYv(y)fY(y) provided that (x,y)D(X,Y)|v(y)|fX,Y(x,y)<+.(x,y)D(X,Y)|v(y)|fX,Y(x,y)<+.

Example 2 Example: Let (X,Y)(X,Y) be a two-dimensional random variable such that fX,Y={15,x=1,2,y=0,1,2,yx0,otherwise.fX,Y={15,x=1,2,y=0,1,2,yx0,otherwise. Compute the expected value of XX.

Solution:

ii By using the joint probability function: E(X)=(x,y)D(X,Y)xfX,Y(x,y)=2x=1xy=015x=85E(X)=(x,y)D(X,Y)xfX,Y(x,y)=2x=1xy=015x=85

Example 3 Example: Let (X,Y)(X,Y) be a two-dimensional random variable such that fX,Y={15,x=1,2,y=0,1,2,yx0,otherwise.fX,Y={15,x=1,2,y=0,1,2,yx0,otherwise. Compute the expected value of XX.

Solution:

iiii By using the marginal function: fX(x)=xy=0fX,Y(x,y)={25,x=135,x=20,otherwise.fX(x)=xy=0fX,Y(x,y)=25,x=135,x=20,otherwise. Therefore, E(X)=2x=1xfX(x)=1×25+2×35=85.E(X)=2x=1xfX(x)=1×25+2×35=85.

Theorem: Let (X,Y)(X,Y) be a continuous two-dimensional random variable with joint probability function fX,Y(x,y):fX,Y(x,y):

  • If g(X,Y)=h(X)g(X,Y)=h(X) that is g(X,Y)g(X,Y) only depends on XX , then

E[h(X)]=++h(x)fX,Y(x,y)dxdy=+h(x)(+fX,Y(x,y)dy)dx=+h(x)fX(x)dxE[h(X)]=++h(x)fX,Y(x,y)dxdy=+h(x)(+fX,Y(x,y)dy)dx=+h(x)fX(x)dx provided that ++|h(x)|fX,Y(x,y)dxdy<+.++|h(x)|fX,Y(x,y)dxdy<+.

  • If g(X,Y)=v(Y)g(X,Y)=v(Y) that is g(X,Y)g(X,Y) only depends on YY , then

E[v(Y)]=++v(y)fX,Y(x,y)dxdy=+v(y)(+fX,Y(x,y)dx)dy=+v(Y)fY(y)dyE[v(Y)]=++v(y)fX,Y(x,y)dxdy=+v(y)(+fX,Y(x,y)dx)dy=+v(Y)fY(y)dy provided that ++|v(y)|fX,Y(x,y)dxdy<+.++|v(y)|fX,Y(x,y)dxdy<+.

Example 4 Example: Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of 3X+23X+2.

Answer:

ii Using the joint density function.

Using the definition of marginal expected value, one gets E(3X+2)=3E(X)+2=3++xfX,Y(x,y)dxdy+2=32010x2dxdy+2=32013dy+2=4E(3X+2)=3E(X)+2=3++xfX,Y(x,y)dxdy+2=32010x2dxdy+2=32013dy+2=4

Example 5 Example: Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of 3X+23X+2.

Answer:

iiii Using the marginal density function.

The marginal density function of XX is given by fX(x)=fX,Y(x,y)dy={2x,0<x<10,otherwise.fX(x)=fX,Y(x,y)dy={2x,0<x<10,otherwise. Therefore, E(3X+2)=3E(X)+2=4E(3X+2)=3E(X)+2=4, because E(X)=+xfX(x)dx=102x2dx=23E(X)=+xfX(x)dx=102x2dx=23

Properties:

  1. E[h(X)+v(Y)]=E[h(X)]+E[v(Y)]E[h(X)+v(Y)]=E[h(X)]+E[v(Y)] provided that E[|h(X)|]<+,E[|h(X)|]<+, EE

  2. E[Ni=1Xi]=Ni=1E[Xi],E[Ni=1Xi]=Ni=1E[Xi], where NN is a finite integer, provided that E[|Xi|]<+E[|Xi|]<+ for i=1,2,...,N.i=1,2,...,N.

Example 6 Example:Let (X,Y)(X,Y) be a discrete bidimensional random variable such that fX,Y(x,y)={x,0<x<1,0<y<20,otherwisefX,Y(x,y)={x,0<x<1,0<y<20,otherwise Compute the expected value of YY.

Answer: We know that E(X+Y)=E(X)+E(Y)=53E(X+Y)=E(X)+E(Y)=53. Since E(X)=23E(X)=23, then we get that E(Y)=1E(Y)=1.

Definition: The rr th and ss th moment of products about the origin of the random variables XX and YY, denoted by is the expected value of XrYs,XrYs, for r=1,2,...; s=1,2,... which is given by

  • if X and Y are discrete random variables: μr,s=E[XrYs]=(x,y)D(X,Y)xrysfX,Y(x,y)

  • if X and Y are continuous random variables: μr,s=E[XrYs]=++xrysf(x,y)dxdy

Remarks:

  • If r=s=1, we have μ1,1=E[XY]

  • Cauchy-Schwarz Inequality: For any two random variables X and Y, we have |E[XY]|E[X2]1/2E[Y2]1/2 provided that E[|XY|] is finite.

  • If X and Y are independent random variables, E[h(X)v(Y)]=E(h(X))E(v(Y)) for any two functions h(X) and v(Y).

    [Warning: The reverse is not true.]

  • If X1,X2,...,Xn are independent random variables independent, E[X1X2...Xn]=E(X1)E(X2)...E(Xn).

    [Warning: The reverse is not true.]

Definition: The r th and s th moment of products about the mean of the discrete random variables X and Y, denoted by μr,s is the expected value of (XμX)r(YμY)s, for r=1,2,...; s=1,2,... which is given by μr,s=E[(XμX)r(YμY)s]=(x,y)D(X,Y)(xμX)r(yμY)sfX,Y(x,y)

Definition: The r th and s th moment of products about the mean of the continuous random variables X and Y, denoted by for r=1,2,...; s=1,2,... is given by μr,s=E[(XμX)r(YμY)s]=++(xμX)r(yμY)sf(x,y)dxdy

The covariance is a measure of the joint variability of two random variables. Formally it is defined as Cov(X,Y)=σXY=μ1,1=E[(XμX)(YμY)]

How can we interpret the covariance?

  • When the variables tend to show similar behavior, the covariance is positive:

    • If high (small) values of one variable mainly correspond to high (small) values of the other variable;
  • When the variables tend to show opposite behavior, the covariance is negative:

    • When high (small) values of one variable mainly correspond to low (high) values of the other;
  • If there is no linear association, then the covariance will be zero.

Properties:

  • Cov(X,Y)=E(XY)E(X)E(Y).

  • If X and Y are independent Cov(X,Y)=0.

  • If Y=bZ, where b is constant, Cov(X,Y)=bCov(X,Z).

  • If Y=V+W, Cov(X,Y)=Cov(X,V)+Cov(X,W).

  • If Y=b, where b is constant, Cov(X,Y)=0.

  • If follows from the Cauchy-Schwarz Inequality that |Cov(X,Y)|Var(X)Var(Y).

The covariance has the inconvenient of depending on the scale of both random variables. For what values of the covariance can we say that there is a strong association between the two random variables?

The correlation coefficient is a measure of the joint variability of two random variables that do not depend on the scale: ρX,Y=Cov(X,Y)Var(X)Var(Y).

Properties:

  • If follows from the Cauchy-Schwarz Inequality that 1ρX,Y1.

If Y=bX+a, where b and a are constants

  • ρX,Y=1 if b>0.

  • ρX,Y=1 if b<0.

  • If b=0, it is not defined.

Summary of important results:

  • If Y=V±W, Var(Y)=Var(V)+Var(W)±2Cov(V,W).

  • If X1,....,Xn are random variables and a1,...,an are constants and Y=ni=1aiXi, then Var(Y)=ni=1a2iVar(Xi)+2ni=1nj=1,j<iaiajCov(Xi,Xj)=0, if Xi,Xj are independent.

  • If X1,....,Xn are random variables, a1,...,an are constants and b1,...,bn are constants, Y1= and Y2=ni=1biXi then

Cov(Y1,Y2)=ni=1aibiVar(Xi)+ni=1nj=1,j<i(aibj+ajbi)Cov(Xi,Xj)=0, if Xi,Xj are independent.

Definition: Let (X,Y) be a two dimensional random variable and u(Y,X) a function of Y and X. Then, the conditional expectation of u(Y,X) given X=x, is given by

  • if X and Y are discrete random variables E[u(Y,X)|X=x]=yDYu(y,x)fY|X=x(y) where DY is the set of discontinuity points of FY(y) and fY|X=x(y) is the value of the conditional probability function of Y given X=x at y

  • if X and Y are continuous random variables E[u(Y,X)|X=x]=+u(y,x)fY|X=x(y)dy where fY|X=x(y) is the value of the conditional probability density function of Y given X=x at y.

provided that the expected values exist and are finite.

Remarks:

  1. If u(Y,X)=Y, then we have the conditional mean of Y, E[u(Y,X)|X=x]=E[Y|X=x]=μY|x (notice that this is a function of x).

  2. If u(Y,X)=(YμY|x)2, then we have the conditional variance of Y E[u(Y,X)|X=x]=E[(YμY|x)2|X=x]=E[(YE[u(Y)|X=x])2|X=x]=Var[Y|X=x]

  3. As usual, Var[Y|X=x]=E[Y2|X=x]E[Y|X=x]2.

  4. If Y and X are [independent], E(Y|X=x)=E(Y).

  5. Of course we can reverse the roles of Y and X, that is we can compute E(u(X,Y)|Y=y), using definitions similar to those above.

Example: Let (X,Y) be two-dimensional random variable such that fX,Y(x,y)={1/2,0<x<2,0<y<x0,c.c.. Then the conditional density function of Y|X=1 is given by fY|X=1(y)={fX,Y(1,y)fX(1),0<y<10,c.c.={1/21/2,0<y<10,c.c.={1,0<y<10,c.c. where fX(x)={x0fX,Y(x,y)dy,0<x<20,c.c.={x2,0<x<20,c.c..

Example: The conditional expected value can be computed as follows:

E(Y|X=1)=10yfY|X=1(y)dy=10ydy=12. To compute the conditional variance, one may start by computing the following conditional expected value E(Y2|X=1)=10y2fY|X=1(y)dy=10y2dy=13. Therefore Var(Y|X=1)=E(Y2|X=1)(E(Y|X=1))2=1314=112

Example: Let X and Y be two random variables such that fX,Y(x,y)=19, for x=1,2,3,y=0,1,2,3,yx

To compute the conditional expected value one has to compute the condition probability function:

fY|X=1(y)={fX,Y(1,Y)fX(1),y=0,10,otherwise={12,y=0,10,otherwise where fX(1)=1y=0fX,Y(1,y)=1y=019=29 Therefore, E(Y|X=1)=yDYyfY|X=1(y)=0×12+1×12=12.

Notice that g(y)=E(X|Y=y) is indeed a function of y. Therefore, g(Y) is a random variable because Y can take different values according its distribution, i.e, if Y can take the value y, then g(Y) can take g(y) with probability P(Y=y)>0.

  • Discrete random variables

The random variable Z=g(Y)=E(X|Y) takes the values g(y)=E(X|Y=y). Assume that all values of g(y) are different. Then, Z takes the value g(y) with probability P(Y=y)

In general, the probability function of Z=g(Y)=E(X|Y) can be computed in the following way P(Z=z)=P(g(Y)=z)=P(Y{y:g(y)=z})

Example: Let (X,Y) be a discrete random variable such that fX,Y(x,y) is represented in the following table

X/Y 1 2 3
0 0.2 0.1 0.15
1 0.05 0.35 0.15

One may compute the following conditional probability functions: fY|X=0={4/9,y=12/9,y=23/9,y=30,otherwiseandfY|X=1={1/11,y=17/11,y=23/11,y=30,otherwise. Consequently, E(Y|X=0)=17/9 and E(Y|X=1)=24/11. Therefore, the random variable Z=E(Y|X) has the following probability function P(Z=z)={P(X=0),z=17/9P(X=1),z=24/110,otherwise={0.45,z=17/90.55,z=24/110,otherwise.

  • Continuous random variables

The cumulative distribution function of Z=g(Y)=E(X|Y) is, indeed FZ(z)=P(Zz)=P(g(Y)z)=P(Y{y:g(y)z}) When g is an injective function, we get that FZ(z)=FY(g1(z)) or FZ(g(y))=FY(y).

Therefore, we can calculate all the quantities that we know (the expected value, variance, ...) for E(X|Y) or E(Y|X)

Theorem (Law of iterated Expectations) Let (X,Y) be a two dimensional random variable. Then, E(Y)=E(E[Y|X]) provided that E(|Y|) is finite and E(X)=E(E[X|Y]) provided that E(X) is finite.

Remark: This theorem shows that there are two ways to compute E(Y) (resp., E(X)). The first is the direct way. The second way is to consider the following steps:

  1. compute E[Y|X=x] and notice that this is a function solely of x that is we can write g(x)=E[Y|X=x],

  2. according to the theorem replacing g(x) by g(X) and taking the mean we obtain E[g(X)]=E[Y] for this specific form of g(X).

  3. This theorem is useful in practice in the calculation of E(Y) if we know fY|X=x(y) or E[X|X=x] and fX(x) (or some moments of X), but not fX,Y(x,y).

Remarks: The results presented can be generalized for functions of X and Y, i.e., E(u(X,Y))=E(E(u(X,Y)|X)), if E(u(X,Y)) exists.

Example: Let (X,Y) be a bi-dimensional continuous random variable such that E(X|Y=y)=3y13andfY(y)={1/2,0<y<20,otherwise Taking into account the previous theorem, E(X)=E(E(X|Y))=E(3Y13)=203y16dy=2/3.

Theorem: Assuming that E(Y2) exists then Var(Y)=Var[E(Y|X)]+E[Var[Y|X]].

Theorem: Let X and Y be two random variables then Cov(X,Y)=Cov(X,E(Y|X))

Example: Let (X,Y) be a bidimensional random variable such that fX|Y=y(x)=1y,   0<x<y  (for a fixedy>1)fY(y)=3y4,   y>1 Compute Var(X) using the previous theorem.

Exam question: Let X and Y be two random variables such that E(X|Y=y)=y, for all y such that fY(y)>0. Prove that Cov(X,Y)=Var(Y). Are the random variables independent? Justify your answer.

Previous
Next