Let (X,Y) be a two-dimensional random variable and
D(X,Y) the set of points of discontinuity of the joint
cumulative distribution function FX,Y(x,y).
Definition: Let be a function of the two-dimensional random variable
(X,Y). Then, the expected value of is given by
(X,Y) is a two-dimensional discrete random variable:
E[g(X,Y)]=∑(x,y)∈D(X,Y)g(x,y)fX,Y(x,y)
provided that
∑(x,y)∈D(X,Y)|g(x,y)|fX,Y(x,y)<+∞.
(X,Y) is a two-dimensional continuous random variable:
E[g(X,Y)]=∫+∞−∞∫+∞−∞g(x,y)fX,Y(x,y)dxdy
provided that
∫+∞−∞∫+∞−∞|g(x,y)|f(x,y)dxdy<+∞.
Example 1 Example: Let (X,Y) be a discrete bidimensional random variable
such that
fX,Y(x,y)={x,0<x<1,0<y<20,otherwise
Compute the expected value of g(X,Y)=X+Y.
Answer: Using the definition of expected value, one gets
E(X+Y)=∫+∞−∞∫+∞−∞(x+y)fX,Y(x,y)dxdy=∫20∫10x(x+y)dxdy=∫2013+y2dy=53
Theorem: Let (X,Y) be a discrete two-dimensional random variable
with joint probability function fX,Y(x,y):
If g(X,Y)=h(X) that is g(X,Y) only depends on X ,
then
E(g(X,Y))=E[h(X)]=∑(x,y)∈D(X,Y)h(x)fX,Y(x,y)=∑x∈DXh(x)∑y∈DYfX,Y(x,y)=∑x∈DXh(x)fX(x)
provided that
∑(x,y)∈D(X,Y)|h(x)|fX,Y(x,y)<+∞.
If g(X,Y)=v(Y) that is g(X,Y) only depends on Y , then
E[v(Y)]=∑(x,y)∈D(X,Y)v(y)fX,Y(x,y)=∑y∈DYv(y)∑x∈DXfX,Y(x,y)=∑y∈DYv(y)fY(y)
provided that
∑(x,y)∈D(X,Y)|v(y)|fX,Y(x,y)<+∞.
Example 2 Example: Let (X,Y) be a two-dimensional random variable such that
fX,Y={15,x=1,2,y=0,1,2,y≤x0,otherwise.
Compute the expected value of X.
Solution:
i By using the joint probability function:
E(X)=∑(x,y)∈D(X,Y)xfX,Y(x,y)=2∑x=1x∑y=015x=85
Example 3 Example: Let (X,Y) be a two-dimensional random variable such that
fX,Y={15,x=1,2,y=0,1,2,y≤x0,otherwise.
Compute the expected value of X.
Solution:
ii By using the marginal function:
fX(x)=x∑y=0fX,Y(x,y)=⎧⎪⎨⎪⎩25,x=135,x=20,otherwise.
Therefore,
E(X)=2∑x=1xfX(x)=1×25+2×35=85.
Theorem: Let (X,Y) be a continuous two-dimensional random variable
with joint probability function fX,Y(x,y):
If g(X,Y)=h(X) that is g(X,Y) only depends on X ,
then
E[h(X)]=∫+∞−∞∫+∞−∞h(x)fX,Y(x,y)dxdy=∫+∞−∞h(x)(∫+∞−∞fX,Y(x,y)dy)dx=∫+∞−∞h(x)fX(x)dx provided that
∫+∞−∞∫+∞−∞|h(x)|fX,Y(x,y)dxdy<+∞.
If g(X,Y)=v(Y) that is g(X,Y) only depends on Y , then
E[v(Y)]=∫+∞−∞∫+∞−∞v(y)fX,Y(x,y)dxdy=∫+∞−∞v(y)(∫+∞−∞fX,Y(x,y)dx)dy=∫+∞−∞v(Y)fY(y)dy
provided that
∫+∞−∞∫+∞−∞|v(y)|fX,Y(x,y)dxdy<+∞.
Example 4 Example: Let (X,Y) be a discrete bidimensional random variable
such that
fX,Y(x,y)={x,0<x<1,0<y<20,otherwise
Compute the expected value of 3X+2.
Answer:
i Using the joint density function.
Using the definition of marginal expected value, one gets
E(3X+2)=3E(X)+2=3∫+∞−∞∫+∞−∞xfX,Y(x,y)dxdy+2=3∫20∫10x2dxdy+2=3∫2013dy+2=4
Example 5 Example: Let (X,Y) be a discrete bidimensional random variable
such that
fX,Y(x,y)={x,0<x<1,0<y<20,otherwise
Compute the expected value of 3X+2.
Answer:
ii Using the marginal density function.
The marginal density function of X is given by
fX(x)=∫∞−∞fX,Y(x,y)dy={2x,0<x<10,otherwise.
Therefore, E(3X+2)=3E(X)+2=4, because
E(X)=∫+∞−∞xfX(x)dx=∫102x2dx=23
Properties:
E[h(X)+v(Y)]=E[h(X)]+E[v(Y)]
provided that
E[|h(X)|]<+∞,E
E[∑Ni=1Xi]=∑Ni=1E[Xi], where N is a finite integer, provided that
E[|Xi|]<+∞ for i=1,2,...,N.
Example 6 Example:Let (X,Y) be a discrete bidimensional random variable such
that
fX,Y(x,y)={x,0<x<1,0<y<20,otherwise
Compute the expected value of Y.
Answer: We know that E(X+Y)=E(X)+E(Y)=53. Since
E(X)=23, then we get that E(Y)=1.
Definition: The r th and s th moment of products about the
origin of the random variables X and Y, denoted by is the expected value of XrYs, for
r=1,2,...;s=1,2,... which is given by
if X and Y are discrete random variables:
μ′r,s=E[XrYs]=∑(x,y)∈D(X,Y)xrysfX,Y(x,y)
if X and Y are continuous random variables:
μ′r,s=E[XrYs]=∫+∞−∞∫+∞−∞xrysf(x,y)dxdy
Remarks:
If r=s=1, we have μ′1,1=E[XY]
Cauchy-Schwarz Inequality: For any two random variables X and
Y, we have
|E[XY]|≤E[X2]1/2E[Y2]1/2 provided that
E[|XY|] is finite.
If X and Y are independent random variables, E[h(X)v(Y)]=E(h(X))E(v(Y)) for any two functions
h(X) and v(Y).
[Warning: The reverse is not true.]
If X1,X2,...,Xn are independent random variables
independent,
E[X1X2...Xn]=E(X1)E(X2)...E(Xn).
[Warning: The reverse is not true.]
Definition: The r th and s th moment of products about the mean
of the discrete random variables X and Y, denoted by μr,s is the expected value of
(X−μX)r(Y−μY)s, for r=1,2,...;s=1,2,... which is given by
μr,s=E[(X−μX)r(Y−μY)s]=∑(x,y)∈D(X,Y)(x−μX)r(y−μY)sfX,Y(x,y)
Definition: The r th and s th moment of products about the mean
of the continuous random variables X and Y, denoted by for r=1,2,...;s=1,2,... is given by
μr,s=E[(X−μX)r(Y−μY)s]=∫+∞−∞∫+∞−∞(x−μX)r(y−μY)sf(x,y)dxdy
The covariance is a measure of the joint variability of two random
variables. Formally it is defined as
Cov(X,Y)=σXY=μ1,1=E[(X−μX)(Y−μY)]
How can we interpret the covariance?
When the variables tend to show similar behavior, the covariance is
positive:
If high (small) values of one variable mainly correspond to high
(small) values of the other variable;
When the variables tend to show opposite behavior, the covariance is
negative:
When high (small) values of one variable mainly correspond to
low (high) values of the other;
If there is no linear association, then the covariance will be zero.
Properties:
Cov(X,Y)=E(XY)−E(X)E(Y).
If X and Y are independent Cov(X,Y)=0.
If Y=bZ, where b is constant,
Cov(X,Y)=bCov(X,Z).
If Y=V+W,
Cov(X,Y)=Cov(X,V)+Cov(X,W).
If Y=b, where b is constant,
Cov(X,Y)=0.
If follows from the Cauchy-Schwarz Inequality that |Cov(X,Y)|≤√Var(X)Var(Y).
The covariance has the inconvenient of depending on the scale of both
random variables. For what values of the covariance can we say that
there is a strong association between the two random variables?
The correlation coefficient is a measure of the joint variability of two
random variables that do not depend on the scale:
ρX,Y=Cov(X,Y)√Var(X)Var(Y).
Properties:
If follows from the Cauchy-Schwarz Inequality that −1≤ρX,Y≤1.
If Y=bX+a, where b and a are constants
ρX,Y=1 if b>0.
ρX,Y=−1 if b<0.
If b=0, it is not defined.
Summary of important results:
If Y=V±W,Var(Y)=Var(V)+Var(W)±2Cov(V,W).
If X1,....,Xn are random variables and a1,...,an are
constants and Y=∑ni=1aiXi, then
Var(Y)=n∑i=1a2iVar(Xi)+2n∑i=1n∑j=1,j<iaiajCov(Xi,Xj)=0, if Xi,Xj are independent.
If X1,....,Xn are random variables, a1,...,an are
constants and b1,...,bn are constants, Y1= and Y2=∑ni=1biXi
then
Cov(Y1,Y2)=n∑i=1aibiVar(Xi)+n∑i=1n∑j=1,j<i(aibj+ajbi)Cov(Xi,Xj)=0, if Xi,Xj are independent.
Definition: Let (X,Y) be a two dimensional random variable and
u(Y,X) a function of Y and X. Then, the conditional expectation of
u(Y,X) given X=x, is given by
if X and Y are discrete random variables
E[u(Y,X)|X=x]=∑y∈DYu(y,x)fY|X=x(y)
where
DY is the set of discontinuity points of FY(y)
and fY|X=x(y) is the value of the conditional probability
function of Y given X=x at y
if X and Y are continuous random variables
E[u(Y,X)|X=x]=+∞∫−∞u(y,x)fY|X=x(y)dy
where fY|X=x(y) is the value of the conditional probability
density function of Y given X=x at y.
provided that the expected values exist and are finite.
Remarks:
If u(Y,X)=Y, then we have the conditional mean of Y,E[u(Y,X)|X=x]=E[Y|X=x]=μY|x (notice that this is a function of
x).
If u(Y,X)=(Y−μY|x)2, then we have the
conditional variance of YE[u(Y,X)|X=x]=E[(Y−μY|x)2|X=x]=E[(Y−E[u(Y)|X=x])2|X=x]=Var[Y|X=x]
As usual, Var[Y|X=x]=E[Y2|X=x]−E[Y|X=x]2.
If Y and X are [independent], E(Y|X=x)=E(Y).
Of course we can reverse the roles of Y and X, that is we can
compute E(u(X,Y)|Y=y), using definitions
similar to those above.
Example: Let (X,Y) be two-dimensional random variable such that
fX,Y(x,y)={1/2,0<x<2,0<y<x0,c.c..
Then the conditional density function of Y|X=1 is
given by
fY|X=1(y)={fX,Y(1,y)fX(1),0<y<10,c.c.={1/21/2,0<y<10,c.c.={1,0<y<10,c.c.
where
fX(x)={∫x0fX,Y(x,y)dy,0<x<20,c.c.={x2,0<x<20,c.c..
Example: The conditional expected value can be computed as follows:
E(Y|X=1)=∫10yfY|X=1(y)dy=∫10ydy=12.
To compute the conditional variance, one may start by computing the
following conditional expected value
E(Y2|X=1)=∫10y2fY|X=1(y)dy=∫10y2dy=13.
Therefore
Var(Y|X=1)=E(Y2|X=1)−(E(Y|X=1))2=13−14=112
Example: Let X and Y be two random variables such that
fX,Y(x,y)=19, for x=1,2,3,y=0,1,2,3,y≤x
To compute the conditional expected value one has to compute the
condition probability function:
fY|X=1(y)={fX,Y(1,Y)fX(1),y=0,10,otherwise={12,y=0,10,otherwise
where
fX(1)=1∑y=0fX,Y(1,y)=1∑y=019=29
Therefore,
E(Y|X=1)=∑y∈DYyfY|X=1(y)=0×12+1×12=12.
Notice that g(y)=E(X|Y=y) is indeed a function of y. Therefore,
g(Y) is a random variable because Y can take different values
according its distribution, i.e, if Y can take the value y, then
g(Y) can take g(y) with probability P(Y=y)>0.
Discrete random variables
The random variable Z=g(Y)=E(X|Y) takes the values
g(y)=E(X|Y=y). Assume that all values of g(y) are
different. Then,
Z takes the value g(y) with probability P(Y=y)
In general, the probability function of Z=g(Y)=E(X|Y) can be
computed in the following way
P(Z=z)=P(g(Y)=z)=P(Y∈{y:g(y)=z})
Example: Let (X,Y) be a discrete random variable such that
fX,Y(x,y) is represented in the following table
X/Y
1
2
3
0
0.2
0.1
0.15
1
0.05
0.35
0.15
One may compute the following conditional probability functions:
fY|X=0=⎧⎪
⎪
⎪⎨⎪
⎪
⎪⎩4/9,y=12/9,y=23/9,y=30,otherwiseandfY|X=1=⎧⎪
⎪
⎪⎨⎪
⎪
⎪⎩1/11,y=17/11,y=23/11,y=30,otherwise.
Consequently, E(Y|X=0)=17/9 and
E(Y|X=1)=24/11. Therefore, the random variable Z=E(Y|X)
has the following probability function
P(Z=z)=⎧⎨⎩P(X=0),z=17/9P(X=1),z=24/110,otherwise=⎧⎨⎩0.45,z=17/90.55,z=24/110,otherwise.
Continuous random variables
The cumulative distribution function of Z=g(Y)=E(X|Y) is, indeed
FZ(z)=P(Z≤z)=P(g(Y)≤z)=P(Y∈{y:g(y)≤z})
When g is an injective function, we get that
FZ(z)=FY(g−1(z)) or FZ(g(y))=FY(y).
Therefore, we can calculate all the quantities that we know (the
expected value, variance, ...) for E(X|Y) or E(Y|X)
Theorem (Law of iterated Expectations) Let (X,Y) be a two
dimensional random variable. Then, E(Y)=E(E[Y|X]) provided that E(|Y|) is finite and
E(X)=E(E[X|Y]) provided that E(X) is finite.
Remark: This theorem shows that there are two ways to compute
E(Y) (resp., E(X)). The first is the direct way. The second way is
to consider the following steps:
compute E[Y|X=x] and notice that this is a function solely of x that is
we can write g(x)=E[Y|X=x],
according to the theorem replacing g(x) by g(X) and taking the
mean we obtain E[g(X)]=E[Y] for this
specific form of g(X).
This theorem is useful in practice in the calculation of
E(Y) if we know fY|X=x(y) or E[X|X=x] and fX(x) (or some moments of X), but not
fX,Y(x,y).
Remarks: The results presented can be generalized for functions of
X and Y, i.e., E(u(X,Y))=E(E(u(X,Y)|X)), if E(u(X,Y))
exists.
Example: Let (X,Y) be a bi-dimensional continuous random variable
such that
E(X|Y=y)=3y−13andfY(y)={1/2,0<y<20,otherwise
Taking into account the previous theorem,
E(X)=E(E(X|Y))=E(3Y−13)=∫203y−16dy=2/3.
Theorem: Assuming that E(Y2) exists then
Var(Y)=Var[E(Y|X)]+E[Var[Y|X]].
Theorem: Let X and Y be two random variables then
Cov(X,Y)=Cov(X,E(Y|X))
Example: Let (X,Y) be a bidimensional random variable such that
fX|Y=y(x)=1y,0<x<y(for a fixedy>1)fY(y)=3y−4,y>1
Compute Var(X) using the
previous theorem.
Exam question: Let X and Y be two random variables such that
E(X|Y=y)=y,
for all y such that fY(y)>0. Prove that
Cov(X,Y)=Var(Y). Are the random variables independent? Justify your
answer.