5. Expected Values of Functions of Random Vectors

Let $(X, Y)$ be a two-dimensional random variable and $D_{(X, Y)}$ the set of points of discontinuity of the joint cumulative distribution function $F_{X, Y} (x, y) .$

Definition: Let be a function of the two-dimensional random variable $(X, Y)$ . Then, the expected value of is given by

$(X, Y)$ is a two-dimensional discrete random variable: $E [g (X, Y)] = \sum_{(x, y) \in D_{(X, Y)}} g (x, y) f_{X, Y} (x, y)$ provided that $\sum_{(x, y) \in D_{(X, Y)}} | g (x, y) | f_{X, Y} (x, y) < + \infty .$
$(X, Y)$ is a two-dimensional continuous random variable: $E [g (X, Y)] = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} g (x, y) f_{X, Y} (x, y) d x d y$ provided that $\int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} | g (x, y) | f (x, y) d x d y < + \infty .$

Example 1 Example: Let $(X, Y)$ be a discrete bidimensional random variable such that $f_{X, Y} (x, y) = {\begin{cases} x, & 0 < x < 1, 0 < y < 2 \\ 0, & otherwise \end{cases}$ Compute the expected value of $g (X, Y) = X + Y$ .

Answer: Using the definition of expected value, one gets $\begin{aligned} E (X + Y) & = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} (x + y) f_{X, Y} (x, y) d x d y \\ = \int_{0}^{2} \int_{0}^{1} x (x + y) d x d y \\ = \int_{0}^{2} \frac{1}{3} + \frac{y}{2} d y = \frac{5}{3} \end{aligned}$

Theorem: Let $(X, Y)$ be a discrete two-dimensional random variable with joint probability function $f_{X, Y} (x, y)$ :

If $g (X, Y) = h (X)$ that is $g (X, Y)$ only depends on $X$ , then $\begin{aligned} E (g (X, Y)) & = E [h (X)] = \sum_{(x, y) \in D_{(X, Y)}} h (x) f_{X, Y} (x, y) \\ = \sum_{x \in D_{X}} h (x) \sum_{y \in D_{Y}} f_{X, Y} (x, y) = \sum_{x \in D_{X}} h (x) f_{X} (x) \end{aligned}$ provided that $\sum_{(x, y) \in D_{(X, Y)}} | h (x) | f_{X, Y} (x, y) < + \infty .$
If $g (X, Y) = v (Y)$ that is $g (X, Y)$ only depends on $Y$ , then $\begin{aligned} E [v (Y)] & = \sum_{(x, y) \in D_{(X, Y)}} v (y) f_{X, Y} (x, y) \\ = \sum_{y \in D_{Y}} v (y) \sum_{x \in D_{X}} f_{X, Y} (x, y) = \sum_{y \in D_{Y}} v (y) f_{Y} (y) \end{aligned}$ provided that $\sum_{(x, y) \in D_{(X, Y)}} | v (y) | f_{X, Y} (x, y) < + \infty .$

Example 2 Example: Let $(X, Y)$ be a two-dimensional random variable such that $f_{X, Y} = {\begin{cases} \frac{1}{5}, & x = 1, 2, y = 0, 1, 2, y \leq x \\ 0, & otherwise \end{cases} .$ Compute the expected value of $X$ .

Solution:

$i$ By using the joint probability function: $\begin{aligned} E (X) & = \sum_{(x, y) \in D_{(X, Y)}} x f_{X, Y} (x, y) = \sum_{x = 1}^{2} \sum_{y = 0}^{x} \frac{1}{5} x \\ = \frac{8}{5} \end{aligned}$

Example 3 Example: Let $(X, Y)$ be a two-dimensional random variable such that $f_{X, Y} = {\begin{cases} \frac{1}{5}, & x = 1, 2, y = 0, 1, 2, y \leq x \\ 0, & otherwise \end{cases} .$ Compute the expected value of $X$ .

Solution:

$i i$ By using the marginal function: $f_{X} (x) = \sum_{y = 0}^{x} f_{X, Y} (x, y) = {\begin{cases} \frac{2}{5}, & x = 1 \\ \frac{3}{5}, & x = 2 \\ 0, & otherwise \end{cases} .$ Therefore, $E (X) = \sum_{x = 1}^{2} x f_{X} (x) = 1 \times \frac{2}{5} + 2 \times \frac{3}{5} = \frac{8}{5} .$

Theorem: Let $(X, Y)$ be a continuous two-dimensional random variable with joint probability function $f_{X, Y} (x, y) :$

If $g (X, Y) = h (X)$ that is $g (X, Y)$ only depends on $X$ , then

$\begin{aligned} E [h (X)] & = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} h (x) f_{X, Y} (x, y) d x d y \\ = \int_{- \infty}^{+ \infty} h (x) (\int_{- \infty}^{+ \infty} f_{X, Y} (x, y) d y) d x = \int_{- \infty}^{+ \infty} h (x) f_{X} (x) d x \end{aligned}$ provided that $\int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} | h (x) | f_{X, Y} (x, y) d x d y < + \infty .$

If $g (X, Y) = v (Y)$ that is $g (X, Y)$ only depends on $Y$ , then

$\begin{aligned} E [v (Y)] & = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} v (y) f_{X, Y} (x, y) d x d y \\ = \int_{- \infty}^{+ \infty} v (y) (\int_{- \infty}^{+ \infty} f_{X, Y} (x, y) d x) d y = \int_{- \infty}^{+ \infty} v (Y) f_{Y} (y) d y \end{aligned}$ provided that $\int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} | v (y) | f_{X, Y} (x, y) d x d y < + \infty .$

Example 4 Example: Let $(X, Y)$ be a discrete bidimensional random variable such that $f_{X, Y} (x, y) = {\begin{cases} x, & 0 < x < 1, 0 < y < 2 \\ 0, & otherwise \end{cases}$ Compute the expected value of $3 X + 2$ .

Answer:

$i$ Using the joint density function.

Using the definition of marginal expected value, one gets $\begin{aligned} E (3 X + 2) & = 3 E (X) + 2 = 3 \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} x f_{X, Y} (x, y) d x d y + 2 \\ = 3 \int_{0}^{2} \int_{0}^{1} x^{2} d x d y + 2 \\ = 3 \int_{0}^{2} \frac{1}{3} d y + 2 = 4 \end{aligned}$

Example 5 Example: Let $(X, Y)$ be a discrete bidimensional random variable such that $f_{X, Y} (x, y) = {\begin{cases} x, & 0 < x < 1, 0 < y < 2 \\ 0, & otherwise \end{cases}$ Compute the expected value of $3 X + 2$ .

Answer:

$i i$ Using the marginal density function.

The marginal density function of $X$ is given by $f_{X} (x) = \int_{- \infty}^{\infty} f_{X, Y} (x, y) d y = {\begin{cases} 2 x, & 0 < x < 1 \\ 0, & otherwise \end{cases} .$ Therefore, $E (3 X + 2) = 3 E (X) + 2 = 4$ , because $\begin{aligned} E (X) & = \int_{- \infty}^{+ \infty} x f_{X} (x) d x = \int_{0}^{1} 2 x^{2} d x = \frac{2}{3} \end{aligned}$

Properties:

$E [h (X) + v (Y)] = E [h (X)] + E [v (Y)]$ provided that $E [| h (X) |] < + \infty,$ $E$
$E [\sum_{i = 1}^{N} X_{i}] = \sum_{i = 1}^{N} E [X_{i}],$ where $N$ is a finite integer, provided that $E [| X_{i} |] < + \infty$ for $i = 1, 2, . . ., N .$

Example 6 Example:Let $(X, Y)$ be a discrete bidimensional random variable such that $f_{X, Y} (x, y) = {\begin{cases} x, & 0 < x < 1, 0 < y < 2 \\ 0, & otherwise \end{cases}$ Compute the expected value of $Y$ .

Answer: We know that $E (X + Y) = E (X) + E (Y) = \frac{5}{3}$ . Since $E (X) = \frac{2}{3}$ , then we get that $E (Y) = 1$ .

Definition: The $r$ th and $s$ th moment of products about the origin of the random variables $X$ and $Y$ , denoted by is the expected value of $X^{r} Y^{s},$ for $r = 1, 2, . . .;$ $s = 1, 2, . . .$ which is given by

if $X$ and $Y$ are discrete random variables: $μ_{r, s}^{'} = E [X^{r} Y^{s}] = \sum_{(x, y) \in D_{(X, Y)}} x^{r} y^{s} f_{X, Y} (x, y)$
if $X$ and $Y$ are continuous random variables: $μ_{r, s}^{'} = E [X^{r} Y^{s}] = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} x^{r} y^{s} f (x, y) d x d y$

Remarks:

If $r = s = 1,$ we have $μ_{1, 1}^{'} = E [X Y]$
Cauchy-Schwarz Inequality: For any two random variables $X$ and $Y$ , we have $| E [X Y] | \leq E {[X^{2}]}^{1 / 2} E {[Y^{2}]}^{1 / 2}$ provided that $E [| X Y |]$ is finite.
If $X$ and $Y$ are independent random variables, $E [h (X) v (Y)] = E (h (X)) E (v (Y))$ for any two functions $h (X)$ and $v (Y) .$

[Warning: The reverse is not true.]
If $X_{1}, X_{2}, . . ., X_{n}$ are independent random variables independent, $E [X_{1} X_{2} . . . X_{n}] = E (X_{1}) E (X_{2}) . . . E (X_{n}) .$

[Warning: The reverse is not true.]

Definition: The $r$ th and $s$ th moment of products about the mean of the discrete random variables $X$ and $Y$ , denoted by $μ_{r, s}$ is the expected value of ${(X - μ_{X})}^{r} {(Y - μ_{Y})}^{s},$ for $r = 1, 2, . . .;$ $s = 1, 2, . . .$ which is given by $\begin{aligned} μ_{r, s} & = E [{(X - μ_{X})}^{r} {(Y - μ_{Y})}^{s}] \\ = \sum_{(x, y) \in D_{(X, Y)}} {(x - μ_{X})}^{r} {(y - μ_{Y})}^{s} f_{X, Y} (x, y) \end{aligned}$

Definition: The $r$ th and $s$ th moment of products about the mean of the continuous random variables $X$ and $Y$ , denoted by for $r = 1, 2, . . .;$ $s = 1, 2, . . .$ is given by $\begin{aligned} μ_{r, s} & = E [{(X - μ_{X})}^{r} {(Y - μ_{Y})}^{s}] \\ = \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} {(x - μ_{X})}^{r} {(y - μ_{Y})}^{s} f (x, y) d x d y \end{aligned}$

The covariance is a measure of the joint variability of two random variables. Formally it is defined as $C o v (X, Y) = σ_{X Y} = μ_{1, 1} = E [(X - μ_{X}) (Y - μ_{Y})]$

How can we interpret the covariance?

When the variables tend to show similar behavior, the covariance is positive:
- If high (small) values of one variable mainly correspond to high (small) values of the other variable;
When the variables tend to show opposite behavior, the covariance is negative:
- When high (small) values of one variable mainly correspond to low (high) values of the other;
If there is no linear association, then the covariance will be zero.

Properties:

$C o v (X, Y) = E (X Y) - E (X) E (Y) .$
If $X$ and $Y$ are independent $C o v (X, Y) = 0.$
If $Y = b Z$ , where $b$ is constant, $C o v (X, Y) = b C o v (X, Z) .$
If $Y = V + W$ , $C o v (X, Y) = C o v (X, V) + C o v (X, W) .$
If $Y = b$ , where $b$ is constant, $C o v (X, Y) = 0.$
If follows from the Cauchy-Schwarz Inequality that $| C o v (X, Y) | \leq \sqrt{V a r (X) V a r (Y)} .$

The covariance has the inconvenient of depending on the scale of both random variables. For what values of the covariance can we say that there is a strong association between the two random variables?

The correlation coefficient is a measure of the joint variability of two random variables that do not depend on the scale: $ρ_{X, Y} = \frac{C o v (X, Y)}{\sqrt{V a r (X) V a r (Y)}} .$

Properties:

If follows from the Cauchy-Schwarz Inequality that $- 1 \leq ρ_{X, Y} \leq 1.$

If $Y = b X + a,$ where $b$ and $a$ are constants

$ρ_{X, Y} = 1$ if $b > 0.$
$ρ_{X, Y} = - 1$ if $b < 0.$
If $b = 0,$ it is not defined.

Summary of important results:

If $Y = V \pm W,$ $V a r (Y) = V a r (V) + V a r (W) \pm 2 C o v (V, W) .$
If $X_{1}, . . . ., X_{n}$ are random variables and $a_{1}, . . ., a_{n}$ are constants and $Y = \sum_{i = 1}^{n} a_{i} X_{i},$ then $V a r (Y) = \sum_{i = 1}^{n} a_{i}^{2} V a r (X_{i}) + 2 \underset{= 0, if X_{i}, X_{j} are independent}{\underset{⏟}{\sum_{i = 1}^{n} \sum_{j = 1, j < i}^{n} a_{i} a_{j} C o v (X_{i}, X_{j})}} .$
If $X_{1}, . . . ., X_{n}$ are random variables, $a_{1}, . . ., a_{n}$ are constants and $b_{1}, . . ., b_{n}$ are constants, $Y_{1} =$ and $Y_{2} = \sum_{i = 1}^{n} b_{i} X_{i}$ then

$C o v (Y_{1}, Y_{2}) = \sum_{i = 1}^{n} a_{i} b_{i} V a r (X_{i}) + \underset{= 0, if X_{i}, X_{j} are independent}{\underset{⏟}{\sum_{i = 1}^{n} \sum_{j = 1, j < i}^{n} (a_{i} b_{j} + a_{j} b_{i}) C o v (X_{i}, X_{j})}} .$

Definition: Let $(X, Y)$ be a two dimensional random variable and $u (Y, X)$ a function of $Y$ and $X .$ Then, the conditional expectation of $u (Y, X)$ given $X = x$ , is given by

if $X$ and $Y$ are discrete random variables $E [u (Y, X) | X = x] = \sum_{y \in D_{Y}} u (y, x) f_{Y | X = x} (y)$ where $D_{Y}$ is the set of discontinuity points of $F_{Y} (y)$ and $f_{Y | X = x} (y)$ is the value of the conditional probability function of $Y$ given $X = x$ at $y$
if $X$ and $Y$ are continuous random variables $E [u (Y, X) | X = x] = \int_{- \infty}^{+ \infty} u (y, x) f_{Y | X = x} (y) d y$ where $f_{Y | X = x} (y)$ is the value of the conditional probability density function of $Y$ given $X = x$ at $y$ .

provided that the expected values exist and are finite.

Remarks:

If $u (Y, X) = Y$ , then we have the conditional mean of $Y,$ $E [u (Y, X) | X = x] = E [Y | X = x] = μ_{Y | x}$ (notice that this is a function of $x$ ) $.$
If $u (Y, X) = {(Y - μ_{Y | x})}^{2}$ , then we have the conditional variance of $Y$ $\begin{aligned} E [u (Y, X) | X = x] & = & E [{(Y - μ_{Y | x})}^{2} | X = x] \\ = & E [{(Y - E [u (Y) | X = x])}^{2} | X = x] \\ = & V a r [Y | X = x] \end{aligned}$
As usual, $V a r [Y | X = x] = E [Y^{2} | X = x] - E {[Y | X = x]}^{2} .$
If $Y$ and $X$ are [independent], $E (Y | X = x) = E (Y) .$
Of course we can reverse the roles of $Y$ and $X,$ that is we can compute $E (u (X, Y) | Y = y),$ using definitions similar to those above.

Example: Let $(X, Y)$ be two-dimensional random variable such that $f_{X, Y} (x, y) = {\begin{cases} 1 / 2, & 0 < x < 2, 0 < y < x \\ 0, & c.c. \end{cases} .$ Then the conditional density function of $Y | X = 1$ is given by $\begin{aligned} f_{Y | X = 1} (y) & = {\begin{cases} \frac{f_{X, Y} (1, y)}{f_{X} (1)}, & 0 < y < 1 \\ 0, & c.c. \end{cases} = {\begin{cases} \frac{1 / 2}{1 / 2}, & 0 < y < 1 \\ 0, & c.c. \end{cases} \\ = {\begin{cases} 1, & 0 < y < 1 \\ 0, & c.c. \end{cases} \end{aligned}$ where $\begin{aligned} f_{X} (x) = {\begin{cases} \int_{0}^{x} f_{X, Y} (x, y) d y, & 0 < x < 2 \\ 0, & c.c. \end{cases} = {\begin{cases} \frac{x}{2}, & 0 < x < 2 \\ 0, & c.c. \end{cases} . \end{aligned}$

Example: The conditional expected value can be computed as follows:

$\begin{aligned} E (Y | X = 1) = \int_{0}^{1} y f_{Y | X = 1} (y) d y = \int_{0}^{1} y d y = \frac{1}{2} . \end{aligned}$ To compute the conditional variance, one may start by computing the following conditional expected value $\begin{aligned} E (Y^{2} | X = 1) = \int_{0}^{1} y^{2} f_{Y | X = 1} (y) d y = \int_{0}^{1} y^{2} d y = \frac{1}{3} . \end{aligned}$ Therefore $\begin{aligned} V a r (Y | X = 1) & = E (Y^{2} | X = 1) - {(E (Y | X = 1))}^{2} \\ = \frac{1}{3} - \frac{1}{4} = \frac{1}{12} \end{aligned}$

Example: Let $X$ and $Y$ be two random variables such that $f_{X, Y} (x, y) = \frac{1}{9}, for x = 1, 2, 3, y = 0, 1, 2, 3, y \leq x$

To compute the conditional expected value one has to compute the condition probability function:

$f_{Y | X = 1} (y) = {\begin{cases} \frac{f_{X, Y} (1, Y)}{f_{X} (1)}, & y = 0, 1 \\ 0, & otherwise \end{cases} = {\begin{cases} \frac{1}{2}, & y = 0, 1 \\ 0, & otherwise \end{cases}$ where $f_{X} (1) = \sum_{y = 0}^{1} f_{X, Y} (1, y) = \sum_{y = 0}^{1} \frac{1}{9} = \frac{2}{9}$ Therefore, $E (Y | X = 1) = \sum_{y \in D_{Y}} y f_{Y | X = 1} (y) = 0 \times \frac{1}{2} + 1 \times \frac{1}{2} = \frac{1}{2} .$

Notice that $g (y) = E (X | Y = y)$ is indeed a function of $y$ . Therefore, $g (Y)$ is a random variable because $Y$ can take different values according its distribution, i.e, if $Y$ can take the value $y$ , then $g (Y)$ can take $g (y)$ with probability $P (Y = y) > 0$ .

Discrete random variables

The random variable $Z = g (Y) = E (X | Y)$ takes the values $g (y) = E (X | Y = y)$ . Assume that all values of $g (y)$ are different. Then, $Z takes the value g (y) with probability P (Y = y)$

In general, the probability function of $Z = g (Y) = E (X | Y)$ can be computed in the following way $\begin{aligned} P (Z = z) = P (g (Y) = z) = P (Y \in {y : g (y) = z}) \end{aligned}$

Example: Let $(X, Y)$ be a discrete random variable such that $f_{X, Y} (x, y)$ is represented in the following table

X/Y	1	2	3
0	0.2	0.1	0.15
1	0.05	0.35	0.15

One may compute the following conditional probability functions: $\begin{aligned} f_{Y | X = 0} = {\begin{cases} 4 / 9, & y = 1 \\ 2 / 9, & y = 2 \\ 3 / 9, & y = 3 \\ 0, & otherwise \end{cases} and f_{Y | X = 1} = {\begin{cases} 1 / 11, & y = 1 \\ 7 / 11, & y = 2 \\ 3 / 11, & y = 3 \\ 0, & otherwise \end{cases} . \end{aligned}$ Consequently, $E (Y | X = 0) = 17 / 9$ and $E (Y | X = 1) = 24 / 11$ . Therefore, the random variable $Z = E (Y | X)$ has the following probability function $P (Z = z) = {\begin{cases} P (X = 0), & z = 17 / 9 \\ P (X = 1), & z = 24 / 11 \\ 0, & otherwise \end{cases} = {\begin{cases} 0.45, & z = 17 / 9 \\ 0.55, & z = 24 / 11 \\ 0, & otherwise \end{cases} .$

Continuous random variables

The cumulative distribution function of $Z = g (Y) = E (X | Y)$ is, indeed $\begin{aligned} F_{Z} (z) = P (Z \leq z) = P (g (Y) \leq z) = P (Y \in {y : g (y) \leq z}) \end{aligned}$ When $g$ is an injective function, we get that $F_{Z} (z) = F_{Y} (g^{- 1} (z)) or F_{Z} (g (y)) = F_{Y} (y) .$

Therefore, we can calculate all the quantities that we know (the expected value, variance, ...) for $E (X | Y)$ or $E (Y | X)$

Theorem (Law of iterated Expectations) Let $(X, Y)$ be a two dimensional random variable. Then, $E (Y) = E (E [Y | X])$ provided that $E (| Y |)$ is finite and $E (X) = E (E [X | Y])$ provided that $E (X)$ is finite.

Remark: This theorem shows that there are two ways to compute $E (Y)$ (resp., $E (X)$ ). The first is the direct way. The second way is to consider the following steps:

compute $E [Y | X = x]$ and notice that this is a function solely of $x$ that is we can write $g (x) = E [Y | X = x],$
according to the theorem replacing $g (x)$ by $g (X)$ and taking the mean we obtain $E [g (X)] = E [Y]$ for this specific form of $g (X) .$
This theorem is useful in practice in the calculation of $E (Y)$ if we know $f_{Y | X = x} (y)$ or $E [X | X = x]$ and $f_{X} (x)$ (or some moments of $X$ ), but not $f_{X, Y} (x, y) .$

Remarks: The results presented can be generalized for functions of $X$ and $Y$ , i.e., $E (u (X, Y)) = E (E (u (X, Y) | X))$ , if $E (u (X, Y))$ exists.

Example: Let $(X, Y)$ be a bi-dimensional continuous random variable such that $E (X | Y = y) = \frac{3 y - 1}{3} and f_{Y} (y) = {\begin{cases} 1 / 2, & 0 < y < 2 \\ 0, & otherwise \end{cases}$ Taking into account the previous theorem, $E (X) = E (E (X | Y)) = E (\frac{3 Y - 1}{3}) = \int_{0}^{2} \frac{3 y - 1}{6} d y = 2 / 3.$

Theorem: Assuming that $E (Y^{2})$ exists then $V a r (Y) = V a r [E (Y | X)] + E [V a r [Y | X]] .$

Theorem: Let $X$ and $Y$ be two random variables then $C o v (X, Y) = C o v (X, E (Y | X))$

Example: Let $(X, Y)$ be a bidimensional random variable such that $\begin{aligned} f_{X | Y = y} (x) = \frac{1}{y}, 0 < x < y (for a fixed y > 1) \\ f_{Y} (y) = 3 y^{- 4}, y > 1 \end{aligned}$ Compute $V a r (X)$ using the previous theorem.

Exam question: Let $X$ and $Y$ be two random variables such that $E (X | Y = y) = y,$ for all $y$ such that $f_{Y} (y) > 0$ . Prove that $C o v (X, Y) = V a r (Y)$ . Are the random variables independent? Justify your answer.

Last updated on Jan 1, 0001

Edit this page