6. Hypothesis Testing
This chapter is strongly related to the chapter about confidence intervals. In the confidence interval chapter, we learned how to combine information about our estimator and its RSD to construct a (possibly approximate) (or ) confidence interval for some unknown parameter , as in:
In this chapter, we will answer a slightly different question:
- Suppose that somebody claims to know that (this is called the null-hypothesis).
- How can we use our random sample of data (and our assumptions about the population distribution) to detect if this claim is false?
- Again, we need to include a quantification of uncertainty due to sampling.
We will see that the end-result of step 2 of the construction of a statistical test is a so-called decision rule. The decision rule of a particular test for a particular problem could for instance be:
Reject the null hypothesis that is equal to if is larger that .
We can then calculate the sample average using our sample of data, and make the decision (this is step 3).
The null-hypothesis (pronounce “H nought”) is the statement that we want to test. Suppose that a guy called Pierre claims to know that . We then have . Testing this null-hypothesis means that we will confront this statement with the information in the data. If it is highly unlikely that we would observe the information in the data that we have observed, under the conditions that are implied by the null hypothesis being true, we reject the null-hypothesis (in other words, “I don’t believe you Pierre”), in favour of the alternative hypothesis, . If no such statement can be made, we do not reject the null-hypothesis.
A judge in a court of law tests null-hypotheses all the time: in any respectable democracy, the null hypothesis entertained by the judicial system is “suspect is not guilty”. If there is sufficient evidence to the contrary, the judge rejects this null-hypothesis, and the suspect is sent to jail, say. If the judge does not reject the null-hypothesis, it does not mean that the judicial system claims that the suspect is innocent. All it says is that there was not enough evidence against the validity of the null hypothesis. In statistics, as in law, we never accept the null-hypothesis. We either reject or we do not reject .
1 The Significance Level and The Power-function of a Statistical Test
We will discuss a method to construct statistical tests shortly. For now, suppose that we have constructed a test for , and have obtained the following decision rule:
Reject the null hypothesis that is equal to if is larger that .
We cannot be sure that our decision is correct. We can make two types of errors:
We can reject the null-hypothesis when it is, in fact, true (i.e. in reality ). This error is called a type-I error. The probability that we make this error is called the level or significance level of the test, and is denoted by . We want this probabilty to be small. We often take or .
We can fail to reject the null-hypothesis when it is, in fact, false (i.e. in reality ). This error is called a type-II error. The probability that we make this error is denoted by . Note that this is a function of : the probability of making a type-II error when the true is equal to is different from the probability of making a type-II error when the true is equal to , for instance. The power-function of the test evaluated at is defined as : it is the probability of (correctly) rejecting the null-hypothesis when it is, in fact, false. We want the power to be high (close to one) for each value . See Example 1 for an example of a power-function.
2 The Relationship Between Tests and Confidence Intervals
There is a one-to-one relationship between hypothesis testing and confidence intervals; suppose that we have obtained the following confidence interval for :
The link with hypothesis testing is as follows:
Suppose that we test , where is some known value. All hypothesis tests that use a value of that is contained within the confidence interval, will not lead to a rejection of at the level.
For instance, from the confidence interval above, we can conclude that would not be rejected. The null hypothesis , however, would be rejected. More about this in Example 1.
3 Statistical Hypothesis Testing
We will now discuss the method that can be used to obtain a statistical test that tests the null-hypothesis that , for some pre-specified value of .
The Method: Testing versus .
Step 1: Find Pivotal Statistic (or: Test statistic) .
Step 2: Obtain the quantiles (called critical values now) from under .
We can now formulate the decision rule.
Step 3: Calculate and compare with critical values (or use the p-value).
The hardest part is step 1. A pivotal statistic (T) is a function of the data (our random sample) alone, given that we know the value of , the value of the parameter that we want to test. This value is called the hypothesised value of . It was in the example above. It turns out that the pivotal statistic is often equal to the pivotal quantity that we needed to obtain confidence intervals. It is a statistic in the current context, because the value of is known (it is ), whereas in confidence intervals it was not known. That is why was called a pivotal quantity, not a pivotal statistic.
The pivotal statistic (or: test statistic) must satisfy the following two conditions:
- The distribution of T should be completely known if the null-hypothesis is true (for instance, or ).
- T itself should not depend on any parameters that are unknown if the null hypothesis is true (i.e. given that the value of is a known constant). It is only allowed to depend on the data (which is the definition of a statistic).
- Condition 1 allows us to perform step 2 of the method.
- Condition 2 allows us to perform step 3 of the method.
The easiest way to clarify the method is to apply it to some examples.
4 Testing or .
For the case that we want to test a null-hypothesis about the mean of a population distribution, we will discuss examples where the population distribution is Bernoulli, Poisson, Normal or Exponential.
4.1 Example 1:
We are considering the situation where we are willing to assume that the population distribution is a member of the family of Normal distributions. We are interested in testing the null-hypothesis .
Step 1: Find a Test Statistic (or: Pivotal Statistic)
The method requires us to find a pivotal statistic T. As this is a question about the mean of the population distribution, it makes sense to consider the sample average (this is the Method of Moments estimator of , as well as the Maximum Likelihood estimator of ). We know that linear combinations of Normals are Normal, so that we know the RSD of :
If the null-hypothesis is true, we obtain:
The quantity is not a pivotal statistic, as the RSD of is not completely known: it depends on the unknown parameter , even given a known value of .
Let’s try standardising with the population-variance:
The quantity on the left-hand side satisfies the first condition of a pivotal statistic: the standard normal distribution is completely known (i.e. has no unknown parameters). The quantity itself does not satisfy the second condition: although it is allowed to depend on the known , it also depends on the unknown parameter , and is hence not a statistic.
We can solve this problem by standardising with the estimated variance (instead of ):
We have obtained a pivotal statistic : itself depends on the sample alone (given that is a known constant if is known to be true) and is hence a statistic. Moreover, if is true, the RSD of T is completely known. We can progress to step two.
Step 2: Obtain the critical values from under .
We need to find the values and such that under (i.e. assuming that the null-hypothesis is true). In other words, we need to find the following:
- : the quantile of the distribution with degrees of freedom.
- : the quantile of the distribution with degrees of freedom.
These are easily found using the statistical tables or by using R. If, for example, we have a sample of size :
qt(0.025, df=49)
## [1] -2.009575
qt(0.975, df=49)
## [1] 2.009575
As the distribution is symmetric, it would have sufficed to calculate only one of the two quantiles.
We have the following decision rule:
Reject the null hypothesis if the observed test-statistic is larger than 2.009575 or smaller than -2.009575.
One-sided Tests: We have been using the two-sided alternative hypothesis in this example. Sometimes we prefer the right-sided (or left-sided) alternative hypothesis ().
In that case, there would be evidence against the null-hypothesis if is a sufficient amount larger than (a sufficient amount smaller than ), which implies that is sufficiently positive (negative).
What constitutes sufficiently positive (negative)? With , a value of larger than (smaller than ) would be considered sufficiently unlikely to have ocurred under the null hypothesis.In our example, the alternative hypothesis would therefore have led to the following decison-rule:
Reject the null hypothesis if the observed test-statistic is larger than 1.676551.
We test against if we are sure that is not smaller than . Likewise, we test against if we are sure that is not larger than .
The link between Hypothesis Testing and Confidence Intervals:
To understand this link, let’s focus on the following true statement (we consider the two-sided test):
- Note that, if we replace by , this probability is true for any value of that is contained in the confidence interval (by step 3 of the confidence interval method).
- Therefore, any hypothesised value that is contained within the confidence interval will not be rejected by the test.
Step 3: Calculate and compare with critical values.
Using a sample of size , we can calculate the values of and . As the value of is known, the pivotal statistic is truly a statistic: it is a function of the random sample alone. Because we have observed a random sample from the population distribution, we can calculate the value that takes using our observed sample: . Suppose that we obtain , using a sample of size .
The argument of a statistical test is as follows:
- In step 2, we have derived that, if is true, our observed test statistic is a draw from the distribution.
- Observing a value from the is highly unlikely: , which is the quantile of the distribution.
- We therefore think that the null-hypothesis is false: we reject .
To give an example, suppose that we want to test the null-hypothesis against the two-sided alternative hypothesis .
We draw a random sample of size from . This means that, in reality, the null-hypothesis is false.
set.seed(12345)
data <- rnorm(50, mean=7, sd=10)
data
## [1] 12.8552882 14.0946602 5.9069669 2.4650283 13.0588746 -11.1795597
## [7] 13.3009855 4.2381589 4.1584026 -2.1932200 5.8375219 25.1731204
## [13] 10.7062786 12.2021646 -0.5053199 15.1689984 -1.8635752 3.6842241
## [19] 18.2071265 9.9872370 14.7962192 21.5578508 0.5567157 -8.5313741
## [25] -8.9770952 25.0509752 2.1835264 13.2037980 13.1212349 5.3768902
## [31] 15.1187318 28.9683355 27.4919034 23.3244564 9.5427119 11.9118828
## [37] 3.7591342 -9.6205024 24.6773385 7.2580105 18.2851083 -16.8035806
## [43] -3.6026555 16.3714054 15.5445172 21.6072940 -7.1309878 12.6740325
## [49] 12.8318765 -6.0679883
We calculate the sample average and the adjusted sample variance:
xbar <- mean(data)
xbar
## [1] 8.795663
S2 <- var(data)
S2
## [1] 120.2501
We calculate the quantiles (it turns out that we only need ):
b <- qt(0.975, df=49)
b
## [1] 2.009575
The observed value of the test-statistic is equal to:
T_obs <- (xbar - 5)/sqrt(S2/50)
T_obs
## [1] 2.447541
As , we reject . The test gives the correct result.
The t.test command can do the calculations for us:
t.test(data, mu=5, alternative="two.sided", conf.level=0.95)
##
## One Sample t-test
##
## data: data
## t = 2.4475, df = 49, p-value = 0.01801
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
## 5.67920 11.91213
## sample estimates:
## mean of x
## 8.795663
The p-value associated with :
The p-value for a two-sided test (which is shown in the output) is defined as follows:
We reject a null-hypothesis if the p-value is smaller than (0.05 here). The conclusion will be the same as the conclusion obtained from comparing with critical values (step 3). The p-value above actually tells us something more: we would not have rejected the null-hypothesis for a two-sided alternative with . If you use the p-value, there is no need to find critical values (step 2).
If our alternative hypothesis was , we would have used the p-value for a right-sided test:
4.1.1 Aproximation using CLT
We are considering the same example, but now we are not willing to assume that the population distribution is a member of the family of Normal distributions. We are still interested in testing against the two-sided alternative .
The narrative does not have to be changed much:
Step 1: Find a Test-statistic (or: pivotal statistic) :
The method requires us to find a pivotal statistic. For the sample average, we know that, by central limit theorem, that
We can obtain a test-statistic by standardising with the estimated variance:
Step 2: Find the critical values and with .
The quantile of the standard normal distribution is:
b2 <- qnorm(0.975, mean=0, sd=1)
b2
## [1] 1.959964
We can therefore write down the decision rule:
Reject the null hypothesis if the observed test-statistic is larger than 1.96 or smaller than -1.96,
where .
Step 3: Calculate and compare with critical values (or use the p-value).
We will use the same dataset, so we obtain the same observed test-statistic: . Note that the data were generated using , so that the null-hypothesis is, in fact, false. As , we reject the null-hypothesis (correctly).
The power at :
In reality, we would not know for sure that our conclusion is correct, because we would not know that the true mean is equal to 7. The power at tells us how likely it is that we draw the correct conclusion using our test when, in fact, .
The top graph in the following figure gives a graphical explanation of Power():
## Using poppler version 0.73.0
If the null-hypothesis is true, the RSD of T is the standard-normal distribution. If, in reality, , the RSD of T is . Therefore, the Power at is the area with dashed diagonal gray lines:
The bottom graph in the figure shows that the power at is higher than the power at : as the RSD of T is shifted more to the right and the critical value does not change, the area with dashed diagonal gray lines increases.
Plotting the Power-curve
We want to plot the power-curve of our statistical test.
The Power-curve (the power at any ):
The power-curve is a plot of the power-function. The power-function of the test is equal to the probability that we reject the null hypothesis, given that it is indeed false. As the null-hypothesis can be false in many ways (it is false whenever the true is unequal to , which is 5 in our example), the power of the test is a function of :
We want these probabilities to be as high as possible (or, equivalently, we want the probability of making a type-II error to be as small as possible) for all values of .
In our example, the power-function is equal to
The two-but-last line uses the symbolic notation for a random variable that has the distribution. This distribution is obtained from noting that if , for instance, the RSD of is , so that . Therefore, for general values of , . To obtain the required probabilities, we substract the mean in the next step to obtain a standard-normal distribution. We now plot the power-function using R:
mu <- seq(from=0, to=10, by=0.1)
mu
## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4
## [16] 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9
## [31] 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4
## [46] 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
## [61] 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4
## [76] 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
## [91] 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10.0
y <- 1 - pnorm( 6.96 - mu) + pnorm( 2.96 - mu)
y
## [1] 0.99846180 0.99788179 0.99710993 0.99609297 0.99476639 0.99305315
## [7] 0.99086253 0.98808937 0.98461367 0.98030073 0.97500211 0.96855724
## [13] 0.96079610 0.95154278 0.94062007 0.92785499 0.91308508 0.89616539
## [19] 0.87697572 0.85542791 0.83147274 0.80510607 0.77637368 0.74537467
## [25] 0.71226284 0.67724599 0.64058294 0.60257833 0.56357538 0.52394672
## [31] 0.48408404 0.44438669 0.40525008 0.36705437 0.33015398 0.29486860
## [37] 0.26147601 0.23020706 0.20124304 0.17471547 0.15070815 0.12926136
## [43] 0.11037777 0.09402971 0.08016731 0.06872703 0.05964005 0.05284013
## [49] 0.04827045 0.04588912 0.04567306 0.04762015 0.05174936 0.05809910
## [55] 0.06672357 0.07768766 0.09106026 0.10690664 0.12528008 0.14621336
## [61] 0.16971050 0.19573926 0.22422494 0.25504581 0.28803058 0.32295817
## [67] 0.35955989 0.39752390 0.43650205 0.47611856 0.51598016 0.55568737
## [73] 0.59484605 0.63307886 0.67003594 0.70540430 0.73891544 0.77035107
## [79] 0.79954646 0.82639161 0.85083028 0.87285699 0.89251238 0.90987737
## [85] 0.92506633 0.93821984 0.94949743 0.95907050 0.96711588 0.97381016
## [91] 0.97932484 0.98382262 0.98745454 0.99035813 0.99265637 0.99445738
## [97] 0.99585470 0.99692804 0.99774432 0.99835894 0.99881711
plot(mu, y, type="l", xlab="mu", ylab="P(reject H0 | mu)", col="blue")
abline(v=7, b=0, col="red")
abline(a=0.05, b=0, lty="dashed", col="gray")
abline(a=0.5159802, b=0, lty="dashed", col="gray")
The power-curve shows that we were quite lucky: if , the test correctly rejects the null-hypothesis in only about of the repeated samples. Our observed sample was one of those, but need not have been. If, in reality, or , the test would always give the correct answer.
Note that if in reality (i.e. the null hypothesis is true) the test rejects only in of the repeated samples. This is the meaning of , the probability of a type-I error. Note also that the probability of a type-II error if () is around .
If the alternative hypothesis was , the power curve would look different:
The power-curve of a right-sided test does not increase on the left:
mu <- seq(from=0, to=10, by=0.1)
y <- 1 - pnorm(6.645 - mu)
y2 <- 1 - pnorm( 6.96 - mu) + pnorm( 2.96 - mu)
plot(mu, y, type="l", xlab="mu", ylab="P(reject H0 | mu)")
lines(mu, y2, lty="dashed", col="blue")
abline(v=7, b=0, col="red")
abline(a=0.05, b=0, lty="dashed", col="gray")
abline(a=0.6387052, b=0, lty="dashed", col="gray")
1 - pnorm(6.645 - 7)
## [1] 0.6387052
The power of the right-sided test at is about , higher than the for the two-sided test. The power-curve of the two-sided test is plotted in dashed blue for comparison.
4.2 Example 2:
We are considering the situation where we are willing to assume that the population distribution is a member of the family of Poissson() distributions. As the Poisson distribution has , we are interested in testing versus . As an example, we take and .
Step 1: Find a Test-statistic (or: pivotal statistic) :
The method requires us to find a pivotal statistic . One requirement is that the T is a statistic (depends only on the data) given that is a known constant (the hypothesised value). It therefore makes sense to consider the sample average (this is the Method of Moments estimator of , as well as the Maximum Likelihood estimator of ). We do not know the repeated sampling distribution of in this case. All we know is that
If the null-hypothesis is true, we have
This is a pivotal statistic: the RSD of the sum is completely known if we know that , and the sum itself is a statistic. We can proceed to step 2.
Step 2: Find the critical values and with .
The and quantiles of the Poisson(20) distribution are (approximately):
a <- qpois(0.025, lambda=20)
a
## [1] 12
b <- qpois(0.975, lambda=20)
b
## [1] 29
ppois(12, 20) #P(T<=12)
## [1] 0.03901199
ppois(11,20) #P(T<12)
## [1] 0.02138682
1-ppois(29, 20) #P(T>29)
## [1] 0.02181822
1-ppois(28, 20) #P(T>=29)
## [1] 0.03433352
We can therefore write down the decision rule:
Reject the null hypothesis if is smaller than 12 or larger than 29.
Step 3: Calculate and compare with critical values (or use the p-value).
We draw a sample of size 100 from the Poisson() distribution, and calculate the sum:
set.seed(1234)
data <- rpois(100, 0.2)
data
## [1] 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
## [38] 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
## [75] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0
T_obs <- sum(data)
T_obs
## [1] 14
As , we do not reject the null-hypothesis that .
4.2.1 Aproximation using CLT
We are now interested in finding an approximate test for . There is no good reason to do this, other than checking how good the approximation is.
Step 1: Find a Test-statistic (or: pivotal statistic) :
As for Poisson() distributions, the central limit theorem states that
If the null-hypothesis is true, we have
We can now standardise:
This is a pivotal statistic.
Step 2: Find the quantiles and such that .
qnorm(0.975)
## [1] 1.959964
The decision rule is:
Reject the null-hypothesis if T<-1.96 ot T>1.96
Note that, as is equivalent to
we can also write:
Reject the null-hypothesis if is smaller than 11.23 or larger than 28.77.
This is quite similar to the exact test derived above.
Step 3: Calculate and compare with critical values (or use the p-value).
Using the same sample, we obtain:
T_obs <- (mean(data) - 0.2)/sqrt(0.002)
T_obs
## [1] -1.341641
As is not smaller than , we do not reject the null-hypothesis that .
We can also use the p-value:
pnorm(T_obs) + (1-pnorm(-T_obs))
## [1] 0.1797125
As the p-value is not smaller than , we do not reject the null-hypothesis.
4.3 Example 3:
We are considering the situation where we are willing to assume that the population distribution is a member of the family of Exponential distributions with (unknown) parameter . As the exponential distribution has , we are interested in testing versus the alternative hypothesis . As an example, we will take and . Hence, we want to test the null-hypothesis that .
Step 1: Find a test statistic :
The parameter can be estimated by , the Method of Moments and Maximum Likelihood estimator. We cannot obtain the exact RSD for this statistic. As we have a random sample from the Exponential distribution, we also know that
If the null-hypothesis is true, we obtain
This is a pivotal statistic. We cannot get the critical values of this statistic using our Statistical Tables. We can transform the Gamma distribution to a chi-squared distribution (for which we have tables) as follows:
This is also a pivotal statistic.
Step 2: Find the quantiles and such that .
The and quantiles are:
a <- qchisq(0.025, df=200)
a
## [1] 162.728
b <- qchisq(0.975, df=200)
b
## [1] 241.0579
The decision rule is:
Reject the null-hypothesis if is smaller than 162.728 or larger than 241.0579.
or
Reject the null-hypothesis if_{i=1}^n X_i$ is smaller than 406.82 or larger than 602.65.
Step 3: Calculate and compare with critical values (or use the p-value).
We draw a sample of size 100 from the Exp() distribution, and calculate the sum:
set.seed(12432)
data <- rexp(100, rate=0.2)
data
## [1] 0.14907795 8.07766447 2.02745666 6.49205265 0.37654896 10.23011103
## [7] 6.50703386 1.99843257 3.19968787 3.81472730 11.59536625 3.10470183
## [13] 6.22892454 2.70810540 4.79461150 8.46509304 0.30120264 3.05038690
## [19] 7.29538310 0.16704212 6.78359305 2.21707233 5.15083448 0.73830057
## [25] 5.53390365 10.54390572 0.55507295 4.18100134 7.73419513 9.33628288
## [31] 1.47064050 17.69536803 6.77209163 3.69946593 4.18173227 7.68760043
## [37] 1.43140818 0.18152633 11.45252555 0.21562584 1.87887548 4.99501155
## [43] 5.55790541 0.26258148 0.23179928 2.08911416 0.42976580 1.64560789
## [49] 1.08356090 1.38374444 3.20642702 2.58905530 3.80587957 1.21345423
## [55] 0.68163493 1.13175232 9.19108407 0.75484126 1.45888929 9.99728440
## [61] 0.05778570 2.27937926 3.96084369 6.18156264 1.94891606 2.39714234
## [67] 0.29781575 0.09192151 3.68075910 0.18130849 4.54638749 1.82050722
## [73] 3.66988372 4.97436924 12.16494984 2.34126599 4.06593290 5.68069283
## [79] 1.84244461 0.74156984 2.69434477 4.86109398 3.51903898 14.82506115
## [85] 0.91001743 0.03604018 5.18614162 2.99553917 2.26465601 1.79149822
## [91] 6.80382571 2.76859507 3.60379507 2.90056841 4.64767496 0.50438227
## [97] 12.26700920 9.39569936 2.13883634 0.02376590
T_obs <- sum(data)
T_obs
## [1] 398.797
As , we reject the null-hypothesis that (or that ). Note that the null-hypothesis is, in fact, true, as our data was generated using . We have incorrectly rejected the null hypothesis. As , we know that this type-I error occurs in of the repeated samples. Unfortunately, our observed sample was one of those . In practice, we would not have known this, so we would not have known that we made the wrong decision.
4.3.1 Aproximation using CLT
We will now derive the test using a large sample approximation.
Step 1: Find a pivotal statistic :
If we have a random sample from the Exponential distribution, which has and , the CLT states that
If the null-hypothesis is true, we obtain
We can standardise to obtain a standard-normal distribution (for which we have tables):
Step 2: Find the quantiles and such that .
As before, the quantile of is (more or less) 1.96.
The decision rule is:
Reject the null-hypothesis if T < -1.96 or larger than 1.96.
Note that, as is equivalent to
we can also write:
Reject the null-hypothesis if is smaller than 402 or larger than 598.
This is very similar to the exact test that uses and .
Step 3: Calculate and compare with critical values (or use the p-value).
Using the same sample we obtain
T_obs <- (mean(data) - 5)/sqrt(0.25)
T_obs
## [1] -2.024059
pnorm(T_obs) + (1-pnorm(-T_obs))
## [1] 0.04296408
As the p-value is smaller than , we reject the null-hypothesis that . Hence, the approximate test gives the incorrect result, as did the exact test.
4.4 Example 4:
We are considering the situation where we are willing to assume that the population distribution is a member of the family of Bernoulli distributions. We are interested in testing versus . As an example, we take and .
Both the Method of Moments estimator and the Maximum Likelihood for are equal to the sample average.
Step 1: Find a pivotal statistic :
When sampling from the Bernoulli() distribution, we only have the following result:
If the null-hypothesis is true, we obtain
As is a statistic and its distribution is completely known if the null-hypothesis is true, it is a test-statistic.
Step 2: Find the critical values and with .
The and quantiles of the Binomial(n=100, p=0.7) distribution are (approximately):
a <- qbinom(0.025, size=100, prob=0.7)
a
## [1] 61
b <- qbinom(0.975, size=100, prob=0.7)
b
## [1] 79
pbinom(61, size=100, prob=0.7) #P(T<=61)
## [1] 0.033979
pbinom(60, size=100, prob=0.7) #P(T<61)
## [1] 0.02098858
1 - pbinom(79, size=100, prob=0.7) #P(T>79)
## [1] 0.01646285
1 - pbinom(78, size=100, prob=0.7) #P(T>=79)
## [1] 0.02883125
We can therefore write down the decision rule:
Reject the null hypothesis if is smaller than 61 or larger than 79.
Step 3: Calculate and compare with critical values (or use the p-value).
We draw a sample of size 100 from the Bernoulli() distribution, and calculate the sum:
set.seed(12345)
data <- rbinom(100, size=1, prob=0.7)
data
## [1] 0 0 0 0 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0
## [38] 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 0 0 0 1 0 1 1 0 1 1 1 1
## [75] 1 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 1 1 0 1 0 0 1 1
T_obs <- sum(data)
T_obs
## [1] 65
As , we (correctly) do not reject the null-hypothesis that .
4.4.1 Approximation using CLT
We are interested in finding an approximate test for the null-hypothesis .
Step 1: Find a pivotal statistic :
If , we know that and . The central limit theorem then states that
If the null-hypothesis is true, we obtain
Standardising gives
This is a pivotal statistic.
Step 2: Find the quantiles and such that .
As before, the quantile of is about .
Step 3: Calculate and compare with critical values (or use the p-value).
Using the same data, we obtain
T_obs <- (mean(data) - 0.7) / sqrt(0.0021)
T_obs
## [1] -1.091089
As , we (correctly) do not reject the null-hypothesis.
5 Testing (or )
For the case that we want a confidence interval for the difference in means of two populations, we will discuss examples where the population distributions are Normal and Bernoulli.
Suppose that we have two random samples. The first random sample () is a sample of incomes from Portugal and has size . The second random sample () is a sample of incomes from England and has size . We want test .
5.1 Example 5: and
Suppose that we are willing to assume that both income distributions are members of the Normal family of distributions.
Step 1: Find a pivotal statistic :
From example of the RSD chapter, we know that:
This is not a pivotal statistic, even if it is known that . To obtain a pivotal quantity, we need to standardise. How we standardise will depend on whether or not and are known. If they are known, we will denote them by and .
5.1.1 Variances Known: and
If the variances are known, we can standardise using the known variances and obtain a repeated sampling distribution that is standard-normal:
If the null hypothesis is true, we have
Step 2: Find the quantiles and such that .
The quantile of $N(0,1) is about .
Step 3: Calculate and compare with critical values (or use the p-value).
As an example, we draw a random sample of size from and another random sample of size from . We know that the distributions are Normal and that and . Note that, in this case, the null-hypothesis is false.
set.seed(12345)
data1 <- rnorm(50, mean=10, sd=10)
data2 <- rnorm(60, mean=15, sd=9)
We calculate the observed test statistic:
xbar1 <- mean(data1)
xbar2 <- mean(data2)
T_obs <- (xbar1-xbar2)/sqrt(100/50 + 80/60)
T_obs
## [1] -2.940584
As , we (correctly) reject the null-hypothesis.
5.1.2 Variances Unknown but The Same:
If the variances and are not known, we need to estimate them. If and are unknown, but known to be equal (to the number , say), we can estimate the single unknown variance using only the first sample or using only the second sample. It would be better, however, to use the information contained in both samples. As discussed in example of the RSD chapter, we can use the pooled adjusted sample variance , leading to:
If the null-hypothesis is true, we have
This is a pivotal statistic.
Step 2: Find the quantiles and such that .
In our example, and . The quantile of the distribution is:
q2 <- qt(0.975, df=108)
q2
## [1] 1.982173
Step 3: Calculate and compare with critical values (or use the p-value).
As an example, we again take a random sample of size from and another random sample of size from . Note that, this time, the variances are the same, but we do not know the value. Note that the null-hypothesis is true.
set.seed(12345)
data1 <- rnorm(50, mean=10, sd=10)
xbar1 <- mean(data1)
S2_1 <- var(data1)
S2_1
## [1] 120.2501
data2 <- rnorm(60, mean=10, sd=10)
xbar2 <- mean(data2)
S2_2 <- var(data2)
S2_2
## [1] 121.1479
The observed test-statistic is equal to:
S2_p <- (59*S2_1 + 49*S2_2)/(60 + 50 -2)
S2_p
## [1] 120.6574
T_obs <- (xbar1 - xbar2) / sqrt( (S2_p*(1/60 + 1/50)) )
T_obs
## [1] -0.2896504
As , we (correctly) do not reject the null-hypothesis.
5.1.3 Variances Unknown, Approximation Using CLT
If the variances and are unknown and not known to be the same, we have the estimate both of them. Standardising with two estimated variances does not lead to a distribution, so we have no exact RSD. We can approximate the RSD using the Central Limit Theorem:
Standardising with two estimated variances gives:
If the null-hypothesis is true, we have
This is a pivotal quantity.
Step 2: Find the quantiles and such that .
The quantile of is about .
Step 3: Calculate and compare with critical values (or use the p-value).
We generate the data again according to our first example:
data1 <- rnorm(50, mean=10, sd=10)
xbar1 <- mean(data1)
data2 <- rnorm(60, mean=15, sd=9)
xbar2 <- mean(data2)
In this case, we do not know and , so we estimate them:
S2_1 <- var(data1)
S2_1
## [1] 127.94
S2_2 <- var(data2)
S2_2
## [1] 74.59016
We calculate the observed test-statistic:
T_obs <- (xbar1 - xbar2) / sqrt( S2_1/60 + S2_2/50 )
T_obs
## [1] -3.029133
As , we (correctly) reject the null-hypothesis.
5.2 Example 6: and
We are now assuming that and both have Bernoulli distributions, with , and , , respectively. For example, could represent the unemployment rate in Portugal and the unemployment rate in England. We want a two-sided test for the null-hypothesis .
Step 1: Find a pivotal statistic :
We base our analysis of on , the difference of the fraction of unemployed in sample 1 with the fraction of unemployed in sample 2. The central limit theorem states that both sample averages have an RSD that is approximately Normal:
As is a linear combination of (approximately) Normals, its RSD is (approximately) Normal:
We can estimate by and by . Using section 3.1.3 of the RSD chapter, standardising with the estimated variances gives:
If the null-hypothesis is true, we have
Step 2: Find the quantiles and such that .
The quantile is around 1.96.
Step 3: Calculate and compare with critical values (or use the p-value).
As an example, we draw a random sample of size from Bernoulli(), and another random sample of size from Bernoulli():
set.seed(1234)
data1 <- rbinom(50, size=1, prob=0.7)
data2 <- rbinom(40, size=1, prob=0.5)
We calculate the sample averages:
xbar1 <- mean(data1)
xbar1
## [1] 0.8
xbar2 <- mean(data2)
xbar2
## [1] 0.425
diff <- xbar1 - xbar2
diff
## [1] 0.375
The observed test statistic is equal to:
T_obs <- (diff - 0)/sqrt(xbar1*(1-xbar1)/50 + xbar2*(1-xbar2)/40)
T_obs
## [1] 3.88661
As , we (correctly) reject the null-hypothesis: the true difference is equal to .
6 Testing
For the test we will only discuss the example where the population distributions is Normal.
6.1 Example 7: :
We are considering the situation where we are willing to assume that the population distribution is a member of the family of Normal distributions. We are now interested in testing .
Step 1: Find a pivotal statistic :
The natural estimator for the population variance is the adjusted sample variance
We do not know the repeated sampling distribution of . We do know that
If the null-hypothesis is true we have
We can take \(T=\frac{(n-1) S^{\prime 2}}{\sigma^2_0\) as our pivotal statistic.
Step 2: Find the quantiles and such that .
The Chi-squared distribution is not symmetric, so we need to calculate both quantiles separately. If we have a sample of size from , and we test , we need the and quantiles of the distribution:
a <- qchisq(0.025, df=99)
a
## [1] 73.36108
b <- qchisq(0.975, df=99)
b
## [1] 128.422
The decision rule is:
Reject the null-hypothesis if T < 73.361 or T > 128.422.
Step 3: Calculate and compare with critical values (or use the p-value).
As an example, we draw a sample of size from . We want to test . We first calculate the adjusted sample variance:
set.seed(12345)
data <- rnorm(100, mean=4, sd=4)
S2 <- var(data)
S2
## [1] 19.882
The observed test-statistic is
T_obs <- 99*S2/16
T_obs
## [1] 123.0199
As , we (correctly) do not reject the null-hypothesis.
7 Testing
If we want to test whether the variances of two population distributions are equal, we will only discuss the example where the population distributions are Normal.
Suppose that we have two random samples. The first random sample () is a sample of incomes from Portugal and has size . The second random sample () is a sample of incomes from England and has size . We want to test (or, equivalently, )..
The obvious thing to do is to compare , the adjusted sample variance of the Portuguese sample, with , the adjusted sample variance of the English sample. In particular, we would compute . Unfortunately, there is no easy way to obtain the repeated sampling distribution of .
If we assume that both population distributions are Normal, we can derive a result that comes close enough.
7.1 Example 8: and
Instead of comparing the the differencet of the two sample variances, we can also calculate . If this number is close to , perhaps is also close to one, which implies that is close to zero.
Step 1: Find a pivotal statistic :
We have seen in Example 8 of the RSD chapter that
If the null hypothesis is true, the denominator is equal to one, so that
This is a pivotal statistic.
Step 2: Find the quantiles and such that .
As the F-distribution is not symmetric, we need to calculate both quantiles. As an example, let and . Then we have
a <- qf(0.025, df1=99, df2=149)
a
## [1] 0.6922322
b <- qf(0.975, df1=99, df2=149)
b
## [1] 1.425503
If you use the statistical tables, then you need the following result to compute :
See Example of the RSD chapter.
The decision rule is:
Reject the null-hypothesis if or .
Step 3: Calculate and compare with critical values (or use the p-value).
As an example, we use a random sample of size from and a second random sample of size from (so is false), and calculate the adjusted sample variances:
set.seed(12345)
data1 <- rnorm(100, mean=10, sd=10)
S2_1 <- var(data1)
S2_1
## [1] 124.2625
data2 <- rnorm(150, mean=20, sd=sqrt(400))
S2_2 <- var(data2)
S2_2
## [1] 402.0311
The observed test-statistic is equal to:
T_obs <- S2_1 / S2_2
T_obs
## [1] 0.3090867
As , we (correctly) reject the null-hypothesis.