Two Sample t test for Comparing Two Means
We often want to know whether the means of two populations on some outcome differ. For example, there are many questions in which we want to compare two categories of some categorical variable (e.g., compare males and females) or two populations receiving different treatments in context of an experiment. The two-sample t-test is a hypothesis test for answering questions about the mean where the data are collected from two random samples of independent observations, each from an underlying normal distribution. The steps of conducting a two-sample t-test are quite similar to those of the one-sample test.
This example requires two normally distributed but independent populations, σ is unknown.
The first step to examining this question is to establish the specific hypotheses we wish to
examine. Specifically, we want to establish a null hypothesis and an alternative hypothesis to be evaluated with data.
An experiment is conducted to determine whether intensive tutoring (covering a great deal of material in a fixed amount of time) is more effective than paced tutoring (covering less material in the same amount of time). Two randomly chosen groups are tutored separately and then administered proficiency tests. Use α = 0.5 for the significance level.
In this case:
- Null hypothesis is that the difference between the two groups is 0. Another way of stating the null hypothesis is that the difference between the group that received intensive tutoring and the mean of the group that received paced tutoring is zero.
- Alternative hypothesis – the difference between the observed mean of the group that received intensive tutoring the expected mean of the group that received paced tutoring is not zero.
Let μ 1 represent the population mean for the intensive tutoring group and μ 2 represent the population mean for the paced tutoring group.
Null hypothesis: H0: μ 1 = μ 2
or H0: μ 1 – μ 2 = 0
Alternative hypothesis: Ha: μ 1 > μ 2
or Ha: μ 1 – μ 2 > 0
where and are the means of the two samples, Δ is the hypothesized difference between the population means (0 if testing for equal means), s 1 and s 2 are the standard deviations of the two samples, and n 1and n 2 are the sizes of the two samples. The number of degrees of freedom for the problem is the smaller of n1 – 1 and n2 – 1.
The degrees of freedom parameter is the smaller of (12 – 1) and (10 – 1), or 9. Because this is a one‐tailed test, the alpha level (0.05) is not divided by two. The next step is to look up t05,9 in the t‐table (Table VI in the text), which gives a critical value of 1.833.
The computed t of 1.166 ≤ 1.833, so the null hypothesis cannot be rejected. This test has not provided statistically significant evidence that intensive tutoring is superior to paced tutoring.
Let us examine how to do this problem using StatCrunch. Click on Stat > T Stats > Two Sample > With Summary. Enter the data in the next window and click Compute. The results are shown below.
As stated before, the computed t of 1.166 does not exceed the value found in the table (1.833), so the null hypothesis cannot be rejected. In conclusion, this test has not provided statistically significant evidence that intensive tutoring is superior to paced tutoring.
Use This Value to Determine P-Value
Having calculated the t-statistic, compare the t-value with a standard table of t-values to determine whether the t-statistic reaches the threshold of statistical significance. You can also use the P Value from T Score Calculator to find the p-value.
P values evaluate how well the sample data support the devil’s advocate argument that the null hypothesis is true. It measures how compatible your data are with the null hypothesis. How likely is the effect observed in your sample data if the null hypothesis is true?
- High P values: your data are likely with a true null.
- Low P values: your data are unlikely with a true null.
A low P value suggests that your sample provides enough evidence that you can reject the null hypothesis for the entire population. In this case, the p-value is 0.1294. The result is not significant at p < 0.05.