Difference in Means Hypothesis Test Calculator
Calculate the results of your two sample t-test.
Use the calculator below to analyze the results of a difference in sample means hypothesis test. Enter your sample means, sample standard deviations, sample sizes, hypothesized difference in means, test type, and significance level to calculate your results.
You will find a description of how to conduct a two sample t-test below the calculator.
Define the Two Sample t-test
Result
Significance Level | Difference in Means | |
---|---|---|
t-score | ||
Probability |
The Difference Between the Sample Means Under the Null Distribution
Conducting a Hypothesis Test for the Difference in Means
When two populations are related, you can compare them by analyzing the difference between their means.
A hypothesis test for the difference in samples means can help you make inferences about the relationships between two population means.
Testing for a Difference in Means
For the results of a hypothesis test to be valid, you should follow these steps:
- Check Your Conditions
- State Your Hypothesis
- Determine Your Analysis Plan
- Analyze Your Sample
- Interpret Your Results
Check Your Conditions
To use the testing procedure described below, you should check the following conditions:
- Independence of Samples - Your samples should be collected independently of one another.
- Simple Random Sampling - You should collect your samples with simple random sampling. This type of sampling requires that every occurrence of a value in a population has an equal chance of being selected when taking a sample.
- Normality of Sample Distributions - The sampling distributions for both samples should follow the Normal or a nearly Normal distribution. A sampling distribution will be nearly Normal when the samples are collected independently and when the population distribution is nearly Normal. Generally, the larger the sample size, the more normally distributed the sampling distribution. Additionally, outlier data points can make a distribution less Normal, so if your data contains many outliers, exercise caution when verifying this condition.
State Your Hypothesis
You must state a null hypothesis and an alternative hypothesis to conduct an hypothesis test of the difference in means.
The null hypothesis is a skeptical claim that you would like to test.
The alternative hypothesis represents the alternative claim to the null hypothesis.
Your null hypothesis and alternative hypothesis should be stated in one of three mutually exclusive ways listed in the table below.
Null Hypothesis | Alternative Hypothesis | Number of Tails | Description |
---|---|---|---|
μ1 - μ2 = D | μ1 - μ2 ≠ D | Two | Tests whether the sample means come from populations with a difference in means equal to D. If D = 0, then tests if the samples come from populations with means that are different from each other. |
μ1 - μ2 ≤ D | μ1 - μ2 > D | One (right) | Tests whether sample one comes from a population with a mean that is greater than sample two's population mean by a difference of D. If D = 0, then tests if sample one comes from a population with a mean greater than sample two's population mean. |
μ1 - μ2 ≥ D | μ1 - μ2 < D | One (left) | Tests whether sample one comes from a population with a mean that is less than sample two's population mean by a difference of D. If D = 0, then tests if sample one comes from a population with a mean less than sample two's population mean. |
D is the hypothesized difference between the populations' means that you would like to test.
Determine Your Analysis Plan
Before conducting a hypothesis test, you must determine a reasonable significance level, α, or the probability of rejecting the null hypothesis assuming it is true. The lower your significance level, the more confident you can be of the conclusion of your hypothesis test. Common significance levels are 10%, 5%, and 1%.
To evaluate your hypothesis test at the significance level that you set, consider if you are conducting a one or two tail test:
- Two-tail tests divide the rejection region, or critical region, evenly above and below the null distribution, i.e. to the tails of the null sampling distribution. For example, in a two-tail test with a 5% significance level, your rejection region would be the upper and lower 2.5% of the null distribution. An alternative hypothesis of μ1 - μ2 ≠ D requires a two tail test.
- One-tail tests place the rejection region entirely on one side of the distribution i.e. to the right or left tail of the null distribution. For example, in a one-tail test evaluating if the actual difference in means, D, is above the null distribution with a 5% significance level, your rejection region would be the upper 5% of the null distribution. μ1 - μ2 > D and μ1 - μ2 < D alternative hypotheses require one-tail tests.
The graphical results section of the calculator above shades rejection regions blue.
Analyze Your Sample
After checking your conditions, stating your hypothesis, determining your significance level, and collecting your sample, you are ready to analyze your hypothesis.
Sample means follow the Normal distribution with the following parameters:
- The Difference in the Population Means, D - The true difference in the population means is unknown, but we use the hypothesized difference in the means, D, from the null hypothesis in the calculations.
- The Standard Error, SE - The standard error of the difference in the sample means can be computed as follows:
SE = (s12/n1 + s22/n2)(1/2)
with s1 being the standard deviation of sample one, n1 being the sample size of sample one, s2 being the standard deviation of sample one, and n2 being the sample size of sample two.
The standard error defines how differences in sample means are expected to vary around the null difference in means sampling distribution given the sample sizes and under the assumption that the null hypothesis is true.
- The Degrees of Freedom, DF - The degrees of freedom calculation can be estimated as the smaller of
n1 - 1 or n2 - 1. For more accurate results, use the following formula for the degrees of freedom (DF):
DF = (s12/n1 + s22/n2)2 / ((s12/n1)2 / (n1 - 1) + (s22/n2)2 / (n2 - 1))
In a difference in means hypothesis test, we calculate the probability that we would observe the difference in sample means (x̄1 - x̄2), assuming the null hypothesis is true, also known as the p-value. If the p-value is less than the significance level, then we can reject the null hypothesis.
You can determine a precise p-value using the calculator above, but we can find an estimate of the p-value manually by calculating the t-score, or t-statistic, as follows: t = (x̄1 - x̄2 - D) / SE
The t-score is a test statistic that tells you how far our observation is from the null hypothesis's difference in means under the null distribution. Using any t-score table, you can look up the probability of observing the results under the null distribution. You will need to look up the t-score for the type of test you are conducting, i.e. one or two tail. A hypothesis test for the difference in means is sometimes known as a two sample mean t-test because of the use of a t-score in analyzing results.
Interpret Your Results
The conclusion of a hypothesis test for the difference in means is always either:
- Reject the null hypothesis
- Do not reject the null hypothesis
If you reject the null hypothesis, you cannot say that your sample difference in means is the true difference between the means. If you do not reject the null hypothesis, you cannot say that the hypothesized difference in means is true.
A hypothesis test is simply a way to look at evidence and conclude if it provides sufficient evidence to reject the null hypothesis.
Example: Hypothesis Test for the Difference in Two Means
Let’s say you are a manager at a company that designs batteries for smartphones. One of your engineers believes that she has developed a battery that will last more than two hours longer than your standard battery.
Before you can consider if you should replace your standard battery with the new one, you need to test the engineer’s claim. So, you decided to run a difference in means hypothesis test to see if her claim that the new battery will last two hours longer than the standard one is reasonable.
You direct your team to run a study. They will take a sample of 100 of the new batteries and compare their performance to 1,000 of the old standard batteries.
- Check the conditions - Your test consists of independent samples. Your team collects your samples using simple random sampling, and you have reason to believe that all your batteries' performances are always close to normally distributed. So, the conditions are met to conduct a two sample t-test.
- State Your Hypothesis - Your null hypothesis is that the charge of the new battery lasts at most two hours longer than your standard battery (i.e. μ1 - μ2 ≤ 2). Your alternative hypothesis is that the new battery lasts more than two hours longer than the standard battery (i.e. μ1 - μ2 > 2).
- Determine Your Analysis Plan - You believe that a 1% significance level is reasonable. As your test is a one-tail test, you will evaluate if the difference in mean charge between the samples would occur at the upper 1% of the null distribution.
- Analyze Your Sample - After collecting your samples (which you do after steps 1-3), you find the new battery sample had a mean charge of 10.4 hours, x̄1, with a 0.8 hour standard deviation, s1. Your standard battery sample had a mean charge of 8.2 hours, x̄2, with a standard deviation of 0.2 hours, s2. Using the calculator above, you find that a difference in sample means of 2.2 hours [2 = 10.4 – 8.2] would results in a t-score of 2.49 under the null distribution, which translates to a p-value of 0.72%.
- Interpret Your Results - Since your p-value of 0.72% is less than the significance level of 1%, you have sufficient evidence to reject the null hypothesis.
In this example, you found that you can reject your null hypothesis that the new battery design does not result in more than 2 hours of extra battery life. The test does not guarantee that your engineer’s new battery lasts two hours longer than your standard battery, but it does give you strong reason to believe her claim.