SOC 221 • Lecture 7
Monday, July 15, 2024
\(\bar{X}\) —INFERENCE–> \(\mu_x\)
Sample statistic: The characteristic of the sample that we actually observe (i.e. the mean study time of a SAMPLE of UW students)
For example: draw a random sample of 100 students and observe \(\bar{X} = 14.5\)
Population parameter: The characteristic of the population that we are interested in knowing (i.e. the mean study time of all UW students)
Our goal: Estimate the unknown population parameter \(\mu_x = ?\)
\[ \bar{X} = 14.5\text{ hours/week study time} \]
Hypothesis:
Hypothesis testing:
Say we know that, on average, college students across the country study 13 hrs/week, with a standard deviation of 8.
Question: Do UW students really study more than the national average?
Insufficient time, energy, and money, so draw a random sample of UW students:
N = 100, Mean = 14.5 hrs/week
Looks like UW students study more than the national average.
But there are at least two explanations for this observation. . .
Explanation 1:
\(\mu{UW} > 13\)
Say we know that, on average, college students across the country study 13 hrs/week, with a standard deviation of 8.
Question: Do UW students really study more than the national average?
Insufficient time, energy, and money, so draw a random sample of UW students:
N = 100, Mean = 14.5 hrs/week
Looks like UW students study more than the national average.
But there are at least two explanations for this observation. . .
Explanation 2:
\(\mu{UW} = 13\)
Question: Do UW students really study more than the national average?
Have to decide between two explanations. . . TWO HYPOTHESES
\(\mu{UW} > 13\)
Explanation 1:
\(\mu{UW} = 13\)
Explanation 2:
ALTERNATIVE HYPOTHESIS (\(H_a\))
States that there IS a real difference in the population (difference not just due to chance)
NULL
HYPOTHESIS: (\(H_0\))
States that there is NO real difference in the population (sample result happened by chance)
\(\mu{UW} > 13\)
Explanation 1:
\(\mu{UW} = 13\)
Explanation 2:
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
If the probability of observing the sample result is low (i.e., sample results are really inconsistent with the null hypothesis) then we REJECT the null hypothesis.
This would SUPPORT the alternative hypothesis.
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
If, on the other hand, the probability of observing the sample result is too high (i.e., sample results are somewhat consistent with the null hypothesis) then we FAIL TO REJECT the null hypothesis.
This would FAIL TO SUPPORT the alternative hypothesis.
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
\(\mu_{\bar{X}_{UW}} = \mu_{UW} = 13\)
\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{8.0}{\sqrt{100}}\)
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
Think about where our ONE SINGLE sample falls in this distribution.
14.5 is out on this side of the distribution, but how far?
We need to convert our score to a z-score
\(\sigma_{\bar{X}} = 0.8\)
\(z = \frac{\bar{X} - \mu_{0}}{\sigma_{\bar{X}}}\)
\(= \frac{14.5 - 13}{0.8} = 1.875\)
\(\mu_{\bar{X}_{UW}} = \mu_{UW} = 13\)
14.5 is 1.875 standard errors away from what we assume under the null hypothesis
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
What is the probability of randomly selecting a case that is 1.875 standard deviations above the mean of a normal distribution?
We need to find the
probability associated
with a z-score of 1.875
in the standard
normal table
\(\mu_{\bar{X}_{UW}} = \mu_{UW} = 13\)
14.5 is 1.875 standard errors away from what we assume under the null hypothesis
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
What is the probability of randomly selecting a case that is 1.875 standard deviations above the mean of a normal distribution?
The proportion of cases
below z of 1.875
is 0.9696
\(\mu_{\bar{X}_{UW}} = \mu_{UW} = 13\)
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
What is the probability of randomly selecting a case that is 1.875 standard deviations above the mean of a normal distribution?
The proportion of cases
below z of 1.875
is 0.9696
\(\mu_{\bar{X}_{UW}} = \mu_{UW} = 13\)
Probability of observing a case from this part is \(1 -.9696 =.0304\)
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
So, the probability of randomly selecting a sample with a mean as large as 14.5 from a population with a mean of 13 is only 0.0304
The proportion of cases
below z of 1.875
is 0.9696
\(\mu_{\bar{X}_{UW}} = \mu_{UW} = 13\)
Probability of observing a case from this part is \(1 -.9696 =.0304\)
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
So, the probability of randomly selecting a sample with a mean as large as 14.5 from a population with a mean of 13 is only 0.0304
P-VALUE: The probability of observing the sample result if the null hypothesis were actually true.
i.e., the probability that the null hypothesis is true, given our sample results
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
So, the probability of randomly selecting a sample with a mean as large as 14.5 from a population with a mean of 13 is only 0.0304
Since the P-VALUE is small (the sample result is unlikely to have occurred if the null hypothesis were actually true we
REJECT THE NULL HYPOTHESIS
SUPPORT THE ALTERNATIVE HYPOTHESIS
ALTERNATIVE HYPOTHESIS (\(H_a\))
\[\mu{UW} > 13\] (sample result reflects a real difference)
NULL
HYPOTHESIS: (\(H_0\))
\[\mu{UW} = 13\] (sample result just happened by chance)
How likely is it that we would observe a sample mean of 14.5 if, in reality, the population of UW students really don’t study more than 13 hours per week?
SUPPORT THE ALTERNATIVE HYPOTHESIS
P-VALUE: The probability of observing the sample result if the null hypothesis were actually true.
When the P-VALUE is small we:
REJECT THE NULL HYPOTHESIS,
SUPPORT THE ALTERNATIVE HYPOTHESIS
and say that the result is
STATISTICALLY SIGNIFICANT
Question: How small does the P-value have to be before we reject the null hypothesis?
Answer: We decide on that standard before our test by setting the ALPHA LEVEL (\(\alpha\)) for the test
ALPHA \((\alpha)\): The probability threshold at which we are willing to reject the null hypothesis.
\(\text{p-value} \lt \alpha\)
we REJECT THE NULL HYPOTHESIS,
SUPPORT THE ALTERNATIVE HYPOTHESIS
and say that the result is
STATISTICALLY SIGNIFICANT
Standard choices for \(\alpha\):
0.05
0.01
0.001
ALPHA \((\alpha)\): The probability threshold at which we are willing to reject the null hypothesis.
\(\text{p-value} \lt \alpha\)
we REJECT THE NULL HYPOTHESIS,
SUPPORT THE ALTERNATIVE HYPOTHESIS
and say that the result is
STATISTICALLY SIGNIFICANT
Standard choices for \(\alpha\):
0.05
0.01
0.001
Not willing to reject the null hypothesis unless there is less than a 5% chance that the sample result (difference) appeared just because of random sampling error.
Even with these high standards, we never call this proof because another sample may lead to a different decision.
So we NEVER ACCEPT AN HYPOTHESIS!
(can only reject or retain / support or fail to support)
The university tells us that the average student on campus consumes 2 drinks per week with a standard deviation of 1.9. They have asked us to determine whether students living in the Greek system are different from the university average in terms of average number of drinks per week.
Draw a random sample of students from Greek system:
\(n = 150\), Mean = \(2.3\) drinks/week
The university tells us that the average student on campus consumes 2 drinks per week with a standard deviation of 1.9. They have asked us to determine whether students living in the Greek system are different from the university average in terms of average number of drinks per week.
Draw a random sample of students from Greek system:
\(n = 150\), Mean = \(2.3\) drinks/week
NULL HYPOTHESIS:
ALTERNATIVE
HYPOTHESIS:
\[ \mu_{greek} = 2 \]
\[ \mu_{greek} \ne 2 \]
Set up a test assuming that the null \(H_0\) is true and see whether the facts of our sample contradict that assumption.
Stated in terms of unknown population parameters
This is a TWO-SIDED (non-directional) hypothesis
A ONE-SIDED test is one in which we are interested if the unknown population parameter is HIGHER or LOWER than the value assumed under the null hypothesis
Example: Do women have a higher level of emotional intelligence than do men?
Reject null hypothesis if the test result is different enough from the null in the right direction
A TWO-SIDED test is one in which we are interested if the unknown population parameter is just DIFFERENT from the value assumed under the null hypothesis
Example: Are men and women different in terms of emotional intelligence?
Reject null hypothesis if the test result is EITHER much higher OR much lower than what we assume under the null
A ONE-SIDED test is one in which we are interested if the unknown population parameter is HIGHER or LOWER than the value assumed under the null hypothesis
Only one side contains strong results consistent with our research hypothesis
A TWO-SIDED test is one in which we are interested if the unknown population parameter is just DIFFERENT from the value assumed under the null hypothesis
Both of these areas contain strong results consistent with our research hypothesis
A ONE-SIDED test is one in which we are interested if the unknown population parameter is HIGHER or LOWER than the value assumed under the null hypothesis
A TWO-SIDED test is one in which we are interested if the unknown population parameter is just DIFFERENT from the value assumed under the null hypothesis
Same null hypothesis (that there is no difference) contradicts both one- and two-sided alternative hypotheses.
Critical value: The minimum value at which the test statistic would lead you to reject the null hypothesis
Normal
distribution
\(\alpha = 0.05\)
\(H_a: \mu \lt 0\)
Critical value of
\(z = -1.65\)
\(H_a: \mu \ne 0\)
Critical value of
\(z = \pm 1.96\)
The university tells us that the average student on campus consumes 2 drinks per week with a standard deviation of 1.9. They have asked us to determine whether students living in the Greek system are different from the university average in terms of average number of drinks per week.
Draw a random sample of students from Greek system:
\(n = 150\), Mean = \(2.3\) drinks/week
\[ H_0: \mu_{greek} = 2 \]
\[ H_a: \mu_{greek} \ne 2 \]
This is a TWO-SIDED (non-directional) hypothesis
Normal
distribution
\(\alpha = 0.05\)
So our sample result has to be at least 1.96 standard errors away from what is assumed under \(H_0\) for us to reject \(H_0\)
\(H_a: \mu \ne 0\)
Critical value of
\(z = \pm 1.96\)
The university tells us that the average student on campus consumes 2 drinks per week with a standard deviation of 1.9. They have asked us to determine whether students living in the Greek system are different from the university average in terms of average number of drinks per week.
Draw a random sample of students from Greek system:
\(n = 150\), Mean = \(2.3\) drinks/week
\[ H_0: \mu_{greek} = 2 \]
\[ H_a: \mu_{greek} \ne 2 \]
Figuring out how far the sample result is fom \(H_0\) and putting it in standard-error units
\[ \sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{1.9}{\sqrt{150}} = 0.155 \]
\(z\) \(= \frac{\bar{X} - \mu_0}{\sigma_{\bar{X}}}\)
\(\frac{2.3-2}{0.155} =\) \(1.94\)
So, the average number of drinks for the Greeks is \(1.94\) standard errors above what we assume under \(H_0\)
The university tells us that the average student on campus consumes 2 drinks per week with a standard deviation of 1.9. They have asked us to determine whether students living in the Greek system are different from the university average in terms of average number of drinks per week.
Draw a random sample of students from Greek system:
\(n = 150\), Mean = \(2.3\) drinks/week
\[ H_0: \mu_{greek} = 2 \]
\[ H_a: \mu_{greek} \ne 2 \]
Since our test statistic (\(1.94\)) is less extreme than the critical value (\(1.96\)) we
FAIL TO REJECT \(H_0\)
You are interested in knowing whether immigrants (US residents born outside of the country) are different from the US population as a whole in terms of educational attainment. You draw a random sample of 225 adult immigrants and find that their average level of education is 13.3 years. Compare this to the statistics for the population of American adults which has an average education of 12.75 with a standard deviation of 4 years. Use a .05 alpha level to test the statistical significance of the observed difference.
You are interested in knowing whether immigrants (US residents born outside of the country) are different from the US population as a whole in terms of educational attainment. You draw a random sample of 225 adult immigrants and find that their average level of education is 13.3 years. Compare this to the statistics for the population of American adults which has an average education of 12.75 with a standard deviation of 4 years. Use a .05 alpha level to test the statistical significance of the observed difference.
\[ H_0: \mu_{immig} = 12.75 \]
\[ H_a: \mu_{immig} \ne 12.75 \]
Set up a test assuming that the null \(H_0\) is true and see whether the facts of our sample contradict that assumption
You are interested in knowing whether immigrants (US residents born outside of the country) are different from the US population as a whole in terms of educational attainment. You draw a random sample of 225 adult immigrants and find that their average level of education is 13.3 years. Compare this to the statistics for the population of American adults which has an average education of 12.75 with a standard deviation of 4 years. Use a .05 alpha level to test the statistical significance of the observed difference.
\[ H_0: \mu_{immig} = 12.75 \]
\[ H_a: \mu_{immig} \ne 12.75 \]
Normal
distribution
\(\alpha = 0.05\)
So our sample result has to be at least 1.96 standard errors away from what is assumed under \(H_0\) for us to reject \(H_0\)
\(H_a: \mu \ne 0\)
Critical value of
\(z = \pm 1.96\)
You are interested in knowing whether immigrants (US residents born outside of the country) are different from the US population as a whole in terms of educational attainment. You draw a random sample of 225 adult immigrants and find that their average level of education is 13.3 years. Compare this to the statistics for the population of American adults which has an average education of 12.75 with a standard deviation of 4 years. Use a .05 alpha level to test the statistical significance of the observed difference.
\[ H_0: \mu_{immig} = 12.75 \]
\[ H_a: \mu_{immig} \ne 12.75 \]
Figuring out how far the sample result is from \(H_0\) and putting it in standard-error units
\[ \sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{4}{\sqrt{225}} = 0.267 \]
\(z\) \(= \frac{\bar{X} - \mu_0}{\sigma_{\bar{X}}}\)
\(\frac{13.3-12.75}{0.267} =\) \(2.060\)
Mean education for immigrants is 2.06 standard errors above the national average (assumed under \(H_0\))
You are interested in knowing whether immigrants (US residents born outside of the country) are different from the US population as a whole in terms of educational attainment. You draw a random sample of 225 adult immigrants and find that their average level of education is 13.3 years. Compare this to the statistics for the population of American adults which has an average education of 12.75 with a standard deviation of 4 years. Use a .05 alpha level to test the statistical significance of the observed difference.
\[ H_0: \mu_{immig} = 12.75 \]
\[ H_a: \mu_{immig} \ne 12.75 \]
Since our test statistic (2.06) is more extreme than the critical value (1.96) we
REJECT \(H_0\), FIND EVIDENCE TO SUPPORT \(H_A\)
Confidence intervals: Assume our sample
of 100 UW students comes from a population with a standard deviation of 8.0, estimate the population mean…
Hypothesis testing: Say we know that the average study time for the population of college students in the country is 13 hrs/week, with a standard deviation of 8.0. Do UW students really study more than the national average?
These types of examples are UNREALISTIC (or at least really rare) because we rarely know the population standard deviation.
\[ \sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \]
\[ s_{\bar{X}} = \frac{s}{\sqrt{n}} \]
When the population standard deviation is unknown, use the SAMPLE standard deviation to estimate the standard error used in inferences.
Confidence intervals: You draw a sample of 100 UW students and find that the average study time is 14.5 hours/week with a sample standard deviation of 8.25, estimate the population mean.
Hypothesis testing: You draw a sample of 100 UW students and find that the average study time is 14.5 hours/week with a sample standard deviation of 8.25. Test the hypothesis that UW students study more than the national average of 13 hours/week.
PROBLEM: Using sample statistics to make
two different inferences
\(s_x \Rightarrow \sigma_x\)
\(\bar{X} \Rightarrow \mu_x\)
= INCREASED UNCERTAINTY
This is especially problematic when the sample size is small!
Account for the extra uncertainty by assuming that the sampling distribution follows a t-distribution instead of a normal (z) distribution.
Using the t-distribution tends to give more conservative inferences (e.g., wider confidence intervals, more extreme critical values)
Correction is most dramatic when the sample size is small
How different t is from Normal depends on the degrees of freedom
Degrees of freedom (df)
The number of scores that are free to vary in the calculation of a statistic
Q: If a sample of \(n = 5\) has a mean of 3, how many scores are free to vary?
___ + ___ + ___ + ___ + ___ = 15
2
1
7
2
A: \(n-1=4\)
Once you know the first four numbers, there is only one value that the fifth could take to produce a mean of \(3\).
df for calculation of a mean = \(n-1\)
t-distribution | critical value |
---|---|
df = 3 | \(\pm3.182\) |
df = 30 | \(\pm2.042\) |
df = 100 | \(\pm1.984\) |
df = 1000 | \(\pm1.962\) |
Notice: Tests with the t-distribution are more conservative (wider confidence intervals, harder to reject \(H_0\)) than with normal (critical value: \(\pm1.960\)) and results are especially conservative when the sample size (df) is small.
You draw a sample of 100 UW students and find that the average study time is 14.5 hours/week with a sample standard deviation of 8.25. Estimate the population mean.
\[ \text{Confidence interval} = \]
\[ \bar{X} \pm t(\frac{s_x}{\sqrt{n}}) \]
\[ \text{95% C.I.} = 14.5 \pm 1.984(0.825) \] \[ \text{ } = 14.5 \pm 1.6368 \]
\[ s_\bar{X} = \frac{s_x}{\sqrt{n}} = \frac{8.25}{\sqrt{100}} = 0.825 \]
\(df = 100 - 1 = 99\)
We are 95% confident that the average number of hours studied for the population of UW students is between 12.86 to 16.14 hours
You draw a sample of 100 UW students and find that the average study time is 14.5 hours/week with a sample standard deviation of 8.25. Test the hypothesis that UW students study more than the national average of 13 hours/week.
\(H_0: \mu_{UW} = 13\) \(H_a: \mu_{UW} \gt 13\)
Critical value of t?
Population standard deviation is unknown
Use the sample standard deviation to estimate the standard error
\[ s_\bar{X} = \frac{s_x}{\sqrt{n}} = \frac{8.25}{\sqrt{100}} = 0.825 \]
\(df = 100 - 1 = 99\)
You draw a sample of 100 UW students and find that the average study time is 14.5 hours/week with a sample standard deviation of 8.25. Test the hypothesis that UW students study more than the national average of 13 hours/week.
\(H_0: \mu_{UW} = 13\) \(H_a: \mu_{UW} \gt 13\)
Critical value of t = 1.660
Test statistic from the sample:
\(t = \frac{\bar{X}-\mu_0}{s_\bar{X}} = \frac{14.5 - 13}{0.825} =\) \(1.818\)
Population standard deviation is unknown
Use the sample standard deviation to estimate the standard error
\[ s_\bar{X} = \frac{s_x}{\sqrt{n}} = \frac{8.25}{\sqrt{100}} = 0.825 \]
\(df = 100 - 1 = 99\)
Since obtained t is more extreme than critical value, reject \(H_0\) and support the claim that the population of UW students really do study more than the national average.
In a random sample of 121 UW students, the mean number of visits to family in the past six months was 5.5 with a standard deviation of 0.9. Calculate a 90% confidence interval for the average number of visits in the population.
STEPS:
You want to know whether high school students are different than the national average (µ=220) in terms of the number of Facebook friends. A random sample of 30 high school students shows a mean number of Facebook friends of 245 with a standard deviation of 90. Is this difference from the national average statistically significant at the .01 level?
STEPS:
1a. State the null and alternative hypotheses
1b. Choose your alpha level and find the critical value
2. Calculate the test statistic (t-score) and p-value
3. Make a decision (Reject or fail to reject the null hypothesis (\(H_0\)) and support or fail to support the alternative hypothesis \(H_a\))
In a random sample of 121 UW students, the mean number of visits to family in the past six months was 5.5 with a standard deviation of 0.9. Calculate a 90% confidence interval for the average number of visits in the population.
\[ \text{Confidence interval} = \]
\[ \bar{X} \pm t(\frac{s_x}{\sqrt{n}}) \]
\[ \text{90% C.I.} = 5.5 \pm 1.66(0.082) \] \[ \text{ } = 5.5 \pm 0.136 \]
\[ s_\bar{X} = \frac{s_x}{\sqrt{n}} = \frac{0.9}{\sqrt{121}} = 0.082 \]
\(df = 121 - 1 = 120\)
We are 90% confident that the average number of visits to family for the population of UW students is between 5.364 to 5.636 hours
You want to know whether high school students are different than the national average (µ=220) in terms of the number of Facebook friends. A random sample of 30 high school students shows a mean number of Facebook friends of 245 with a standard deviation of 90. Is this difference from the national average statistically significant at the .01 level?
\(H_0: \mu_{HS} = 220\) \(H_a: \mu_{HS} \ne 220\)
Critical value of t?
Population standard deviation is unknown
Use the sample standard deviation to estimate the standard error
\[ s_\bar{X} = \frac{s_x}{\sqrt{n}} = \frac{90}{\sqrt{30}} = 16.432 \]
\(df = 30 - 1 = 29\)
You want to know whether high school students are different than the national average (µ=220) in terms of the number of Facebook friends. A random sample of 30 high school students shows a mean number of Facebook friends of 245 with a standard deviation of 90. Is this difference from the national average statistically significant at the .01 level?
\(H_0: \mu_{HS} = 220\) \(H_a: \mu_{HS} \ne 220\)
Critical value of t = 2.756
Test statistic from the sample:
\(t = \frac{\bar{X}-\mu_0}{s_\bar{X}} = \frac{245 - 220}{16.432} =\) \(1.521\)
Population standard deviation is unknown
Use the sample standard deviation to estimate the standard error
\[ s_\bar{X} = \frac{s_x}{\sqrt{n}} = \frac{90}{\sqrt{30}} = 16.432 \]
\(df = 30 - 1 = 29\)
Since obtained t is NOT more extreme than critical value, we FAIL TO REJECT \(H_0\) that the average number of FB friends in the high school population is the same as the national average.
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} = \mu_{WSU}\)
\(H_a: \mu_{UW} \ne \mu_{WSU}\)
\(H_0: \mu_{UW} - \mu_{WSU} = 0\)
\(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
Observed difference between sample means reflects chance sampling error.
Observed difference b/w sample means reflects a real difference b/w population means
Question just asks about any difference
\(H_a: \mu_{1} \ne \mu_{2}\)
↓
\(H_a: \mu_{1} - \mu_{2} \ne 0\)
Question implies a direction of difference
\(H_a: \mu_{1} \gt \mu_{2}\)
↓
\(H_a: \mu_{1} - \mu_{2} \gt 0\)
\(H_a: \mu_{1} \lt \mu_{2}\)
↓
\(H_a: \mu_{1} - \mu_{2} \lt 0\)
Can restate any in terms of differences between population means
Note: all of these hypothesize a real difference between population means
All are contradicted by the same null hypothesis
\(H_0: \mu_{1} = \mu_{2}\) → \(H_0: \mu_{1} - \mu_{2} = 0\)
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} - \mu_{WSU} = 0\) \(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
Sampling Distribution for the difference between means
A theoretical probability distribution that would be obtained by calculating all of the possible mean differences (\(\bar{X_1}-\bar{X_2}\)) for all possible pairs of random, independent samples of size \(n_1\) and \(n_2\) drawn from two populations.
The DIFFERENCES between sample means drawn from every possible PAIR of samples create a SAMPLING DISTRIBUTION OF DIFFERENCES BETWEEN (SAMPLE) MEANS
Characteristics of the Sampling Distribution for Difference between Two Means:
\(\mu_{\bar{X_1} - \bar{X_2}} = \mu_1 - \mu_2\)
\(\sigma_{\bar{X_1} - \bar{X_2}} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\) → \(s_{\bar{X_1} - \bar{X_2}} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)
INFERENCE
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} - \mu_{WSU} = 0\) \(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
\(\alpha = 0.05\)
Two-sided test
Because population standard deviations are unknown, use t-distribution1
\(df_1= n_1-1 = 100-1 = 99\) \(df_2= n_2-1 = 81-1 = 80\)
←
Use the smaller of the two df’s to create a more conservative test!
Our sample result has to be at least 1.99 standard errors away from the null hypothesis for us to reject the null hypothesis
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} - \mu_{WSU} = 0\) \(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
Figuring out how far the sample result is from \(H_0\)
and putting it in standard-error units
\(t = \frac{(\bar{X_1}-\bar{X_2})-0}{s_{\bar{X_1}}-s_{\bar{X_2}}}\)
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} - \mu_{WSU} = 0\) \(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
Observed sample result
(difference between means)
\(t = \frac{(\bar{X_1}-\bar{X_2})-0}{s_{\bar{X_1}}-s_{\bar{X_2}}}\)
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} - \mu_{WSU} = 0\) \(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
Difference observed under
the null hypotheses
\(t = \frac{(\bar{X_1}-\bar{X_2})-0}{s_{\bar{X_1}}-s_{\bar{X_2}}}\)
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} - \mu_{WSU} = 0\) \(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
Standard error
(s.d. of the sampling distribution)
\(t = \frac{(\bar{X_1}-\bar{X_2})-0}{s_{\bar{X_1}}-s_{\bar{X_2}}}\)
\(s_{\bar{X_1} - \bar{X_2}} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} = \sqrt{\frac{8.25^2}{100} + \frac{7.0^2}{81}} = \sqrt{1.285588} = 1.134\)
\(t = \frac{(14 - 12.5)-0}{1.134} = 1.764\)
Observed sample difference is 1.764 standard errors away from the assumption of the null hypothesis
We want to know if UW students differ from WSU students in terms of average study time. We draw a random sample of 100 UW students and find an average study time of 14.5 hours per week with a standard deviation of 8.25. In contrast, our random sample of 81 WSU students has an average study time of 12.5 hours per week with a standard deviation of 7.0. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{UW} - \mu_{WSU} = 0\) \(H_a: \mu_{UW} - \mu_{WSU} \ne 0\)
\(t = \frac{(14 - 12.5)-0}{1.134} = 1.764\)
We want to know if high school students have more Facebook (FB) friends than do senior citizens. A random sample of 30 high school students shows a mean number of FB friends of 245 with a standard deviation of 90. In contrast, a random sample of 40 senior citizens (age 65+) has a mean of 185 FB friends with a standard deviation of 65. Use a .05 alpha level to test the statistical significance of the observed difference between means.
We want to know if high school students have more Facebook (FB) friends than do senior citizens. A random sample of 30 high school students shows a mean number of FB friends of 245 with a standard deviation of 90. In contrast, a random sample of 40 senior citizens (age 65+) has a mean of 185 FB friends with a standard deviation of 65. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{HS} - \mu_{SC} = 0\) \(H_a: \mu_{HS} - \mu_{SC} \gt 0\)
Critical value of t?
Population standard deviation is unknown
Use the sample standard deviation to estimate the standard error
\[ s_{\bar{X_1} - \bar{X_2}} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} = \sqrt{\frac{90^2}{30} + \frac{65^2}{40}} = \sqrt{270 + 105.625} = 19.381 \]
\(df = 30 - 1 = 29\)
We want to know if high school students have more Facebook (FB) friends than do senior citizens. A random sample of 30 high school students shows a mean number of FB friends of 245 with a standard deviation of 90. In contrast, a random sample of 40 senior citizens (age 65+) has a mean of 185 FB friends with a standard deviation of 65. Use a .05 alpha level to test the statistical significance of the observed difference between means.
\(H_0: \mu_{HS} - \mu_{SC} = 0\) \(H_a: \mu_{HS} - \mu_{SC} \gt 0\)
Critical value of t = 1.699
Test statistic from the sample:
\(t = \frac{\bar{X}-\mu_0}{s_\bar{X}} = \frac{(245-185)-0}{19.381} =\)
\(3.096\)
Population standard deviation is unknown
Use the sample standard deviation to estimate the standard error
\[ s_{\bar{X_1} - \bar{X_2}} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} = \sqrt{\frac{90^2}{30} + \frac{65^2}{40}} = \sqrt{270 + 105.625} = 19.381 \]
\(df = 30 - 1 = 29\)
Since obtained t is MORE extreme than the critical value, we REJECT \(H_0\). Based on this evidence, we are reasonably confidence that the population of high schoolers have a higher average number of FB friends than do senior citizens. The observed difference between sample means is statistically significant.