Frequency distribution of attitudes towards the death penalty | ||
---|---|---|
Black and White respondents in GSS 2000 | ||
f | Percent | |
Support Death Penalty |
719 |
80.16 |
Oppose Death Penalty |
178 |
19.84 |
TOTAL |
897 |
100 |
SOC 221 • Lecture 8
Monday, July 22, 2024
Have used one sample to draw inferences about one population
Have drawn inferences about the difference between two populations
New goal: test statistical significance for associations in two-way tables
Association
Two variables are “associated” with each other when variation in one variable corresponds with variation in the other variable.
Synonym: relationship
Independent Variable
An independent variable is assumed to influence a dependent variable. The assumed “cause” in an association
Dependent Variable
A dependent variable is affected by one or more other variables. The assumed “effect” in an association
Generally avoid using language of “cause” and “effect” since establishing a case for causality is always difficult and rarely certain.
Say you were presented with the following two tables. Can you tell whether there is an association between race and support for the death penalty?
Frequency distribution of attitudes towards the death penalty | ||
---|---|---|
Black and White respondents in GSS 2000 | ||
f | Percent | |
Support Death Penalty |
719 |
80.16 |
Oppose Death Penalty |
178 |
19.84 |
TOTAL |
897 |
100 |
Frequency distribution of race | ||
---|---|---|
Black and White respondents in GSS 2000 | ||
f | Percent | |
Black |
104 |
11.59 |
White |
793 |
88.41 |
TOTAL |
897 |
100 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
57.69% = (60/104) * (100)
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Can detect the ASSOCIATION between variables by comparing conditional distributions…
STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?
DIRECTION
Is the association positive or negative?
STATISTICAL SIGNIFICANCE
How certain can we be that the association exists in the population?
STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?
STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?
Conditional distributions are completely dissimilar (maximum difference in column %s across values of the IV)
Two-way table of vote on Affordable Care Act (ACA) by political party | |||
---|---|---|---|
US Senators in 2009 | |||
Democrat | Republican | TOTAL | |
Voted for ACA |
60 |
0 |
60 |
Voted against ACA |
0 |
39 |
39 |
TOTAL |
60 |
39 |
99 |
No conditional variation (all cases with a particular IV value have the same DV value)
STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?
Conditional distributions are exactly the same (no difference in column %s across values of the IV; all match the marginal %s)
Two-way table of transportation to work by gender | |||
---|---|---|---|
Alltech Corp. workers 2014 | |||
Female | Male | TOTAL | |
Drive |
100 |
150 |
250 |
Public Transportation |
30 |
45 |
75 |
Walk/bike |
10 |
15 |
25 |
TOTAL |
140 |
210 |
350 |
High conditional variation (lots of different values of DV for cases with same IV value)
Two-way table: Number of Crimes Committed by Education | |||
---|---|---|---|
Sample of parolees from Florida Prisons | |||
Low Education | High Education | TOTAL | |
0 Crimes |
80 |
130 |
210 |
1 Crime |
24 |
15 |
39 |
2+ Crimes |
56 |
5 |
61 |
TOTAL |
160 |
150 |
310 |
YES: Conditional distributions are different
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
How STRONG is the association?
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
How STRONG is the association?
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
RISK RATIO (a.k.a. relative risk)
The ratio of the probability of some outcome among one group to the probability of the outcome among a different group.
How STRONG is the association?
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
For black respondents:
\(P(SUPPORT) = 0.5769\)
For white respondents:
\(P(SUPPORT) = 0.8310\)
RISK RATIO = \((0.8310)\) \(/\) \((0.5769)\) \(= 1.44\)
The probability of supporting the death penalty is 1.44 times greater for white than for black respondents
STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?
DIRECTION
Is the association positive or negative?
Two-way table: Number of Crimes Committed by Education | |||
---|---|---|---|
Sample of parolees from Florida Prisons | |||
Low Education | High Education | TOTAL | |
0 Crimes |
80 |
130 |
210 |
1 Crime |
24 |
15 |
39 |
2+ Crimes |
56 |
5 |
61 |
TOTAL |
160 |
150 |
310 |
Negative:
Higher education associated with lower number of crimes
What is the direction of this association?
What is the direction of this association?
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Not relevant since these are nominal variables (no higher or lower values)
STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?
DIRECTION
Is the association positive or negative?
STATISTICAL SIGNIFICANCE
How certain can we be that the association exists in the population?
There is an association in this sample
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Key question: Is this association in the sample strong enough to convince us that there is a real association in the POPULATION from which the sample was drawn?
Used to test statistical significance of associations in a two-way table (so, between categorical variables)
Intended to test whether a pattern or association observed in a set of sample data:
Based on a comparison of our observed frequencies to expected frequencies.
What would the table look like if there were no association between the variables?
Two-way table of attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
60 |
659 |
719 |
Oppose Death Penalty |
44 |
134 |
178 |
TOTAL |
104 |
793 |
897 |
Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables
What would the table look like if there were no association between the variables?
Conditional distributions of the DV would match across the values of the IV (same as marginals)
EXPECTED FREQUENCIES for attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
|
|
719 |
Oppose Death Penalty |
|
|
178 |
TOTAL |
104 |
793 |
897 |
Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables
What would the table look like if there were no association between the variables?
Conditional distributions of the DV would match across the values of the IV (same as marginals)
EXPECTED FREQUENCIES for attitudes towards the death penalty by race | |||
---|---|---|---|
Black and White respondents in GSS 2000 | |||
Black | White | TOTAL | |
Support Death Penalty |
83.36 |
635.64 |
719 |
Oppose Death Penalty |
20.64 |
157.36 |
178 |
TOTAL |
104 |
793 |
897 |
Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables
\[
\chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e}
\] where
\(f_o =\) \(observed\) \(\text{frequency in a given}\) cell
\(f_e =\) \(\text{frequency in the}\) cell expected \(\text{under the assumption of the null hypothesis}\)
\(\text{(no association in the population)}\)
\[ f_e = \frac{\text{(row marginal)(column marginal)}}{n} \]
Note: Chi-square will take a value of 0 if there is no association in the sample.
OBSERVED and EXPECTED Frequencies | |||
---|---|---|---|
Black | White | TOTAL | |
Support Death Penalty |
\(f_0 = 60\) |
\(f_0 = 659\) |
719 |
Oppose Death Penalty |
\(f_0 = 44\) |
\(f_0 = 134\) |
178 |
TOTAL |
104 |
793 |
897 |
\(f_0\) | \(f_e\) | \((f_0 - f_e)\) | \((f_0 - f_e)^2\) | \(\frac{(f_0 - f_e)^2}{f_e}\) | |
---|---|---|---|---|---|
Cell #1 | 60 | 83.36 | -23.36 | 545.69 | 6.55 |
Cell #2 | 659 | 635.64 | 23.36 | 545.69 | 0.86 |
Cell #3 | 44 | 20.64 | 23.36 | 545.69 | 26.44 |
Cell #4 | 134 | 157.36 | -23.36 | 545.69 | 3.47 |
\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]
OBSERVED and EXPECTED Frequencies | |||
---|---|---|---|
Black | White | TOTAL | |
Support Death Penalty |
\(f_0 = 60\) |
\(f_0 = 659\) |
719 |
Oppose Death Penalty |
\(f_0 = 44\) |
\(f_0 = 134\) |
178 |
TOTAL |
104 |
793 |
897 |
\(f_0\) | \(f_e\) | \((f_0 - f_e)\) | \((f_0 - f_e)^2\) | \(\frac{(f_0 - f_e)^2}{f_e}\) | |
---|---|---|---|---|---|
Cell #1 | 60 | 83.36 | -23.36 | 545.69 | 6.55 |
Cell #2 | 659 | 635.64 | 23.36 | 545.69 | 0.86 |
Cell #3 | 44 | 20.64 | 23.36 | 545.69 | 26.44 |
Cell #4 | 134 | 157.36 | -23.36 | 545.69 | 3.47 |
\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]
\(\Sigma\) \(= 37.31\)
Chi-square score summarizes the difference between what we observe in the sample and what would expect to observe if there was no association between the variables.
Question: Is that difference big enough to convince us that it did not just happen by chance (sampling error)?
Need a hypothesis test
OBSERVED and EXPECTED Frequencies | |||
---|---|---|---|
Black | White | TOTAL | |
Support Death Penalty |
\(f_0 = 60\) |
\(f_0 = 659\) |
719 |
Oppose Death Penalty |
\(f_0 = 44\) |
\(f_0 = 134\) |
178 |
TOTAL |
104 |
793 |
897 |
\(H_0\): No association between race and attitudes toward death penalty in the population
\(H_a\): A real association between race and attitudes toward death penalty in the population
The default alpha level is \(0.05\) (\(0.01\) and \(0.001\) are tougher alternatives)
\(df\) \(= (R-1)(C-1) = (2-1)(2-1) =\) \(1\)
Now we can go to the chi-square distribution table to see what critical value is associated with an alpha level of 0.05 and 1 degree of freedom
Our critical value is \(3.84\)
OBSERVED and EXPECTED Frequencies | |||
---|---|---|---|
Black | White | TOTAL | |
Support Death Penalty |
\(f_0 = 60\) |
\(f_0 = 659\) |
719 |
Oppose Death Penalty |
\(f_0 = 44\) |
\(f_0 = 134\) |
178 |
TOTAL |
104 |
793 |
897 |
\(f_0\) | \(f_e\) | \((f_0 - f_e)\) | \((f_0 - f_e)^2\) | \(\frac{(f_0 - f_e)^2}{f_e}\) | |
---|---|---|---|---|---|
Cell #1 | 60 | 83.36 | -23.36 | 545.69 | 6.55 |
Cell #2 | 659 | 635.64 | 23.36 | 545.69 | 0.86 |
Cell #3 | 44 | 20.64 | 23.36 | 545.69 | 26.44 |
Cell #4 | 134 | 157.36 | -23.36 | 545.69 | 3.47 |
\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]
\(\Sigma\) \(= 37.31\)
Since the obtained chi-square (\(37.31\)) is greater than the critical value (\(3.84\)), I can reject the null hypothesis
This supports the research hypothesis that there IS a real association between race and attitudes toward the death penalty IN THE POPULATION
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
|
80 |
Indifferrent |
36 |
|
111 |
Likes |
|
184 |
|
TOTAL |
100 |
275 |
375 |
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
Fill in the missing observed frequencies
(note that once two cells are completed (and you have the marginals) you can complete the table)
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
Add column percentages to better understand conditional distributions
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
Difference in conditional distributions indicate that there IS an association in the sample
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?
\(\frac{0.53}{0.39} = 1.\)\(36\)
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
Use risk ratios to quantify the association. For example, the probability of liking Thanksgiving is 36% higher for carnivores than for vegetarians
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Is this association statistically significant?
\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
\(H_a\): IN THE POPULATION, there is an association
between diet and attitudes towards Thanksgiving.
\(H_0\): There is no association between diet and
attitudes towards Thanksgiving IN THE POPULATION.
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Is this association statistically significant?
\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
Find EXPECTED FREQUENCIES:
\(f_e = \frac{\text{(row marginal)(column marginal)}}{n}\)
For example: \(f_e\) for Cell 5 = \(\frac{(184)(100)}{375} =\) \(49.07\)
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Is this association statistically significant?
\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Is this association statistically significant?
\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]
\(f_0\) | \(f_e\) | \((f_0 - f_e)\) | \((f_0 - f_e)^2\) | \(\frac{(f_0 - f_e)^2}{f_e}\) | |
---|---|---|---|---|---|
Cell 1 | 25 | 21.33 | 3.67 | 13.44 | 0.63 |
Cell 2 | 55 | 58.67 | -3.67 | 13.44 | 0.23 |
Cell 3 | 36 | 29.60 | 6.40 | 40.96 | 1.38 |
Cell 4 | 75 | 81.40 | -6.40 | 40.96 | 0.50 |
Cell 5 | 39 | 49.07 | -10.07 | 101.34 | 2.07 |
Cell 6 | 145 | 134.93 | 10.07 | 101.34 | 0.75 |
Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Is this association statistically significant?
\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]
\(f_0\) | \(f_e\) | \((f_0 - f_e)\) | \((f_0 - f_e)^2\) | \(\frac{(f_0 - f_e)^2}{f_e}\) | |
---|---|---|---|---|---|
Cell 1 | 25 | 21.33 | 3.67 | 13.44 | 0.63 |
Cell 2 | 55 | 58.67 | -3.67 | 13.44 | 0.23 |
Cell 3 | 36 | 29.60 | 6.40 | 40.96 | 1.38 |
Cell 4 | 75 | 81.40 | -6.40 | 40.96 | 0.50 |
Cell 5 | 39 | 49.07 | -10.07 | 101.34 | 2.07 |
Cell 6 | 145 | 134.93 | 10.07 | 101.34 | 0.75 |
\(\chi^2 =\) | 5.56 |
Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
\[ \text{obtained } \chi^2 = 5.56 \]
\(\text{critical value of }\)
\(\chi^2 = 5.99\)
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
Since obtained value of chi-square is LESS EXTREME than the critical value we FAIL TO REJECT THE NULL HYPOTHESIS. The association observed is NOT statistically significant. Cannot be confident that the association exists in the population.
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
25 |
55 |
80 |
Indifferrent |
36 |
75 |
111 |
Likes |
39 |
145 |
184 |
TOTAL |
100 |
275 |
375 |
What happens if the sample is doubled, with the same conditional distributions?
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Since the conditional distributions are the same, and do not match, there still appears to be an association IN THE SAMPLE.
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
50 |
110 |
160 |
Indifferrent |
72 |
150 |
222 |
Likes |
78 |
290 |
368 |
TOTAL |
200 |
550 |
750 |
What happens if the sample is doubled, with the same conditional distributions?
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
Since the conditional distributions are the same, and do not match, there still appears to be an association IN THE SAMPLE.
Feelings about Thanksgiving | |||
---|---|---|---|
Vegetarian | Carnivore | TOTAL | |
Dislikes |
50 |
110 |
160 |
Indifferrent |
72 |
150 |
222 |
Likes |
78 |
290 |
368 |
TOTAL |
200 |
550 |
750 |
What happens if the sample is doubled, with the same conditional distributions?
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
\(f_0\) | \(f_e\) | \((f_0 - f_e)\) | \((f_0 - f_e)^2\) | \(\frac{(f_0 - f_e)^2}{f_e}\) | |
---|---|---|---|---|---|
Cell 1 | 50 | 42.67 | 7.33 | 53.73 | 1.26 |
Cell 2 | 110 | 117.33 | -7.33 | 53.73 | 0.46 |
Cell 3 | 72 | 59.2 | 12.8 | 163.84 | 2.77 |
Cell 4 | 150 | 162.8 | -12.8 | 163.84 | 1.01 |
Cell 5 | 78 | 98.13 | -20.13 | 405.22 | 4.13 |
Cell 6 | 290 | 269.87 | 20.13 | 405.22 | 1.5 |
What happens if the sample is doubled, with the same conditional distributions?
You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.
\[ \text{obtained } \chi^2 = 11.13 \]
\(\text{critical value of }\)
\(\chi^2 = 5.99\)
\(f_0\) | \(f_e\) | \((f_0 - f_e)\) | \((f_0 - f_e)^2\) | \(\frac{(f_0 - f_e)^2}{f_e}\) | |
---|---|---|---|---|---|
Cell 1 | 50 | 42.67 | 7.33 | 53.73 | 1.26 |
Cell 2 | 110 | 117.33 | -7.33 | 53.73 | 0.46 |
Cell 3 | 72 | 59.2 | 12.8 | 163.84 | 2.77 |
Cell 4 | 150 | 162.8 | -12.8 | 163.84 | 1.01 |
Cell 5 | 78 | 98.13 | -20.13 | 405.22 | 4.13 |
Cell 6 | 290 | 269.87 | 20.13 | 405.22 | 1.5 |
\(\chi^2 =\) | 11.13 |
The obtained value of the chi-square goes way up. Now, the obtained value of chi-square \(\gt\) critical value of chi-square. The association observed in the sample IS statistically significant. We REJECT THE NULL HYPOTHESIS and find SUPPORT FOR THE ALTERNATIVE HYPOTHESIS that there is an association in the population.
What happens if the sample is doubled, with the same conditional distributions?
FIRST step in assessing arguments that one variable (independent variable) has a causal impact on another (dependent variable)
We’ve looked for associations between nominal and ordinal variables in bivariate tables
Now we want to measure association for interval variables
STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?
DIRECTION
Is the association positive or negative?
STATISTICAL SIGNIFICANCE
How certain can we be that the association exists in the population?
Scatterplot: A graph that uses points to simultaneously display the value on two variables for each case in the data
Allows us to picture the association between variables
Example: Association between hiker weight and weight of backpack carried
body | backpack |
---|---|
120 | 26 |
187 | 30 |
109 | 26 |
103 | 24 |
131 | 29 |
165 | 35 |
158 | 31 |
116 | 28 |
Scatterplot: A graph that uses points to simultaneously display the value on two variables for each case in the data
Allows us to picture the association between variables
Example: Association between hiker weight and weight of backpack carried
body | backpack |
---|---|
120 | 26 |
187 | 30 |
109 | 26 |
103 | 24 |
131 | 29 |
165 | 35 |
158 | 31 |
116 | 28 |
X-axis (horizontal) displays all values in the IV
Y-axis (vertical) displays all values on the DV
Each dot represents a case, positioned along the X and Y axes
Scatterplot: A graph that uses points to simultaneously display the value on two variables for each case in the data
Allows us to picture the association between variables
Example: Association between hiker weight and weight of backpack carried
body | backpack |
---|---|
120 | 26 |
187 | 30 |
109 | 26 |
103 | 24 |
131 | 29 |
165 | 35 |
158 | 31 |
116 | 28 |
Can see the (positive) association
(high values on one variable tend to go with high values on the other)
The two most common tool for measuring
associations between interval variables
CORRELATION
standardized summary of association between our variables
REGRESSION
characterizes the substantive effect of X on Y
Both correlation and regression are based on a description of a line used to characterize data points in a scatterplot
Correlation Coefficient (r)
A measure of association reflecting both the strength and the direction of the association between two interval-level variables
Why it’s helpful
Correlation Coefficient (r)
A measure of association reflecting both the strength and the direction of the association between two interval-level variables
Why you need to be careful
Correlation Coefficient (r)
A measure of association reflecting both the strength and the direction of the association between two interval-level variables
Interpretation
Correlation Coefficient (r)
A measure of association reflecting both the strength and the direction of the association between two interval-level variables
Interpretation
What type of correlations are the following?
What type of correlations are the following?
Stronger associations = more tightly clustered points
Weak associations have lots of conditional variation and not much difference in conditional distributions (distribution of the DV across values of the IV)
Strong associations have very little conditional variation and lots of difference in conditional distributions (distribution of the DV across values of the IV)
Hours studied | Dates |
---|---|
10 | 1 |
15 | 2 |
10 | 5 |
6 | 1 |
2 | 0 |
7 | 3 |
10 | 4 |
12 | 3 |
7 | 1 |
20 | 3 |
Hours studied | Dates |
---|---|
10 | 1 |
15 | 2 |
10 | 5 |
6 | 1 |
2 | 0 |
7 | 3 |
10 | 4 |
12 | 3 |
7 | 1 |
20 | 3 |
Hours studied | Dates |
---|---|
10 | 1 |
15 | 2 |
10 | 5 |
6 | 1 |
2 | 0 |
7 | 3 |
10 | 4 |
12 | 3 |
7 | 1 |
20 | 3 |
Hours studied | Dates |
---|---|
10 | 1 |
15 | 2 |
10 | 5 |
6 | 1 |
2 | 0 |
7 | 3 |
10 | 4 |
12 | 3 |
7 | 1 |
20 | 3 |
Can see the association
Correlation coefficient allows us to quantify/summarize that association
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Value of x variable
for an individual
Mean of
x variable
Standard deviation
of x variable
Value of y variable
for an individual
Mean of y variable
Standard deviation
of y variable
Add up for all
individuals
Divide the whole
mess by n-1
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Just the standardized score on x (i.e. how many standard deviations the individual’s value on x is from the mean of x)
Just the standardized score on y (i.e. how many standard deviations the individual’s value on y is from the mean of y)
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Calculate means and standard
deviations for both variables
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Get the deviation of the x score
from the mean of x
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Get the standardized score of x
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Get the deviation of the y score
from the mean of y
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Get the standardized score of y
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Calculate the product of
standardized scores of x and y
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Do the same thing for every case
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Take the sum of the products of
standardized x and y values
Association between hours studied and number of dates
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Hours studied (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 10 | 1 | 0.10 | 0.02 | -1.30 | -0.83 | -0.02 |
2 | 15 | 2 | 5.10 | 1.02 | -0.30 | -0.19 | -0.19 |
3 | 10 | 5 | 0.10 | 0.02 | 2.70 | 1.72 | 0.03 |
4 | 6 | 1 | -3.90 | -0.78 | -1.30 | -0.83 | 0.64 |
5 | 2 | 0 | -7.90 | -1.57 | -2.30 | -1.47 | 2.31 |
6 | 7 | 3 | -2.90 | -0.58 | 0.70 | 0.45 | -0.26 |
7 | 10 | 4 | 0.10 | 0.02 | 1.70 | 1.08 | 0.02 |
8 | 12 | 3 | 2.10 | 0.42 | 0.70 | 0.45 | 0.19 |
9 | 7 | 1 | -2.90 | -0.58 | -1.30 | -0.83 | 0.48 |
10 | 20 | 3 | 10.10 | 2.01 | 0.70 | 0.45 | 0.90 |
sum | 99.00 | 23.00 | 4.11 | ||||
mean | 9.90 | 2.30 | |||||
st. dev. | 5.02 | 1.57 |
Divide by n-1
\(r = \frac{4.11}{10-1} = 0.456\)
Hours studied | Dates |
---|---|
10 | 1 |
15 | 2 |
10 | 5 |
6 | 1 |
2 | 0 |
7 | 3 |
10 | 4 |
12 | 3 |
7 | 1 |
20 | 3 |
Description of the association?
Correlation \(r = 0.456\)
Positive, moderate association
Hours studied | Dates |
---|---|
10 | 1 |
15 | 2 |
10 | 5 |
6 | 1 |
2 | 0 |
7 | 3 |
10 | 4 |
12 | 3 |
7 | 1 |
20 | 3 |
Description of the association?
Correlation \(r = 0.456\)
Positive, moderate association
This is the DIRECTION and STRENGTH of
the association in the SAMPLE
Want to know whether there is an association
in the POPULATION (i.e., whether the
observed association is statistically significant)
Need a hypothesis test…
GOAL: We want to know if the association seen in the sample (as revealed by \(r\)) reflects
a real association between the two variables in the population
OR
chance sampling error when in reality the two variables are
not associated in the population
Similar error of prediction (similar spread around the line) at all values of X
Association between hours studied and number of dates
\(r = 0.456\)
Positive, moderate association in the sample
Association between hours studied and number of dates
\(r = 0.456\)
Positive, moderate association in the sample
\(H_0: \rho = 0\)
\(H_1: \rho \ne 0\)
Use alpha = 0.05 by default
Use table for critical values
\(r \text{ observed} = 0.456\)
Since \(r \text{ observed (0.456)}\) < \(r\text{ critical (0.6319)}\),
we FAIL TO REJECT \(H_0\)
2-sided
(direction typically not specified)
We cannot say that there is an association between hours studied and number of dates in the population of students
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Example 1
Association between hours
on TikTok and # of dates?
Example 2
Association between hours playing
video games and # of dates?
Mean | s | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Hours of playing video games |
10 | 5 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 2.00 | 3.50 |
Hours on TikTok |
0 | 3 | 5 | 15 | 14 | 25 | 1 | 1 | 5 | 22 | 9.10 | 9.21 |
# of dates |
1 | 2 | 5 | 1 | 0 | 3 | 4 | 3 | 1 | 3 | 2.30 | 1.57 |
Example 1
Association between hours
on TikTok and # of dates?
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Tiktok hours (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 0 | 1 | -9.10 | -0.99 | -1.30 | -0.83 | 0.82 |
2 | 3 | 2 | -6.10 | -0.66 | -0.30 | -0.19 | 0.13 |
3 | 5 | 5 | -4.10 | -0.45 | 2.70 | 1.72 | -0.77 |
4 | 15 | 1 | 5.90 | 0.64 | -1.30 | -0.83 | -0.53 |
5 | 14 | 0 | 4.90 | 0.53 | -2.30 | -1.47 | -0.78 |
6 | 25 | 3 | 15.90 | 1.73 | 0.70 | 0.45 | 0.77 |
7 | 1 | 4 | -8.10 | -0.88 | 1.70 | 1.08 | -0.95 |
8 | 1 | 3 | -8.10 | -0.88 | 1.70 | 0.45 | -0.39 |
9 | 5 | 1 | -4.10 | -0.45 | -1.30 | -0.83 | 0.37 |
10 | 22 | 3 | 12.90 | 1.40 | 0.70 | 0.45 | 0.63 |
sum | 91.00 | 23.00 | -0.71 | ||||
mean | 9.10 | 2.30 | |||||
st. dev. | 9.21 | 1.57 |
\(r = \frac{-0.71}{10-1} =\) \(-0.079\)
Interpretation?
Weak negative association between hours on TikTok and number of dates
Statistical significance?
Since absolute value of r-obtained (0.076) is less extreme than the critical value of r (0.6319), we fail to reject \(H_0\) that \(\rho = 0\).
We do not have enough evidence to say that the association observed in the sample exists in the population. It is not statistically significant.
Example 2
Association between hours playing
video games and # of dates?
\[ r = \frac{1}{n-1}\Sigma(\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \]
Person | Video game hours (\(x_i\)) | Dates (\(y_i\)) | \(x_i -\bar{x}\) | \(\frac{x_i-\bar{x}}{s_x}\) | \(y_i -\bar{y}\) | \(\frac{y_i-\bar{y}}{s_y}\) | \((\frac{x_i-\bar{x}}{s_x})(\frac{y_i-\bar{y}}{s_y})\) |
---|---|---|---|---|---|---|---|
1 | 0 | 1 | 8.00 | 2.29 | -1.30 | -0.83 | 0.82 |
2 | 3 | 2 | 3.00 | 0.86 | -0.30 | -0.19 | 0.13 |
3 | 5 | 5 | -2.00 | -0.57 | 2.70 | 1.72 | -0.77 |
4 | 15 | 1 | -2.00 | -0.57 | -1.30 | -0.83 | -0.53 |
5 | 14 | 0 | 3.00 | 0.86 | -2.30 | -1.47 | -0.78 |
6 | 25 | 3 | -2.00 | -0.57 | 0.70 | 0.45 | 0.77 |
7 | 1 | 4 | -2.00 | -0.57 | 1.70 | 1.08 | -0.95 |
8 | 1 | 3 | -2.00 | -0.57 | 1.70 | 0.45 | -0.39 |
9 | 5 | 1 | -2.00 | -0.57 | -1.30 | -0.83 | 0.37 |
10 | 22 | 3 | -2.00 | -0.57 | 0.70 | 0.45 | 0.63 |
sum | 91.00 | 23.00 | -4.75 | ||||
mean | 9.10 | 2.30 | |||||
st. dev. | 9.21 | 1.57 |
\(r = \frac{-4.75}{10-1} =\) \(-0.528\)
Interpretation?
Moderate negative association between hours of video games played and number of dates.
Statistical significance?
Since absolute value of r-obtained (0.528) is less extreme than the critical value of r (0.6319), we fail to reject \(H_0\) that \(\rho = 0\).
We do not have enough evidence to say that the association observed in the sample exists in the population. It is not statistically significant.
Data from a sample of 102 adults results in the correlation matrix to the right
Age | Hours Worked | Hours on leisure | |
---|---|---|---|
Age | 1.00 | 0.28 | -0.40 |
Hours Worked | 0.28 | 1.00 | -0.61 |
Hours on leisure | -0.40 | -0.61 | 1.00 |
Interpret the correlation coefficients
Data from a sample of 102 adults results in the correlation matrix to the right
Age | Hours Worked | Hours on leisure | |
---|---|---|---|
Age | 1.00 | 0.28 | -0.40 |
Hours Worked | 0.28 | 1.00 | -0.61 |
Hours on leisure | -0.40 | -0.61 | 1.00 |
Interpret the correlation coefficients
Weak positive association between age and hours worked in this sample.
Data from a sample of 102 adults results in the correlation matrix to the right
Age | Hours Worked | Hours on leisure | |
---|---|---|---|
Age | 1.00 | 0.28 | -0.40 |
Hours Worked | 0.28 | 1.00 | -0.61 |
Hours on leisure | -0.40 | -0.61 | 1.00 |
Interpret the correlation coefficients
Weak positive association between age and hours worked in this sample.
Moderate negative association between age and leisure hours in this sample.
Data from a sample of 102 adults results in the correlation matrix to the right
Age | Hours Worked | Hours on leisure | |
---|---|---|---|
Age | 1.00 | 0.28 | -0.40 |
Hours Worked | 0.28 | 1.00 | -0.61 |
Hours on leisure | -0.40 | -0.61 | 1.00 |
Interpret the correlation coefficients
Weak positive association between age and hours worked in this sample.
Moderate negative association between age and leisure hours in this sample.
Strong negative association between hours worked and leisure hours in this sample.
Data from a sample of 102 adults results in the correlation matrix to the right
Age | Hours Worked | Hours on leisure | |
---|---|---|---|
Age | 1.00 | 0.28 | -0.40 |
Hours Worked | 0.28 | 1.00 | -0.61 |
Hours on leisure | -0.40 | -0.61 | 1.00 |
Which of these correlation coefficients is statistically significant at the 0.05 alpha level?
Weak positive association between age and hours worked in this sample.
Moderate negative association between age and leisure hours in this sample.
Strong negative association between hours worked and leisure hours in this sample.
Data from a sample of 102 adults results in the correlation matrix to the right
Age | Hours Worked | Hours on leisure | |
---|---|---|---|
Age | 1.00 | 0.28 | -0.40 |
Hours Worked | 0.28 | 1.00 | -0.61 |
Hours on leisure | -0.40 | -0.61 | 1.00 |
Which of these correlation coefficients is statistically significant at the 0.05 alpha level?
\(H_a: \rho \ne 0\)
\(H_0: \rho = 0\)
Compare observed values of \(r\) to the
critical value of \(r\)
critical value of \(r = 0.1946\)
Since all observed values of \(r\) are MORE EXTREME than the critical value of \(r\), we can reject the null hypothesis in each case and conclude that the correlations are likely non-zero IN THE POPULATION (all sample correlations statistically significant)