Two-way tables and chi-square

SOC 221 • Lecture 8

Victoria Sass

Wednesday, July 30, 2025

Two-way tables
and chi-square

Bivariate tables
and chi-square

Overview

Have used one sample to draw inferences about one population
Have drawn inferences about the difference between two populations
- Beginning of efforts to test statistical significance of associations between variables
  - Example: testing for gender differences in emotional intelligence (EI) = testing for an association between gender and EI
New goal: test statistical significance for associations in two-way tables
- Two-way tables: tables that simultaneously cross-classify values of one variable by values of another
  - Also called bivariate tables, contingency tables, or crosstabs
- Useful for assessing the bivariate associations between categorical variables

Association
Two variables are “associated” with each other when variation in one variable corresponds with variation in the other variable.

Synonym: relationship

If the variables X and Y are associated:

particular values of Y tend to coincide with particular values of X
the values of Y are different across different values of X
the average value of Y depends on the value of X

Examples:

Education and income are associated because people with higher levels of education tend to have higher levels of income.
Political attitudes are associated with age in that older people tend to be more politically conservative than younger people.

Independent Variable
An independent variable is assumed to influence a dependent variable. The assumed “cause” in an association

Examples:

If we assume that education affects income, education is the independent variable and income is the dependent variable.
If we believe that political attitudes change with age, then age is independent variable and political attitudes represent the dependent variable.

Dependent Variable
A dependent variable is affected by one or more other variables. The assumed “effect” in an association

Generally avoid using language of “cause” and “effect” since establishing a case for causality is always difficult and rarely certain.

Associations in
bivariate tables

Goal: Understand the association between race and attitudes about the death penalty

Say you were presented with the following two tables. Can you tell whether there is an association between race and support for the death penalty?

Frequency distribution of attitudes towards the death penalty
Black and White respondents in GSS 2000
	f	Percent
Support Death Penalty	719	80.16
Oppose Death Penalty	178	19.84
TOTAL	897	100

Frequency distribution of race
Black and White respondents in GSS 2000
	f	Percent
Black	104	11.59
White	793	88.41
TOTAL	897	100

Two-way (contingency/bivariate) tables

Purpose:
- Simultaneously cross-classify the distribution of two variables
- Common first step in detecting and summarizing an association between variables

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60	659	719
Oppose Death Penalty	44	134	178
TOTAL	104	793	897

Goal: Understand the association between race and attitudes about the death penalty

Title: Values of the DV by Values of the IV

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60	659	719
Oppose Death Penalty	44	134	178
TOTAL	104	793	897

Goal: Understand the association between race and attitudes about the death penalty

Column variable (usually the independent variable)

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60	659	719
Oppose Death Penalty	44	134	178
TOTAL	104	793	897

Goal: Understand the association between race and attitudes about the death penalty

Row variable (usually the dependent variable)

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60	659	719
Oppose Death Penalty	44	134	178
TOTAL	104	793	897

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60	659	719
Oppose Death Penalty	44	134	178
TOTAL	104	793	897

Observed frequencies (Each cell of the table includes the count of cases with the specific combination of attributes on the two variables)

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60	659	719
Oppose Death Penalty	44	134	178
TOTAL	104	793	897

Marginals show the basic distribution of the two variables
(same information from frequency tables)

“Total” row and column are called marginals

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

57.69% = (60/104) * (100)

Calculate percentages of the DV within values of IV (here use column %s)

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

Percentages of the DV within values of the IV =
Conditional Distributions

Can detect the ASSOCIATION between variables by comparing conditional distributions…

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

DIRECTION
Is the association positive or negative?

STATISTICAL SIGNIFICANCE
How certain can we be that the association exists in the population?

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

Maximum (perfect) association:
- All cases with a particular value of X have the same value on Y (no conditional variation)
- Knowing the value of X allows for perfect prediction of Y
Minimum (no) association:
- High conditional variation (value of Y varies even among those cases with the same value of X)
- Knowing the value of X does not improve ability to predict Y

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

Maximum (perfect) association:
- All cases with a particular value of X have the same value on Y (no conditional variation)
- Knowing the value of X allows for perfect prediction of Y
Minimum (no) association:
- High conditional variation (value of Y varies even among those cases with the same value of X)
- Knowing the value of X does not improve ability to predict Y

Conditional distributions are completely dissimilar (maximum difference in column %s across values of the IV)

Two-way table of vote on Affordable Care Act (ACA) by political party
US Senators in 2009
	Democrat	Republican	TOTAL
Voted for ACA	60 (100.00%)	0 (0.00%)	60 (60.61%)
Voted against ACA	0 (0.00%)	39 (100.00%)	39 (39.39%)
TOTAL	60 (100.00%)	39 (100.00%)	99

No conditional variation (all cases with a particular IV value have the same DV value)

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

Maximum (perfect) association:
- All cases with a particular value of X have the same value on Y (no conditional variation)
- Knowing the value of X allows for perfect prediction of Y
Minimum (no) association:
- High conditional variation (value of Y varies even among those cases with the same value of X)
- Knowing the value of X does not improve ability to predict Y

Conditional distributions are exactly the same (no difference in column %s across values of the IV; all match the marginal %s)

Two-way table of transportation to work by gender
Alltech Corp. workers 2014
	Female	Male	TOTAL
Drive	100 (71.43%)	150 (71.43%)	250 (71.43%)
Public Transportation	30 (21.43%)	45 (21.43%)	75 (21.43%)
Walk/bike	10 (7.14%)	15 (7.14%)	25 (7.14%)
TOTAL	140 (100.00%)	210 (100.00%)	350 (100.00%)

High conditional variation (lots of different values of DV for cases with same IV value)

Using conditional distributions to detect an association

Two-way table: Number of Crimes Committed by Education
Sample of parolees from Florida Prisons
	Low Education	High Education	TOTAL
0 Crimes	80 (50.00%)	130 (86.67%)	210 (67.74%)
1 Crime	24 (15.00%)	15 (10.00%)	39 (12.58%)
2+ Crimes	56 (35.00%)	5 (3.33%)	61 (19.68%)
TOTAL	160 (100.00%)	150 (100.00%)	310 (100.00%)

YES: Conditional distributions are different

Is there any association?

Goal: Understand the association between race and attitudes about the death penalty

Is there an association between race and support for the death penalty? How do you know?

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Conditional distributions are different from each other and the marginal percentages
If attitudes were completely unassociated with race, we’d expect \(80.16\%\) of both races to support the death penalty

Goal: Understand the association between race and attitudes about the death penalty

How STRONG is the association?

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Somewhere between the extremes of no association and a perfect association
- We can quantify the strength of the association using risk ratios

Goal: Understand the association between race and attitudes about the death penalty

How STRONG is the association?

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

RISK RATIO (a.k.a. relative risk)
The ratio of the probability of some outcome among one group to the probability of the outcome among a different group.

Goal: Understand the association between race and attitudes about the death penalty

How STRONG is the association?

Probability of supporting the death penalty

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

For black respondents:
\(P(SUPPORT) = 0.5769\)

For white respondents:
\(P(SUPPORT) = 0.8310\)

RISK RATIO = \((0.8310)\) \(/\) \((0.5769)\) \(= 1.44\)

The probability of supporting the death penalty is 1.44 times greater for white than for black respondents

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

Maximum (perfect) association:
- All cases with a particular value of X have the same value on Y (no conditional variation)
- Knowing the value of X allows for perfect prediction of Y
Minimum (no) association:
- High conditional variation (value of Y varies even among those cases with the same value of X)
- Knowing the value of X does not improve ability to predict Y

DIRECTION
Is the association positive or negative?

Positive association: High values on Y tend to coincide with high values on X
Negative association: High values on Y tend to coincide with low values on X

Using conditional distributions to detect an association

Two-way table: Number of Crimes Committed by Education
Sample of parolees from Florida Prisons
	Low Education	High Education	TOTAL
0 Crimes	80 (50.00%)	130 (86.67%)	210 (67.74%)
1 Crime	24 (15.00%)	15 (10.00%)	39 (12.58%)
2+ Crimes	56 (35.00%)	5 (3.33%)	61 (19.68%)
TOTAL	160 (100.00%)	150 (100.00%)	310 (100.00%)

Negative:
Higher education associated with lower number of crimes

What is the direction of this association?

Goal: Understand the association between race and attitudes about the death penalty

What is the direction of this association?

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Not relevant since these are nominal variables (no higher or lower values)

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

Maximum (perfect) association:
- All cases with a particular value of X have the same value on Y (no conditional variation)
- Knowing the value of X allows for perfect prediction of Y
Minimum (no) association:
- High conditional variation (value of Y varies even among those cases with the same value of X)
- Knowing the value of X does not improve ability to predict Y

DIRECTION
Is the association positive or negative?

Positive association: High values on Y tend to coincide with high values on X
Negative association: High values on Y tend to coincide with low values on X

STATISTICAL SIGNIFICANCE
How certain can we be that the association exists in the population?

Relevant when drawing inferences from an association observed in a sample to a possible association in the population
Determined by a statistical hypothesis test

Goal: Understand the association between race and attitudes about the death penalty

There is an association in this sample

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Key question: Is this association in the sample strong enough to convince us that there is a real association in the POPULATION from which the sample was drawn?

Chi-square test

Used to test statistical significance of associations in a two-way table (so, between categorical variables)
Intended to test whether a pattern or association observed in a set of sample data:
1. represents a real association in the population from which the sample was drawn
  OR
2. reflects random sampling error when, in reality, there is no real association in the population
Based on a comparison of our observed frequencies to expected frequencies.
- Observed frequencies = the relative frequencies actually observed in the data for the sample
- Expected frequencies = the relative frequencies that we would expect if there was no association in the data

Goal: Understand the association between race and attitudes about the death penalty

What would the table look like if there were no association between the variables?

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	60 (57.69%)	659 (83.10%)	719 (80.16%)
Oppose Death Penalty	44 (42.31%)	134 (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables

Goal: Understand the association between race and attitudes about the death penalty

What would the table look like if there were no association between the variables?

Conditional distributions of the DV would match across the values of the IV (same as marginals)

EXPECTED FREQUENCIES for attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	(80.16%)	(80.16%)	719 (80.16%)
Oppose Death Penalty	(19.84%)	(19.84%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables

Goal: Understand the association between race and attitudes about the death penalty

What would the table look like if there were no association between the variables?

Conditional distributions of the DV would match across the values of the IV (same as marginals)

EXPECTED FREQUENCIES for attitudes towards the death penalty by race
Black and White respondents in GSS 2000
	Black	White	TOTAL
Support Death Penalty	83.36 (80.16%)	635.64 (80.16%)	719 (80.16%)
Oppose Death Penalty	20.64 (19.84%)	157.36 (19.84%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables

Calculation of chi-square

Obtained value of chi-square:

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \] where
\(f_o =\) \(observed\) \(\text{frequency in a given}\) cell
\(f_e =\) \(\text{frequency in the}\) cell expected \(\text{under the assumption of the null hypothesis}\)
\(\text{(no association in the population)}\)

Shortcut to calculate expected cell frequencies:

\[ f_e = \frac{\text{(row marginal)(column marginal)}}{n} \]

Note: Chi-square will take a value of 0 if there is no association in the sample.

Goal: Understand the association between race and attitudes about the death penalty

OBSERVED and EXPECTED Frequencies
	Black	White	TOTAL
Support Death Penalty	\(f_0 = 60\) \(f_e = 83.36\) (57.69%)	\(f_0 = 659\) \(f_e = 635.64\) (83.10%)	719 (80.16%)
Oppose Death Penalty	\(f_0 = 44\) \(f_e = 20.64\) (42.31%)	\(f_0 = 134\) \(f_e = 157.36\) (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

	\(f_0\)	\(f_e\)	\((f_0 - f_e)\)	\((f_0 - f_e)^2\)	\(\frac{(f_0 - f_e)^2}{f_e}\)
Cell #1	60	83.36	-23.36	545.69	6.55
Cell #2	659	635.64	23.36	545.69	0.86
Cell #3	44	20.64	23.36	545.69	26.44
Cell #4	134	157.36	-23.36	545.69	3.47

\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]

Goal: Understand the association between race and attitudes about the death penalty

OBSERVED and EXPECTED Frequencies
	Black	White	TOTAL
Support Death Penalty	\(f_0 = 60\) \(f_e = 83.36\) (57.69%)	\(f_0 = 659\) \(f_e = 635.64\) (83.10%)	719 (80.16%)
Oppose Death Penalty	\(f_0 = 44\) \(f_e = 20.64\) (42.31%)	\(f_0 = 134\) \(f_e = 157.36\) (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

	\(f_0\)	\(f_e\)	\((f_0 - f_e)\)	\((f_0 - f_e)^2\)	\(\frac{(f_0 - f_e)^2}{f_e}\)
Cell #1	60	83.36	-23.36	545.69	6.55
Cell #2	659	635.64	23.36	545.69	0.86
Cell #3	44	20.64	23.36	545.69	26.44
Cell #4	134	157.36	-23.36	545.69	3.47

\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]

\(\Sigma\) \(= 37.31\)

Chi-square score summarizes the difference between what we observe in the sample and what would expect to observe if there was no association between the variables.

Question: Is that difference big enough to convince us that it did not just happen by chance (sampling error)?

Need a hypothesis test

Hypothesis test for two-way Chi-square

Check assumptions
- Random sample, scores are independent (i.e., each subject is allowed only one preference); no expected cell frequencies below 5.
State the hypotheses
- Null: No association in the population
- Alternative: A real association in the population.
Identify alpha and the critical value of chi-square
- Use a chi-square distribution table to get critical value of chi-square
- degrees of freedom \((df) = (R-1)(C-1)\)
Calculate the test statistic
- chi-square obtained
Make a decision
- Reject or fail to reject null hypothesis
- Make a statement about the implications for the population

Step 1: Check assumptions

OBSERVED and EXPECTED Frequencies
	Black	White	TOTAL
Support Death Penalty	\(f_0 = 60\) \(f_e = 83.36\) (57.69%)	\(f_0 = 659\) \(f_e = 635.64\) (83.10%)	719 (80.16%)
Oppose Death Penalty	\(f_0 = 44\) \(f_e = 20.64\) (42.31%)	\(f_0 = 134\) \(f_e = 157.36\) (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

Random sample ✔️
Scores are independent ✔️
No expected cell frequencies below 5 ✔️

Step 2: State the hypotheses

\(H_0\): No association between race and attitudes toward death penalty in the population

\(H_a\): A real association between race and attitudes toward death penalty in the population

Step 3: Identify alpha and the critical value of chi-square

The default alpha level is \(0.05\) (\(0.01\) and \(0.001\) are tougher alternatives)

\(df\) \(= (R-1)(C-1) = (2-1)(2-1) =\) \(1\)

Now we can go to the chi-square distribution table to see what critical value is associated with an alpha level of 0.05 and 1 degree of freedom

Our critical value is \(3.84\)

Step 4: Calculate the test statistic

OBSERVED and EXPECTED Frequencies
	Black	White	TOTAL
Support Death Penalty	\(f_0 = 60\) \(f_e = 83.36\) (57.69%)	\(f_0 = 659\) \(f_e = 635.64\) (83.10%)	719 (80.16%)
Oppose Death Penalty	\(f_0 = 44\) \(f_e = 20.64\) (42.31%)	\(f_0 = 134\) \(f_e = 157.36\) (16.90%)	178 (19.84%)
TOTAL	104 (100.00%)	793 (100.00%)	897

	\(f_0\)	\(f_e\)	\((f_0 - f_e)\)	\((f_0 - f_e)^2\)	\(\frac{(f_0 - f_e)^2}{f_e}\)
Cell #1	60	83.36	-23.36	545.69	6.55
Cell #2	659	635.64	23.36	545.69	0.86
Cell #3	44	20.64	23.36	545.69	26.44
Cell #4	134	157.36	-23.36	545.69	3.47

\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]

\(\Sigma\) \(= 37.31\)

Step 5: Make a decision

Since the obtained chi-square (\(37.31\)) is greater than the critical value (\(3.84\)), I can reject the null hypothesis

This supports the research hypothesis that there IS a real association between race and attitudes toward the death penalty IN THE POPULATION

Break!

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25		80
Indifferrent	36		111
Likes			184
TOTAL	100	275	375

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25	55	80
Indifferrent	36	75	111
Likes	39	145	184
TOTAL	100	275	375

Fill in the missing observed frequencies

(note that once two cells are completed (and you have the marginals) you can complete the table)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 (25.00%)	55 (20.00%)	80
Indifferrent	36 (36.00%)	75 (27.27%)	111
Likes	39 (39.00%)	145 (52.73%)	184
TOTAL	100	275	375

Add column percentages to better understand conditional distributions

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 (25.00%)	55 (20.00%)	80
Indifferrent	36 (36.00%)	75 (27.27%)	111
Likes	39 (39.00%)	145 (52.73%)	184
TOTAL	100	275	375

Difference in conditional distributions indicate that there IS an association in the sample

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

\(\frac{0.53}{0.39} = 1.\)\(36\)

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 (25.00%)	55 (20.00%)	80
Indifferrent	36 (36.00%)	75 (27.27%)	111
Likes	39 (39.00%)	145 (52.73%)	184
TOTAL	100	275	375

Use risk ratios to quantify the association. For example, the probability of liking Thanksgiving is 36% higher for carnivores than for vegetarians

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 (25.00%)	55 (20.00%)	80
Indifferrent	36 (36.00%)	75 (27.27%)	111
Likes	39 (39.00%)	145 (52.73%)	184
TOTAL	100	275	375

\(H_a\): IN THE POPULATION, there is an association
between diet and attitudes towards Thanksgiving.

\(H_0\): There is no association between diet and
attitudes towards Thanksgiving IN THE POPULATION.

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 \(f_e = 21.33\) (25.00%)	55 \(f_e = 58.67\) (20.00%)	80
Indifferrent	36 \(f_e = 29.60\) (36.00%)	75 \(f_e = 81.40\) (27.27%)	111
Likes	39 \(f_e = 49.07\) (39.00%)	145 \(f_e = 134.93\) (52.73%)	184
TOTAL	100	275	375

Find EXPECTED FREQUENCIES:

\(f_e = \frac{\text{(row marginal)(column marginal)}}{n}\)

For example: \(f_e\) for Cell 5 = \(\frac{(184)(100)}{375} =\) \(49.07\)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 \(f_e = 21.33\) (25.00%)	55 \(f_e = 58.67\) (20.00%)	80
Indifferrent	36 \(f_e = 29.60\) (36.00%)	75 \(f_e = 81.40\) (27.27%)	111
Likes	39 \(f_e = 49.07\) (39.00%)	145 \(f_e = 134.93\) (52.73%)	184
TOTAL	100	275	375

Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

	\(f_0\)	\(f_e\)	\((f_0 - f_e)\)	\((f_0 - f_e)^2\)	\(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1	25	21.33	3.67	13.44	0.63
Cell 2	55	58.67	-3.67	13.44	0.23
Cell 3	36	29.60	6.40	40.96	1.38
Cell 4	75	81.40	-6.40	40.96	0.50
Cell 5	39	49.07	-10.07	101.34	2.07
Cell 6	145	134.93	10.07	101.34	0.75

Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

	\(f_0\)	\(f_e\)	\((f_0 - f_e)\)	\((f_0 - f_e)^2\)	\(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1	25	21.33	3.67	13.44	0.63
Cell 2	55	58.67	-3.67	13.44	0.23
Cell 3	36	29.60	6.40	40.96	1.38
Cell 4	75	81.40	-6.40	40.96	0.50
Cell 5	39	49.07	-10.07	101.34	2.07
Cell 6	145	134.93	10.07	101.34	0.75
				\(\chi^2 =\)	5.56

Need to compare our obtained \(\chi^2\) value of 5.56 with the critical value of \(\chi^2\)
By default we use an alpha level of 0.05
\(df = (R-1)(C-1) = (3-1)(2-1) = 2\)

Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

\[ \text{obtained } \chi^2 = 5.56 \]

\(\text{critical value of }\)

\(\chi^2 = 5.99\)

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 \(f_e = 21.33\) (25.00%)	55 \(f_e = 58.67\) (20.00%)	80
Indifferrent	36 \(f_e = 29.60\) (36.00%)	75 \(f_e = 81.40\) (27.27%)	111
Likes	39 \(f_e = 49.07\) (39.00%)	145 \(f_e = 134.93\) (52.73%)	184
TOTAL	100	275	375

Since obtained value of chi-square is LESS EXTREME than the critical value we FAIL TO REJECT THE NULL HYPOTHESIS. The association observed is NOT statistically significant. Cannot be confident that the association exists in the population.

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	25 \(f_e = 21.33\) (25.00%)	55 \(f_e = 58.67\) (20.00%)	80
Indifferrent	36 \(f_e = 29.60\) (36.00%)	75 \(f_e = 81.40\) (27.27%)	111
Likes	39 \(f_e = 49.07\) (39.00%)	145 \(f_e = 134.93\) (52.73%)	184
TOTAL	100	275	375

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Since the conditional distributions are the same, and do not match, there still appears to be an association IN THE SAMPLE.

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	50 (25.00%)	110 (20.00%)	160
Indifferrent	72 (36.00%)	150 (27.27%)	222
Likes	78 (39.00%)	290 (52.73%)	368
TOTAL	200	550	750

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Since the conditional distributions are the same, and do not match, there still appears to be an association IN THE SAMPLE.

Feelings about Thanksgiving
	Vegetarian	Carnivore	TOTAL
Dislikes	50 \(f_e = 42.67\) (25.00%)	110 \(f_e = 117.33\) (20.00%)	160
Indifferrent	72 \(f_e = 59.2\) (36.00%)	150 \(f_e = 162.8\) (27.27%)	222
Likes	78 \(f_e = -98.13\) (39.00%)	290 \(f_e = 269.87\) (52.73%)	368
TOTAL	200	550	750

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

	\(f_0\)	\(f_e\)	\((f_0 - f_e)\)	\((f_0 - f_e)^2\)	\(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1	50	42.67	7.33	53.73	1.26
Cell 2	110	117.33	-7.33	53.73	0.46
Cell 3	72	59.2	12.8	163.84	2.77
Cell 4	150	162.8	-12.8	163.84	1.01
Cell 5	78	98.13	-20.13	405.22	4.13
Cell 6	290	269.87	20.13	405.22	1.5

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

\[ \text{obtained } \chi^2 = 11.13 \]

\(\text{critical value of }\)

\(\chi^2 = 5.99\)

	\(f_0\)	\(f_e\)	\((f_0 - f_e)\)	\((f_0 - f_e)^2\)	\(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1	50	42.67	7.33	53.73	1.26
Cell 2	110	117.33	-7.33	53.73	0.46
Cell 3	72	59.2	12.8	163.84	2.77
Cell 4	150	162.8	-12.8	163.84	1.01
Cell 5	78	98.13	-20.13	405.22	4.13
Cell 6	290	269.87	20.13	405.22	1.5
				\(\chi^2 =\)	11.13

The obtained value of the chi-square goes way up. Now, the obtained value of chi-square \(\gt\) critical value of chi-square. The association observed in the sample IS statistically significant. We REJECT THE NULL HYPOTHESIS and find SUPPORT FOR THE ALTERNATIVE HYPOTHESIS that there is an association in the population.

What happens if the sample is doubled, with the same conditional distributions?

Chi-square and strength of an association

Size of chi-square (obtained) statistic is directly proportional to sample size
- Double cell counts = double chi-square, regardless of strength of association in the sample
- Cut cell counts by \(\frac{1}{4}\) = \(\frac{1}{4}\) reduction in chi-square, regardless of strength of association in the sample
Can have a large chi-square with a weak association if \(n\) is large
Hard to find statistically significant associations with small \(n\)
- Note the key assumption of no expected frequencies below 5
Avoid drawing conclusions about strength of an association based on size of chi-square
- Large chi-square = stronger confidence in inference, not strength

Two-way tables and chi-square

Two-way tablesand chi-square

Bivariate tablesand chi-square

Overview

If the variables X and Y are associated:

Examples:

Examples:

Associations in bivariate tables

Goal: Understand the association between race and attitudes about the death penalty

Two-way (contingency/bivariate) tables

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Goal: Understand the association between race and attitudes about the death penalty

Title: Values of the DV by Values of the IV

Goal: Understand the association between race and attitudes about the death penalty

Column variable (usually the independent variable)

Goal: Understand the association between race and attitudes about the death penalty

Row variable (usually the dependent variable)

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Observed frequencies (Each cell of the table includes the count of cases with the specific combination of attributes on the two variables)

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Marginals show the basic distribution of the two variables(same information from frequency tables)

“Total” row and column are called marginals

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

Calculate percentages of the DV within values of IV (here use column %s)

Goal: Understand the association between race and attitudes about the death penalty

Placeholder

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

Percentages of the DV within values of the IV =Conditional Distributions

Key characteristics of an association (e.g. between X and Y)

Key characteristics of an association (e.g. between X and Y)

Key characteristics of an association (e.g. between X and Y)

Key characteristics of an association (e.g. between X and Y)

Using conditional distributions to detect an association

Is there any association?

Goal: Understand the association between race and attitudes about the death penalty

Is there an association between race and support for the death penalty? How do you know?

Goal: Understand the association between race and attitudes about the death penalty

Goal: Understand the association between race and attitudes about the death penalty

Goal: Understand the association between race and attitudes about the death penalty

Probability of supporting the death penalty

Key characteristics of an association (e.g. between X and Y)

Using conditional distributions to detect an association

Goal: Understand the association between race and attitudes about the death penalty

Key characteristics of an association (e.g. between X and Y)

Goal: Understand the association between race and attitudes about the death penalty

Chi-square test

Goal: Understand the association between race and attitudes about the death penalty

Goal: Understand the association between race and attitudes about the death penalty

Goal: Understand the association between race and attitudes about the death penalty

Calculation of chi-square

Goal: Understand the association between race and attitudes about the death penalty

Goal: Understand the association between race and attitudes about the death penalty

Hypothesis test for two-way Chi-square

Step 1: Check assumptions

Step 2: State the hypotheses

Step 3: Identify alpha and the critical value of chi-square

Step 4: Calculate the test statistic

Step 5: Make a decision

Break!

Practice

Practice

Practice

Practice

Practice

Practice

Practice

Practice

Practice

Practice

Practice

Practice

Practice

Two-way tables
and chi-square

Bivariate tables
and chi-square

Associations in
bivariate tables

Marginals show the basic distribution of the two variables
(same information from frequency tables)

Percentages of the DV within values of the IV =
Conditional Distributions