Two-way tables and chi-square

SOC 221 • Lecture 8

Victoria Sass

Wednesday, July 30, 2025

Two-way tables
and chi-square

Bivariate tables
and chi-square

Overview

  • Have used one sample to draw inferences about one population

  • Have drawn inferences about the difference between two populations

    • Beginning of efforts to test statistical significance of associations between variables
      • Example: testing for gender differences in emotional intelligence (EI) = testing for an association between gender and EI
  • New goal: test statistical significance for associations in two-way tables

    • Two-way tables: tables that simultaneously cross-classify values of one variable by values of another
      • Also called bivariate tables, contingency tables, or crosstabs
    • Useful for assessing the bivariate associations between categorical variables


Association
Two variables are “associated” with each other when variation in one variable corresponds with variation in the other variable.

Synonym: relationship


If the variables X and Y are associated:

  • particular values of Y tend to coincide with particular values of X
  • the values of Y are different across different values of X
  • the average value of Y depends on the value of X

Examples:

  • Education and income are associated because people with higher levels of education tend to have higher levels of income.
  • Political attitudes are associated with age in that older people tend to be more politically conservative than younger people.

Independent Variable
An independent variable is assumed to influence a dependent variable. The assumed “cause” in an association


Examples:

  • If we assume that education affects income, education is the independent variable and income is the dependent variable.
  • If we believe that political attitudes change with age, then age is independent variable and political attitudes represent the dependent variable.

Dependent Variable
A dependent variable is affected by one or more other variables. The assumed “effect” in an association



Generally avoid using language of “cause” and “effect” since establishing a case for causality is always difficult and rarely certain.

Associations in
bivariate tables

Goal: Understand the association between race and attitudes about the death penalty


Say you were presented with the following two tables. Can you tell whether there is an association between race and support for the death penalty?

Frequency distribution of attitudes towards the death penalty
Black and White respondents in GSS 2000
f Percent
Support Death Penalty 719 80.16
Oppose Death Penalty 178 19.84
TOTAL 897 100
Frequency distribution of race
Black and White respondents in GSS 2000
f Percent
Black 104 11.59
White 793 88.41
TOTAL 897 100


Two-way (contingency/bivariate) tables

  • Purpose:
    • Simultaneously cross-classify the distribution of two variables
    • Common first step in detecting and summarizing an association between variables

Goal: Understand the association between race and attitudes about the death penalty


Placeholder


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60 659 719
Oppose Death Penalty 44 134 178
TOTAL 104 793 897

Goal: Understand the association between race and attitudes about the death penalty


Title: Values of the DV by Values of the IV


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60 659 719
Oppose Death Penalty 44 134 178
TOTAL 104 793 897

Goal: Understand the association between race and attitudes about the death penalty


Column variable (usually the independent variable)


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60 659 719
Oppose Death Penalty 44 134 178
TOTAL 104 793 897

Goal: Understand the association between race and attitudes about the death penalty


Row variable (usually the dependent variable)


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60 659 719
Oppose Death Penalty 44 134 178
TOTAL 104 793 897

Goal: Understand the association between race and attitudes about the death penalty


Placeholder


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60 659 719
Oppose Death Penalty 44 134 178
TOTAL 104 793 897

Observed frequencies (Each cell of the table includes the count of cases with the specific combination of attributes on the two variables)

Goal: Understand the association between race and attitudes about the death penalty


Placeholder


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60 659 719
Oppose Death Penalty 44 134 178
TOTAL 104 793 897

Marginals show the basic distribution of the two variables
(same information from frequency tables)

Total” row and column are called marginals

Goal: Understand the association between race and attitudes about the death penalty


Placeholder


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

Goal: Understand the association between race and attitudes about the death penalty


Placeholder


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

57.69% = (60/104) * (100)

Calculate percentages of the DV within values of IV (here use column %s)

Goal: Understand the association between race and attitudes about the death penalty


Placeholder


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

Percentages can be used to compare the distribution of the dependent variable (DV) across values of the independent variable (IV)

Percentages of the DV within values of the IV =
Conditional Distributions

Can detect the ASSOCIATION between variables by comparing conditional distributions…

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

DIRECTION
Is the association positive or negative?


STATISTICAL SIGNIFICANCE
How certain can we be that the association exists in the population?

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

  • Maximum (perfect) association:
    • All cases with a particular value of X have the same value on Y (no conditional variation)
    • Knowing the value of X allows for perfect prediction of Y
  • Minimum (no) association:
    • High conditional variation (value of Y varies even among those cases with the same value of X)
    • Knowing the value of X does not improve ability to predict Y

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

  • Maximum (perfect) association:
    • All cases with a particular value of X have the same value on Y (no conditional variation)
    • Knowing the value of X allows for perfect prediction of Y
  • Minimum (no) association:
    • High conditional variation (value of Y varies even among those cases with the same value of X)
    • Knowing the value of X does not improve ability to predict Y

Conditional distributions are completely dissimilar (maximum difference in column %s across values of the IV)

Two-way table of vote on Affordable Care Act (ACA) by political party
US Senators in 2009
Democrat Republican TOTAL
Voted for ACA 60
(100.00%)
0
(0.00%)
60
(60.61%)
Voted against ACA 0
(0.00%)
39
(100.00%)
39
(39.39%)
TOTAL 60
(100.00%)
39
(100.00%)
99

No conditional variation (all cases with a particular IV value have the same DV value)

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

  • Maximum (perfect) association:
    • All cases with a particular value of X have the same value on Y (no conditional variation)
    • Knowing the value of X allows for perfect prediction of Y
  • Minimum (no) association:
    • High conditional variation (value of Y varies even among those cases with the same value of X)
    • Knowing the value of X does not improve ability to predict Y

Conditional distributions are exactly the same (no difference in column %s across values of the IV; all match the marginal %s)

Two-way table of transportation to work by gender
Alltech Corp. workers 2014
Female Male TOTAL
Drive 100
(71.43%)
150
(71.43%)
250
(71.43%)
Public Transportation 30
(21.43%)
45
(21.43%)
75
(21.43%)
Walk/bike 10
(7.14%)
15
(7.14%)
25
(7.14%)
TOTAL 140
(100.00%)
210
(100.00%)
350
(100.00%)

High conditional variation (lots of different values of DV for cases with same IV value)

Using conditional distributions to detect an association


Two-way table: Number of Crimes Committed by Education
Sample of parolees from Florida Prisons
Low Education High Education TOTAL
0 Crimes 80
(50.00%)
130
(86.67%)
210
(67.74%)
1 Crime 24
(15.00%)
15
(10.00%)
39
(12.58%)
2+ Crimes 56
(35.00%)
5
(3.33%)
61
(19.68%)
TOTAL 160
(100.00%)
150
(100.00%)
310
(100.00%)

YES: Conditional distributions are different


Is there any association?

Goal: Understand the association between race and attitudes about the death penalty


Is there an association between race and support for the death penalty? How do you know?


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897
  • Conditional distributions are different from each other and the marginal percentages
  • If attitudes were completely unassociated with race, we’d expect \(80.16\%\) of both races to support the death penalty

Goal: Understand the association between race and attitudes about the death penalty


How STRONG is the association?



Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897
  • Somewhere between the extremes of no association and a perfect association
    • We can quantify the strength of the association using risk ratios

Goal: Understand the association between race and attitudes about the death penalty



How STRONG is the association?


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

RISK RATIO (a.k.a. relative risk)
The ratio of the probability of some outcome among one group to the probability of the outcome among a different group.

Goal: Understand the association between race and attitudes about the death penalty



How STRONG is the association?

Probability of supporting the death penalty

Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

For black respondents:
\(P(SUPPORT) = 0.5769\)

For white respondents:
\(P(SUPPORT) = 0.8310\)

RISK RATIO = \((0.8310)\) \(/\) \((0.5769)\) \(= 1.44\)

The probability of supporting the death penalty is 1.44 times greater for white than for black respondents

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

  • Maximum (perfect) association:
    • All cases with a particular value of X have the same value on Y (no conditional variation)
    • Knowing the value of X allows for perfect prediction of Y
  • Minimum (no) association:
    • High conditional variation (value of Y varies even among those cases with the same value of X)
    • Knowing the value of X does not improve ability to predict Y

DIRECTION
Is the association positive or negative?

  • Positive association: High values on Y tend to coincide with high values on X
  • Negative association: High values on Y tend to coincide with low values on X

Using conditional distributions to detect an association


Two-way table: Number of Crimes Committed by Education
Sample of parolees from Florida Prisons
Low Education High Education TOTAL
0 Crimes 80
(50.00%)
130
(86.67%)
210
(67.74%)
1 Crime 24
(15.00%)
15
(10.00%)
39
(12.58%)
2+ Crimes 56
(35.00%)
5
(3.33%)
61
(19.68%)
TOTAL 160
(100.00%)
150
(100.00%)
310
(100.00%)

Negative:
Higher education associated with lower number of crimes


What is the direction of this association?

Goal: Understand the association between race and attitudes about the death penalty


What is the direction of this association?


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897


Not relevant since these are nominal variables (no higher or lower values)

Key characteristics of an association (e.g. between X and Y)

STRENGTH
How strong is the tendency for certain values of Y to go with particular values of X?

  • Maximum (perfect) association:
    • All cases with a particular value of X have the same value on Y (no conditional variation)
    • Knowing the value of X allows for perfect prediction of Y
  • Minimum (no) association:
    • High conditional variation (value of Y varies even among those cases with the same value of X)
    • Knowing the value of X does not improve ability to predict Y

DIRECTION
Is the association positive or negative?

  • Positive association: High values on Y tend to coincide with high values on X
  • Negative association: High values on Y tend to coincide with low values on X

STATISTICAL SIGNIFICANCE
How certain can we be that the association exists in the population?

  • Relevant when drawing inferences from an association observed in a sample to a possible association in the population
  • Determined by a statistical hypothesis test

Goal: Understand the association between race and attitudes about the death penalty


There is an association in this sample


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897


Key question: Is this association in the sample strong enough to convince us that there is a real association in the POPULATION from which the sample was drawn?

Chi-square test

  • Used to test statistical significance of associations in a two-way table (so, between categorical variables)

  • Intended to test whether a pattern or association observed in a set of sample data:

    1. represents a real association in the population from which the sample was drawn
      OR
    2. reflects random sampling error when, in reality, there is no real association in the population
  • Based on a comparison of our observed frequencies to expected frequencies.

    • Observed frequencies = the relative frequencies actually observed in the data for the sample
    • Expected frequencies = the relative frequencies that we would expect if there was no association in the data

Goal: Understand the association between race and attitudes about the death penalty


What would the table look like if there were no association between the variables?


Two-way table of attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 60
(57.69%)
659
(83.10%)
719
(80.16%)
Oppose Death Penalty 44
(42.31%)
134
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables

Goal: Understand the association between race and attitudes about the death penalty


What would the table look like if there were no association between the variables?

Conditional distributions of the DV would match across the values of the IV (same as marginals)

EXPECTED FREQUENCIES for attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty
(80.16%)

(80.16%)
719
(80.16%)
Oppose Death Penalty
(19.84%)

(19.84%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables

Goal: Understand the association between race and attitudes about the death penalty


What would the table look like if there were no association between the variables?

Conditional distributions of the DV would match across the values of the IV (same as marginals)

EXPECTED FREQUENCIES for attitudes towards the death penalty by race
Black and White respondents in GSS 2000
Black White TOTAL
Support Death Penalty 83.36
(80.16%)
635.64
(80.16%)
719
(80.16%)
Oppose Death Penalty 20.64
(19.84%)
157.36
(19.84%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897

Chi-square test is based on comparison of the counts/frequencies we actually observe in the sample to what the table would look like if there were no association between the variables

Calculation of chi-square

  • Obtained value of chi-square:

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \] where
\(f_o =\) \(observed\) \(\text{frequency in a given}\) cell
\(f_e =\) \(\text{frequency in the}\) cell expected \(\text{under the assumption of the null hypothesis}\)
   \(\text{(no association in the population)}\)

  • Shortcut to calculate expected cell frequencies:

\[ f_e = \frac{\text{(row marginal)(column marginal)}}{n} \]

Note: Chi-square will take a value of 0 if there is no association in the sample.

Goal: Understand the association between race and attitudes about the death penalty


OBSERVED and EXPECTED Frequencies
Black White TOTAL
Support Death Penalty \(f_0 = 60\)
\(f_e = 83.36\)
(57.69%)
\(f_0 = 659\)
\(f_e = 635.64\)
(83.10%)
719
(80.16%)
Oppose Death Penalty \(f_0 = 44\)
\(f_e = 20.64\)
(42.31%)
\(f_0 = 134\)
\(f_e = 157.36\)
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897


\(f_0\) \(f_e\) \((f_0 - f_e)\) \((f_0 - f_e)^2\) \(\frac{(f_0 - f_e)^2}{f_e}\)
Cell #1 60 83.36 -23.36 545.69 6.55
Cell #2 659 635.64 23.36 545.69 0.86
Cell #3 44 20.64 23.36 545.69 26.44
Cell #4 134 157.36 -23.36 545.69 3.47

\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]

Goal: Understand the association between race and attitudes about the death penalty


OBSERVED and EXPECTED Frequencies
Black White TOTAL
Support Death Penalty \(f_0 = 60\)
\(f_e = 83.36\)
(57.69%)
\(f_0 = 659\)
\(f_e = 635.64\)
(83.10%)
719
(80.16%)
Oppose Death Penalty \(f_0 = 44\)
\(f_e = 20.64\)
(42.31%)
\(f_0 = 134\)
\(f_e = 157.36\)
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897


\(f_0\) \(f_e\) \((f_0 - f_e)\) \((f_0 - f_e)^2\) \(\frac{(f_0 - f_e)^2}{f_e}\)
Cell #1 60 83.36 -23.36 545.69 6.55
Cell #2 659 635.64 23.36 545.69 0.86
Cell #3 44 20.64 23.36 545.69 26.44
Cell #4 134 157.36 -23.36 545.69 3.47

\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]

\(\Sigma\) \(= 37.31\)

Chi-square score summarizes the difference between what we observe in the sample and what would expect to observe if there was no association between the variables.

Question: Is that difference big enough to convince us that it did not just happen by chance (sampling error)?

Need a hypothesis test

Hypothesis test for two-way Chi-square

  1. Check assumptions
    • Random sample, scores are independent (i.e., each subject is allowed only one preference); no expected cell frequencies below 5.
  2. State the hypotheses
    • Null: No association in the population
    • Alternative: A real association in the population.
  3. Identify alpha and the critical value of chi-square
  4. Calculate the test statistic
    • chi-square obtained
  5. Make a decision
    • Reject or fail to reject null hypothesis
    • Make a statement about the implications for the population

Step 1: Check assumptions


OBSERVED and EXPECTED Frequencies
Black White TOTAL
Support Death Penalty \(f_0 = 60\)
\(f_e = 83.36\)
(57.69%)
\(f_0 = 659\)
\(f_e = 635.64\)
(83.10%)
719
(80.16%)
Oppose Death Penalty \(f_0 = 44\)
\(f_e = 20.64\)
(42.31%)
\(f_0 = 134\)
\(f_e = 157.36\)
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897


  • Random sample ✔️
  • Scores are independent ✔️
  • No expected cell frequencies below 5 ✔️

Step 2: State the hypotheses



\(H_0\): No association between race and attitudes toward death penalty in the population


\(H_a\): A real association between race and attitudes toward death penalty in the population

Step 3: Identify alpha and the critical value of chi-square


The default alpha level is \(0.05\) (\(0.01\) and \(0.001\) are tougher alternatives)


\(df\) \(= (R-1)(C-1) = (2-1)(2-1) =\) \(1\)


Now we can go to the chi-square distribution table to see what critical value is associated with an alpha level of 0.05 and 1 degree of freedom



Our critical value is \(3.84\)

Step 4: Calculate the test statistic


OBSERVED and EXPECTED Frequencies
Black White TOTAL
Support Death Penalty \(f_0 = 60\)
\(f_e = 83.36\)
(57.69%)
\(f_0 = 659\)
\(f_e = 635.64\)
(83.10%)
719
(80.16%)
Oppose Death Penalty \(f_0 = 44\)
\(f_e = 20.64\)
(42.31%)
\(f_0 = 134\)
\(f_e = 157.36\)
(16.90%)
178
(19.84%)
TOTAL 104
(100.00%)
793
(100.00%)
897


\(f_0\) \(f_e\) \((f_0 - f_e)\) \((f_0 - f_e)^2\) \(\frac{(f_0 - f_e)^2}{f_e}\)
Cell #1 60 83.36 -23.36 545.69 6.55
Cell #2 659 635.64 23.36 545.69 0.86
Cell #3 44 20.64 23.36 545.69 26.44
Cell #4 134 157.36 -23.36 545.69 3.47

\[ \chi^2 = \Sigma\frac{(f_0 - f_e)^2}{f_e} \]

\(\Sigma\) \(= 37.31\)

Step 5: Make a decision



Since the obtained chi-square (\(37.31\)) is greater than the critical value (\(3.84\)), I can reject the null hypothesis


This supports the research hypothesis that there IS a real association between race and attitudes toward the death penalty IN THE POPULATION

Break!

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.


In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25





80
Indifferrent 36





111
Likes


184
TOTAL 100 275 375

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.


In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25


55


80
Indifferrent 36


75


111
Likes 39


145


184
TOTAL 100 275 375

Fill in the missing observed frequencies

(note that once two cells are completed (and you have the marginals) you can complete the table)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.


In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25

(25.00%)
55

(20.00%)
80
Indifferrent 36

(36.00%)
75

(27.27%)
111
Likes 39

(39.00%)
145

(52.73%)
184
TOTAL 100 275 375

Add column percentages to better understand conditional distributions

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.


In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25

(25.00%)
55

(20.00%)
80
Indifferrent 36

(36.00%)
75

(27.27%)
111
Likes 39

(39.00%)
145

(52.73%)
184
TOTAL 100 275 375

Difference in conditional distributions indicate that there IS an association in the sample

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.


In this sample, is there an association between diet and feelings towards Thanksgiving? How do you know?

\(\frac{0.53}{0.39} = 1.\)\(36\)

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25

(25.00%)
55

(20.00%)
80
Indifferrent 36

(36.00%)
75

(27.27%)
111
Likes 39

(39.00%)
145

(52.73%)
184
TOTAL 100 275 375

Use risk ratios to quantify the association. For example, the probability of liking Thanksgiving is 36% higher for carnivores than for vegetarians

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25

(25.00%)
55

(20.00%)
80
Indifferrent 36

(36.00%)
75

(27.27%)
111
Likes 39

(39.00%)
145

(52.73%)
184
TOTAL 100 275 375

\(H_a\): IN THE POPULATION, there is an association
between diet and attitudes towards Thanksgiving.

\(H_0\): There is no association between diet and
attitudes towards Thanksgiving IN THE POPULATION.

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25
\(f_e = 21.33\)
(25.00%)
55
\(f_e = 58.67\)
(20.00%)
80
Indifferrent 36
\(f_e = 29.60\)
(36.00%)
75
\(f_e = 81.40\)
(27.27%)
111
Likes 39
\(f_e = 49.07\)
(39.00%)
145
\(f_e = 134.93\)
(52.73%)
184
TOTAL 100 275 375

Find EXPECTED FREQUENCIES:

\(f_e = \frac{\text{(row marginal)(column marginal)}}{n}\)

For example: \(f_e\) for Cell 5 = \(\frac{(184)(100)}{375} =\) \(49.07\)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25
\(f_e = 21.33\)
(25.00%)
55
\(f_e = 58.67\)
(20.00%)
80
Indifferrent 36
\(f_e = 29.60\)
(36.00%)
75
\(f_e = 81.40\)
(27.27%)
111
Likes 39
\(f_e = 49.07\)
(39.00%)
145
\(f_e = 134.93\)
(52.73%)
184
TOTAL 100 275 375

Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

\(f_0\) \(f_e\) \((f_0 - f_e)\) \((f_0 - f_e)^2\) \(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1 25 21.33 3.67 13.44 0.63
Cell 2 55 58.67 -3.67 13.44 0.23
Cell 3 36 29.60 6.40 40.96 1.38
Cell 4 75 81.40 -6.40 40.96 0.50
Cell 5 39 49.07 -10.07 101.34 2.07
Cell 6 145 134.93 10.07 101.34 0.75

Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Is this association statistically significant?

\[ \chi^2 = \Sigma\frac{(f_o - f_e)^2}{f_e} \]

\(f_0\) \(f_e\) \((f_0 - f_e)\) \((f_0 - f_e)^2\) \(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1 25 21.33 3.67 13.44 0.63
Cell 2 55 58.67 -3.67 13.44 0.23
Cell 3 36 29.60 6.40 40.96 1.38
Cell 4 75 81.40 -6.40 40.96 0.50
Cell 5 39 49.07 -10.07 101.34 2.07
Cell 6 145 134.93 10.07 101.34 0.75
\(\chi^2 =\) 5.56
  • Need to compare our obtained \(\chi^2\) value of 5.56 with the critical value of \(\chi^2\)
  • By default we use an alpha level of 0.05
  • \(df = (R-1)(C-1) = (3-1)(2-1) = 2\)

Expected frequencies reflect how the table would look if there were no association between the variables (i.e., if the null hypothesis were true)

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

\[ \text{obtained } \chi^2 = 5.56 \]

\(\text{critical value of }\)

\(\chi^2 = 5.99\)

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25
\(f_e = 21.33\)
(25.00%)
55
\(f_e = 58.67\)
(20.00%)
80
Indifferrent 36
\(f_e = 29.60\)
(36.00%)
75
\(f_e = 81.40\)
(27.27%)
111
Likes 39
\(f_e = 49.07\)
(39.00%)
145
\(f_e = 134.93\)
(52.73%)
184
TOTAL 100 275 375

Since obtained value of chi-square is LESS EXTREME than the critical value we FAIL TO REJECT THE NULL HYPOTHESIS. The association observed is NOT statistically significant. Cannot be confident that the association exists in the population.

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 25
\(f_e = 21.33\)
(25.00%)
55
\(f_e = 58.67\)
(20.00%)
80
Indifferrent 36
\(f_e = 29.60\)
(36.00%)
75
\(f_e = 81.40\)
(27.27%)
111
Likes 39
\(f_e = 49.07\)
(39.00%)
145
\(f_e = 134.93\)
(52.73%)
184
TOTAL 100 275 375

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Since the conditional distributions are the same, and do not match, there still appears to be an association IN THE SAMPLE.

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 50

(25.00%)
110

(20.00%)
160
Indifferrent 72

(36.00%)
150

(27.27%)
222
Likes 78

(39.00%)
290

(52.73%)
368
TOTAL 200 550 750

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

Since the conditional distributions are the same, and do not match, there still appears to be an association IN THE SAMPLE.

Feelings about Thanksgiving
Vegetarian Carnivore TOTAL
Dislikes 50
\(f_e = 42.67\)
(25.00%)
110
\(f_e = 117.33\)
(20.00%)
160
Indifferrent 72
\(f_e = 59.2\)
(36.00%)
150
\(f_e = 162.8\)
(27.27%)
222
Likes 78
\(f_e = -98.13\)
(39.00%)
290
\(f_e = 269.87\)
(52.73%)
368
TOTAL 200 550 750

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

\(f_0\) \(f_e\) \((f_0 - f_e)\) \((f_0 - f_e)^2\) \(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1 50 42.67 7.33 53.73 1.26
Cell 2 110 117.33 -7.33 53.73 0.46
Cell 3 72 59.2 12.8 163.84 2.77
Cell 4 150 162.8 -12.8 163.84 1.01
Cell 5 78 98.13 -20.13 405.22 4.13
Cell 6 290 269.87 20.13 405.22 1.5

What happens if the sample is doubled, with the same conditional distributions?

Practice

You collect data from a random sample of 375 individuals to look at whether feelings toward Thanksgiving differ by dietary preferences. The partial data are in the table to the right.

\[ \text{obtained } \chi^2 = 11.13 \]

\(\text{critical value of }\)

\(\chi^2 = 5.99\)

\(f_0\) \(f_e\) \((f_0 - f_e)\) \((f_0 - f_e)^2\) \(\frac{(f_0 - f_e)^2}{f_e}\)
Cell 1 50 42.67 7.33 53.73 1.26
Cell 2 110 117.33 -7.33 53.73 0.46
Cell 3 72 59.2 12.8 163.84 2.77
Cell 4 150 162.8 -12.8 163.84 1.01
Cell 5 78 98.13 -20.13 405.22 4.13
Cell 6 290 269.87 20.13 405.22 1.5
\(\chi^2 =\) 11.13

The obtained value of the chi-square goes way up. Now, the obtained value of chi-square \(\gt\) critical value of chi-square. The association observed in the sample IS statistically significant. We REJECT THE NULL HYPOTHESIS and find SUPPORT FOR THE ALTERNATIVE HYPOTHESIS that there is an association in the population.

What happens if the sample is doubled, with the same conditional distributions?

Chi-square and strength of an association

  • Size of chi-square (obtained) statistic is directly proportional to sample size
    • Double cell counts = double chi-square, regardless of strength of association in the sample
    • Cut cell counts by \(\frac{1}{4}\) = \(\frac{1}{4}\) reduction in chi-square, regardless of strength of association in the sample
  • Can have a large chi-square with a weak association if \(n\) is large
  • Hard to find statistically significant associations with small \(n\)
    • Note the key assumption of no expected frequencies below 5
  • Avoid drawing conclusions about strength of an association based on size of chi-square
    • Large chi-square = stronger confidence in inference, not strength

Homework