Multivariate regression

SOC 221 • Lecture 10

Victoria Sass

Monday, July 29, 2024

Multivariate regression

Where ARE WE?

  • Bivariate correlation and regression good for detecting and describing basic associations between two variables (e.g., X and Y)
  • Often want to bring in additional variables
    • Build case for causality
      • Control for alternative explanations for the main association
    • Develop better predictions
      • Predict Y as a function of multiple variables, not just X
    • Explain more
      • Use multiple variables to explain more of the variation in Y
  • Multivariate regression: A key tool for bringing additional variables into consideration

Bivariate regression


\(x\) is the value of \(X\) for the individual


\(b =\) slope of the line;
how much \(Y\) changes with each one-unit difference in \(X\)


\(\color{#000000}{\widehat{y}} = \color{#000000}{a} + \color{#000000}{b}\color{#000000}{x}\)


\(\widehat{y}\) is the predicted value of \(Y\)


\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\)

Bivariate regression


\(x\) is the value of \(X\) for the individual


\(b =\) slope of the line;
how much \(Y\) changes with each one-unit difference in \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#000000}{a} + \color{#000000}{b}\color{#000000}{x}\)


\(\widehat{y}\) is the predicted value of \(Y\)


\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\)

Bivariate regression


\(x\) is the value of \(X\) for the individual


\(b =\) slope of the line;
how much \(Y\) changes with each one-unit difference in \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#000000}{b}\color{#000000}{x}\)


\(\widehat{y}\) is the predicted value of \(Y\)


\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\)

Bivariate regression


\(x\) is the value of \(X\) for the individual


\(b =\) slope of the line;
how much \(Y\) changes with each one-unit difference in \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#e93cac}{b}\color{#000000}{x}\)


\(\widehat{y}\) is the predicted value of \(Y\)


\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\)

Bivariate regression


\(x\) is the value of \(X\) for the individual


\(b =\) slope of the line;
how much \(Y\) changes with each one-unit difference in \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#e93cac}{b}\color{#1b8883}{x}\)


\(\widehat{y}\) is the predicted value of \(Y\)


\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\)

Multivariate regression


\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)


\(\color{#000000}{\widehat{y}} = \color{#000000}{a} + \color{#000000}{b_1}\color{#000000}{x} + \color{#000000}{b_2}\color{#000000}{z}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\) and \(Z = 0\)

\(z\) is the value of \(Z\) for the individual

Multivariate regression


\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#000000}{a} + \color{#000000}{b_1}\color{#000000}{x} + \color{#000000}{b_2}\color{#000000}{z}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\) and \(Z = 0\)

\(z\) is the value of \(Z\) for the individual

Multivariate regression


\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#000000}{b_1}\color{#000000}{x} + \color{#000000}{b_2}\color{#000000}{z}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\) and \(Z = 0\)

\(z\) is the value of \(Z\) for the individual

Multivariate regression


\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#e93cac}{b_1}\color{#000000}{x} + \color{#000000}{b_2}\color{#000000}{z}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\) and \(Z = 0\)

\(z\) is the value of \(Z\) for the individual

Multivariate regression


\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#e93cac}{b_1}\color{#1b8883}{x} + \color{#000000}{b_2}\color{#000000}{z}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\) and \(Z = 0\)

\(z\) is the value of \(Z\) for the individual

Multivariate regression


\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#e93cac}{b_1}\color{#1b8883}{x} + \color{#690c48}{b_2}\color{#000000}{z}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\) and \(Z = 0\)

\(z\) is the value of \(Z\) for the individual

Multivariate regression


\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#e93cac}{b_1}\color{#1b8883}{x} + \color{#690c48}{b_2}\color{#6e8e13}{z}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X = 0\) and \(Z = 0\)

\(z\) is the value of \(Z\) for the individual

When we control for a variable, we are holding its effect constant (i.e., removing its influence)

Note that we are ADDING variables to the model in order to REMOVE their influence from the effects of \(X\).

When they are not in the model, we are ignoring their influence

Multivariate regression

\(x\) is the value of \(X\) for the individual

\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z_1\) and \(Z_k\)

\(b_2 =\) the effect of \(Z_1\) on \(Y\) after controlling for the effects of \(X\) and \(Z_k\)

\(b_k =\) the effect of \(Z_k\) on \(Y\) after controlling for the effects of \(X\) and \(Z_1\)


\(\color{#a68100}{\widehat{y}} = \color{#754cbc}{a} + \color{#e93cac}{b_1}\color{#1b8883}{x} + \color{#690c48}{b_2}\color{#6e8e13}{z} + \text{. . . } + \color{#bf5700}{b_k}\color{#808080}{z_k}\)


\(\widehat{y}\) is the predicted value of \(Y\)

\(a = Y-intercept\) (i.e., constant);
predicted value of \(Y\) when \(X\), \(Z_1\), and \(Z_k\) all equal \(0\)

\(z_1\) is the value of \(Z_1\) for the individual

\(z_k\) is the value of \(Z_k\) for the individual

Why multivariate regression?


  • Purpose 1: building the case for causality
    • Control for alternate explanations for the main association
  • Purpose 2: developing better predictions
    • Predict \(Y\) as a function of multiple variables, not just \(X\)
  • Purpose 3: explain more
    • Use multiple variables to explain more of the variation in \(Y\)

Purpose 1: building the case for causality


Causality:
A situation in which
one condition, event,
or process contributes
to the production of
another condition, event,
process, or state

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Is there an association between
ice cream sales and crime?

Purpose 1: building the case for causality


Causality:
A situation in which
one condition, event,
or process contributes
to the production of
another condition, event,
process, or state

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Is there an association between
ice cream sales and crime?

Yes! Differences in conditional distributions (column \(\%\)s) show a positive association between ice cream sales and crime.

Purpose 1: building the case for causality


Causality:
A situation in which
one condition, event,
or process contributes
to the production of
another condition, event,
process, or state

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Does this mean that ice cream sales cause crime to increase?

Purpose 1: building the case for causality


3 Criteria for Causality (what we need to show to claim that \(X\) causes \(Y\))

  1. There is a statistical association between \(X\) and \(Y\)
  2. \(X\) happens before \(Y\)
    • (correct time ordering)
  3. No other variable accounts for the statistical link between \(X\) and \(Y\)
Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Does this mean that ice cream sales cause crime to increase?

Purpose 1: building the case for causality


3 Criteria for Causality (what we need to show to claim that \(X\) causes \(Y\))

  1. There is a statistical association between \(X\) and \(Y\)
  2. \(X\) happens before \(Y\)
    • (correct time ordering)
  3. No other variable accounts for the statistical link between \(X\) and \(Y\)

Third criterion motivates us to control for other variables
(i.e., account for their effects; remove their influence)

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Does this mean that ice cream sales cause crime to increase?

Purpose 1: building the case for causality


3 Criteria for Causality (what we need to show to claim that \(X\) causes \(Y\))

  1. There is a statistical association between \(X\) and \(Y\)
  2. \(X\) happens before \(Y\)
    • (correct time ordering)
  3. No other variable accounts for the statistical link between \(X\) and \(Y\)

Third criterion motivates us to control for other variables
(i.e., account for their effects; remove their influence)

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Does this mean that ice cream sales cause crime to increase?

Or is the association spurious?

Purpose 1: building the case for causality


Spurious relationship:
A situation in which
two variables are
statistically associated
but not causally related,
due to their coincidence
of a certain third,
unseen factor.

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Does this mean that ice cream sales cause crime to increase?

Or is the association spurious?

Purpose 1: building the case for causality


Spurious relationship:
A situation in which
two variables are
statistically associated
but not causally related,
due to their coincidence
of a certain third,
unseen factor.

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Certain values on one variable (\(Y\)) tend to correspond with certain values on the other (\(X\)) because both are affected by a third outside variable.

Purpose 1: building the case for causality

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Certain values on one variable (\(Y\)) tend to correspond with certain values on the other (\(X\)) because both are affected by a third outside variable.

Purpose 1: building the case for causality

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

Certain values on one variable (\(Y\)) tend to correspond with certain values on the other (\(X\)) because both are affected by a third outside variable.

Note: Weather might create a link between ice cream sales and crime. Failing to control for weather might make the association look causal even though that is false.

Purpose 1: building the case for causality

Bivariate table (all months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(83.3%)

4
(33.3%)

24
(66.7%)

High violent crime

4
(16.7%)

8
(66.7%)

12
(33.3%)

Total

24

12

36

To control for weather, we look at the association between ice cream sales and crime among cases (months) with similar weather.

Any association between ice cream sales and crime among cases (months) with similar weather cannot be attributed to the effects of differences in weather.

Purpose 1: building the case for causality

Bivariate table (WARM months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

0
(0.0%)

0
(0.0%)

0
(0.0%)

High violent crime

3
(100.0%)

8
(100.0%)

11
(100.0%)

Total

3

8

11

Is there an association between ice cream sales and crime?

No differences in conditional distributions (column %s), indicating that there is no association between ice cream sales and crime among warm months.

To control for weather, we look at the association between ice cream sales and crime among cases (months) with similar weather.

Any association between ice cream sales and crime among cases (months) with similar weather cannot be attributed to the effects of differences in weather.

Purpose 1: building the case for causality

Bivariate table (COOL months):
Violent crime by ice cream sales
Low ice cream sales High ice cream sales Total

Low violent crime

20
(95.2%)

4
(100.0%)

24
(96.0%)

High violent crime

1
(4.8%)

0
(0.0%)

1
(4.0%)

Total

21

4

25

Is there an association between ice cream sales and crime?

Little differences in conditional distributions (column %s), indicating that there is little association between ice cream sales and crime among cool months.

To control for weather, we look at the association between ice cream sales and crime among cases (months) with similar weather.

Any association between ice cream sales and crime among cases (months) with similar weather cannot be attributed to the effects of differences in weather.

Purpose 1: building the case for causality

Summary of logic

  • To build the case that \(X\) causes \(Y\), we have to show that no other variable accounts for the statistical association between \(X\) and \(Y\) (association is not spurious)
  • We need to see if the association between \(X\) and \(Y\) persists even after we control (remove the influence of) other explanatory variables (for example, \(Z\))
  • If the association between \(X\) and \(Y\) goes way when we control for \(Z\) then the association was probably spurious (i.e., false)
  • The part of the association between \(X\) and \(Y\) that remains after we control for \(Z\) MIGHT represent the causal effect of \(X\) on \(Y\) (but still other potential explanations)

Purpose 1: building the case for causality

\(b_1\) and \(b_2\) are called PARTIAL SLOPES


PARTIAL SLOPES:
Slope coefficient from a multivariate regression model, representing the effects of one predictor on the dependent variable while controlling for all other predictors in the model.


Multivariate regression is a convenient tool for assessing the association between two variables (\(X \text{ and } Y\)) while controlling for others (\(Z\)))


\(\color{#000000}{\widehat{y}} = \color{#000000}{a} + \color{#e93cac}{b_1}\color{#000000}{x} + \color{#690c48}{b_2}\color{#000000}{z}\)


\(b_1 =\) the effect of \(X\) on \(Y\) after controlling for the effects of \(Z\)

\(b_2 =\) the effect of \(Z\) on \(Y\) after controlling for the effects of \(X\)

Purpose 1: building the case for causality

BIVARIATE SLOPE: Association between \(X\) and \(Y\)

\[ b = r_{yx}(\frac{s_y}{s_x}) \]

PARTIAL SLOPE: Association between \(X\) and \(Y\) after controlling for \(Z\)

\[ b_1 = (\frac{r_{yx}-r_{yz}r_{xz}}{1-r^2_{xz}})(\frac{s_y}{s_x}) \]

\(r_{xz}\)

\(r_{yz}\)

\(r_{yx}\)

Purpose 1: building the case for causality

BIVARIATE SLOPE: Association between \(X\) and \(Y\)

\[ b = r_{yx}(\frac{s_y}{s_x}) \]

PARTIAL SLOPE: Association between \(X\) and \(Y\) after controlling for \(Z\)

\[ b_1 = (\frac{\color{#4b2e83}{r_{yx}}-\color{#85754d}{r_{yz}r_{xz}}}{1-r^2_{xz}})(\frac{s_y}{s_x}) \]

Looking at the correlation
between \(X\) and \(Y\)

… and subtracting out
the common connection of
these variables to Z

\(r_{xz}\)

\(r_{yz}\)

\(r_{yx}\)

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + 1582.58(educ)\)

ASSOCIATION:
Each additional year of education
is associated with $1,582.58
in additional income


Does this mean that education causes income to increase?


Or is the association at least partly spurious?

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + 1582.58(educ)\)

ASSOCIATION:
Each additional year of education
is associated with $1,582.58
in additional income


Does this mean that education causes income to increase?


Or is the association at least partly spurious?

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + 1582.58(educ)\)

\[ b = r_{yx}(\frac{s_y}{s_x}) \]



Look at the association
between education and income,
subtracting out their
common connection to parental wealth


\[ b_1 = (\frac{\color{#4b2e83}{r_{yx}}-\color{#85754d}{r_{yz}r_{xz}}}{1-r^2_{xz}})(\frac{s_y}{s_x}) \]


Or is the association at least partly spurious?

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + 1582.58(educ)\)

ASSOCIATION:
Each additional year of education
is associated with $1,582.58
in additional income


Multivariate regression model
\(\widehat{Income} = 799.22 + 623.35(educ) + 0.31(\text{par_wealth})\)

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + 1582.58(educ)\)

ASSOCIATION:
Each additional year of education
is associated with $1,582.58
in additional income


Multivariate regression model
\(\widehat{Income} = 799.22 + \color{#6e8e13}{623.35}(educ) + 0.31(\text{par_wealth})\)

\(b_1\): Controlling for parental wealth, each additional year of education is associated with \(\$623.35\) in additional income

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + 1582.58(educ)\)

ASSOCIATION:
Each additional year of education
is associated with $1,582.58
in additional income


Multivariate regression model
\(\widehat{Income} = 799.22 + \color{#6e8e13}{623.35}(educ) + \color{#754cbc}{0.31}(\text{par_wealth})\)

\(b_1\): Controlling for parental wealth, each additional year of education is associated with \(\$623.35\) in additional income

\(b_2\): Controlling for education, each additional dollar of parental wealth is associated with \(\$0.31\) in additional income

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + 1582.58(educ)\)

ASSOCIATION:
Each additional year of education
is associated with $1,582.58
in additional income


Multivariate regression model
\(\widehat{Income} = \color{#a68100}{799.22} + \color{#6e8e13}{623.35}(educ) + \color{#754cbc}{0.31}(\text{par_wealth})\)

\(a\): A person with 0 years of education and 0 dollars of parental wealth is predicted to have an income \(\$799.22\)

\(b_1\): Controlling for parental wealth, each additional year of education is associated with \(\$623.35\) in additional income

\(b_2\): Controlling for education, each additional dollar of parental wealth is associated with \(\$0.31\) in additional income

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + \color{#6e8e13}{1582.58}(educ)\)


Multivariate regression model
\(\widehat{Income} = 799.22 + \color{#6e8e13}{623.35}(educ) + 0.31(\text{par_wealth})\)

Apparent association between education and income is reduced after controlling for parental wealth.

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + \color{#6e8e13}{1582.58}(educ)\)


Multivariate regression model
\(\widehat{Income} = 799.22 + \color{#6e8e13}{623.35}(educ) + 0.31(\text{par_wealth})\)

Apparent association between education and income is reduced after controlling for parental wealth.

The association between education and income appears to be at least PARTIALLY spurious.

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + \color{#6e8e13}{1582.58}(educ)\)


Multivariate regression model
\(\widehat{Income} = 799.22 + \color{#6e8e13}{623.35}(educ) + \color{#754cbc}{0.31}(\text{par_wealth})\)

Apparent association between education and income is reduced after controlling for parental wealth.

The association between education and income appears to be at least PARTIALLY spurious.

Part of the basic association between education and income is due to the fact that they are both affected by parental wealth.

Purpose 1: building the case for causality

Another example: Does education have a causal effect on income?

Bivariate regression model
\(\widehat{Income} = 1816.16 + \color{#6e8e13}{1582.58}(educ)\)

Next step: Add other controls that might account for the remaining association.


Multivariate regression model
\(\widehat{Income} = 799.22 + \color{#6e8e13}{623.35}(educ) + \color{#754cbc}{0.31}(\text{par_wealth})\)

Apparent association between education and income is reduced after controlling for parental wealth.

The association between education and income appears to be at least PARTIALLY spurious.

Part of the basic association between education and income is due to the fact that they are both affected by parental wealth.

Why multivariate regression?


  • Purpose 1: building the case for causality
    • Control for alternate explanations for the main association
  • Purpose 2: developing better predictions
    • Predict \(Y\) as a function of multiple variables, not just \(X\)
  • Purpose 3: explain more
    • Use multiple variables to explain more of the variation in \(Y\)

Purpose 2: developing better predictions

Can use the multivariate regression model to predict values of the dependent variable for any combination of values of the predictors.

\[ \widehat{Income} = 799.22 + 623.35(educ) + 0.31(\text{par_wealth}) \]

Example: What is the predicted level of income for a person with 14 years of education and whose parents have total wealth of $250,000?

\[ \begin{aligned} \widehat{Income} &= 799.22 + 623.35(\color{#e93cac}{14}) + 0.31(\color{#e93cac}{250000}) \\ &= \$87,025.12 \end{aligned} \]

Example: What is the predicted level of income for a person with 16 years of education and whose parents have total wealth of $50,000?

\[ \begin{aligned} \widehat{Income} &= 799.22 + 623.35(\color{#1b8883}{16}) + 0.31(\color{#1b8883}{50000}) \\ &= \$26,271.82 \end{aligned} \]

Why multivariate regression?


  • Purpose 1: building the case for causality
    • Control for alternate explanations for the main association
  • Purpose 2: developing better predictions
    • Predict \(Y\) as a function of multiple variables, not just \(X\)
  • Purpose 3: explain more
    • Use multiple variables to explain more of the variation in \(Y\)

Purpose 3: explain more

Multivariate regression allows for more complete explanation of the variation in the dependent variable.

Bivariate regression model

\(\widehat{Income} = 1816.16 + 1582.58(educ)\)


Coefficient of determination
(\(R^2\))

\[ R_2 = 0.728 \]

Interpretations:

  1. The proportional reduction of prediction error achieved by using the independent variable (\(X\)) in the regression equation to predict the values of the dependent variable (\(Y\))
  2. The proportion of the variance in the dependent variable (\(Y\)) explained by the independent variable (\(X\))

Purpose 3: explain more

Multivariate regression allows for more complete explanation of the variation in the dependent variable.

Bivariate regression model

\(\widehat{Income} = 1816.16 + 1582.58(educ)\)


Coefficient of determination
(\(R^2\))

\[ R_2 = 0.728 \]

Interpretations:

  1. We can reduce our errors by almost 73% if we take into consideration individuals’ education when we predict their income (i.e., use the regression line in our predictions).
  2. Almost 73% of the variance in income is explained by differences in education.

Purpose 3: explain more

Multivariate regression allows for more complete explanation of the variation in the dependent variable.

Multivariate regression model

\(\widehat{Income} = 799.22 + 623.35(educ) + 0.31(\text{par_wealth})\)


Coefficient of MULTIPLE determination
(\(R^2\))

\[ R_2 = 0.813 \]

Interpretations:

  1. The proportional reduction of prediction error achieved by using all of the predictors (\(X\), \(Z\), etc.) in the regression equation to predict the values of the dependent variable (\(Y\))
  2. The proportion of the variance in the dependent variable (\(Y\)) explained by the combination of all of the predictors (\(X\), \(Z\), etc.)

Purpose 3: explain more

Multivariate regression allows for more complete explanation of the variation in the dependent variable.

Multivariate regression model

\(\widehat{Income} = 799.22 + 623.35(educ) + 0.31(\text{par_wealth})\)


Coefficient of MULTIPLE determination
(\(R^2\))

\[ R_2 = 0.813 \]

Interpretations:

  1. We can reduce our errors by about 81% if we take into consideration individuals’ education and the wealth of their parents when we predict their income (i.e., use the regression line in our predictions).
  2. About 81% of the variance in income is explained by the combination of education and parental wealth.

Break!

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Title indicating the dependent variable, units of analysis.

Number of cases

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Slope coefficient for effect of
education on income.

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Slope coefficient for effect of
education on income.

Statistical significance of the slope coefficient

  • Testing the null hypothesis (\(H_0\)) that \(\beta = 0\) (that the slope is zero in the population)
  • Number of stars indicates p-value for coefficient (probability of observing the sample coefficient if \(H_0\) where true).
  • Reject \(H_0\) if p-value < alpha
  • Given ***, our model rejects \(H_0\) with \(99.9\%\) confidence (alpha of \(0.001\))

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Slope coefficient for effect of
education on income.

Y-intercept (a.k.a, constant)

\[ \widehat{Income} = 1816.16 + 1582.58(educ) \]

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Slope coefficient for effect of
education on income.

Y-intercept (a.k.a, constant)

\[ \widehat{Income} = 1816.16 + 1582.58(educ) \]

Coefficient of determination

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Partial slope coefficient for
effect of education on income.

Partial slope coefficient for
effect of parental wealth on income.

Y-intercept (a.k.a, constant)

\[ \widehat{Income} = 799.22 + 623.35(educ) + 0.31(\text{par_wealth}) \]

Coefficient of determination

Regression tables

Results of OLS regression models predicting annual income ($s)
Individual-level data
Model 1 stars1 Model 2 stars2
Predictor

Education

1582.58

***

623.35

*

Parental wealth

0.31

**

Constant

1816.16

799.22

Model \(R^2\)

0.73

0.81

N = 12
* p < 0.05   ** p < 0.01   *** p < 0.001

Same results in table form

Statistical significance of the slope coefficients

  • Testing the null hypothesis (\(H_0\)) that \(\beta = 0\) (that the slope is zero in the population)
  • Number of stars indicates p-value for coefficient (probability of observing the sample coefficient if \(H_0\) where true).
  • Reject \(H_0\) if p-value < alpha
  • Given * for the partial slope coefficient of education, our multivariate model rejects \(H_0\) with \(95\%\) confidence (alpha of \(0.05\))
  • Given ** for the partial slope coefficient of parental wealth, our multivariate model rejects \(H_0\) with \(99\%\) confidence (alpha of \(0.01\))

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. According to the results in Model 1, is the level of crime in the neighborhood associated with the concentration of minorities in the area? Make sure to touch on the direction, magnitude, and statistical significance of the association.
  2. How much of the variation in crime rates across neighborhoods can be explained by the racial composition of the neighborhood alone?

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. According to the results in Model 1, is the level of crime in the neighborhood associated with the concentration of minorities in the area? Make sure to touch on the direction, magnitude, and statistical significance of the association.
  • Direction: positive slope coefficient
    = positive association
  • Magnitude: A difference of one percentage point in the concentration of minorities is associated with an increase of 0.327 crimes per 100,000
  • Statistical significance: Coefficient is statistically significant even if we use alpha of \(0.001\). Therefore, the association in the sample is large enough to convince us that the association likely exists in the population of neighborhoods.

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. How much of the variation in crime rates across neighborhoods can be explained by the racial composition of the neighborhood alone?

Coefficient of determination indicates that about \(1.5\%\) of the variation in crime rates can be explained by the racial composition.

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. According to the results in Model 2, how do the local poverty rate and concentration of minorities in the neighborhood affect local crime rates? Make sure to touch on the direction, magnitude, and statistical significance of the effects.
  2. How would you explain the dramatic change in the slope coefficient for % Minority from Model 1 to Model 2? What does this change in coefficients imply about the causal effect of %Minority on crime rates?
  3. What is the predicted level of crime in a neighborhood in which 30% of the residents are poor and 60% are members of minority groups? How does this compare to the predicted crime rate in a neighborhood in which 30% of residents are poor and none are members of minority groups?

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. According to the results in Model 2, how do the local poverty rate and concentration of minorities in the neighborhood affect local crime rates? Make sure to touch on the direction, magnitude, and statistical significance of the effects.

Controlling for % poverty, a one-unit increase in the local % minority is associated with an decrease in crime of \(0.247\) crimes per 100,000, and this effect is strong enough to convince us that there is an association in the population.

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. According to the results in Model 2, how do the local poverty rate and concentration of minorities in the neighborhood affect local crime rates? Make sure to touch on the direction, magnitude, and statistical significance of the effects.

Controlling for % minority, a one-unit increase in the local poverty rate is associated with an increase in crime of \(2.095\) crimes per 100,000, and this effect is strong enough to convince us that there is an association in the population.

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. How would you explain the dramatic change in the slope coefficient for % Minority from Model 1 to Model 2? What does this change in coefficients imply about the causal effect of % Minority on crime rates?

Indicates that the positive association between % Minority and crime observed in Model 1 was not causal; it is actually due to the higher level of poverty in minority neighborhoods. Accounting for the poverty rate shows that higher minority concentrations are actually associated
with lower crime.

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001
  1. What is the predicted level of crime in a neighborhood in which 30% of the residents are poor and 60% are members of minority groups? How does this compare to the predicted crime rate in a neighborhood in which 30% of residents are poor and none are members of minority groups?
  • \(\widehat{crime} = 48.216 - 0.247(\text{% Minority}) + 2.095(\text{% Poverty})\)
  • 30% poverty, 60% minority: \(\widehat{crime} = 48.216 - 0.247(60) + 2.095(30) = 96.246\)
  • 30% poverty, 0% minority: \(\widehat{crime} = 48.216 - 0.247(0) + 2.095(30) = 111.066\)

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001

BONUS: What’s the interpretation for our Y-intercept?

A neighborhood in which \(\text{% Minority} = 0\) and \(\text{% Poverty} = 0\) is predicted to have a crime rate of \(48.216\) crimes per 100,000 population.

Practice

Results of OLS regression models predicting crimes per 100,000 population in U.S. neighborhoods
Model 1 stars1 Model 2 stars2
Predictor

% Minority

0.327

***

-0.247

***

% Poverty

2.095

***

Constant

61.284

***

48.216

***

Model \(R^2\)

0.015

0.071

N = 6,935
* p < 0.05   ** p < 0.01   *** p < 0.001

BONUS BONUS: What’s the interpretation for our coefficient of multiple determination?

Coefficient of determination indicates that about \(7.1\%\) of the variation in crime rates can be explained by the racial composition and poverty combined.

Homework