| Person # | birth_year | gender | ethnicity | race |
|---|---|---|---|---|
| 1 | 1992 | 1 | 1 | 5 |
| 2 | 1993 | 2 | 0 | 5 |
| 3 | 1995 | 2 | 0 | 2 |
| 4 | 1980 | 2 | 0 | 3 |
| 5 | 1991 | 1 | 1 | 5 |
| 6 | 1975 | 4 | 1 | 1 |
| 7 | 1960 | 1 | 0 | 6 |
| 8 | 1952 | 1 | 0 | 5 |
| 9 | 2000 | 3 | 0 | 3 |
| 10 | 1990 | 1 | 1 | 2 |
| 11 | 1993 | 2 | 1 | 4 |
| 12 | 1992 | 3 | 0 | 4 |
SOC 221 • Lecture 2
Wednesday, June 25, 2025
Apply some STATISTICS to answer interesting questions…
Descriptive statistics
Procedures that help us organize
summarize, and describe the
distribution of a variable.
Distribution
The list of the entire set of observed
values of a variable, indicating all values
and how often those values occur.
| Person # | birth_year | gender | ethnicity | race |
|---|---|---|---|---|
| 1 | 1992 | 1 | 1 | 5 |
| 2 | 1993 | 2 | 0 | 5 |
| 3 | 1995 | 2 | 0 | 2 |
| 4 | 1980 | 2 | 0 | 3 |
| 5 | 1991 | 1 | 1 | 5 |
| 6 | 1975 | 4 | 1 | 1 |
| 7 | 1960 | 1 | 0 | 6 |
| 8 | 1952 | 1 | 0 | 5 |
| 9 | 2000 | 3 | 0 | 3 |
| 10 | 1990 | 1 | 1 | 2 |
| 11 | 1993 | 2 | 1 | 4 |
| 12 | 1992 | 3 | 0 | 4 |
Each column of our dataset contains the distribution of a different variable
> [1] 5 8 3 6 5 8 3 7 9 10 6 8 7 4 6 5
> [17] 8 7 4 6 5 3 6 1 8 6 8 5 6 6 8 9
> [33] 6 9 6 5 9 9 6 5 8 5 8 4 8 6 7 4
> [49] 8 6 8 7 6 8 5 7 6 9 6 10 4 8 6 8
> [65] 6 5 4 7 8 5 7 4 5 5 9 6 6 7 5 3
> [81] 10 3 5 7 9 10 6 7 4 10 6 9 8 7 8 5
> [97] 7 4 10 6 7 4 2 3 8 5 4 7 7 7
| 0 | |
| 1 | / |
| 2 | / |
| 3 | ///// / |
| 4 | ///// //// / |
| 5 | ///// ///// ///// // |
| 6 | ///// ///// ///// ///// /// |
| 7 | ///// ///// ///// // |
| 8 | ///// ///// ///// //// |
| 9 | ///// //// |
| 10 | ///// / |
| Movie Rating (score) |
Frequency (f) |
|||
|---|---|---|---|---|
| 0 | 0 | |||
| 1 | 1 | |||
| 2 | 1 | |||
| 3 | 6 | |||
| 4 | 11 | |||
| 5 | 17 | |||
| 6 | 23 | |||
| 7 | 17 | |||
| 8 | 19 | |||
| 9 | 9 | |||
| 10 | 6 | |||
| Total (N) | 110 |
Descriptive title
First column: all possible values (sometimes labeled with a variable symbol, e.g. x )
Second column: number of times each value appears in the data
N = total number of cases in the data
| Movie Rating (score) |
Frequency (f) |
Percent (%) |
||
|---|---|---|---|---|
| 0 | 0 | 0 | ||
| 1 | 1 | 0.9 | ||
| 2 | 1 | 0.9 | ||
| 3 | 6 | 5.4 | ||
| 4 | 11 | 10 | ||
| 5 | 17 | 15.5 | ||
| 6 | 23 | 20.9 | ||
| 7 | 17 | 15.5 | ||
| 8 | 19 | 17.3 | ||
| 9 | 9 | 8.2 | ||
| 10 | 6 | 5.4 | ||
| Total (N) | 110 | 100 |
\[ Pct_x = \frac{f_x}{N} (100) \]
\[ Pct_0 = \frac{0}{110} (100) \]
\[ Pct_5 = \frac{17}{110} (100) \]
Percentages always add to 100
| Movie Rating (score) |
Frequency (f) |
Percent (%) |
||
|---|---|---|---|---|
| 0 | 0 | 0 | ||
| 1 | 1 | 0.9 | ||
| 2 | 1 | 0.9 | ||
| 3 | 6 | 5.4 | ||
| 4 | 11 | 10 | ||
| 5 | 17 | 15.5 | ||
| 6 | 23 | 20.9 | ||
| 7 | 17 | 15.5 | ||
| 8 | 19 | 17.3 | ||
| 9 | 9 | 8.2 | ||
| 10 | 6 | 5.4 | ||
| Total (N) | 110 | 100 |
Comparing relative frequency of different values \[ Ratio = \frac{Pct_x}{Pct_y} = \frac{f_x}{f_y} \]
Ratio of 5s to 10s: \[ \frac{15.5}{5.4} = 2.87 \]
| Movie Rating (score) |
Frequency (f) |
Percent (%) |
|---|---|---|
| 0 | 0 | 0.0 |
| 1 | 0 | 0.0 |
| 2 | 0 | 0.0 |
| 3 | 1 | 2.0 |
| 4 | 3 | 6.0 |
| 5 | 5 | 10.0 |
| 6 | 9 | 18.0 |
| 7 | 10 | 20.0 |
| 8 | 12 | 24.0 |
| 9 | 6 | 12.0 |
| 10 | 4 | 8.0 |
| Total (N) | 50 | 100 |
| Frequency (f) |
Percent (%) |
|---|---|
| 0 | 0.0 |
| 1 | 1.7 |
| 1 | 1.7 |
| 5 | 8.3 |
| 8 | 13.3 |
| 12 | 20.0 |
| 14 | 23.3 |
| 7 | 11.7 |
| 7 | 11.7 |
| 3 | 5.0 |
| 2 | 3.3 |
| 60 | 100 |
Gender ratio of 10s: \[ \frac{8.0}{3.3} = 2.42 \]
| Movie Rating (score) |
Frequency (f) |
Percent (%) |
Cf |
|
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | |
| 1 | 1 | 0.9 | 1 | |
| 2 | 1 | 0.9 | 2 | |
| 3 | 6 | 5.4 | 8 | |
| 4 | 11 | 10 | 19 | |
| 5 | 17 | 15.5 | 36 | |
| 6 | 23 | 20.9 | 59 | |
| 7 | 17 | 15.5 | 76 | |
| 8 | 19 | 17.3 | 95 | |
| 9 | 9 | 8.2 | 104 | |
| 10 | 6 | 5.4 | 110 | |
| Total (N) | 110 | 100 |
Cumulative frequency (Cf) =
Number of cases in the given category and lower categories
Cf in highest category = N
| Movie Rating (score) |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0.9 | 1 | 0.9 |
| 2 | 1 | 0.9 | 2 | 1.8 |
| 3 | 6 | 5.4 | 8 | 7.2 |
| 4 | 11 | 10 | 19 | 17.2 |
| 5 | 17 | 15.5 | 36 | 32.7 |
| 6 | 23 | 20.9 | 59 | 53.6 |
| 7 | 17 | 15.5 | 76 | 69.1 |
| 8 | 19 | 17.3 | 95 | 86.4 |
| 9 | 9 | 8.2 | 104 | 94.6 |
| 10 | 6 | 5.4 | 110 | 100 |
| Total (N) | 110 | 100 |
Cumulative
percentage (C%) =
Percentage of cases in the given category and lower categories
\[ C\%_x = \frac{Cf_x}{N} (100) \]
\[ C\%_4 = \frac{19}{110} (100) \]
C% in highest category = 100
| Number of courses |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| 6 | 2 | |||
| 5 | 4 | |||
| 4 | 8 | |||
| 3 | 24 | |||
| 2 | 26 | |||
| 1 | 18 | |||
| 0 | 3 | |||
| Total (N) | 85 |
| Number of courses |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| 6 | 2 | 2.35 | 85 | 100.00 |
| 5 | 4 | 4.71 | 83 | 97.65 |
| 4 | 8 | 9.41 | 79 | 92.94 |
| 3 | 24 | 28.24 | 71 | 83.53 |
| 2 | 26 | 30.59 | 47 | 55.29 |
| 1 | 18 | 21.18 | 21 | 24.71 |
| 0 | 3 | 3.53 | 3 | 3.53 |
| Total (N) | 85 | 100 |
Major |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| Arts | 61 | |||
| Humanities | 385 | |||
| Social Science | 425 | |||
| Business | 265 | |||
| Physical Sciences | 194 | |||
| Other | 10 | |||
| Total (N) | 1340 |
Level of measurement?
Major |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| Arts | 61 | 4.55 | ||
| Humanities | 385 | 28.73 | ||
| Social Science | 425 | 31.72 | ||
| Business | 265 | 19.78 | ||
| Physical Sciences | 194 | 14.48 | ||
| Other | 10 | 0.75 | ||
| Total (N) | 1340 | 100 |
Frequency or % “above” or “below” makes no sense without inherent ordering
Response |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| Very Dissatisfied | 7 | 9.33 | 7 | 9.33 |
| Somewhat Dissatisfied | 22 | 29.33 | 29 | 38.67 |
| Somewhat Satisfied | 20 | 26.67 | 49 | 65.33 |
| Very Satisfied | 8 | 10.67 | 57 | 76.0 |
| No Answer | 18 | 24 | 75 | 100.00 |
| Total (N) | 75 | 100 |
Response |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| Very Dissatisfied | 7 | 9.33 | 7 | 9.33 |
| Somewhat Dissatisfied | 22 | 29.33 | 29 | 38.67 |
| Somewhat Satisfied | 20 | 26.67 | 49 | 65.33 |
| Very Satisfied | 8 | 10.67 | 57 | 76.0 |
| No Answer | 18 | 24 | 75 | 100.00 |
| Total (N) | 75 | 100 |
Question: Was a majority at least somewhat dissatisfied with the course (somewhat dissatisfied or worse)?
Response |
Frequency (f) |
Percent (%) |
Cf |
CPcnt |
|---|---|---|---|---|
| Very Dissatisfied | 7 | 12.28 | 7 | 12.28 |
| Somewhat Dissatisfied | 22 | 38.60 | 29 | 50.88 |
| Somewhat Satisfied | 20 | 35.09 | 49 | 85.96 |
| Very Satisfied | 8 | 14.04 | 57 | 100 |
| No Answer | 18 | 24 | 75 | 100.00 |
| Valid (N) | 57 | 100 |
Question: Was a majority at least somewhat dissatisfied with the course (somewhat dissatisfied or worse)?
Best practice: Calculate all statistics based on information from those cases with valid (known) information on the variables of interest.
The following table contains the total number of deaths worldwide as a result of earthquakes for the period from 2000 to 2012.
| Year | Total Number of Deaths |
|---|---|
| 2000 | 231 |
| 2001 | 21,357 |
| 2002 | 11,685 |
| 2003 | 33,819 |
| 2004 | 228,802 |
| 2005 | 88,003 |
| 2006 | 6,605 |
| 2007 | 712 |
| 2008 | 88, 011 |
| 2009 | 1,790 |
| 2010 | 320,120 |
| 2011 | 21,953 |
| 2012 | 768 |
| Total (N) | 823,856 |
Answer the following questions:
Graphical Techniques
Descriptive tools to
display the distribution
of variables, or the
association between
variables, using
pictures or figures.


Categories do not necessarily need to add up to 100%



What does this chart imply about divorce rates?

Note the difference in the X and Y axis.


Graphic Presented by Congressman Jason Chaffetz (R-UT) During Meeting About Planned Parenthood Funding. September 29, 2015




If you need help sourcing bad data visualizations, try the subreddit r/dataisugly
If you need help sourcing good data visualizations, try the subreddit r/dataisbeautiful