Visualizing Labor Market Trends

Ten years after the onset of the Great Recession, the national labor market is humming, with the official unemployment rate hovering around 4 percent and wage growth slowly accelerating. However, the rosy national figures mask important variation in labor market trends across industries, occupations, demographic groups, and geographic areas. This variation should matter to policymakers who are concerned about the extent to which different sectors and groups have actually recovered, and the degree to which economic growth is broadly shared.

These visualizations aim to examine how the recession impacted these different groups as well as the extent to which the recovery is proceeding (or not) by looking at the pattern of job losses and job gains during and after the recession. I use data from a few different U.S. Bureau of Labor Statistics surveys, made easily accessible thanks to the excellent blscrapeR package.

How did the recession affect different industries?

One of the immediate causes of the recession was a crash in the U.S. housing and financial sectors due to mass defaults on sub-prime home mortgages that had been collateralized and sold as “safe” assets. The economic pain soon spread well beyond these two industries, but there was substantial variation in how different industries experience the recession and how long they took to recover. The chart below illustrates that the construction sector was hit much harder than most, while education and health services continued to grow jobs even as the nation overall shed them. Meanwhile, mining and logging followed an entirely different trajectory - likely in response to the shale boom.

industry_colors <- c("Education and health services" = lt_green, 
                     "Mining and logging" = lt_orange,
                     "Construction" = lt_pink,
                     "All industries" = lt_blue, 
                     "Other industries" = "gray50")

source("scripts/set_theme.R")

# plot indexed employment in selected industries
ggplot(index_naics2, aes(x = date, y = index_jobs * 100, group = industry_name,
                          color = selected_industry, alpha = selected)) +
  geom_rect(aes(xmin = start08, xmax = end08,
                ymin = 60, ymax = 130),
            fill = "gray85", color = "gray85", alpha = 0.75) +
  geom_path(size = 0.75) +
  scale_color_manual(values = industry_colors, name = "") +
  scale_alpha(range = c(0.5, 1), guide = "none") +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y",
               limits = c(min(index_naics2$date), max(index_naics2$date))) + 
  lt_theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = c(0.48, .1),
        legend.direction = "horizontal",
        legend.text = element_text(size = rel(1.05)),
        axis.text = element_text(size =  rel(1.05)),
        axis.title = element_text(size =  rel(1.1))) +
  labs(title = "Education and health services boomed while construction busted and mining and logging fluctuated",
       subtitle = "Employment by industry, indexed to January 2007",
       x = "Year", y = "Indexed employment (Jan. 2007 = 100)",
       caption = "Source: Bureau of Labor Statistics Current Employment Statistics")

Which industries have grown the fastest since the recession?

It wasn’t until April of 2014 that the economy returned to its pre-recession employment level. But I wanted to know whether the recession might have caused changes in the economy’s composition; that is, what kind of jobs have we grown? The chart below shows that job growth was uneven across sectors. Worryingly, employment in sectors where average hourly earnings are below the figure for all private jobs grew faster than employment in higher-earnings sectors. For example, average earnings in the fast-growing leisure and hospitality sector were $16 per hour, compared with close to $30 per hour in higher-paying but still-recovering construction sector. In fact, the only higher-earning industry that saw a larger percentage increase than the economy as a whole was professional services.

source("scripts/set_theme.R")

ggplot(filter(industry_growth, !is.na(industry_code)),
       aes(x = reorder(industry_name, pct_change),
           y = pct_change,
           fill = earncat)) +
  geom_col() +
  scale_fill_manual(values = c(lt_blue, lt_pink)) +
  geom_text(aes(label = round(pct_change * 100, 1)), hjust = 0.5, 
            vjust = 1, nudge_y = -0.005, family = "Cabin") +
  scale_y_continuous(labels = scales::percent) +
  scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 15)) +
  facet_grid(cols = vars(earncat), scales = "free", space = "free") +
  geom_hline(aes(yintercept = nonfarm_growth), color = "gray60",
             linetype = "dashed") +
  geom_text(data = ann_text, label = "Growth rate across all industries",
            hjust = "left", nudge_x = -0.5, nudge_y = 0.005, family = "Cabin") +
  labs(title = "The fastest-growing industries have had lower average earnings",
       subtitle = "Growth rate (%) in employment by industry, 2007-2018",
       x = "Industry", y = "Growth rate (% change) in employment",
       caption = "Source: Bureau of Labor Statistics Current Employment Statistics
       Note: High-and low-earning categories based on comparing an industry's average hourly earnings to average hourly earnings for all private jobs.")+
  lt_theme(axis.text.x = element_text(angle = 45, hjust = 1, size = rel(1.1)),
           axis.title = element_text(size = rel(1.1)),
           axis.ticks.x = element_blank(), 
           legend.position = "none",
           panel.grid.major = element_blank(),
           panel.grid.minor = element_blank())

How were different parts of the country affected?

The chart below shows that the recession did not treat all states equally. Even when the national unemployment rate peaked at 10% in October 2009, the Dakotas and Nebraska experienced unemployment rates between 3 and 5 percent, likely due to the shale boom. Meanwhile, states like Michigan, Nevada, Alabama, and California - all of which were hit hard by the foreclosure crisis - saw their unemployment rates hang near 12%.

load("plot_data/state_unemployment_dot_plot.Rdata")
source("scripts/set_theme.R")


ggplot(filter(stdata, period == "M10", year %in% c(2009)), 
       aes(x = reorder(state_abb, value), y = value)) +
  geom_point(aes(size = lf), color = lt_blue) +
  scale_size_continuous(breaks = c(0, 5000000, 10000000, 15000000),
                        labels = c("Under 5M", "5 to 10M", "10 to 15M", "15+ M"),
                        limits = c(0,20000000),
                        name = "Labor Force Size") +
  scale_y_continuous(labels = scales::percent) +
  geom_hline(yintercept = 0.1, color = "gray70", linetype = "longdash" ) +
  labs(title = "States saw widely different unemployment rates in the depth of the recession",
       subtitle = "Unemployment rate and state labor force size (millions), October 2009",
       x = "State", y = "Unemployment Rate",
       caption = "Source: Bureau of Labor Statistics Current Population Survey.
       Note: Peak national unemployment occurred in October 2009.") +
  lt_theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.ticks.x = element_blank(), legend.position = "top",
        panel.grid.major = element_line(color = "gray75")) +
  annotate(geom = "text", x = 10, y = .105, label = "National unemployment rate: 10%",
           size = 4, family = "Cabin") +
  annotate(geom = "text", x = 13, y = .038, label = "The Dakotas benefited from the shale boom",
           family = "Cabin", size = 3.5) +
  annotate(geom = "text", x = 41, y = .131, label = "Michigan's auto industry was hit hard",
           family = "Cabin", size = 3.5)

The map below explores this geographic variation further: for each U.S. county, it shows the number of quarters between 2007 and 2017 where that county’s unemployment rate was higher than the national rate. The relative strength of labor markets in the middle of the country is immediately evident. However, another trend that wasn’t evident in the state-based figure above emerges: metropolitan areas had high unemployment rates for substantially shorter periods than the surrounding rural areas. Interestingly, this pattern seems to be strongest in the Southeast and weakest on the Pacific Coast (the tech boom notwithstanding).

source("scripts/set_theme.R")

# grab county population and geometry
# ctpop <- get_acs(geography = "county", variable = "B01001_001",
#                  geometry = TRUE, shift_geo = TRUE) %>% 
#   dplyr::rename(stcofips = GEOID, pop = estimate) %>% 
#   filter(!stcofips %in% c("72")) %>% select(-moe) 
load("plot_data/ctpop.Rdata")

# read in the CBSA shapefile, transform projection, grab metros
cbsa <- suppressMessages(st_read("shp/cbsa_2017.geojson"))  %>%
  st_transform(st_crs(ctpop)$proj4string) %>%
  filter(LSAD == "M1", !str_detect(NAME, "PR"),
         !str_detect(NAME, "AK"), !str_detect(NAME, "HI"))

# grab relevant counties from AK and HI
ak_cty <- filter(ctpop, substr(stcofips, 1, 2) == "02") 
anchorage <- st_union(ak_cty[ak_cty$stcofips=="02020",],
                        ak_cty[ak_cty$stcofips=="02170",])
fairbanks <- ak_cty[ak_cty$stcofips == "02090", ]

hi_cty <- filter(ctpop, substr(stcofips, 1, 2) == "15") 
kahului <- st_union(hi_cty[hi_cty$stcofips == "15009", ],
                    hi_cty[hi_cty$stcofips == "15005", ])
honolulu <- hi_cty[hi_cty$stcofips == "15003",]

cps_codes <- fread("https://download.bls.gov/pub/time.series/ln/ln.series")

# grab code for quarterly unemployment rate
cps_query <- filter(cps_codes, periodicity_code == "Q",
                    substr(series_id, 4, 11) == "14000000",
                    sexs_code == 0)


natl_data <- suppressMessages(bls_api(cps_query$series_id, startyear = 2007, endyear = 2018,
                     registrationKey = blskey, annualaverage = TRUE))  %>% 
  dplyr::rename(series_id = seriesID) %>%
  mutate(quarter = paste0(year, "-Q", substr(period, 3, 3))) %>% 
  select(quarter, value) %>% dplyr::rename(natl_urate = value)

load("raw/cturate_data.Rdata")

# determine month of highest unemployments
cturate <- mutate(cturate_data, stfips = substr(series_id, 6, 7),
                  stcofips = substr(series_id, 6, 10),
                  month = paste0(year, "-", period),
                  quarter = case_when(
                    period %in% c("M01", "M02", "M03") ~ paste0(year, "-", "Q1"),
                    period %in% c("M04", "M05", "M06") ~ paste0(year, "-", "Q2"),
                    period %in% c("M07", "M08", "M09") ~ paste0(year, "-", "Q3"),
                    period %in% c("M10", "M11", "M12") ~ paste0(year, "-", "Q4")
                  )) %>% 
  dplyr::rename(urate = value) %>% 
  group_by(stcofips) %>% 
  mutate(max_urate = max(urate),
         is_max = ifelse(urate == max_urate, 1, 0))

annual <- cturate %>% filter(period == "M13") %>%
  dplyr::rename(ann_urate = urate) %>% 
  select(year, stcofips, ann_urate)

adj_factor <- cturate %>% filter(period != "M13", year < 2018) %>% 
  left_join(annual, by = c("stcofips", "year")) %>% 
  mutate(ann_delta = urate - ann_urate) %>% 
  group_by(period) %>% summarize(month_adj = mean(ann_delta))

adj_cturate <- cturate %>% filter(period != "M13", year < 2018) %>% 
  left_join(adj_factor, by = "period") %>% 
  mutate(adj_urate = urate - month_adj) 

rm(cturate)

qt_above <- adj_cturate %>% group_by(stcofips, quarter) %>%
  summarize(adj_urate = mean(adj_urate)) %>% ungroup() %>%
  group_by(stcofips) %>% 
  left_join(natl_data, by = "quarter") %>% 
  mutate(above_natl = ifelse(adj_urate > natl_urate, 1, 0),
         urate_delta = adj_urate - natl_urate,
         delta_quintile = ntile(urate_delta, 5),
         delta_ratio = adj_urate / natl_urate) %>%
  group_by(stcofips) %>% 
  summarize(qt_over_natl = sum(above_natl)) %>% ungroup() %>% 
  mutate(above_quintile = ntile(qt_over_natl, 5))

natl_map <- ctpop %>% inner_join(qt_above, by = "stcofips") %>% 
  mutate(qt_cat = case_when(
    qt_over_natl == 0 ~ 1,
    qt_over_natl < 20 ~ 2,
    qt_over_natl < 40 ~ 3,
    qt_over_natl >= 40 ~ 4
  ), pop_alpha = case_when(
    pop < 100000 ~ "A",
    pop < 1000000 ~ "B",
    pop >= 1000000 ~ "C" 
  ), color_cat = paste0(qt_cat, pop_alpha))

# construct a bivariate color matrix:
# I want it to run from lt_yellow to lt_blue (4 categories),
# and from low to high intensity (3 categories)
color_matrix <- data.frame("colors" = c("#FFD81680", "#FFD816BF", "#FFD816FF",
                                        "#BFA21080", "#BFA210BF", "#BFA210FF",
                                        "#217FBE80", "#217FBEBF", "#217FBEFF",
                                        "#3C657E80", "#3C657EBF", "#3C657EFF"),
                           "xaxis" = rep(c("A", "B", "C"), 4),
                           "yaxis" = matrix(sapply(seq(1, 4), function(x) rep(x, 3)), ncol = 1)) %>% 
  mutate(cat = paste0(yaxis, xaxis))

map_legend <- ggplot(color_matrix, aes(x = xaxis, y = yaxis)) +
  geom_tile(fill = color_matrix$colors, color = "gray50") +
  scale_x_discrete(labels = c("Under 100,000", "100,000 to 1,000,000", "1M or more")) +
  scale_y_continuous(labels = c("0", "1-19", "20-39", "40+"),
                     breaks = seq(1, 4)) +
  labs(title = "Quarters of high unemployment by population", 
       x = "Population", y = "Quarters") +
  lt_theme(panel.grid.major = element_blank(),
           plot.title = element_text(face = "bold"))


metro <- data.frame("x" = 0.5, "y" = 1.5) %>% 
  ggplot(aes(x, y)) +
  geom_tile(fill = NA, color = "gray50", lwd = 1) +
  scale_x_continuous(limits = c(0, 2)) +
  scale_y_continuous(limits = c(0, 2)) +
  labs(title = "Metropolitan areas") +
  lt_theme(axis.text = element_blank(), 
           panel.grid.major = element_blank(),
           axis.title = element_blank(), 
           plot.title = element_text(size = rel(1.1)))


# create the map itself
mainmap <- ggplot() +
  geom_sf(data = natl_map, aes(fill = color_cat), lwd = 0) + 
  geom_sf(data = cbsa, color = "gray50", fill = NA, lwd = .20) +
  geom_sf(data = anchorage, color = "gray50", fill = NA, lwd = .20) +
  geom_sf(data = fairbanks, color = "gray50", fill = NA, lwd = .20) +
  geom_sf(data = honolulu, color = "gray50", fill = NA, lwd = .20) +
  geom_sf(data = kahului, color = "gray50", fill = NA, lwd = .20) +
  scale_fill_manual(values = c("1A" = "#FFD81680", "1B" = "#FFD816BF",
                               "1C" = "#FFD816FF", "2A" = "#BFA21080",
                               "2B" = "#BFA210BF", "2C" = "#BFA210FF",
                               "3A" = "#217FBE80", "3B" = "#217FBEBF",
                               "3C" = "#217FBEFF", "4A" = "#3C657E80",
                               "4B" = "#3C657EBF", "4C" = "#3C657EFF")) +
  coord_sf() +
  labs(title = "The Great Plains states and most metropolitan cores have had shorter spells of high unemployment",
       subtitle = "Quarters where county unemployment rates were higher than the nation's, 2007-2017",
       caption = "Source: Bureau of Labor Statistics
       Note: County unemployment rates were seasonally smoothed.") +
  lt_theme(legend.position = "none",
           axis.title = element_blank(),
           axis.text = element_blank())

grid.arrange(mainmap, metro, map_legend,
             layout_matrix = rbind(c(1, 1, 1, 1, 1, 1),
                                   c(1, 1, 1, 1, 1, 1),
                                   c(1, 1, 1, 1, 1, 1),
                                   c(1, 1, 1, 1, 1, 1),
                                   c(1, 1, 1, 1, 1, 1),
                                   c(1, 1, 1, 1, 1, 1),
                                   c(1, 1, 1, 1, 1, 1),
                                   c(3, 3, 3, 2, NA, NA),
                                   c(3, 3, 3, NA, NA, NA)))

In addition, the “official” dates of the recession didn’t align with what was happening in labor markets. Technically, a recession is defined as two consecutive quarters of negative growth in gross domestic product (GDP); while unemployment rates are strongly correlated with GDP, they are of course not identical. While the Great Recession technically ended at the end of the second quarter of 2009, unemployment continued to rise and stay elevated in some areas. The chart below shows the distribution of quarterly seasonally-adjusted unemployment rates across counties, weighted by population (h/t my classmate Jonathan Tan for the chart concept). Clearly, Americans in some parts of the country were still experiencing very high unemployment rates even years after the crisis had technically ended.

load("raw/cturate_data.Rdata")

# determine month of highest unemployments
cturate <- mutate(cturate_data, stfips = substr(series_id, 6, 7),
                  stcofips = substr(series_id, 6, 10),
                  month = paste0(year, "-", period),
                  quarter = case_when(
                    period %in% c("M01", "M02", "M03") ~ paste0(year, "-", "Q1"),
                    period %in% c("M04", "M05", "M06") ~ paste0(year, "-", "Q2"),
                    period %in% c("M07", "M08", "M09") ~ paste0(year, "-", "Q3"),
                    period %in% c("M10", "M11", "M12") ~ paste0(year, "-", "Q4")
                  )) %>% 
  dplyr::rename(urate = value) %>%  group_by(stcofips) %>% 
  mutate(max_urate = max(urate), is_max = ifelse(urate == max_urate, 1, 0))
rm(cturate_data)


annual <- cturate %>% filter(period == "M13") %>% dplyr::rename(ann_urate = urate) %>% 
  select(year, stcofips, ann_urate)

adj_factor <- cturate %>% filter(period != "M13", year < 2018) %>% 
  left_join(annual, by = c("stcofips", "year")) %>% 
  mutate(ann_delta = urate - ann_urate) %>% 
  group_by(period) %>% summarize(month_adj = mean(ann_delta))

adj_cturate <- cturate %>% filter(period != "M13", year < 2018) %>% 
  left_join(adj_factor, by = "period") %>% 
  mutate(adj_urate = urate - month_adj)

by_quarter <- adj_cturate %>% group_by(stcofips, quarter) %>%
  summarize(adj_urate = mean(adj_urate)) %>% ungroup() %>%
  group_by(stcofips) %>% 
  mutate(max_urate = max(adj_urate),
         is_max = ifelse(adj_urate == max_urate, 1, 0))

adj_cturate %<>% left_join(select(ctpop, stcofips, pop), by = "stcofips")  

recession_quarters <- unlist(Map(function(x, y) paste0(x, "-Q", y),
                          c(rep(2008, 4), rep(2009, 3)),
                          c(seq(1, 4), seq(1, 3))))

qbreaks <- unlist(Map(function(x, y) paste0(x, "-Q", y),
                          sapply(seq(2007, 2013), function(x) rep(x, 2)),
                          c(1, 3)))

plotdata <- adj_cturate %>% filter(!is.na(pop), between(year, 2007, 2013)) %>% 
  mutate(quarter = as.factor(quarter),
         recession = ifelse(quarter %in% recession_quarters, "1", "0")) 

plotdata %>% 
ggplot(aes(x = adj_urate/100,
           y = fct_rev(quarter),
           height = ..density..,
           weight = pop,
           fill = recession)) +
  geom_density_ridges(stat = "density", color = "white") +
  scale_x_continuous(limits = c(0, .20), labels = scales::percent) +
  scale_fill_manual(values = c("1" = lt_pink, "0" = lt_blue)) +
  scale_y_discrete(breaks = qbreaks) +
  labs(title = "Unemployment stayed high in many counties well after the end of the official recession",
       subtitle = "Population-weighted distribution of county unemployment rates",
       x = "Unemployment rate (%)", y = "Quarter",
       caption = "Source: Bureau of Labor Statistics \nNote: County unemployment rates were seasonally smoothed.") +
  lt_theme(panel.grid.major.y = element_blank(),
           panel.grid.minor.x = element_line(color = "gray75", size = rel(0.65),
                                  linetype = "dotted"),
           legend.position = "none") +
  annotate(geom = "text", x = 0.00, y = "2008-Q4", label = "Recession", family = "Cabin") +
  annotate(geom = "errorbar", x = 0.01, ymin = "2008-Q1", ymax = "2009-Q3",
           color = "gray75", width = 0.01/10)

How did the recession affect longer-term labor market disparities?

Economists sometimes categorize labor market trends as either cyclical or structural to denote whether fluctuations are primarily driven by the economic cycle or whether they reflect deeper changes in the labor market. Unfortunately, racial disparities in unemployment are both cyclical and structural, as the chart below illustrates. The unemployment rate among black Americans has been higher than the unemployment rate among white Americans for as long as the Current Population Survey has measured it (the white unemployment rate is available back to 1954, but the black unemployment rate is only available after 1971). That gap only widens during recessions. However, the current disparity is the smallest it has ever been.

load("plot_data/black_white_unemployment_gap.Rdata")
source("scripts/set_theme.R")

ggplot(raceth_unemp_yr, aes(x = year)) +
  geom_linerange(aes(ymin = white/100, ymax = black/100, alpha = bw_delta),
                 size = 1.75, color = lt_orange) +
  geom_point(aes(y = white/100), color = lt_orange, size = 2) + 
  geom_point(aes(y = black/100), color = lt_orange, size = 2) +
  scale_x_continuous(breaks = seq(1970, 2020, 5)) +
  scale_y_continuous(labels = scales::percent) +
  scale_alpha_continuous(name = "Black-white\ngap (pp)") +
  labs(title = "Black unemployment has always been higher than white unemployment
  but the gap is smaller than ever",
       subtitle = "Average annual unemployment rate by race, 1971-2018",
       x = "Year", y = "Average annual unemployment rate",
       caption = "Source: Bureau of Labor Statistics Current Population Survey") +
  annotate(geom = "linerange", x = 1998, ymin = 0.18, ymax = 0.195, 
           color = "orange") +
  annotate(geom = "point", x = 1998, y = 0.18, color = "orange") +
  annotate(geom = "point", x = 1998, y = 0.195, color = "orange") +
  annotate(geom = "text", x = 2004.5, y = 0.195, label = "Black unemployment",
           size = 4, family = "Cabin") +
  annotate(geom = "text", x = 2004.5, y = 0.18, label = "White unemployment",
           size = 4, family = "Cabin") +
  lt_theme(legend.position = c(0.95, 0.8))

The educational profile of the unemployed is slowly changing

A few decades ago, a high school diploma - let alone a college degree - used to be sufficient education to guarantee reasonably steady, sufficiently well-paid employment. Over time, however, the workforce has become more educated, and as a result, the unemployed population contains more post-secondary degree-holders even when the labor market is strong. The chart below shows educational attainment among the unemployed at three such moments of relatively low unemployment.

#===============================================================================#
# MAKE WAFFLES, FROM SCRATCH
#===============================================================================#

load("plot_data/unemployment_education.Rdata")
source("scripts/set_theme.R")
 
# create an array of values to plot for a single year
gridfn <- function(df, blockval, per_row) {
  
  scaled_df <- df %>% mutate(value = value / blockval)
  total <- sum(scaled_df$value)
  
  grid <- expand.grid(x = seq(1, per_row), y = seq(1, ceiling(total / per_row)))
  z <- unlist(sapply(unique(scaled_df$names),
                     function(x) rep(x, scaled_df$value[scaled_df$names == x])))
  grid$z <- c(z, rep(NA, nrow(grid) - length(z)))
  return(grid)
}

grid_more_yrs <- function(df, years, blockval, per_row) {
  grids <- data.frame(x = c(0), y = c(0), z = c(""))
  for(yr in years){
    g <- filter(df, year == yr) %>% ungroup() %>% select(names, value) %>% 
      gridfn(blockval = blockval, per_row = per_row) %>% mutate(year = yr)
    grids <- bind_rows(grids, data.frame(g))
  }
  grids %<>% filter(x > 0)
  return(grids)
}

grids <- grid_more_yrs(unemp_edu, years = c(1997, 2007, 2017), 
                       blockval = 100, per_row = 4) %>% 
  mutate(x = x + (year - 1997) / 2,
         order = case_when(
           z == "Less than high school" ~ 1,
           z == "High school graduate" ~ 2,
           z == "Some college or associate's degree " ~ 3,
           z == "Bachelor's degree or higher" ~ 4
         ))

colors <- c("Less than high school" = lt_blue,
            "High school graduate" = lt_pink,
            "Some college or associate's degree" = lt_green,
            "Bachelor's degree or higher" = lt_orange)

ggplot(grids, aes(x = x, y = y, fill = reorder(z, order))) +
  geom_tile(color = "white") +
  scale_fill_manual(values = colors, na.value = "white") +
  scale_x_continuous(breaks = c(2.5, 7.5, 12.5), labels = c(1997, 2007, 2017),
                     limits = c(0, 20)) +
  coord_equal() +
  labs(title = "Today's unemployed population includes more individuals with at least
  some college education",
       subtitle = "Unemployed population by educational attainment near three business cycle peaks",
  caption = "Source: Bureau of Labor Statistics Current Population Survey") +
  annotate(geom = "tile", x = 1, y = 13, fill = "gray50") +
  annotate(geom = "text", x = 4, y = 13, label = "Represents \n100,000 people",
           size = 3.5, family = "Cabin") +
  annotate(geom = "text", x = 17, y = 1, label = "Less than\nhigh school", 
           size = 3.5, family = "Cabin", color = lt_blue) +
  annotate(geom = "text", x = 17, y = 4, label = "High school\ngraduate",
           size = 3.5, family = "Cabin", color = lt_pink) +
  annotate(geom = "text", x = 17, y = 8, label = "Some college or\nassociate's degree",
           size = 3.5, family = "Cabin", color = lt_green) +
  annotate(geom = "text", x = 17, y = 11, label = "Bachelor's degree\n or higher", 
           size = 3.5, family = "Cabin", color = lt_orange) +
  lt_theme(axis.title = element_blank(), axis.text.y = element_blank(),
           legend.position = "none", panel.grid.major = element_blank(),
           panel.grid.minor = element_blank(),
           axis.text.x = element_text(face = "bold", size = rel(1.25)))

The labor market probably has more slack than first meets the eye

In order to be considered officially unemployed, an individual must be out of work, available to work, and have actively looked for work within the last month. However, this measure understates the total slack in the labor market because it does not account for “marginally attached” workers, who don’t have work and want it but have only searched for a job within the last year, or “discouraged” workers, who want a job but have given up searching. In addition, the official unemployment number does not include people who are working part-time but would prefer full-time work. The chart below compares the official measure to these alternative measures over time, showing that the officially unemployed only represent about half of a larger group of individuals who want a full-time job and don’t have one.

series_list <- c("LNU03000000", "LNU05026645", "LNU05026642", "LNU02032194")

# construct query
cps_query <- filter(cps_codes, series_id %in% series_list)

year_intervals <- sapply(seq(1954, 2018, 19), function(x) seq(x, x + 18, 18))
year_intervals[2,4] <- 2018

raw_data <- map2(year_intervals[1, ] , year_intervals[2, ],
                 function(x, y) suppressMessages(bls_api(cps_query$series_id, startyear = x, endyear = y,
                                        registrationKey = blskey,
                                        annualaverage = TRUE))) %>% bind_rows() %>% 
  dplyr::rename(series_id = seriesID) %>%
  mutate(month = paste0(year, "-", period))

unemp_monthly <- raw_data %>% filter(period != "M13") %>% 
  mutate(names = case_when(
    series_id == "LNU03000000" ~ "Unemployed",
    series_id == "LNU05026645" ~ "Discouraged",
    series_id == "LNU05026642" ~ "Marginally attached",
    series_id == "LNU02032194"  ~ "Part-time for economic reasons"
  ),
  order = case_when(
    names == "Unemployed" ~ 1,
    names == "Discouraged" ~ 2,
    names == "Marginally attached" ~ 3,
    names == "Part-time for economic reasons" ~ 4
  )) %>% ungroup()

monthly_totals <- unemp_monthly %>% select(month, value) %>% 
  group_by(month) %>% summarize(total = sum(value)) #%>% filter(year > 1993) 

unemp_monthly %<>% left_join(monthly_totals, by = "month")

#===============================================================================#
# SMOOTH SEASONAL VARIATION
#===============================================================================#

annual <- raw_data %>% filter(period == "M13") %>% dplyr::rename(ann_value = value) %>% 
  select(year, series_id, ann_value) %>% 
  mutate(names = case_when(
    series_id == "LNU03000000" ~ "Unemployed",
    series_id == "LNU05026645" ~ "Discouraged",
    series_id == "LNU05026642" ~ "Marginally attached",
    series_id == "LNU02032194"  ~ "Part-time for economic reasons"
  ))

adj_factor <- unemp_monthly %>% 
  left_join(annual, by = c("year", "names")) %>% 
  mutate(ann_delta = value - ann_value) %>% 
  group_by(period, names) %>% summarize(month_adj = mean(ann_delta))

adj_levels <- unemp_monthly  %>% 
  left_join(adj_factor, by = c("period", "names")) %>% 
  mutate(adj_value = value - month_adj) 

# save.image("plot_data/different_unemployment_measures.Rdata")

#===============================================================================#
# MAKE A STACKED BAR
#===============================================================================#

underutilized <- adj_levels %>% filter(year > 1993) %>% 
  mutate(month_date = lubridate::ymd(paste0(str_remove(month, "M"), "-01"))) 

ggplot(underutilized, aes(x = month_date, y = adj_value / 1000, group = names)) +
  geom_area(aes(fill = names)) +
  scale_x_date(date_breaks = "2 year", date_labels = "%Y",
               limits = c(min(underutilized$month_date), max(underutilized$month_date))) +
  scale_y_continuous(labels = scales::number) +
  scale_fill_manual(values = c("Discouraged" = lt_green, 
                               "Marginally attached" = lt_pink,
                               "Part-time for economic reasons" = lt_orange,
                               "Unemployed" = lt_blue),
                    name = "") +
  labs(title = "The official unemployment measure doesn't count millions of workers",
       subtitle = "Alternative measures of underutilized workers, in millions, 1994-2018",
       x = "Year", y = "Workers (millions)",
       caption = "Source: Bureau of Labor Statistics Current Population Survey") +
  lt_theme(legend.position = c(0.23, 0.85),
           axis.text.x = element_text(angle = 45),
           legend.text = element_text(size = rel(1.1)),
           panel.grid.minor = element_line(color = "gray85", size = rel(0.75),
                                  linetype = "dotted"),
           panel.grid.major = element_line(color = "white", size = rel(0.75),
                                  linetype = "dotted"),
           panel.ontop = TRUE)

References

Data came from a variety of Bureau of Labor Statistics and Census Bureau surveys, including the Current Population Survey, the American Community Survey, Current Employment Statistics, and Local Area Unemployment Statistics.
I used blscrapeR to download the BLS data and the tidycensus package to download American Community Survey data.
Shapefiles come from the Census Bureau’s TIGER program.
Jonathan Tan’s excellent chart on solar panel cost in the U.S. inspired my county unemployment distribution chart.

The theme for this project is defined below:

lt_theme <- function (...) {
  theme(text = element_text(family = "Cabin Medium"),
  plot.title = element_text(size = rel(1.25), face = "bold"),
  plot.subtitle = element_text(family = "Cabin"),
  plot.caption = element_text(family = "Cabin", face = "italic"),
  panel.background = element_rect(fill = NA), 
  panel.grid.major = element_line(color = "gray75", size = rel(0.75),
                                  linetype = "dotted"),
  panel.grid.minor = element_blank(),
  axis.ticks = element_blank(),
  axis.text = element_text(family = "Cabin SemiBold"),
  legend.background = element_blank(),
  legend.key = element_blank()) +
  theme(...)
  
}