Visualizing COVID cases in Belgium

using R and gganimate

I was wondering how to visualize some of the COVID case load maps I was seeing around the internet and on government sites. Most of the time they use coloured polygons to represent the COVID case load in a particular area.

However, this way of representing the data is not free from bias. In particular the artificial division into towns leads to what is known as the Modifiable Areal Unit Problem. This source of bias orginitates from the fact that small changes in location could tip statistics in favour of a certain conclusion (just because of how lines are drawn on a map). This problem is often leveraged in a political context when Gerrymandering. This shows that maps are idealized representations of a rather messy world, and don’t always reflect this messy state (or can be abused within this context).

In the map below I’ve introduced some randomness to an otherwise rather well defined map of town polygons. I’ve sampled the towns randomly according to the number of COVID cases on a particular day. These locations are not tied to addresses but break the uniformity of the town polygons. I’ve also included a delay, where previous locations remain shown in a grey shade. These can be considered “active” or “hot” cases, representing people who are still contagious. The latter helps define clusters in space.

Below you find the source code of the figure, it allows you to recreate the figure for any given day. The code should be easily adapted for other countries. For Belgium all data is sourced from Sciensano, who govern health related data in Belgium. Instances where there are fewer than 5 cases are set to 0, since this is the most conservative approach.

# read all required libraries
library(sf)
library(geojsonsf)
library(tidyverse)

# read in latest data from sciensano
# the Belgian agency overseeing health stats
cases <- jsonlite::fromJSON("https://epistat.sciensano.be/Data/COVID19BE_CASES_MUNI.json")

# only retain what we need, both in columns
# as in case count (remove <5 values, set to 0)
cases <- cases %>%
  select("DATE","NIS5","CASES","TX_DESCR_NL") %>%
  rename(name = "TX_DESCR_NL",
         date = "DATE",
         cases = "CASES") %>%
  mutate(cases = ifelse(
    cases == "<5",
    0,
    cases)) %>%
  mutate(cases = as.numeric(cases))
  
  
# remove towns with no cases
towns_with_cases <- cases %>%
  group_by(NIS5) %>%
  summarize(contagion = any(cases > 0)) %>%
  filter(contagion)

cases <- cases %>%
  filter(NIS5 %in% towns_with_cases$NIS5)

# read in all Belgian towns
# grabbed from https://github.com/pduchesne/data
download.file(
  "https://raw.githubusercontent.com/pduchesne/data/master/geo/Communes-Gemeenten.geojson",
  file.path(tempdir(), "towns.geojson"))
towns <- geojsonsf::geojson_sf(file.path(tempdir(), "towns.geojson"))

# drop unneeded columns, and rename the "name" column
towns <- towns %>%
  select(-c("description","DESN","name","ISN","ICC","Shape_Area","Shape_Leng")) %>%
  mutate(NIS5 = substr(SHN,4,9)) %>%
  select(-"SHN") %>%
  st_zm() %>%
  drop_na()
  
# Cleanup geometries and reproject
# to the local Belgian Lambert projection
towns <- suppressWarnings(st_buffer(towns, 0.0)) %>%
  st_transform(crs = 31370) # Belgian Lambert 72 projection

# Do a union on the towns to
# define the Belgian border
outline <- st_union(towns)

# Offset this polygon for a drop shadow
# effect later on
shadow <- st_sfc(outline + c(0,-3000),
                 crs = 31370)

# define a rolling mean function
# to be used in the tidy routine
# below
roll_mean <- function(data, max_date){
  df <- tibble(
    date = as.character(
      seq(as.Date("2020/02/23"),
          max_date,
          "days")))
  
  df <- left_join(df, data, by = "date")
  
  # fill in missing row names
  df$NIS5 <- data$NIS5[1]
  df$name <- data$name[1]
  
  # apply rolling mean to cases
  x <- zoo::rollmean(
    zoo::zoo(df$cases, df$date),
    k = 7,
    align = "right",
    fill = NA,
    na.rm = TRUE
  )
  
  df$cases_roll <- suppressWarnings(as.vector(round(x)))
  return(df)
}

# take the rolling mean by town
# start at 24/02 end today
cases <- cases %>%
  group_by(NIS5) %>%
  do(roll_mean(.,Sys.Date()))

# do a full join of the case data
# with the town outlines
cases <- right_join(towns, cases) %>%
  mutate(date = as.factor(date)) %>%
  mutate(
    cases = ifelse(is.na(cases),0,cases),
    cases_roll = ifelse(is.na(cases_roll),0,cases_roll)
    ) %>%
  drop_na()

# select random points by date and polygon
cases <- cases %>%
  group_by(NIS5, date) %>%
  mutate(geom2 = st_combine(st_sample(geometry, cases_roll)))

st_geometry(cases) <- "geom2"

# create a rolling "sum" of all locations in the past
# 5 days (assuming that people will recover in ~week)
# this shows the persistence of infections in a region

dates <- unique(cases$date)
cases$date <- as.Date(cases$date)
sick <- lapply(dates, function(x){
  sel <- seq(as.Date(x) - 4, as.Date(x), by = "days")
  sel <- cases[which(cases$date %in% sel),]
  geom <- st_combine(st_geometry(sel))
  st_sf(date = x, geom = geom)
})

sick <- do.call("rbind", sick)

# Define a global mapping theme
theme_map <- function(...) {
  theme_minimal() +
    theme(
      text = element_text(
        family = "Ubuntu Thin",
        color = "#222222"),
      axis.line = element_blank(),
      axis.text.x = element_blank(),
      axis.text.y = element_blank(),
      axis.ticks = element_blank(),
      axis.title.x = element_blank(),
      axis.title.y = element_blank(),
      panel.grid.major = element_blank(),
      plot.background = element_rect(
        fill = "#FFFFFF",
        color = NA),
      panel.background = element_rect(
        fill = "#FFFFFF",
        color = "#FFFFFF"),
      legend.direction = "vertical",
      legend.position = c(.2, .3),
      ...
    ) 
}

# set colours
cols = c("new cases" = "orange",
         "old cases (past 5 days)" = "#C3C3C3")

# use the point geometry
# not the polygon one
# switch using st_geometry()
st_geometry(cases) <- "geom2"

# create ggplot based
# animation setup
covid_map <- ggplot() +
  geom_sf( # Belgium outline
    data = shadow,
    fill = "#C3C3C3",
    col = NA,
    lwd = 0) +
  geom_sf( # Belgium outline
    data = outline,
    fill = "#222222",
    col = NA,
    lwd = 0) +
  geom_sf(data = sick, # past random dots
          aes(color = "old cases (past 5 days)"),
          cex = 0.1,
          show.legend = "point") +
  geom_sf( # the random dots
    data = cases,
    aes(color = "new cases"),
    cex = 0.1,
    show.legend = "point") +
  scale_color_manual(
    name = "",
    values = cols,
    guide = guide_legend(
      override.aes = list(shape = c(20, 20),
                          cex = 3))
    ) +
  labs(
    caption = "graphics & analysis by @koen_hufkens for bluegreen.ai"
  ) +
  theme_map() +
  transition_manual(
    date
  )

anim <- covid_map +
  ggtitle("Belgian COVID cases",
          subtitle = "visualized by random sampling\n on {current_frame}")

animate(
  anim,
  width = 500,
  height = 500,
  fps = 2
)

anim_save(filename = "belgium_covid.gif")

Avatar
Koen Hufkens, PhD
Founder, Research Associate

As an earth system scientist and ecologist I model ecosystem processes.

Next
Previous