Getting the data

A couple of months ago I discovered a package developed by Public Health England that makes it easier to extract the data behind their fingertips website (https://fingertips.phe.org.uk/). I wanted to try it and as I like maps I decided to download some data at CCG level and make an interactive choropleth map using the leaflet package. I use select_indicators to browse the indicators and pick one at random that seems to have CCG level data. Sometimes the page will not load so be aware that you might have to try several times.

inds <- select_indicators()
df <- fingertips_data(IndicatorID = inds,
                        AreaTypeID = 152)
glimpse(df)
## Observations: 6,882
## Variables: 24
## $ IndicatorID                                   <int> 92165, 92165, 92...
## $ IndicatorName                                 <fctr> Total number of...
## $ ParentCode                                    <fctr> NA, E92000001, ...
## $ ParentName                                    <fctr> NA, England, En...
## $ AreaCode                                      <fctr> E92000001, E390...
## $ AreaName                                      <fctr> England, London...
## $ AreaType                                      <fctr> Country, Sub-re...
## $ Sex                                           <fctr> Persons, Person...
## $ Age                                           <fctr> All ages, All a...
## $ CategoryType                                  <fctr> NA, NA, NA, NA,...
## $ Category                                      <fctr> NA, NA, NA, NA,...
## $ Timeperiod                                    <fctr> 2010 Q1, 2010 Q...
## $ Value                                         <dbl> 167.1858, 138.07...
## $ LowerCI95.0limit                              <dbl> NA, NA, NA, NA, ...
## $ UpperCI95.0limit                              <dbl> NA, NA, NA, NA, ...
## $ LowerCI99.8limit                              <dbl> NA, NA, NA, NA, ...
## $ UpperCI99.8limit                              <dbl> NA, NA, NA, NA, ...
## $ Count                                         <dbl> 9080969.0, 11790...
## $ Denominator                                   <dbl> 54316618, 853868...
## $ Valuenote                                     <fctr> NA, NA, NA, NA,...
## $ RecentTrend                                   <fctr> NA, NA, NA, NA,...
## $ ComparedtoEnglandvalueorpercentiles           <fctr> Not compared, N...
## $ Comparedtosubnationalparentvalueorpercentiles <fctr> Not compared, N...
## $ TimeperiodSortable                            <int> 20100100, 201001...

Cleaning the data

From a first look at the data I can see that there are a lot of variables with only one value or only missing values. I drop these variables as they add little value.

df <- df %>% 
  select_if(function(x) !all(is.na(x))) %>% # Drop columns with all missing values
  select_if(function(x) !(is.factor(x) && nlevels(x)==1)) # Drop factors with only one level

I don’t like CamelCase so I use the to_snake_case function from the snakecase package to change the names to a format I like (I also really wanted to try this package).

df_names <- names(df)
snake_names <- to_snake_case(df_names)
names(df)<- snake_names
names(df)
##  [1] "indicator_id"        "parent_code"         "parent_name"        
##  [4] "area_code"           "area_name"           "area_type"          
##  [7] "timeperiod"          "value"               "count"              
## [10] "denominator"         "valuenote"           "recent_trend"       
## [13] "timeperiod_sortable"

I now have a smaller data frame with variable names that I like but there is still some data I will not use. I only care about the CCG so I filter my data based on area_type. At some point I want to create a map with time slider to show changes over time but for now I start with only one time period.

df_small <- df %>% 
  filter(area_type=="CCGs (since 4/2017)") %>% 
  filter(timeperiod=="2017 Q2") %>% 
  droplevels()

Getting a shapefile

I need a shapefile to create a map so I just googled CCG shapefile and found one on data.gov.uk. I ended up having to simplify the spatial dataframe and create a new spatial object with the data in as the very large file size made the map massive and slow.

setwd("/Users/emmavestesson/Documents/R folder/Fingertips/Data/Clinical_Commissioning_Groups_April_2017_Full_Clipped_Boundaries_in_England_V4")
shape_temp <- readOGR(dsn = "clinical_Commissioning_Groups_April_2017_Full_Clipped_Boundaries_in_England_V4.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "clinical_Commissioning_Groups_April_2017_Full_Clipped_Boundaries_in_England_V4.shp", layer: "clinical_Commissioning_Groups_April_2017_Full_Clipped_Boundaries_in_England_V4"
## with 207 features
## It has 9 fields
## Integer64 fields read as strings:  objectid bng_e bng_n
shape2 <- gSimplify(shape_temp, tol=0.01, topologyPreserve=TRUE) # simplify shapefile
shape <- SpatialPolygonsDataFrame(shape2, data=shape_temp@data) # add data object 

I look at the shape data and the PHE data to identify which variables to merge on and then merge the PHE data to the shape data.

## Observations: 207
## Variables: 9
## $ objectid   <fctr> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...
## $ ccg17cd    <fctr> E38000001, E38000002, E38000003, E38000004, E38000...
## $ ccg17nm    <fctr> NHS Airedale, Wharfedale and Craven CCG, NHS Ashfo...
## $ bng_e      <fctr> 393990, 597640, 481343, 547758, 523471, 429979, 56...
## $ bng_n      <fctr> 462191, 140644, 219317, 185109, 191753, 403330, 19...
## $ long       <dbl> -2.093300, 0.823374, -0.820020, 0.129493, -0.218220...
## $ lat        <dbl> 54.05565, 51.13096, 51.86651, 51.54553, 51.61108, 5...
## $ st_areasha <dbl> 1220633023, 580617232, 979028121, 36091040, 8673643...
## $ st_lengths <dbl> 221998.42, 149424.22, 278005.39, 40663.92, 50837.78...
## Observations: 207
## Variables: 13
## $ indicator_id        <int> 92165, 92165, 92165, 92165, 92165, 92165, ...
## $ parent_code         <fctr> E39000029, E39000035, E39000034, E3900001...
## $ parent_name         <fctr> Yorkshire and Humber NHS region, South Ea...
## $ area_code           <fctr> E38000001, E38000002, E38000003, E3800000...
## $ area_name           <fctr> NHS Airedale, Wharfdale And Craven CCG, N...
## $ area_type           <fctr> CCGs (since 4/2017), CCGs (since 4/2017),...
## $ timeperiod          <fctr> 2017 Q2, 2017 Q2, 2017 Q2, 2017 Q2, 2017 ...
## $ value               <dbl> 138.0813, 213.9027, 140.2690, 124.4213, 11...
## $ count               <dbl> 21899, 26371, 28469, 24672, 41658, 38176, ...
## $ denominator         <dbl> 158595, 123285, 202960, 198294, 374915, 23...
## $ valuenote           <fctr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ recent_trend        <fctr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ timeperiod_sortable <int> 20170200, 20170200, 20170200, 20170200, 20...

I now have everything I need to create the map. I add the variable name and the value I am mapping to a label as well as a legend. Pay no attention to the breaks used for colour fill, I just wanted to try the function.

pal <- colorNumeric("magma", shape@data$count)

leaflet(shape, options= leafletOptions( minZoom=6, maxZoom=12) )%>% 
  addProviderTiles(providers$OpenStreetMap.Mapnik) %>%
  addPolygons(stroke=TRUE, weight=1, color="black", 
              fillColor = ~pal(count), fillOpacity=0.5,  
              label = ~paste0(area_name, ": ", prettyNum(count, big.mark = ",", format="f")),
              highlight = highlightOptions(
                weight = 5,
                color = "#666",
                dashArray = "",
                fillOpacity = 0.7,
                bringToFront = TRUE)) %>%
  addLegend("topleft", pal = pal, values = ~count,
            title = "Prescribed antibiotic items <br> per 1000 residents ",
            labFormat = labelFormat(prefix = ""),
            opacity = 1
   )

Next step

I want to add a timeslider to display changes over time but I decided this was a good place to stop seeing how this is my first blog post.

Packages I used

library(sp)
library(leaflet)
library(fingertipsR)
library(tidyverse)
library(snakecase)
library(rgdal)
library(rgeos)