Tidy Tuesday

I have seen some cool graphs on twitter created for Tidy Tuesday. I wanted to join in on the fun so I downloaded the data from week 3 and started playinh. The data are from our world in data and I downloaded the data file from github.

mortality <- readxl::read_excel(here::here("global_mortality.xlsx"))
glimpse(mortality)

## Observations: 6,156
## Variables: 35
## $ country                                    <chr> "Afghanistan", "Afg...
## $ country_code                               <chr> "AFG", "AFG", "AFG"...
## $ year                                       <dbl> 1990, 1991, 1992, 1...
## $ `Cardiovascular diseases (%)`              <dbl> 17.61040, 17.80181,...
## $ `Cancers (%)`                              <dbl> 4.025975, 4.054145,...
## $ `Respiratory diseases (%)`                 <dbl> 2.106626, 2.134176,...
## $ `Diabetes (%)`                             <dbl> 3.832555, 3.822228,...
## $ `Dementia (%)`                             <dbl> 0.5314287, 0.532497...
## $ `Lower respiratory infections (%)`         <dbl> 10.886362, 10.35696...
## $ `Neonatal deaths (%)`                      <dbl> 9.184653, 8.938897,...
## $ `Diarrheal diseases (%)`                   <dbl> 2.497141, 2.572228,...
## $ `Road accidents (%)`                       <dbl> 3.715944, 3.729142,...
## $ `Liver disease (%)`                        <dbl> 0.8369093, 0.845515...
## $ `Tuberculosis (%)`                         <dbl> 5.877075, 5.891704,...
## $ `Kidney disease (%)`                       <dbl> 1.680611, 1.671115,...
## $ `Digestive diseases (%)`                   <dbl> 1.058771, 1.049322,...
## $ `HIV/AIDS (%)`                             <dbl> 0.01301948, 0.01451...
## $ `Suicide (%)`                              <dbl> 0.4366105, 0.442280...
## $ `Malaria (%)`                              <dbl> 0.4488863, 0.455019...
## $ `Homicide (%)`                             <dbl> 1.287020, 1.290991,...
## $ `Nutritional deficiencies (%)`             <dbl> 0.3505045, 0.343212...
## $ `Meningitis (%)`                           <dbl> 3.037603, 2.903202,...
## $ `Protein-energy malnutrition (%)`          <dbl> 0.3297599, 0.322171...
## $ `Drowning (%)`                             <dbl> 0.9838624, 0.954586...
## $ `Maternal deaths (%)`                      <dbl> 1.769213, 1.749264,...
## $ `Parkinson disease (%)`                    <dbl> 0.02515859, 0.02545...
## $ `Alcohol disorders (%)`                    <dbl> 0.02899828, 0.02917...
## $ `Intestinal infectious diseases (%)`       <dbl> 0.1833303, 0.178107...
## $ `Drug disorders (%)`                       <dbl> 0.04120540, 0.04203...
## $ `Hepatitis (%)`                            <dbl> 0.1387378, 0.135008...
## $ `Fire (%)`                                 <dbl> 0.1741567, 0.170671...
## $ `Heat-related (hot and cold exposure) (%)` <dbl> 0.1378229, 0.134826...
## $ `Natural disasters (%)`                    <dbl> 0.00000000, 0.79760...
## $ `Conflict (%)`                             <dbl> 0.932, 2.044, 2.408...
## $ `Terrorism (%)`                            <dbl> 0.007, 0.040, 0.027...

The variable names had some special characters so I started by doing some name tidying.

names(mortality)<- mortality %>% 
  names() %>% 
  to_snake_case() %>% 
  str_remove(pattern="\\(%\\)") %>% 
  str_replace_all(pattern="-", "_") %>% 
  str_replace_all(pattern="/", "_")  %>% 
  str_replace_all(pattern="\\(", "_") %>% 
  str_remove(pattern="\\)")

Next it was time to reshape the data.

world <- mortality %>% 
  filter(country=="World") %>% 
  select(-country_code) %>% 
  gather(key=disease, value ="percent", -(country:year))

There is a lot of data so I decided to plot a time series for the five diseases that killed the most people in 2016.

top5_world <- world %>% 
  filter(year==2016) %>% 
  arrange(desc(percent)) %>% 
  top_n(5)

## Selecting by percent

I have been trying to get a different project to work in tweenr for a while so I thought that this was a good time to try my luck on a different data set. This data set is very similar to the gapminder data that is used in a tweenr tutorial so I relied heavily on that code.

mortality_edit <- world %>% 
  filter(disease %in% top5_world$disease) %>% 
  select(-country) %>% 
  rename(x=year,y=percent,time=year) %>%
  mutate(ease="linear") 

mortality_tween <- tween_elements(mortality_edit, time="time", group="disease", ease="ease", nframes=150) %>% 
  mutate(year = round(time), disease = .group) %>%
  left_join(world, by=c("disease","year")) %>% 
  mutate(disease=Hmisc::capitalize(str_replace(disease, pattern="_", " ")))

## Warning: Column `disease` joining factor and character vector, coercing
## into character vector

It worked! Now I just need to apply my new skills to my old project.

p2 <- ggplot(mortality_tween,
             aes(x=year, y=percent, group=disease, frame = .frame, cumulative = TRUE)) +
  geom_line(aes(color=disease), size=1) + theme_minimal() + ggtitle("Top 5 causes of death in the world") +
  xlab("Year") + scale_colour_discrete(name  ="Disease")

gganimate(p2, title_frame = FALSE,interval = 0.05, filename ="world_mortality_2016.gif")

Packages I used

library(tidyverse)
library(snakecase)
library(gganimate)
library(readxl)

My first tidy Tuesday

Tidy Tuesday

Packages I used

Emma Vestesson