Emma goes to useR! 2019
It is Friday evening and my train from Toulouse is delayed [update: train ended up being 2h30 min late, please take this into account when judging this post on its length and coherence.]. My bag is heavy from all the stickers I have aquired and it is too hot for me to stroll around Toulouse. What better than to write up my thoughts on useR!2019?
I have had such a good time and I am leaving full of inspiration to try new packages and ways of working. People were incredibly friendly and I chatted to people I had never met before about everything from R to how great the football world cup was. Toulouse is also a beautiful city.
I was lucky enough to be around for the tutorials on the Tuesday. As one of my next projects at work is to build packages I of course booked onto the package development workshop with Jenny Bryan, Hadley Wickham and Jim Hester (no further introductions needed). I had high expectation and was not disappointed, the session was great and I left feeling confident that not only can I build packages but also test and document them. In particular, I liked the focus on the workflow because I think that it will really help me remember what to do and in what order. I am also keen to have a closer look at the resources that are linked.
The keynote speakers were really good and Julia Stewart Lowndes was my favourite (Julien Cornebise was a close second). Julia spoke passionately about open science and the power of communities and her slides had beautiful illustrations by Allison Horst. Watch it!
Packages I had never heard of that I am excited to try
Conferences and meetups are great ways to find out about new (sometimes only to me) packages. An example of such package is
disk.frame. The way I understand it,
disk.frame is for data that is too big to be kept in RAM but small enough to fit on your hard drive. That description seems to fit a lot of different data sets so I am sure that it will be useful. The idea of it seems beautifully simple, large files are slit into smaller chunks and then you manipulate the chunks. slides and github
Another package that I heard about through the grapevine (there were so many great talks to pick from) is
vroom. My source tells me that it reads data super fast and even let’s you only read in certain columns. Sounds like the dream!
I am all about version control nowadays but I didn’t realise that you can version control your data! The package
git2rdata is likely to make me even more obsessed with version control. No more ‘but I thought this variable hadn’t changed!?!’ or ‘why did we create a new version on 4 April?’! (I mean I could have written tests or notes for this but this seems so much more convenient!). Thierry Onkelinx presented (and gave me a sticker). github
I like to think that I am good at writing code to streamline a process but tables are often the one thing that I end doing manually, everything from copying values from different files to formatting and merging cells. Usually creating a tables is one of the last steps and I have run out of time/automation-energy. The
flextable seems to be able to help me with some of these problems. github
Package updates that I am excited about
Most people who use R for statistics have probably used the
survival package at some point. The creator Terry Therneau showed us some of the new features that will be in the next big release, including better support for competing risk models and functions to check that the structure of the data is correct. Some of this is on github and should be on CRAN soonish. Survival analysis is one of those statistical areas that I find fascinating but I sometimes get confused because the structure of the data is very particular (at least to me). This future update looked promising. My favourite part of this talk was the anecdote shared that Terry Therneau had presented this package at a S conference in Toulouse 30 years ago and was now back in Toulouse for the first time to present an update.
Going from wide to long data or the other way is basically impossible without spending at least ten minutes getting it wrong, randomly changing the arguments and getting a coffee before either succeeding or giving up. This has been the case for me both in my old life as a Stata user and my current life as an R person. The presentation on the updated
tidyr package gave me some very much needed hope. The new functions
pivot_wide seem to have function arguments that actually describe what they are eg
names_from and you no longer need to quote variables that actually exist. I have high hopes that I will be able to use the new functions without banging my head against the wall. Some types of data wrangling that previously required multiple steps now seem to be able to be done in one step and I already have specific chunks of code that I want to try this on because I hate how clunky the current version is. Not all features are on CRAN so here are the details of the talk.
@hadleywickham) July 10, 2019
Colin Gillespie’s presentation on R and security. I expected something dry but useful but it turned out to be one the most amusing (and scary) talks I went to. People give away their passwords way too quickly. This talk also included the life goal of only ever being hacked by an adult which I think is a good goal. Link to slides
The children are our future
At least from the perspective of someone with no children, useR seemed to be a really child friendly conference and this made me really happy to see. Many people had brought their children and some of those children even made it to the stage.
R-ladiesSo much of my development as an R user has been shaped by R-ladies. It was so nice to see so many R-ladies in the same place and the lunch was lovely. Look how many of us there were!
@RLadiesGlobal) July 10, 2019