High-performance R
Although R is probably not the fastest language ever made, it sees a lot of use in scientific and data analysis applications. This tutorial focus on a number of techniques we can use to parallelize and speed up our code, in addition to a couple other useful techniques we can use in our work.
In particular, these materials combine a review of basic R syntax and concepts with more advanced performance-related content. The goal of this format is two-fold: we want to make sure everyone is on the same page, and it ensures a relatively logical progression of concepts (the alternative is to have two distinct lessons, and have the difficulty level go from 0 to 9000 real fast). Hopefully these materials will be useful for R users of all levels.
Topics covered include the following:
- Basic R syntax and concepts
- Doing work with the “tidyverse” packages (
dplyr
,purrr
,ggplot2
) - Measuring code performance
- Common performance optimizations
- Parallelization with
doParallel
,plyr
, andmultidplyr
Setup
Before you start, make sure you have both R and RStudio installed and ready to go.
You may also wish to install the tidyverse packages
with install.packages("tidyverse")
beforehand. We’ll be using them a lot.
Credits
Giving credit where credit is due, these materials borrow liberally from Software Carpentry’s R for Reproducible Science workshop as well as a lot of concepts from Hadley Wickham’s “Advanced R” book.