If you don’t know what parallel processing is, long story short, it optimizes your computer’s ability to compute by splitting tasks and running them on separate cores on your computer’s processor, to run these tasks in “parallel”. This is commonly used for running repeated tasks such as in
for loops, where instead of running each iteration of the loop one after the other, you can split up these tasks, send them to different cores, and run these tasks at the same time. You can think of this as “division of labor”.
This becomes very useful especially in processes that take a long time, because the time you save from performing parallel processing scales with how long the process takes. Basically, if your computer has 4 cores, your process can run to 4 times faster. With 8 cores, it can run up to 8 times faster.
In this tutorial I will show you how to set up R for parallel processing, which only takes a few lines of code. For the set up we will use the R package:
doParallel. We will install/load it with
p_load from the package
pacman (which was introduced here.)
pacman::p_load(doParallel) cores <- detectCores() cl <- makeCluster(cores) registerDoParallel(cl)
detectCores is used to identify the number of cores your computer’s processor has. Then
makeCluster will set up the parallel processing framework (to easily split tasks and combine results) with the number of cores detected from the previous line. Finally, the
registerDoParallel will register the parallel processing backend to be used with
foreach, a package designed for performing for loops in parallel.
Now that your R is set up for parallel processing, the next post will show you how to set up and perform your tasks in parallel, and also demonstrate the difference in performance between regular and parallel processing.