If you don’t know what parallel processing is, long story short, it optimizes your computer’s ability to compute by splitting tasks and running them on separate cores on your computer’s processor, to run these tasks in “parallel”. This is commonly used for running repeated tasks such as in for loops, where instead of running each iteration of the loop one after the other, you can split up these tasks, send them to different cores, and run these tasks at the same time. You can think of this as “division of labor”.

This becomes very useful especially in processes that take a long time, because the time you save from performing parallel processing scales with how long the process takes. Basically, if your computer has 4 cores, your process can run to 4 times faster. With 8 cores, it can run up to 8 times faster.

Set-up with doParallel

In this tutorial I will show you how to set up R for parallel processing, which only takes a few lines of code. For the set up we will use the R package: doParallel. We will install/load it with p_load from the package pacman (which was introduced here.)

pacman::p_load(doParallel)

cores <- detectCores()
cl <- makeCluster(cores[1])
registerDoParallel(cl)

The function detectCores is used to identify the number of cores your computer’s processor has. Then makeCluster will set up the parallel processing framework (to easily split tasks and combine results) with the number of cores detected from the previous line. Finally, the registerDoParallel will register the parallel processing backend to be used with foreach, a package designed for performing for loops in parallel.

Now that your R is set up for parallel processing, the next post will show you how to set up and perform your tasks in parallel, and also demonstrate the difference in performance between regular and parallel processing.