This is a brief tutorial on making your code run faster in R. Code-optimization can become a very advanced topic in R. For our purposes, we discuss a little bit about memory management, and show two ways to test how fast parts of your code are running, so that you can identify where to make changes that could speed up your code.

memory management

When you create objects in R, you are assigning part of your computer’s memory to represent the parts of the object. Generally speaking, your ability to create objects in R is limited by the memory of your computer. Additionally, the process of representing your object in memory takes time. So, memory management involves understanding how to efficiently represent information in memory.

The size of an object in memory depends on object class

# numeric by default
a <- rep(0,1000)
object.size(a)
## 8040 bytes
# as.character
a <- as.character(rep(0,1000))
object.size(a)
## 8088 bytes
# as.integer
a <- as.integer(rep(0,1000))
object.size(a)
## 4040 bytes
a <- as.double(rep(0,1000))
object.size(a)
## 8040 bytes

matrices and data frames

b <- matrix(0,ncol=10,nrow=100)
object.size(b)
## 8200 bytes
b <- as.data.frame(matrix(0,ncol=10,nrow=100))
object.size(b)
## 9704 bytes

Rprofvis

We can use the Rprofvis function to find out how long parts of our code take:

In this example, we create a matrix with two rows and 1000 columns. Then, we use rbind to add rows to the bottom one at a time. We add 1,000 new rows.

Rprof(tmp <- tempfile())

a <- matrix(0,nrow=2,ncol=1000)
for(i in 1:1000){
  a <- rbind(a,rnorm(1000,0,1))
}

Rprof()
summaryRprof(tmp)
## $by.self
##         self.time self.pct total.time total.pct
## "rbind"     12.22       94      12.22        94
## "eval"       0.78        6      13.00       100
## 
## $by.total
##                       total.time total.pct self.time self.pct
## "eval"                     13.00       100      0.78        6
## "block_exec"               13.00       100      0.00        0
## "call_block"               13.00       100      0.00        0
## "evaluate_call"            13.00       100      0.00        0
## "evaluate::evaluate"       13.00       100      0.00        0
## "evaluate"                 13.00       100      0.00        0
## "handle"                   13.00       100      0.00        0
## "in_dir"                   13.00       100      0.00        0
## "knitr::knit"              13.00       100      0.00        0
## "process_file"             13.00       100      0.00        0
## "process_group.block"      13.00       100      0.00        0
## "process_group"            13.00       100      0.00        0
## "rmarkdown::render"        13.00       100      0.00        0
## "timing_fn"                13.00       100      0.00        0
## "withCallingHandlers"      13.00       100      0.00        0
## "withVisible"              13.00       100      0.00        0
## "rbind"                    12.22        94     12.22       94
## 
## $sample.interval
## [1] 0.02
## 
## $sampling.time
## [1] 13

In this example, we do the same as above, except we pre-allocate the matrix with 1002 rows.

Rprof(tmp <- tempfile())

a <- matrix(0,nrow=1002,ncol=1000)
for(i in 3:1002){
  a[i,] <- rnorm(1000,0,1)
}

Rprof()
summaryRprof(tmp)
## $by.self
##         self.time self.pct total.time total.pct
## "rnorm"      0.12    85.71       0.12     85.71
## "eval"       0.02    14.29       0.14    100.00
## 
## $by.total
##                       total.time total.pct self.time self.pct
## "eval"                      0.14    100.00      0.02    14.29
## "block_exec"                0.14    100.00      0.00     0.00
## "call_block"                0.14    100.00      0.00     0.00
## "evaluate_call"             0.14    100.00      0.00     0.00
## "evaluate::evaluate"        0.14    100.00      0.00     0.00
## "evaluate"                  0.14    100.00      0.00     0.00
## "handle"                    0.14    100.00      0.00     0.00
## "in_dir"                    0.14    100.00      0.00     0.00
## "knitr::knit"               0.14    100.00      0.00     0.00
## "process_file"              0.14    100.00      0.00     0.00
## "process_group.block"       0.14    100.00      0.00     0.00
## "process_group"             0.14    100.00      0.00     0.00
## "rmarkdown::render"         0.14    100.00      0.00     0.00
## "timing_fn"                 0.14    100.00      0.00     0.00
## "withCallingHandlers"       0.14    100.00      0.00     0.00
## "withVisible"               0.14    100.00      0.00     0.00
## "rnorm"                     0.12     85.71      0.12    85.71
## 
## $sample.interval
## [1] 0.02
## 
## $sampling.time
## [1] 0.14
Rprof(tmp <- tempfile())

a <- matrix(rnorm(1002*1000,0,1),nrow=1002,ncol=1000)

Rprof()
summaryRprof(tmp)
## $by.self
##          self.time self.pct total.time total.pct
## "rnorm"       0.08       80       0.08        80
## "matrix"      0.02       20       0.10       100
## 
## $by.total
##                       total.time total.pct self.time self.pct
## "matrix"                    0.10       100      0.02       20
## "block_exec"                0.10       100      0.00        0
## "call_block"                0.10       100      0.00        0
## "eval"                      0.10       100      0.00        0
## "evaluate_call"             0.10       100      0.00        0
## "evaluate::evaluate"        0.10       100      0.00        0
## "evaluate"                  0.10       100      0.00        0
## "handle"                    0.10       100      0.00        0
## "in_dir"                    0.10       100      0.00        0
## "knitr::knit"               0.10       100      0.00        0
## "process_file"              0.10       100      0.00        0
## "process_group.block"       0.10       100      0.00        0
## "process_group"             0.10       100      0.00        0
## "rmarkdown::render"         0.10       100      0.00        0
## "timing_fn"                 0.10       100      0.00        0
## "withCallingHandlers"       0.10       100      0.00        0
## "withVisible"               0.10       100      0.00        0
## "rnorm"                     0.08        80      0.08       80
## 
## $sample.interval
## [1] 0.02
## 
## $sampling.time
## [1] 0.1

microbenchmark

Microbenchmark allows you to run an expression X number of times, and measuring the mean amout of time the expression takes. You can put a multiple expressions and compare them. The default is to run the expression 100 times (but you can change that)

library(microbenchmark)
## Warning: package 'microbenchmark' was built under R version 3.4.3
a<-rnorm(10000,500,25)

microbenchmark(mean(a),
               sum(a)/length(a))
## Unit: microseconds
##              expr    min      lq     mean  median      uq     max neval
##           mean(a) 20.029 20.4575 24.93915 24.6755 25.8020 171.597   100
##  sum(a)/length(a)  9.065  9.2775 11.51821 11.3535 12.8905  31.147   100
##  cld
##    b
##   a
microbenchmark(mean(a),
               sum(a)/length(a), times=50)
## Unit: microseconds
##              expr    min     lq     mean median     uq    max neval cld
##           mean(a) 19.481 20.179 21.53392 20.365 20.722 44.403    50   b
##  sum(a)/length(a)  8.904  9.146 10.20708  9.282 10.593 17.943    50  a