ASA Connect

 View Only

The speed improvement in R 3.2 is a game changer

  • 1.  The speed improvement in R 3.2 is a game changer

    Posted 03-01-2015 11:13

    Like many statisticians, I depend on R not only for data analysis for also for intensive numerical simulations. R has a reputation for being slow, especially if your algorithm cannot be easily vectorized as in many Monte Carlo Markov chain type of algorithm. Ways to deal with such situations is to write critical part of the code in C or C++ via Rcpp package. Newer numerical language such as julia (julia-lang.org) is specifically designed to run much faster. But to leave the comfortable home of R can not only lead to increased technical complexities, lower productivity but also huge emotional stress. 

    The upcoming R 3.2, however, features improved byte-code compiler by Professor Luke Tierney that brings substantial speed improvement to non-vectorized code. The following is from R development log

    "The byte-code compiler and interpreter include new instructions that allow many scalar subsetting and assignment and scalar arithmetic operations to be handled more efficiently. This can result in significant performance improvements in scalar numerical code"

    This post is to provide two examples that compare the speed of R 3.1 with the previous byte code compiler and R 3.2 with the enhanced compiler.

    Example 1. Code from "A comparison of programming languages in economics" by S. Borağan Aruoba, Jesús Fernández-Villaverde. https://github.com/jesusfv/Comparison-Programming-Languages-Economics/blob/master/RBC_R.R.

    R 3.1 without byte code:

    Time =  541.99 1.92 549.97 NA NA

    R 3.1 with byte code at optimization level 3

     Time =  264.23 17.58 281.88 NA NA

    R 3.2 without byte code:

    Time =  536.26 0.97 537.35 NA NA

    R 3.2 with byte code at optimization level 3

     Time =  48.99 0.73 49.72 NA NA

    The R 3.2 with byte code is 5 times as fast as R 3.1.

    Example 2. This code compares looped code liao2 against vectorized code liao1. Surprisingly, the loop code runs faster in R 3.2 with byte code

    liao1 = function(x)

    {

      d = nrow(x)

      n = ncol(x)

      result1 = numeric(n)

      one.to.d = 1:d

     

      for(i in 1:n)

      {

        sum1 = 0

        for(j in 1:n) sum1 = sum1 + sum((x[,j] - x[,i])^2)   

        result1[i] = sum1

      } 

      result1

    }

     

    ###################################

    liao2 = function(x)

    {

      d = nrow(x)

      n = ncol(x)

      result1 = numeric(n)

      one.to.d = 1:d

     

      for(i in 1:n)

      {

        sum1 = 0

        for(j in 1:n)

        {

          for(k in one.to.d) sum1 = sum1 + (x[k,j] - x[k,i])^2

        }

        result1[i] = sum1

      } 

      result1

    }

    ################################ 

    n = 1000

    d = 10

    x = rnorm(n*d)

    dim(x) = c(d, n)

     

    system.time(y1 <<- liao1(x))

    system.time(y2 <<- liao2(x))

    R 3.1 without byte code:

    system.time(y1 <<- liao1(x))

       user  system elapsed

          3       0       3

    > system.time(y2 <<- liao2(x))

       user  system elapsed

      13.83    0.00   13.83

    R 3.1 with byte code at optimization level 3

    system.time(y1 <<- liao1(x))

       user  system elapsed

       2.22    0.00    2.21

    > system.time(y2 <<- liao2(x))

       user  system elapsed

       3.48    0.00    3.48

    R 3.2 without byte code:  

    system.time(y1 <<- liao1(x))

       user  system elapsed

       2.52    0.00    2.51

    > system.time(y2 <<- liao2(x))

       user  system elapsed

      14.33    0.00   14.32

    Time =  536.26 0.97 537.35 NA NA

    R 3.2 with byte code at optimization level 3

     

    > system.time(y1 <<- liao1(x))

       user  system elapsed

       1.68    0.00    1.67

    > system.time(y2 <<- liao2(x))

       user  system elapsed

       1.14    0.00    1.14

    In summary, the change in R 3.2 byte code compiler is a big deal. It allows R to do much more demanding computational work and it may change the programming style from often forced vectorization to a more natural way suitable for a particular algorithm.


    -------------------------------------------
    Jiangang Liao
    -------------------------------------------