-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathREADME.Rmd
209 lines (154 loc) · 7.62 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# armacmp
<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![R-CMD-check](https://github.com/dirkschumacher/armacmp/workflows/R-CMD-check/badge.svg)](https://github.com/dirkschumacher/armacmp/actions)
[![Codecov test coverage](https://codecov.io/gh/dirkschumacher/armacmp/branch/master/graph/badge.svg)](https://app.codecov.io/gh/dirkschumacher/armacmp?branch=master)
<!-- badges: end -->
The goal of `armacmp` is to create a DSL to formulate linear algebra code in R that is compiled to C++ using the Armadillo Template Library. It also offers an mathematical optimization that uses `RcppEnsmallen` to optimize functions in C++.
The scope of the package is linear algebra and Armadillo. It is not meant to evolve into a general purpose R to C++ transpiler.
It has three main functions:
* `compile` compiles an R function to C++ and makes that function again avaliable in your R session.
* `translate` translates an R function to C++ and returns the code as text.
* `compile_optimization_problem` uses `RcppEnsmallen` and the functions above to compile continuous mathematical optimizations problems to C++.
This is currently an *experimental prototype* with most certainly bugs or unexpected behaviour. However I would be happy for any type of feedback, alpha testers, feature requests and potential use cases.
Potential use cases:
* Speed up your code :)
* Quickly estimate `Rcpp` speedup gain for linear algebra code
* Learn how R linear algebra code can be expressed in C++ using `translate` and use the code as a starting point for further development.
* Mathematical optimization with `optimize`
* ...
## Installation
``` r
remotes::install_github("dirkschumacher/armacmp")
```
## Caveats and limitations
* *speed*: R is already really fast when it comes to linear algebra operations. So simply compiling your code to C++ might not give you a *significant and relevant* speed boost. The best way to check is to measure it yourself and see for your specific use-case, if compiling your code to C++ justifies the additional complexity.
* *NAs*: there is currently no NA handling. In fact everything is assumed to be double (if you use matrices/vectors).
* *numerical stability*: Note that your C++ code might produce different results in certain situations. Always validate before you use it for important applications.
## Example
You can compile R like code to C++. Not all R functions are supported.
```{r}
library(armacmp)
```
Takes a matrix and returns its transpose.
```{r}
trans <- compile(function(X) {
return(t(X))
})
trans(matrix(1:10))
```
Or a slightly larger example using QR decomposition
```{r, echo=TRUE, eval=TRUE}
# from Arnold, T., Kane, M., & Lewis, B. W. (2019). A Computational Approach to Statistical Learning. CRC Press.
lm_cpp <- compile(function(X, y = type_colvec()) {
qr_res <- qr(X)
qty <- t(qr.Q(qr_res)) %*% y
beta_hat <- backsolve(qr.R(qr_res), qty)
return(beta_hat, type = type_colvec())
})
# example from the R docs of lm.fit
n <- 70000 ; p <- 20
X <- matrix(rnorm(n * p), n, p)
y <- rnorm(n)
all.equal(
as.numeric(coef(lm.fit(X, y))),
as.numeric(lm_cpp(X, y))
)
```
## API
`armacmp` always compiles functions. Every function needs to have a `return` statement with an optional type argument.
```{r, eval=FALSE}
my_fun <- compile(function(X, y = type_colvec())) {
return(X %*% y, type = type_colvec())
}
```
A lot of linear algebra functions/operators are defined as well some control flow (for loops and if/else).
Please take a look at the [function reference article](https://dirkschumacher.github.io/armacmp/articles/function-reference.html) for more details what can be expressed.
### Optimization of arbitrary and differentiable functions using `ensmallen`
The package now also supports optimization of functions using `RcppEnsmallen`. Find out more at [ensmallen.org](https://ensmallen.org/).
All code is compiled to C++. During the optimization there is no context switch back to R.
#### Arbitrary function
Here we minimize `2 * norm(x)^2` using simulated annealing.
```{r}
# taken from the docs of ensmallen.org
optimize <- compile_optimization_problem(
data = list(),
evaluate = function(x) {
return(2 * norm(x)^2)
},
optimizer = optimizer_SA()
)
# should be roughly 0
optimize(matrix(c(1, -1, 1), ncol = 1))
```
Optimizers:
* Simulated Annealing through `optimizer_SA`
* Conventional Neural Evolution `optimizer_CNE`
* ...
#### Differentiable functions
Here solve a linear regression problem using L-BFGS.
```{r}
optimize_lbfgs <- compile_optimization_problem(
data = list(design_matrix = type_matrix(), response = type_colvec()),
evaluate = function(beta) {
return(norm(response - design_matrix %*% beta)^2)
},
gradient = function(beta) {
return(-2 %*% t(design_matrix) %*% (response - design_matrix %*% beta))
},
optimizer = optimizer_L_BFGS()
)
# this example is taken from the RcppEnsmallen package
# https://github.com/coatless/rcppensmallen/blob/master/src/example-linear-regression-lbfgs.cpp
n <- 1e6
beta <- c(-2, 1.5, 3, 8.2, 6.6)
p <- length(beta)
X <- cbind(1, matrix(rnorm(n), ncol = p - 1))
y <- X %*% beta + rnorm(n / (p - 1))
# Run optimization with lbfgs fullly in C++
optimize_lbfgs(
design_matrix = X,
response = y,
beta = matrix(runif(p), ncol = 1)
)
```
Optimizers:
* L-BFGS through `optimizer_L_BFGS`
* Gradient Descent through `optimizer_GradientDescent`
* ...
### When does `armacmp` improve performance?
It really depends on the use-case and your code. In general Armadillo can combine linear algebra operations. For example the addition of 4 matrices `A + B + C + D` can be done in a single for loop. Armadillo can detect that and generates efficient code.
So whenever you combine many different operations, `armacmp` _might_ be helpful in speeding things up.
We gather some examples on the wiki to further explore if compiling linear algebra code to C++ actually makes sense for pure speed reasons.
### Related projects
* [nCompiler](https://github.com/nimble-dev/nCompiler) - Code-generate C++ from R. Inspired the approach to compile R functions directly instead of just a code block as in the initial version.
### Contribute
`armacmp` is experimental and has a volatile codebase. The best way to contribute is to write issues/report bugs/propose features and test the package with your specific use-case.
### Code of conduct
Please note that the 'armacmp' project is released with a
[Contributor Code of Conduct](CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.
### References
* Conrad Sanderson and Ryan Curtin. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software, Vol. 1, pp. 26, 2016.
* S. Bhardwaj, R. Curtin, M. Edel, Y. Mentekidis, C. Sanderson. ensmallen: a flexible C++ library for efficient function optimization. Workshop on Systems for ML and Open Source Software at NIPS 2018.
* Dirk Eddelbuettel, Conrad Sanderson (2014). RcppArmadillo: Accelerating R
with high-performance C++ linear algebra. Computational Statistics and Data
Analysis, Volume 71, March 2014, pages 1054-1063. URL
http://dx.doi.org/10.1016/j.csda.2013.02.005
* Dirk Eddelbuettel and Romain Francois (2011). Rcpp: Seamless R and C++
Integration. Journal of Statistical Software, 40(8), 1-18. URL
https://www.jstatsoft.org/v40/i08/.