title | subtitle | talkdate | author | job1 | job2 | job3 | logoUni | logoDep | framework | highlighter | hitheme | widgets | mode | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Authoring R Packages |
IAMCS Machine Learning and Applied Statistics Workshop Series |
February 18, 2014 |
Mathew McLean |
Research Assistant Professor |
Texas A & M University |
Tamulogo.png |
IAMCSlogo.svg |
io2012 |
highlight.js |
tomorrow |
|
selfcontained |
FunParts <- function(fun.name){
return(list(formals(fun.name), body(fun.name), environment(fun.name)))
}
FunParts(ncol)
## [[1]]
## [[1]]$x
##
##
##
## [[2]]
## dim(x)[2L]
##
## [[3]]
## <environment: namespace:base>
...
is a special object type inR
used to passed extra arguments to a function- Arguments that don't match any formal arguments of a function are matched with
...
- Arguments matched to
...
can be extracted usinglist(...)
- can also use
c(...)
oras.list(...)
- can also use
- Full names of arguments need not always be specified when calling a function
- True for extraction (
$
), and attributes (attr
) as well
- True for extraction (
- Note: partial argument matching occurs before
...
, but not after it
DemoPartialMatch <- function(first.arg, ..., second.arg){c(...)}
DemoPartialMatch(fir = 1, sec = 2)
## sec
## 2
- Bad practise, leads to bugs, don't use partial matching
- R has global options which can be set to warn about this
- Use
missing(x)
to check if the argumentx
is specified in the call to a function - Use
hasArg(x)
to check for argumentx
is in...
or in formal arguments
TestingMissing <- function(x, ...){
print(c(hasArg(x), hasArg(y), missing(x)))
missing(y)
invisible(x)
}
TestingMissing(y = 1)
## [1] FALSE TRUE TRUE
## Error in missing(y): 'missing' can only be used for arguments
- Alternative:
function(x = NULL){ is.null(x) ...
- Note: the use of
invisible
preventsx
from being printed if no assignment occurs
- We need to make sure the arguments are in the form we expect
- errors, warnings, and messages should be issued as necessary
- messages should be meaningful for the user
Some useful functions include
is.na
,is.null
,is.numeric
, etc. for checking typeall
,any
, e.g.any(is.na(x))
- use
inherits
to check the class of an object identical
is the proper way to check equality inif
andwhile
statements- consider
x == y
whenx
ory
does not have length one or areNA
orNULL
- all.equal is for near equality; though
isTRUE(all.equal(...))
is fine too
- consider
try
- wrapper to safely run an expression that may fail
result <- try(FooFun(x), silent = TRUE) # silent = TRUE means error message suppressed
if (inherits(result, "try-error"))
result <- NA
- Equivalent alternative:
result <- tryCatch(FooFun(x), error = function(e) return(NA))
- Try
demo(error.catching)
inR
and see ?conditions - if messages can be ignored when function is called by your code, use
suppressMessages
suppressWarnings
for suppressing warnings
--- .smallerSpacing
- a
call
is one of three language objects inR
(the others areexpression
s andname
s) - We may use
match.call()
inside a function to capture the original call - This is useful for example:
- to modify the contents of a call and pass them on to another function ; see e.g.
lm
- to save the call for later use (e.g. returning it as part of a list)
- to modify the contents of a call and pass them on to another function ; see e.g.
GetCallParts <- function(x, ...){
.call <- match.call(expand.dots = TRUE)
return(list(.call[[1]], .call$z))
}
GetCallParts(x, y = 1, z = 2)
## [[1]]
## GetCallParts
##
## [[2]]
## [1] 2
- We can use
do.call
to call a function by name and argument list (example coming later) - We can evaluate calls using
eval
StupidSum <- function(x, ...){
.call <- match.call(expand.dots = TRUE)
.call$x <- NULL
.call$na.rm <- TRUE
.call[[1L]] <- as.name("sum")
return(eval(.call))
}
StupidSum(x = "hi", y = 1, z = 2, z2 = NA)
## [1] 3
--- .smallerSpacing
Functions may be
- assigned to variables and stored in data structures
- used as arguments to other functions
- returned by other functions
- anonymous (not bound to a variable)
Example: Change error distribution used to simulate complicated model
GetErrorDist <- function(err.fun.name, ...){
err.fun.name <- match.fun(err.fun.name)
return(function(n) do.call(err.fun.name, list(n, ...)))
}
ErrorDist <- GetErrorDist("rnorm", sd = 3, mean = 10)
## data <- SimulateModel(n = 100, ErrorDist, other.args)
ErrorDist <- GetErrorDist("rt", df = 2); ErrorDist(3)
## [1] -0.01098557 0.58717775 3.81613738
--- .biggerSpacing
- Assignment (e.g.
x <- 1
) and function calls (e. g. f(x = 1)) bind values to variables - A frame is a set of bindings (collection of objects)
- A variable is allowed only one binding per frame
- Can have different bindings in different frames
- A free variable in a particular frame is one that is unbound (has no assigned value)
- Scoping rules determine where
R
looks for values for free variables - An environment is made up of a frame and a pointer to an enclosing environment
- The scope of a binding is the set of environments in which it is visible
--- .biggerSpacing
- Recall, a function has three parts: arguments, a body, and an enclosing environment
- The function's enclosing environment is the one it was created in
- When a function is called, the following occurs
- Args. in the call are matched to the formal args of the function definition
- A new environment is created and assignments are made to it for each formal arg.
- The function's environment becomes the enclosing environment for this one
- The body of the function is evaluated in this new env. and result is returned
--- .biggerSpacing
- User's workspace is
R_GlobalEnv
, which is bound to.GlobalEnv
search()
gives a list of all attached packages - "the search path"- i.e. all env. where
R
will look for bindings (including.GlobalEnv
)
- i.e. all env. where
- When a package is attached, its environment becomes the enclosing env. of R_GlobalEnv
x <<- 1
will bind value1
to first instance of a variablex
in the search path- or create a binding for
x
inR_GlobalEnv
if no existing binding is found
- or create a binding for
- Craziness: Functions
parent.frame
andparent.env
are very differentparent.frame(1)
gives the environment of the calling function - used frequentlyparent.env
gives the enclosing environment of its argument - rarely of direct use
- Also see
?with
and?local
- Two nice references on frames, environments and scope in
R
here and here
--- .smallerSpacing
- Every closure is associated with a enclosing environment (the one it was defined in)
Fun1 <- function(x){
y <- 1
H(x)
}
Fun2 <- function(x){
y <- 1
G <- function(z) z + y
G(x)
}
H <- function(z) z + y
y <- 0; x <- 100
c(Fun1(1), Fun2(1)) # Because of local bindings for x in both Fun.'s, x in .GlobalEnv is ignored
## [1] 1 2
- Since
H
is defined in.GlobalEnv
,H
usesy
in.GlobalEnv
-- this is lexical scoping
env <- new.env(); env # create and print a new environment
## <environment: 0x00000000183b5ea0>
env$x <- 1; as.list(env) # add object x to env and view its contents
## $x
## [1] 1
env$f <- function(x, env){
assign("x", x, envir = env);
return(environment(NULL)) # return the current evaluation environment
}
ls(envir = env) # list objects in env
## [1] "f" "x"
env$f(100, env) # assign 100 to x in env
## <environment: 0x000000000e1f20c0>
get("x", env) # retrieve value of x in env ## 100
env$f(10, parent.env(env)) # assign 10 to x in env's parent environment: R_GlobalEnv
## <environment: 0x000000000db730c0>
x ## 10
f <- function(n) parent.frame(n);
identical(f(1), eval(f(2), envir = new.env())) ## TRUE # R_GlobalEnv
--- .smallerSpacing
- A loaded package,
pkgname
, has three environments associated with it- "package:pkgname" - all exported objects (really pointers to the objects);
- returned by
search()
- returned by
- "namespace:pkgname" - all objects (pointers), including internal ones
- "imports:pkgname" - objects from other packages required by
pkgname
- "package:pkgname" - all exported objects (really pointers to the objects);
- These are locked: trying to add or remove bindings in them causes
error
; see?bindenv
# Function for viewing the environments associated with an attached package
GetPkgEnvirons <- function(pkgname){
ns <- getNamespace(pkgname)
return(list(package = as.environment(paste0("package:", pkgname)),
namespace = ns, imports = parent.env(ns)))
}
sapply(GetPkgEnvirons("stats"), environmentIsLocked)
## package namespace imports
## TRUE TRUE TRUE
--- .smallerSpacing
-
source code can be viewed by typing the function name at the console
- only works if function is exported by namespace on the search path
- sometimes backticks (`) are necessary
`::`
## function (pkg, name) ## { ## pkg <- as.character(substitute(pkg)) ## name <- as.character(substitute(name)) ## getExportedValue(pkg, name) ## } ## <bytecode: 0x000000000b607278> ## <environment: namespace:base>
-
::
- function for viewing exported objects from a namespace, e.g.pkgname::FunName
-
:::
- function for viewing internal objects in a namespace
--- .largerSpacing
- My favourite R function:
getAnywhere()
- access any internal or external function on the search path
- no need to specify package
- who has time to read the help files? Learn by example!
- Download source package (the .tar.gz one) view files in src (for C code)
- Note:
:::
should not be used in production code;- function is not exported for a reason, package author could change it without notice
- Source code for R-devel
- U. Ligges on accessing source code: Rnews 6.4, pp. 43-45
--- .largerSpacing
- Classes provide a template for the creation of objects
- Methods are the special functions of a class that work on the class's objects
- Classes in
R
can inherit features from multiple other classes - Classes simplify code, making it easier to use, and reuse
S3
isR
's oldest, simplest, and most used way to implement OOP- Uses generic functions, special functions that usually only call
UseMethod
- Create a new generic:
MyGeneric <- function(x, ...) UseMethod("MyGeneric")
- Create a new generic:
- Uses generic functions, special functions that usually only call
S4
is newer, more elegant, and has many more features
S3
classes do not require a formal definition- simply change the
class
attribute of an object- e.g.
class(x) <- "MyClass"
or `class(x) <- c("MyClass", "data.frame")
- e.g.
- simply change the
- We can define a new method for
MyGeneric
as follows
MyGeneric.MyClass <- function(x, ...){ # note the matching signature
... # some expressions
}
y <- structure(list(a = 1, b = 2), class = "MyClass") # create instance of MyClass
MyGeneric(y) # no need to say MyGeneric.MyClass(y)
UseMethod
decides which method to dispatch based on the first argument of the generic- Use
methods(print)
to list all the methods defined for the genericprint
- or
methods(class = "Date")
to list all methods defined for classDate
- or
library(fortunes); fortune(121)
##
## Sean Davis: It got me going quickly with S4 methods, which it seems to me
## are the way to go in most cases.
## Rolf Turner: If you want to simultaneously handcuff yourself, strap
## yourself into a strait jacket, and tie yourself in knots, and moreover
## write code which is incomprehensible to the human mind, then S4 methods
## are indeed the way to go.
## -- Sean Davis and Rolf Turner (expressing different views about the
## benefits of S4 classes)
## R-help (May 2005)
- H. Wickham on object oriented programming in
R
: intro and more advanced ?setRefClass
for creating references classes - methods belong to obj., obj. are mutable
- My second favourite
R
function:browser
- stops execution and allows for inspection of current environment when called
debug(Foo)
: callsbrowser
as soon asFoo
is entered- can then step through function line by line
recover
allows for browsing in any active function calltrace
: allows for insertion of arbitrary debugging code at chosen points in a functiontrace(Foo, edit = TRUE)
opens editor with copy ofFoo
's code whenFoo
is called- changes made in the editor are then used when executed
Foo
- changes made in the editor are then used when executed
- e.g.
trace(Foo, browser, exit = browser)
calls browser on entryand
exit ofFoo
- to insert code at certain steps, see the
at
argument and the examples at?trace
- Use
untrace(Foo)
to stop tracingFoo
-
When developing package, you may find the following options useful:
options(error = utils::recover)
- post-mordem debugging of errors when they occur
options(error = function() traceback(2))
- print call stack automatically on error
options(showWarnCalls = TRUE, showErrorCalls = TRUE)
- Call stack is printed when a warning or error occurs
options(warn = 2)
- all warnings are treated as errors
options(prompt = "Howdy! ")
- For determining which parts of code are slowest and use the most memory
library(mgcv, quietly = TRUE)
X <- matrix(rnorm(5e5), 1e3, 5e2); y <- rowSums(X) + rt(1e3, df = 1)
Rprof("mcgvEx.out")
invisible(gam(y ~ te(X[, 1], X[, 2]) + X[, c(-1, -2)]))
Rprof(NULL) # turn off profiling
summaryRprof("mcgvEx.out")
## $by.self
## self.time self.pct total.time total.pct
## ".C" 14.00 96.95 14.00 96.95
## "%*%" 0.14 0.97 0.14 0.97
## "eigen" 0.08 0.55 0.08 0.55
## "crossprod" 0.04 0.28 0.04 0.28
## "tcrossprod" 0.04 0.28 0.04 0.28
## "gam.fit" 0.02 0.14 14.26 98.75
## "FUN" 0.02 0.14 0.06 0.42
## "na.omit.data.frame" 0.02 0.14 0.04 0.28
## "sort.int" 0.02 0.14 0.04 0.28
## "[.data.frame" 0.02 0.14 0.02 0.14
## "double" 0.02 0.14 0.02 0.14
## "is.na" 0.02 0.14 0.02 0.14
##
## $by.total
## total.time total.pct self.time self.pct
## "block_exec" 14.44 100.00 0.00 0.00
## "call_block" 14.44 100.00 0.00 0.00
## "doTryCatch" 14.44 100.00 0.00 0.00
## "eval" 14.44 100.00 0.00 0.00
## "evaluate" 14.44 100.00 0.00 0.00
## "evaluate_call" 14.44 100.00 0.00 0.00
## "force" 14.44 100.00 0.00 0.00
## "gam" 14.44 100.00 0.00 0.00
## "handle" 14.44 100.00 0.00 0.00
## "ifelse" 14.44 100.00 0.00 0.00
## "in_dir" 14.44 100.00 0.00 0.00
## "knit" 14.44 100.00 0.00 0.00
## "parse_page" 14.44 100.00 0.00 0.00
## "process_file" 14.44 100.00 0.00 0.00
## "process_group" 14.44 100.00 0.00 0.00
## "process_group.block" 14.44 100.00 0.00 0.00
## "slidify" 14.44 100.00 0.00 0.00
## "try" 14.44 100.00 0.00 0.00
## "tryCatch" 14.44 100.00 0.00 0.00
## "tryCatchList" 14.44 100.00 0.00 0.00
## "tryCatchOne" 14.44 100.00 0.00 0.00
## "withCallingHandlers" 14.44 100.00 0.00 0.00
## "withVisible" 14.44 100.00 0.00 0.00
## "estimate.gam" 14.34 99.31 0.00 0.00
## "gam.fit" 14.26 98.75 0.02 0.14
## "magic" 14.02 97.09 0.00 0.00
## ".C" 14.00 96.95 14.00 96.95
## "magic.post.proc" 0.22 1.52 0.00 0.00
## "%*%" 0.14 0.97 0.14 0.97
## "eigen" 0.08 0.55 0.08 0.55
## "totalPenaltySpace" 0.08 0.55 0.00 0.00
## "FUN" 0.06 0.42 0.02 0.14
## "apply" 0.06 0.42 0.00 0.00
## "matrix" 0.06 0.42 0.00 0.00
## "quantile.default" 0.06 0.42 0.00 0.00
## "variable.summary" 0.06 0.42 0.00 0.00
## "crossprod" 0.04 0.28 0.04 0.28
## "tcrossprod" 0.04 0.28 0.04 0.28
## "na.omit.data.frame" 0.04 0.28 0.02 0.14
## "sort.int" 0.04 0.28 0.02 0.14
## ".External2" 0.04 0.28 0.00 0.00
## "model.frame" 0.04 0.28 0.00 0.00
## "model.frame.default" 0.04 0.28 0.00 0.00
## "na.omit" 0.04 0.28 0.00 0.00
## "sort" 0.04 0.28 0.00 0.00
## "sort.default" 0.04 0.28 0.00 0.00
## "[.data.frame" 0.02 0.14 0.02 0.14
## "double" 0.02 0.14 0.02 0.14
## "is.na" 0.02 0.14 0.02 0.14
## "[" 0.02 0.14 0.00 0.00
## "format_perc" 0.02 0.14 0.00 0.00
## "formatC" 0.02 0.14 0.00 0.00
## "paste0" 0.02 0.14 0.00 0.00
## "pmax" 0.02 0.14 0.00 0.00
## "vapply" 0.02 0.14 0.00 0.00
##
## $sample.interval
## [1] 0.02
##
## $sampling.time
## [1] 14.44
--- .smallerSpacing
- For shorter runs, Rprof can be misleading and the
microbenchmark
package can be useful
library(microbenchmark)
x <- numeric(10000)
microbenchmark(unlist(lapply(x, function(y) y + 1)), for (i in seq_along(x)) x[i] <- 1)
## Unit: milliseconds
## expr min lq mean median uq max neval
## unlist(lapply(x, function(y) y + 1)) 4.511981 4.829220 5.218576 4.964749 5.241842 7.643321 100
## for (i in seq_along(x)) x[i] <- 1 7.600157 7.687542 8.199546 7.955580 8.073601 13.383509 100
- Need to consider garbage collection; see Radford Neal on
microbenchmark
issues - See
tracemem
,Rprofmem
, orRprof
withmemory.profiling
on for tracking memory usage - Contributed packages
profr
andproftools
provide additional anaylsis of profiling results - See Chapter 3 of Writing R Extensions manual
--- .mediumSpacing
- To start a package use function
package.skeleton
- specify package name, directory to create pkg in and specify objects to include in pkg
- What to put in the package is specified using one of the following arguments
list
- character vector specifying object namesenvironment
- an environment containing objects to add to the packagecode_files
- character vector of path names to.R
files (sourced to create pkg)
- The result is a directory named after your package containing:
- Directory
R
containing source code (one file per object or a copy ofcode_files
) - Directory
man
containing outlines of documentation files for each object - Directory
data
for data in .rda files - NAMESPACE file - objects to be imported and exported
- DESCRIPTION file - basic info about the package
- Read-and-delete-me file - do as the name suggests
- Directory
--- .mediumSpacing
x <- read.dcf(file = system.file("DESCRIPTION", package = "splines"))
writeLines(formatDL(names(as.data.frame(x)), as.character(x), style = "list"))
## Package: splines
## Version: 3.2.0
## Priority: base
## Imports: graphics, stats
## Title: Regression Spline Functions and Classes
## Author: Douglas M. Bates <[email protected]> and William N. Venables <[email protected]>
## Maintainer: R Core Team <[email protected]>
## Description: Regression spline functions and classes.
## License: Part of R 3.2.0
## Built: R 3.2.0; x86_64-w64-mingw32; 2015-02-27 03:19:04 UTC; windows
These are all the required fields, plus Built
which should not be specified by author
--- .mediumSpacing
-
Collate
: used to specify the order theR
code files should be processed in- If used, all files must be given; can be OS-specific, e.g. Collate.unix
-
Authors@R
: for machine-readable author specification with theperson class
class -
BugReports
: A URL to send bug reports to -
Imports
: List of packages whose namespaces you use (comma-separated, no quotes) -
Suggests
: Packages not necessarily needed, e.g. only used in tests or vignettes -
LinkingTo
: Package names, to make use of theirC
header files -
Enhances
: Packages your package enhances -
Depends
: Packages that must be attached to load your packageDepends
is often incorrectly used instead ofImports
-
Can also version number requirements, including for
R
itself- e.g.
Depends: R (>= 3.0.0)
- e.g.
-
Many others. Only fields that should not be used are
Built
andPackaged
--- .mediumSpacing
# lists possible functions for NAMESPACE file
names(parseNamespaceFile("utils", R.home("library")))
## [1] "imports" "exports" "exportPatterns" "importClasses"
## [5] "importMethods" "exportClasses" "exportMethods" "exportClassPatterns"
## [9] "dynlibs" "nativeRoutines" "S3methods"
- What should be exported? (user can use with
::
)export(FnctnName)
exportPattern(reg.ex)
- includes all files matchingreg.ex
- What needs to be imported from other packages?
import(pkg.name)
,importFrom(pkg.name, fnctn.name)
- importing an entire package (using
imports
) should be avoided, if possible
- What compiled code needs to be included:
useDynLib(foo)
LaTeX
-like help files that get converted toLaTeX
,HTML
, and text when pkg built- use function
prompt
to generate an outline (skeleton) for a new function for the pkg - Should contains examples that are checked when package is built
- Chapter 2 of Writing R Extensions Manual
--- .mediumSpacing
- With this package, you write documentation above function definition in source file
- Function
roxygenise()
parses this code into a documentation file - Can also create NAMESPACE and DESCRIPTION files
- Each line starts #', then just replace LaTeX-like markup with
@
For example:
#' This is the title of the documentation file
#'
#' This is the description of the function
#' @param x a character vector
#' @return string; x - processed some way
#' @details These are the details
#' @export
#' @importFrom tools toRd
#' @seealso \code{\link{table}}
#' @examples MyFun(c("hi", "bye"))
MyFun <- function(x = "hi"){
- tests - additional code for testing the package
- demo - Code for user can see executed with
demo
function - inst - additional data, that is not
.rda
format - src - for compiled code
RStudio
- Can add a special pane for checking, building, and installing packages to its GUI
- package
testthat
- tools for making testing easier (important)
- package
devtools
- Many useful functions when writing code for packages
system2("open", file.path(R.home("doc"), "manual", "R-exts.pdf"))
- Other manuals:
list.files(R.home("doc/manual"))
- Tutorial on creating packages by R Core Team member F. Leisch
- Lecture notes on how to create R packages by R Core member B. Ripley
- CRAN policies
- John Chambers - Newest book, the "Green Book" (S4), the "White Book" (S3)
- Hadley Wickham - Advanced R Programming and the companion package
pryr
- Search and ask questions on Stack Overflow and the R mailing lists
R
functions??
orhelp.search
;RSiteSearch
to use the R-project search engine- CRANberries for new packages and updates