-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathRPR-Pipe.R
135 lines (108 loc) · 4.76 KB
/
RPR-Pipe.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# tocID <- "RPR-Pipe.R"
#
# Purpose: A Bioinformatics Course:
# Discussing pipe operators.
#
# Version: 1.0
#
# Date: 2021 10
# Author: Boris Steipe ([email protected])
#
# Versions:
# 1.0 New code
#
#
# TODO:
# - find more interesting examples
#
# == DO NOT SIMPLY source() THIS FILE! =======================================
#
# If there are portions you don't understand, use R's help system, Google for an
# answer, or ask your instructor. Don't continue if you don't understand what's
# going on. That's not how it works ...
#
# ==============================================================================
#TOC> ==========================================================================
#TOC>
#TOC> Section Title Line
#TOC> ------------------------------------------------
#TOC> 1 Pipe Concept 41
#TOC> 2 Nested Expression 73
#TOC> 3 magrittr:: Pipe 78
#TOC> 4 Base R Pipe 93
#TOC> 5 Intermediate Assignment 108
#TOC> 6 Postscript 127
#TOC>
#TOC> ==========================================================================
# = 1 Pipe Concept =======================================================
# Pipes are actually an awesome idea for any code that implements a workflow -
# a sequence of operations, each of which transforms data in a specialized way.
#
# This principle is familiar from maths: chained functions. If have a function
# y = f(x) and want to use those results as in z = g(y), I can just write
# z = g(f(x))
#
# On the unix command line, pipes were used from the very beginning, implemented
# with the "|" pipe character.
#
# In R, the magrittr package provided the %>% operator, and recently the |>
# operator has been introduced into base R.
#
# However there are alternatives: intermediate assignment, and nested functions
# that have always existed in base R anyway.
#
# Let us look at an example. In writing this, I found out that virtually
# ALL non-trivial examples I came up with don't translate well into this idiom
# at all. It is actually quite limited to simple filtering operations on
# data. A more interesting example might be added in the future, let me know if
# you have a good idea.
#
# A somewhat contrived example is to sort a list of files by the
# length of the file names:
myFiles <- list.files(pattern = "\\.R$")
# nchar() gives the number of characters in a string, order() produces indices
# that map an array to its sorted form.
#
# = 2 Nested Expression ===================================================
myFiles[order(nchar(myFiles))]
# = 3 magrittr:: Pipe =====================================================
if (! requireNamespace("magrittr", quietly = TRUE)) {
install.packages("magrittr")
}
# Package information:
# library(help = magrittr) # basic information
# browseVignettes("magrittr") # available vignettes
# data(package = "magrittr") # available datasets
library(magrittr)
myFiles %>% nchar %>% order %>% myFiles[.]
# = 4 Base R Pipe =========================================================
# Since version 4.1, base R now supports a pipe operator without the need
# to load a special package. Such an introductions of external functionality
# into the language is very rare.
#
# Unfortunately it won't (yet) work with the '[' function, so we need to write
# an intermediate function for this example
extract <- function(x, v) {
return(v[x])
}
myFiles |> nchar() |> order() |> extract(myFiles)
# = 5 Intermediate Assignment =============================================
# So what's the problem? As you can see, the piped code may be concise and
# expressive. But there is also a large amount of implicit assignment and
# processing going on and that is usually a bad idea because it makes code hard
# to maintain. I am NOT a big fan of the nested syntax, but I don't think that
# replacing it with the pipe makes things much better. My preferred idiom is
# to use intermediate assignments. Only then is it convenient to examine
# the code step by step and validate every single step. And that is the most
# important objective at all: no code is good if it does not compute
# correctly.
x <- nchar(myFiles)
x <- order(x)
myFiles[x]
# = 6 Postscript ==========================================================
# I tried to write an example that strips all comments from a list of files, and
# another example that finds all files that were not yet updated this year
# (according to the "# Date: in the header). Neither examples can be well
# written without intermediate assignments, or at least sapply() functions
# that are not simpler at all than the intermediate assignment.
# [END]