forked from swcarpentry/shell-novice
-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-loop.html
334 lines (334 loc) · 23.2 KB
/
04-loop.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: The Unix Shell</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<h1 class="title">The Unix Shell</h1>
<h2 class="subtitle">Loops</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Write a loop that applies one or more commands separately to each file in a set of files.</li>
<li>Trace the values taken on by a loop variable during execution of the loop.</li>
<li>Explain the difference between a variable’s name and its value.</li>
<li>Explain why spaces and some punctuation characters shouldn’t be used in files’ names.</li>
<li>Demonstrate how to see what commands have recently been executed.</li>
<li>Re-run recently executed commands without retyping them.</li>
</ul>
</div>
</section>
<p>Wildcards and tab completion are two ways to reduce typing (and typing mistakes). Another is to tell the shell to do something over and over again. Suppose we have several hundred genome data files named <code>basilisk.dat</code>, <code>unicorn.dat</code>, and so on. In this example, we’ll use the <code>creatures</code> directory which only has two example files, but the principles can be applied to many many more files at once. We would like to modify these files, but also save a version of the original files and rename them as <code>original-basilisk.dat</code> and <code>original-unicorn.dat</code>. We can’t use:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">mv</span> *.dat original-*.dat</code></pre>
<p>because that would expand to:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">mv</span> basilisk.dat unicorn.dat original-*.dat</code></pre>
<p>This wouldn’t back up our files, instead we get an error</p>
<pre class="error"><code>mv: target `original-*.dat' is not a directory</code></pre>
<p>This a problem arises when <code>mv</code> receives more than two inputs. When this happens, it expects the last input to be a directory where it can move all the files it was passed. Since there is no directory named <code>original-*.dat</code> in the <code>creatures</code> directory we get an error.</p>
<p>Instead, we can use a <strong>loop</strong> to do some operation once for each thing in a list. Here’s a simple example that displays the first three lines of each file in turn:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">for</span> <span class="kw">filename</span> in basilisk.dat unicorn.dat
<span class="kw">></span> <span class="kw">do</span>
<span class="kw">></span> <span class="kw">head</span> -3 <span class="ot">$filename</span>
<span class="kw">></span> <span class="kw">done</span></code></pre>
<pre class="output"><code>COMMON NAME: basilisk
CLASSIFICATION: basiliscus vulgaris
UPDATED: 1745-05-02
COMMON NAME: unicorn
CLASSIFICATION: equus monoceros
UPDATED: 1738-11-24</code></pre>
<p>When the shell sees the keyword <code>for</code>, it knows it is supposed to repeat a command (or group of commands) once for each thing in a list. In this case, the list is the two filenames. Each time through the loop, the name of the thing currently being operated on is assigned to the <strong>variable</strong> called <code>filename</code>. Inside the loop, we get the variable’s value by putting <code>$</code> in front of it: <code>$filename</code> is <code>basilisk.dat</code> the first time through the loop, <code>unicorn.dat</code> the second, and so on.</p>
<p>By using the dollar sign we are telling the shell interpreter to treat <code>filename</code> as a variable name and substitute its value on its place, but not as some text or external command. When using variables it is also possible to put the names into curly braces to clearly delimit the variable name: <code>$filename</code> is equivalent to <code>${filename}</code>, but is different from <code>${file}name</code>. You may find this notation in other people’s programs.</p>
<p>Finally, the command that’s actually being run is our old friend <code>head</code>, so this loop prints out the first three lines of each data file in turn.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="follow-the-prompt"><span class="glyphicon glyphicon-pushpin"></span>Follow the Prompt</h2>
</div>
<div class="panel-body">
<p>The shell prompt changes from <code>$</code> to <code>></code> and back again as we were typing in our loop. The second prompt, <code>></code>, is different to remind us that we haven’t finished typing a complete command yet.</p>
</div>
</aside>
<p>We have called the variable in this loop <code>filename</code> in order to make its purpose clearer to human readers. The shell itself doesn’t care what the variable is called; if we wrote this loop as:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">for</span> <span class="kw">x</span> in basilisk.dat unicorn.dat
<span class="kw">do</span>
<span class="kw">head</span> -3 <span class="ot">$x</span>
<span class="kw">done</span></code></pre>
<p>or:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">for</span> <span class="kw">temperature</span> in basilisk.dat unicorn.dat
<span class="kw">do</span>
<span class="kw">head</span> -3 <span class="ot">$temperature</span>
<span class="kw">done</span></code></pre>
<p>it would work exactly the same way. <em>Don’t do this.</em> Programs are only useful if people can understand them, so meaningless names (like <code>x</code>) or misleading names (like <code>temperature</code>) increase the odds that the program won’t do what its readers think it does.</p>
<p>Here’s a slightly more complicated loop:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">for</span> <span class="kw">filename</span> in *.dat
<span class="kw">do</span>
<span class="kw">echo</span> <span class="ot">$filename</span>
<span class="kw">head</span> -100 <span class="ot">$filename</span> <span class="kw">|</span> <span class="kw">tail</span> -20
<span class="kw">done</span></code></pre>
<p>The shell starts by expanding <code>*.dat</code> to create the list of files it will process. The <strong>loop body</strong> then executes two commands for each of those files. The first, <code>echo</code>, just prints its command-line parameters to standard output. For example:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">echo</span> hello there</code></pre>
<p>prints:</p>
<pre class="output"><code>hello there</code></pre>
<p>In this case, since the shell expands <code>$filename</code> to be the name of a file, <code>echo $filename</code> just prints the name of the file. Note that we can’t write this as:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">for</span> <span class="kw">filename</span> in *.dat
<span class="kw">do</span>
<span class="ot">$filename</span>
<span class="kw">head</span> -100 <span class="ot">$filename</span> <span class="kw">|</span> <span class="kw">tail</span> -20
<span class="kw">done</span></code></pre>
<p>because then the first time through the loop, when <code>$filename</code> expanded to <code>basilisk.dat</code>, the shell would try to run <code>basilisk.dat</code> as a program. Finally, the <code>head</code> and <code>tail</code> combination selects lines 81-100 from whatever file is being processed.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="spaces-in-names"><span class="glyphicon glyphicon-pushpin"></span>Spaces in Names</h2>
</div>
<div class="panel-body">
<p>Filename expansion in loops is another reason you should not use spaces in filenames. Suppose our data files are named:</p>
<pre><code>basilisk.dat
red dragon.dat
unicorn.dat</code></pre>
<p>If we try to process them using:</p>
<pre><code>for filename in *.dat
do
head -100 $filename | tail -20
done</code></pre>
<p>then the shell will expand <code>*.dat</code> to create:</p>
<pre><code>basilisk.dat red dragon.dat unicorn.dat</code></pre>
<p>With older versions of Bash, or most other shells, <code>filename</code> will then be assigned the following values in turn:</p>
<pre><code>basilisk.dat
red
dragon.dat
unicorn.dat</code></pre>
<p>That’s a problem: <code>head</code> can’t read files called <code>red</code> and <code>dragon.dat</code> because they don’t exist, and won’t be asked to read the file <code>red dragon.dat</code>.</p>
<p>We can make our script a little bit more robust by <strong>quoting</strong> our use of the variable:</p>
<pre><code>for filename in *.dat
do
head -100 "$filename" | tail -20
done</code></pre>
<p>but it’s simpler just to avoid using spaces (or other special characters) in filenames.</p>
</div>
</aside>
<p>Going back to our original file renaming problem, we can solve it using this loop:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">for</span> <span class="kw">filename</span> in *.dat
<span class="kw">do</span>
<span class="kw">mv</span> <span class="ot">$filename</span> original-<span class="ot">$filename</span>
<span class="kw">done</span></code></pre>
<p>This loop runs the <code>mv</code> command once for each filename. The first time, when <code>$filename</code> expands to <code>basilisk.dat</code>, the shell executes:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">mv</span> basilisk.dat original-basilisk.dat</code></pre>
<p>The second time, the command is:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">mv</span> unicorn.dat original-unicorn.dat</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="measure-twice-run-once"><span class="glyphicon glyphicon-pushpin"></span>Measure Twice, Run Once</h2>
</div>
<div class="panel-body">
<p>A loop is a way to do many things at once — or to make many mistakes at once if it does the wrong thing. One way to check what a loop <em>would</em> do is to echo the commands it would run instead of actually running them. For example, we could write our file renaming loop like this:</p>
<pre><code>for filename in *.dat
do
echo mv $filename original-$filename
done</code></pre>
<p>Instead of running <code>mv</code>, this loop runs <code>echo</code>, which prints out:</p>
<pre><code>mv basilisk.dat original-basilisk.dat
mv unicorn.dat original-unicorn.dat</code></pre>
<p><em>without</em> actually running those commands. We can then use up-arrow to redisplay the loop, back-arrow to get to the word <code>echo</code>, delete it, and then press “enter” to run the loop with the actual <code>mv</code> commands. This isn’t foolproof, but it’s a handy way to see what’s going to happen when you’re still learning how loops work.</p>
</div>
</aside>
<h2 id="nelles-pipeline-processing-files">Nelle’s Pipeline: Processing Files</h2>
<p>Nelle is now ready to process her data files. Since she’s still learning how to use the shell, she decides to build up the required commands in stages. Her first step is to make sure that she can select the right files — remember, these are ones whose names end in ‘A’ or ‘B’, rather than ‘Z’. Starting from her home directory, Nelle types:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">cd</span> north-pacific-gyre/2012-07-03
$ <span class="kw">for</span> <span class="kw">datafile</span> in *[AB].txt
<span class="kw">></span> <span class="kw">do</span>
<span class="kw">></span> <span class="kw">echo</span> <span class="ot">$datafile</span>
<span class="kw">></span> <span class="kw">done</span></code></pre>
<pre class="output"><code>NENE01729A.txt
NENE01729B.txt
NENE01736A.txt
...
NENE02043A.txt
NENE02043B.txt</code></pre>
<p>Her next step is to decide what to call the files that the <code>goostats</code> analysis program will create. Prefixing each input file’s name with “stats” seems simple, so she modifies her loop to do that:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">for</span> <span class="kw">datafile</span> in *[AB].txt
<span class="kw">></span> <span class="kw">do</span>
<span class="kw">></span> <span class="kw">echo</span> <span class="ot">$datafile</span> stats-<span class="ot">$datafile</span>
<span class="kw">></span> <span class="kw">done</span></code></pre>
<pre class="output"><code>NENE01729A.txt stats-NENE01729A.txt
NENE01729B.txt stats-NENE01729B.txt
NENE01736A.txt stats-NENE01736A.txt
...
NENE02043A.txt stats-NENE02043A.txt
NENE02043B.txt stats-NENE02043B.txt</code></pre>
<p>She hasn’t actually run <code>goostats</code> yet, but now she’s sure she can select the right files and generate the right output filenames.</p>
<p>Typing in commands over and over again is becoming tedious, though, and Nelle is worried about making mistakes, so instead of re-entering her loop, she presses the up arrow. In response, the shell redisplays the whole loop on one line (using semi-colons to separate the pieces):</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">for</span> <span class="kw">datafile</span> in *[AB].txt<span class="kw">;</span> <span class="kw">do</span> <span class="kw">echo</span> <span class="ot">$datafile</span> stats-<span class="ot">$datafile</span><span class="kw">;</span> <span class="kw">done</span></code></pre>
<p>Using the left arrow key, Nelle backs up and changes the command <code>echo</code> to <code>goostats</code>:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">for</span> <span class="kw">datafile</span> in *[AB].txt<span class="kw">;</span> <span class="kw">do</span> <span class="kw">bash</span> goostats <span class="ot">$datafile</span> stats-<span class="ot">$datafile</span><span class="kw">;</span> <span class="kw">done</span></code></pre>
<p>When she presses enter, the shell runs the modified command. However, nothing appears to happen — there is no output. After a moment, Nelle realizes that since her script doesn’t print anything to the screen any longer, she has no idea whether it is running, much less how quickly. She kills the job by typing Control-C, uses up-arrow to repeat the command, and edits it to read:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">for</span> <span class="kw">datafile</span> in *[AB].txt<span class="kw">;</span> <span class="kw">do</span> <span class="kw">echo</span> <span class="ot">$datafile</span><span class="kw">;</span> <span class="kw">bash</span> goostats <span class="ot">$datafile</span> stats-<span class="ot">$datafile</span><span class="kw">;</span> <span class="kw">done</span></code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="beginning-and-end"><span class="glyphicon glyphicon-pushpin"></span>Beginning and End</h2>
</div>
<div class="panel-body">
<p>We can move to the beginning of a line in the shell by typing <code>^A</code> (which means Control-A) and to the end using <code>^E</code>.</p>
</div>
</aside>
<p>When she runs her program now, it produces one line of output every five seconds or so:</p>
<pre class="output"><code>NENE01729A.txt
NENE01729B.txt
NENE01736A.txt
...</code></pre>
<p>1518 times 5 seconds, divided by 60, tells her that her script will take about two hours to run. As a final check, she opens another terminal window, goes into <code>north-pacific-gyre/2012-07-03</code>, and uses <code>cat stats-NENE01729B.txt</code> to examine one of the output files. It looks good, so she decides to get some coffee and catch up on her reading.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="those-who-know-history-can-choose-to-repeat-it"><span class="glyphicon glyphicon-pushpin"></span>Those Who Know History Can Choose to Repeat It</h2>
</div>
<div class="panel-body">
<p>Another way to repeat previous work is to use the <code>history</code> command to get a list of the last few hundred commands that have been executed, and then to use <code>!123</code> (where “123” is replaced by the command number) to repeat one of those commands. For example, if Nelle types this:</p>
<pre><code>$ history | tail -5
456 ls -l NENE0*.txt
457 rm stats-NENE01729B.txt.txt
458 bash goostats NENE01729B.txt stats-NENE01729B.txt
459 ls -l NENE0*.txt
460 history</code></pre>
<p>then she can re-run <code>goostats</code> on <code>NENE01729B.txt</code> simply by typing <code>!458</code>.</p>
</div>
</aside>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="variables-in-loops"><span class="glyphicon glyphicon-pencil"></span>Variables in Loops</h2>
</div>
<div class="panel-body">
<p>Suppose that <code>ls</code> initially displays:</p>
<pre><code>fructose.dat glucose.dat sucrose.dat</code></pre>
<p>What is the output of:</p>
<pre><code>for datafile in *.dat
do
ls *.dat
done</code></pre>
<p>Now, what is the output of:</p>
<pre><code>for datafile in *.dat
do
ls $datafile
done</code></pre>
<p>Why do these two loops give you different outputs?</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="saving-to-a-file-in-a-loop---part-one"><span class="glyphicon glyphicon-pencil"></span>Saving to a File in a Loop - Part One</h2>
</div>
<div class="panel-body">
<p>In the same directory, what is the effect of this loop?</p>
<pre><code>for sugar in *.dat
do
echo $sugar
cat $sugar > xylose.dat
done</code></pre>
<ol style="list-style-type: decimal">
<li>Prints <code>fructose.dat</code>, <code>glucose.dat</code>, and <code>sucrose.dat</code>, and copies <code>sucrose.dat</code> to create <code>xylose.dat</code>.</li>
<li>Prints <code>fructose.dat</code>, <code>glucose.dat</code>, and <code>sucrose.dat</code>, and concatenates all three files to create <code>xylose.dat</code>.</li>
<li>Prints <code>fructose.dat</code>, <code>glucose.dat</code>, <code>sucrose.dat</code>, and <code>xylose.dat</code>, and copies <code>sucrose.dat</code> to create <code>xylose.dat</code>.</li>
<li>None of the above.</li>
</ol>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="saving-to-a-file-in-a-loop---part-two"><span class="glyphicon glyphicon-pencil"></span>Saving to a File in a Loop - Part Two</h2>
</div>
<div class="panel-body">
<p>In another directory, where <code>ls</code> returns:</p>
<pre><code>fructose.dat glucose.dat sucrose.dat maltose.txt</code></pre>
<p>What would be the output of the following loop?</p>
<pre><code>for datafile in *.dat
do
cat $datafile >> sugar.dat
done</code></pre>
<ol style="list-style-type: decimal">
<li>All of the text from <code>fructose.dat</code>, <code>glucose.dat</code> and <code>sucrose.dat</code> would be concatenated and saved to a file called <code>sugar.dat</code>.</li>
<li>The text from <code>sucrose.dat</code> will be saved to a file called <code>sugar.dat</code>.</li>
<li>All of the text from <code>fructose.dat</code>, <code>glucose.dat</code>, <code>sucrose.dat</code> and <code>maltose.txt</code> would be concatenated and saved to a file called <code>sugar.dat</code>.</li>
<li>All of the text from <code>fructose.dat</code>, <code>glucose.dat</code> and <code>sucrose.dat</code> would be printed to the screen and saved to a file called <code>sugar.dat</code></li>
</ol>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="doing-a-dry-run"><span class="glyphicon glyphicon-pencil"></span>Doing a Dry Run</h2>
</div>
<div class="panel-body">
<p>Suppose we want to preview the commands the following loop will execute without actually running those commands:</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">for</span> <span class="kw">file</span> in *.dat
<span class="kw">do</span>
<span class="kw">analyze</span> <span class="ot">$file</span> <span class="kw">></span> analyzed-<span class="ot">$file</span>
<span class="kw">done</span></code></pre>
<p>What is the difference between the the two loops below, and which one would we want to run?</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="co"># Version 1</span>
<span class="kw">for</span> <span class="kw">file</span> in *.dat
<span class="kw">do</span>
<span class="kw">echo</span> analyze <span class="ot">$file</span> <span class="kw">></span> analyzed-<span class="ot">$file</span>
<span class="kw">done</span></code></pre>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="co"># Version 2</span>
<span class="kw">for</span> <span class="kw">file</span> in *.dat
<span class="kw">do</span>
<span class="kw">echo</span> <span class="st">"analyze </span><span class="ot">$file</span><span class="st"> > analyzed-</span><span class="ot">$file</span><span class="st">"</span>
<span class="kw">done</span></code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="nested-loops-and-command-line-expressions"><span class="glyphicon glyphicon-pencil"></span>Nested Loops and Command-Line Expressions</h2>
</div>
<div class="panel-body">
<p>The <code>expr</code> does simple arithmetic using command-line parameters:</p>
<pre><code>$ expr 3 + 5
8
$ expr 30 / 5 - 2
4</code></pre>
<p>Given this, what is the output of:</p>
<pre><code>for left in 2 3
do
for right in $left
do
expr $left + $right
done
done</code></pre>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/shell-novice">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>