diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9e13a29f..37eb7480 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,106 +1,57 @@ ## Contributing -[The Carpentries][cp-site] ([Software Carpentry][swc-site], [Data -Carpentry][dc-site], and [Library Carpentry][lc-site]) are open source -projects, and we welcome contributions of all kinds: new lessons, fixes to -existing material, bug reports, and reviews of proposed changes are all -welcome. +[The Carpentries][cp-site] ([Software Carpentry][swc-site], [Data Carpentry][dc-site], and [Library Carpentry][lc-site]) are open source projects, and we welcome contributions of all kinds: new lessons, fixes to existing material, bug reports, and reviews of proposed changes are all welcome. ### Contributor Agreement -By contributing, you agree that we may redistribute your work under [our -license](LICENSE.md). In exchange, we will address your issues and/or assess -your change proposal as promptly as we can, and help you become a member of our -community. Everyone involved in [The Carpentries][cp-site] agrees to abide by -our [code of conduct](CODE_OF_CONDUCT.md). +By contributing, you agree that we may redistribute your work under [our license](LICENSE.md). In exchange, we will address your issues and/or assess your change proposal as promptly as we can, and help you become a member of our community. Everyone involved in [The Carpentries][cp-site] agrees to abide by our [code of conduct](CODE_OF_CONDUCT.md). ### How to Contribute -The easiest way to get started is to file an issue to tell us about a spelling -mistake, some awkward wording, or a factual error. This is a good way to -introduce yourself and to meet some of our community members. +The easiest way to get started is to file an issue to tell us about a spelling mistake, some awkward wording, or a factual error. This is a good way to introduce yourself and to meet some of our community members. -1. If you do not have a [GitHub][github] account, you can [send us comments by - email][contact]. However, we will be able to respond more quickly if you use - one of the other methods described below. +1. If you do not have a [GitHub][github] account, you can [send us comments by email][contact]. However, we will be able to respond more quickly if you use one of the other methods described below. -2. If you have a [GitHub][github] account, or are willing to [create - one][github-join], but do not know how to use Git, you can report problems - or suggest improvements by [creating an issue][repo-issues]. This allows us - to assign the item to someone and to respond to it in a threaded discussion. +2. If you have a [GitHub][github] account, or are willing to [create one][github-join], but do not know how to use Git, you can report problems or suggest improvements by [creating an issue][repo-issues]. This allows us to assign the item to someone and to respond to it in a threaded discussion. -3. If you are comfortable with Git, and would like to add or change material, - you can submit a pull request (PR). Instructions for doing this are - [included below](#using-github). For inspiration about changes that need to - be made, check out the [list of open issues][issues] across the Carpentries. +3. If you are comfortable with Git, and would like to add or change material, you can submit a pull request (PR). Instructions for doing this are [included below](#using-github). For inspiration about changes that need to be made, check out the [list of open issues][issues] across the Carpentries. -Note: if you want to build the website locally, please refer to [The Workbench -documentation][template-doc]. +Note: if you want to build the website locally, please refer to [The Workbench documentation][template-doc]. ### Where to Contribute 1. If you wish to change this lesson, add issues and pull requests here. -2. If you wish to change the template used for workshop websites, please refer - to [The Workbench documentation][template-doc]. - +2. If you wish to change the template used for workshop websites, please refer to [The Workbench documentation][template-doc]. ### What to Contribute -There are many ways to contribute, from writing new exercises and improving -existing ones to updating or filling in the documentation and submitting [bug -reports][issues] about things that do not work, are not clear, or are missing. -If you are looking for ideas, please see [the list of issues for this -repository][repo-issues], or the issues for [Data Carpentry][dc-issues], -[Library Carpentry][lc-issues], and [Software Carpentry][swc-issues] projects. +There are many ways to contribute, from writing new exercises and improving existing ones to updating or filling in the documentation and submitting [bug reports][issues] about things that do not work, are not clear, or are missing. If you are looking for ideas, please see [the list of issues for this repository][repo-issues], or the issues for [Data Carpentry][dc-issues], [Library Carpentry][lc-issues], and [Software Carpentry][swc-issues] projects. -Comments on issues and reviews of pull requests are just as welcome: we are -smarter together than we are on our own. **Reviews from novices and newcomers -are particularly valuable**: it's easy for people who have been using these -lessons for a while to forget how impenetrable some of this material can be, so -fresh eyes are always welcome. +Comments on issues and reviews of pull requests are just as welcome: we are smarter together than we are on our own. **Reviews from novices and newcomers are particularly valuable**: it's easy for people who have been using these lessons for a while to forget how impenetrable some of this material can be, so fresh eyes are always welcome. ### What *Not* to Contribute -Our lessons already contain more material than we can cover in a typical -workshop, so we are usually *not* looking for more concepts or tools to add to -them. As a rule, if you want to introduce a new idea, you must (a) estimate how -long it will take to teach and (b) explain what you would take out to make room -for it. The first encourages contributors to be honest about requirements; the -second, to think hard about priorities. +Our lessons already contain more material than we can cover in a typical workshop, so we are usually *not* looking for more concepts or tools to add to them. As a rule, if you want to introduce a new idea, you must (a) estimate how long it will take to teach and (b) explain what you would take out to make room for it. The first encourages contributors to be honest about requirements; the second, to think hard about priorities. -We are also not looking for exercises or other material that only run on one -platform. Our workshops typically contain a mixture of Windows, macOS, and -Linux users; in order to be usable, our lessons must run equally well on all -three. +We are also not looking for exercises or other material that only run on one platform. Our workshops typically contain a mixture of Windows, macOS, and Linux users; in order to be usable, our lessons must run equally well on all three. ### Using GitHub -If you choose to contribute via GitHub, you may want to look at [How to -Contribute to an Open Source Project on GitHub][how-contribute]. In brief, we -use [GitHub flow][github-flow] to manage changes: +If you choose to contribute via GitHub, you may want to look at [How to Contribute to an Open Source Project on GitHub][how-contribute]. In brief, we use [GitHub flow][github-flow] to manage changes: -1. Create a new branch in your desktop copy of this repository for each - significant change. +1. Create a new branch in your desktop copy of this repository for each significant change. 2. Commit the change in that branch. 3. Push that branch to your fork of this repository on GitHub. 4. Submit a pull request from that branch to the [upstream repository][repo]. -5. If you receive feedback, make changes on your desktop and push to your - branch on GitHub: the pull request will update automatically. +5. If you receive feedback, make changes on your desktop and push to your branch on GitHub: the pull request will update automatically. NB: The published copy of the lesson is usually in the `main` branch. -Each lesson has a team of maintainers who review issues and pull requests or -encourage others to do so. The maintainers are community volunteers, and have -final say over what gets merged into the lesson. +Each lesson has a team of maintainers who review issues and pull requests or encourage others to do so. The maintainers are community volunteers, and have final say over what gets merged into the lesson. ### Other Resources -The Carpentries is a global organisation with volunteers and learners all over -the world. We share values of inclusivity and a passion for sharing knowledge, -teaching and learning. There are several ways to connect with The Carpentries -community listed at including via social -media, slack, newsletters, and email lists. You can also [reach us by -email][contact]. +The Carpentries is a global organisation with volunteers and learners all over the world. We share values of inclusivity and a passion for sharing knowledge, teaching and learning. There are several ways to connect with The Carpentries community listed at including via social media, slack, newsletters, and email lists. You can also [reach us by email][contact]. [repo]: https://github.com/LibraryCarpentry/lc-shell/ [repo-issues]: https://github.com/LibraryCarpentry/lc-shell/issues diff --git a/episodes/02-navigating-the-filesystem.md b/episodes/02-navigating-the-filesystem.md index a6865823..1afc800d 100644 --- a/episodes/02-navigating-the-filesystem.md +++ b/episodes/02-navigating-the-filesystem.md @@ -21,19 +21,11 @@ exercises: 10 We will begin with the basics of navigating the Unix shell. -Let's start by opening the shell. This likely results in seeing a black or white window with a cursor flashing next to a dollar sign. -This is our command line, and the `$` is the command **prompt** to show that the system is ready for our input. -The appearance of the prompt will vary from system to system, depending on how the set up has been configured. -Other common prompts include the `%` or `#` signs, but we will use `$` in this lesson to represent the prompt generally. - -When working in the shell, you are always *somewhere* in the computer's -file system, in some folder (directory). We will therefore start by finding out -where we are by using the `pwd` command, which you can use whenever you are unsure -about where you are. It stands for "print working directory" and the result of the -command is printed to your standard output, which is the screen. - -Let's type `pwd` and press enter to execute the command -(Note that the `$` sign is used to indicate a command to be typed on the command prompt, +Let's start by opening the shell. This likely results in seeing a black or white window with a cursor flashing next to a dollar sign. This is our command line, and the `$` is the command **prompt** to show that the system is ready for our input. The appearance of the prompt will vary from system to system, depending on how the set up has been configured. Other common prompts include the `%` or `#` signs, but we will use `$` in this lesson to represent the prompt generally. + +When working in the shell, you are always *somewhere* in the computer's file system, in some folder (directory). We will therefore start by finding out where we are by using the `pwd` command, which you can use whenever you are unsure about where you are. It stands for "print working directory" and the result of the command is printed to your standard output, which is the screen. + +Let's type `pwd` and press enter to execute the command (Note that the `$` sign is used to indicate a command to be typed on the command prompt, but we never type the `$` sign itself, just what follows after it.): ```bash @@ -44,8 +36,7 @@ $ pwd /Users/riley ``` -The output will be a path to your home directory. Let's check if we recognise it -by looking at the contents of the directory. To do that, we use the `ls` command. This stands for "list" and the result is a print out of all the contents in the directory: +The output will be a path to your home directory. Let's check if we recognise it by looking at the contents of the directory. To do that, we use the `ls` command. This stands for "list" and the result is a print out of all the contents in the directory: ```bash $ ls @@ -56,14 +47,9 @@ Applications Documents Library Music Public Desktop Downloads Movies Pictures ``` -We may want more information than just a list of files and directories. -We can get this by specifying various **flags** (also known as `options`, `parameters`, or, most frequently, -`arguments`) to go with our basic commands. -Arguments modify the workings of the command by telling the computer what sort of output or manipulation we want. +We may want more information than just a list of files and directories. We can get this by specifying various **flags** (also known as `options`, `parameters`, or, most frequently, `arguments`) to go with our basic commands. Arguments modify the workings of the command by telling the computer what sort of output or manipulation we want. -If we type `ls -l` and press enter, the computer returns a list of files that contains -information similar to what we would find in our Finder (Mac) or Explorer (Windows): -the size of the files in bytes, the date it was created or last modified, and the file name. +If we type `ls -l` and press enter, the computer returns a list of files that contains information similar to what we would find in our Finder (Mac) or Explorer (Windows): the size of the files in bytes, the date it was created or last modified, and the file name. ```bash $ ls -l @@ -81,14 +67,9 @@ drwx------+ 3 riley staff 102 Jul 16 11:30 Pictures drwxr-xr-x+ 5 riley staff 170 Jul 16 11:30 Public ``` -In everyday usage we are more accustomed to units of measurement like kilobytes, megabytes, and gigabytes. -Luckily, there's another flag `-h` that when used with the -l option, prints unit suffixes: -Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the -number of digits to three or fewer using base 2 for sizes. +In everyday usage we are more accustomed to units of measurement like kilobytes, megabytes, and gigabytes. Luckily, there's another flag `-h` that when used with the -l option, prints unit suffixes: Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the number of digits to three or fewer using base 2 for sizes. -Now `ls -h` won't work on its own. When we want to combine two flags, -we can just run them together. So, by typing `ls -lh` and pressing -enter we receive an output in a human-readable format (note: the order here doesn't matter). +Now `ls -h` won't work on its own. When we want to combine two flags, we can just run them together. So, by typing `ls -lh` and pressing enter we receive an output in a human-readable format (note: the order here doesn't matter). ```bash $ ls -lh @@ -106,17 +87,13 @@ drwx------+ 3 riley staff 102B Jul 16 11:30 Pictures drwxr-xr-x+ 5 riley staff 170B Jul 16 11:30 Public ``` -We've now spent a great deal of time in our home directory. -Let's go somewhere else. We can do that through the `cd` or Change Directory command: -(Note: On Windows and Mac, by default, the case of the file/directory doesn't matter. -On Linux it does.) +We've now spent a great deal of time in our home directory. Let's go somewhere else. We can do that through the `cd` or Change Directory command: (Note: On Windows and Mac, by default, the case of the file/directory doesn't matter. On Linux it does.) ```bash $ cd Desktop ``` -Notice that the command didn't output anything. This means that it was carried -out successfully. Let's check by using `pwd`: +Notice that the command didn't output anything. This means that it was carried out successfully. Let's check by using `pwd`: ```bash $ pwd @@ -126,8 +103,7 @@ $ pwd /Users/riley/Desktop ``` -If something had gone wrong, however, the command would have told you. Let's -test that by trying to move into a non-existent directory: +If something had gone wrong, however, the command would have told you. Let's test that by trying to move into a non-existent directory: ```bash $ cd "things to learn about the shell" @@ -137,15 +113,9 @@ $ cd "things to learn about the shell" bash: cd: things to learn about the shell: No such file or directory ``` -Notice that we surrounded the name by quotation marks. The *arguments* given -to any shell command are separated by spaces, so a way to let them know that -we mean 'one single thing called "things to learn about the shell"', not -'six different things', is to use (single or double) quotation marks. +Notice that we surrounded the name by quotation marks. The *arguments* given to any shell command are separated by spaces, so a way to let them know that we mean 'one single thing called "things to learn about the shell"', not 'six different things', is to use (single or double) quotation marks. -We've now seen how we can go 'down' through our directory structure -(as in into more nested directories). If we want to go back, we can type `cd ..`. -This moves us 'up' one directory, putting us back where we started. -**If we ever get completely lost, the command `cd` without any arguments will bring +We've now seen how we can go 'down' through our directory structure (as in into more nested directories). If we want to go back, we can type `cd ..`. This moves us 'up' one directory, putting us back where we started. **If we ever get completely lost, the command `cd` without any arguments will bring us right back to the home directory, the place where we started.** ::::::::::::::::::::::::::::::::::::::::: callout @@ -160,14 +130,9 @@ To switch back and forth between two directories use `cd -`. ## Try exploring -Move around the computer, get used to moving in and out of directories, -see how different file types appear in the Unix shell. Be sure to use the `pwd` and -`cd` commands, and the different flags for the `ls` command you learned so far. +Move around the computer, get used to moving in and out of directories, see how different file types appear in the Unix shell. Be sure to use the `pwd` and `cd` commands, and the different flags for the `ls` command you learned so far. -If you run Windows, -also try typing `explorer .` to open Explorer for the current directory -(the single dot means "current directory"). If you're on a Mac, -try `open .` and for Linux try `xdg-open .` to open their graphical file manager. +If you run Windows, also try typing `explorer .` to open Explorer for the current directory (the single dot means "current directory"). If you're on a Mac, try `open .` and for Linux try `xdg-open .` to open their graphical file manager. :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -178,17 +143,11 @@ As we become more comfortable, we can get very quickly to the directory that we ## Getting help -Use the `man` command to invoke the manual page (documentation) for a shell command. -For example, `man ls` displays all the arguments available to you - which saves -you remembering them all! Try this for each command you've learned so far. -Use the spacebar to navigate the manual pages. Use q at any time to quit. +Use the `man` command to invoke the manual page (documentation) for a shell command. For example, `man ls` displays all the arguments available to you - which saves you remembering them all! Try this for each command you've learned so far. Use the spacebar to navigate the manual pages. Use q at any time to quit. -***Note*: this command is for Mac and Linux users only**. It does not work directly for Windows users. -If you use Windows, you can search for the shell command on [http://man.he.net/](https://man.he.net/), -and view the associated manual page. In some systems the command name followed by `--help` will work, e.g. `ls --help`. +***Note*: this command is for Mac and Linux users only**. It does not work directly for Windows users. If you use Windows, you can search for the shell command on [http://man.he.net/](https://man.he.net/), and view the associated manual page. In some systems the command name followed by `--help` will work, e.g. `ls --help`. -Also, the manual lists commands individually, e.g., although `-h` can only be used together with the `-l` option, -you'll find it listed as `-h` in the manual, not as `-lh`. +Also, the manual lists commands individually, e.g., although `-h` can only be used together with the `-l` option, you'll find it listed as `-h` in the manual, not as `-lh`. ::::::::::::::: solution @@ -247,13 +206,9 @@ BSD May 19, 2002 BSD ## Find out about advanced `ls` commands -Find out, using the manual page, how to list the files in a -directory ordered by their filesize. Try it out in different directories. Can you combine it -with the `-l` *argument* you learned before? +Find out, using the manual page, how to list the files in a directory ordered by their filesize. Try it out in different directories. Can you combine it with the `-l` *argument* you learned before? -Afterwards, -find out how you can order a list of files based on their last modification date. -Try ordering files in different directories. +Afterwards, find out how you can order a list of files based on their last modification date. Try ordering files in different directories. ::::::::::::::: solution diff --git a/episodes/03-working-with-files-and-folders.md b/episodes/03-working-with-files-and-folders.md index ca802d95..323b1e3c 100644 --- a/episodes/03-working-with-files-and-folders.md +++ b/episodes/03-working-with-files-and-folders.md @@ -22,13 +22,7 @@ exercises: 10 ## Working with files and folders -As well as navigating directories, we can interact with files on the command line: -we can read them, open them, run them, and even edit them. In fact, there's really -no limit to what we *can* do in the shell, but even experienced shell users still switch to -graphical user interfaces (GUIs) for many tasks, such as editing formatted text -documents (Word or OpenOffice), browsing the web, editing images, etc. But if we -wanted to make the same crop on hundreds of images, say, the pages of a scanned book, -then we could automate that cropping work by using shell commands. +As well as navigating directories, we can interact with files on the command line: we can read them, open them, run them, and even edit them. In fact, there's really no limit to what we *can* do in the shell, but even experienced shell users still switch to graphical user interfaces (GUIs) for many tasks, such as editing formatted text documents (Word or OpenOffice), browsing the web, editing images, etc. But if we wanted to make the same crop on hundreds of images, say, the pages of a scanned book, then we could automate that cropping work by using shell commands. Before getting started, we will use `ls` to list the contents of our current directory. Using `ls` periodically to view your options is useful to orient oneself. @@ -41,8 +35,7 @@ Applications Documents Library Music Public Desktop Downloads Movies Pictures ``` -We will try a few basic ways to interact with files. Let's first move into the -`shell-lesson` directory on your desktop. +We will try a few basic ways to interact with files. Let's first move into the `shell-lesson` directory on your desktop. ```bash $ cd @@ -61,8 +54,7 @@ $ mkdir firstdir $ cd firstdir ``` -Here we used the `mkdir` command (meaning 'make directories') to create a directory -named 'firstdir'. Then we moved into that directory using the `cd` command. +Here we used the `mkdir` command (meaning 'make directories') to create a directory named 'firstdir'. Then we moved into that directory using the `cd` command. But wait! There's a trick to make things a bit quicker. Let's go up one directory. @@ -70,19 +62,13 @@ But wait! There's a trick to make things a bit quicker. Let's go up one director $ cd .. ``` -Instead of typing `cd firstdir`, let's try to type `cd f` and then press the Tab key. -We notice that the shell completes the line to `cd firstdir/`. +Instead of typing `cd firstdir`, let's try to type `cd f` and then press the Tab key. We notice that the shell completes the line to `cd firstdir/`. ::::::::::::::::::::::::::::::::::::::::: callout ## Tab for Auto-complete -Pressing tab at any time within the shell will prompt it to attempt to auto-complete -the line based on the files or sub-directories in the current directory. -Where two or more files have the same characters, the auto-complete will only fill up to the -first point of difference, after which we can add more characters, and -try using tab again. We would encourage using this method throughout -today to see how it behaves (as it saves loads of time and effort!). +Pressing tab at any time within the shell will prompt it to attempt to auto-complete the line based on the files or sub-directories in the current directory. Where two or more files have the same characters, the auto-complete will only fill up to the first point of difference, after which we can add more characters, and try using tab again. We would encourage using this method throughout today to see how it behaves (as it saves loads of time and effort!). :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -90,9 +76,7 @@ today to see how it behaves (as it saves loads of time and effort!). If you are in `firstdir`, use `cd ..` to get back to the `shell-lesson` directory. -Here there are copies of two public domain books downloaded from -[Project Gutenberg](https://www.gutenberg.org/) along with other files we will -cover later. +Here there are copies of two public domain books downloaded from [Project Gutenberg](https://www.gutenberg.org/) along with other files we will cover later. ```bash $ ls -lh @@ -111,21 +95,15 @@ total 33M drwxr-xr-x 1 riley staff 64B Feb 22 2017 firstdir ``` -The files `829-0.txt` and `33504-0.txt` holds the content of book #829 -and #33504 on Project Gutenberg. But we've forgot *which* books, so -we try the `cat` command to read the text of the first file: +The files `829-0.txt` and `33504-0.txt` holds the content of book #829 and #33504 on Project Gutenberg. But we've forgot *which* books, so we try the `cat` command to read the text of the first file: ```bash $ cat 829-0.txt ``` -The terminal window erupts and the whole book cascades by (it is printed to -your terminal), leaving us with a new prompt and the last few lines of the book -above this prompt. +The terminal window erupts and the whole book cascades by (it is printed to your terminal), leaving us with a new prompt and the last few lines of the book above this prompt. -Often we just want a quick glimpse of the first or the last part of a file to -get an idea about what the file is about. To let us do that, the Unix shell -provides us with the commands `head` and `tail`. +Often we just want a quick glimpse of the first or the last part of a file to get an idea about what the file is about. To let us do that, the Unix shell provides us with the commands `head` and `tail`. ```bash $ head 829-0.txt @@ -159,38 +137,25 @@ Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks. ``` -If ten lines is not enough (or too much), we would check `man head`(or `head --help` when using Windows) -to see if there exists an option to specify the number of lines to get -(there is: `head -n 20` will print 20 lines). +If ten lines is not enough (or too much), we would check `man head` (or `head --help` when using Windows) to see if there exists an option to specify the number of lines to get (there is: `head -n 20` will print 20 lines). -Another way to navigate files is to view the contents one screen at a time. -Type `less 829-0.txt` to see the first screen, `spacebar` to see the -next screen and so on, then `q` to quit (return to the command prompt). +Another way to navigate files is to view the contents one screen at a time. Type `less 829-0.txt` to see the first screen, `spacebar` to see the next screen and so on, then `q` to quit (return to the command prompt). ```bash $ less 829-0.txt ``` -Like many other shell commands, the commands `cat`, `head`, `tail` and `less` -can take any number of arguments (they can work with any number of files). -We will see how we can get the first lines of several files at once. -To save some typing, we introduce a very useful trick first. +Like many other shell commands, the commands `cat`, `head`, `tail` and `less` can take any number of arguments (they can work with any number of files). We will see how we can get the first lines of several files at once. To save some typing, we introduce a very useful trick first. ::::::::::::::::::::::::::::::::::::::::: callout ## Re-using commands -On a blank command prompt, press the up arrow key and notice that the previous -command you typed appears before your cursor. We can continue pressing the -up arrow to cycle through your previous commands. The down arrow cycles back -toward your most recent command. This is another important labour-saving -function and something we'll use a lot. +On a blank command prompt, press the up arrow key and notice that the previous command you typed appears before your cursor. We can continue pressing the up arrow to cycle through your previous commands. The down arrow cycles back toward your most recent command. This is another important labour-saving function and something we'll use a lot. :::::::::::::::::::::::::::::::::::::::::::::::::: -Press the up arrow until you get to the `head 829-0.txt` command. Add a space -and then `33504-0.txt` (Remember your friend Tab? Type `3` followed by Tab to -get `33504-0.txt`), to produce the following command: +Press the up arrow until you get to the `head 829-0.txt` command. Add a space and then `33504-0.txt` (Remember your friend Tab? Type `3` followed by tab to get `33504-0.txt`), to produce the following command: ```bash $ head 829-0.txt 33504-0.txt @@ -236,12 +201,7 @@ $ head *.txt ## More on wildcards -Wildcards are a feature of the shell and will therefore work with *any* command. -The shell will expand wildcards to a list of files and/or directories before -the command is executed, and the command will never see the wildcards. -As an exception, if a wildcard expression does not match any file, Bash -will pass the expression as a parameter to the command as it is. For example -typing `ls *.pdf` results in an error message that there is no file called \*.pdf. +Wildcards are a feature of the shell and will therefore work with *any* command. The shell will expand wildcards to a list of files and/or directories before the command is executed, and the command will never see the wildcards. As an exception, if a wildcard expression does not match any file, Bash will pass the expression as a parameter to the command as it is. For example typing `ls *.pdf` results in an error message that there is no file called \*.pdf. :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -249,10 +209,7 @@ typing `ls *.pdf` results in an error message that there is no file called \*.pd ### Moving, copying and deleting files -We may also want to change the file name to something more descriptive. -We can **move** it to a new name by using the `mv` or move command, -giving it the old name as the first argument and the new name as the second -argument: +We may also want to change the file name to something more descriptive. We can **move** it to a new name by using the `mv` or move command, giving it the old name as the first argument and the new name as the second argument: ```bash $ mv 829-0.txt gulliver.txt @@ -276,11 +233,7 @@ $ ls ## Copying a file -Instead of *moving* a file, you might want to *copy* a file (make a duplicate), -for instance to make a backup before modifying a file. -Just like the `mv` command, the `cp` command takes two arguments: the old name -and the new name. How would you make a copy of the file `gulliver.txt` called -`gulliver-backup.txt`? Try it! +Instead of *moving* a file, you might want to *copy* a file (make a duplicate), for instance to make a backup before modifying a file. Just like the `mv` command, the `cp` command takes two arguments: the old name and the new name. How would you make a copy of the file `gulliver.txt` called `gulliver-backup.txt`? Try it! ::::::::::::::: solution @@ -298,8 +251,7 @@ cp gulliver.txt gulliver-backup.txt ## Renaming a directory -Renaming a directory works in the same way as renaming a file. Try using the -`mv` command to rename the `firstdir` directory to `backup`. +Renaming a directory works in the same way as renaming a file. Try using the `mv` command to rename the `firstdir` directory to `backup`. ::::::::::::::: solution @@ -317,10 +269,7 @@ mv firstdir backup ## Moving a file into a directory -If the last argument you give to the `mv` command is a directory, not a file, -the file given in the first argument will be moved to that directory. Try -using the `mv` command to move the file `gulliver-backup.txt` into the -`backup` folder. +If the last argument you give to the `mv` command is a directory, not a file, the file given in the first argument will be moved to that directory. Try using the `mv` command to move the file `gulliver-backup.txt` into the `backup` folder. ::::::::::::::: solution @@ -344,12 +293,9 @@ mv gulliver-backup.txt backup/gulliver-backup.txt ## The wildcards and regular expressions -The `?` wildcard matches one character. The `*` wildcard matches zero or -more characters. If you attended the lesson on regular expressions, do you -remember how you would express that as regular expressions? +The `?` wildcard matches one character. The `*` wildcard matches zero or more characters. If you attended the lesson on regular expressions, do you remember how you would express that as regular expressions? -(Regular expressions are not a feature of the shell, but some commands support -them. We'll get back to that.) +(Regular expressions are not a feature of the shell, but some commands support them. We'll get back to that.) ::::::::::::::: solution @@ -366,18 +312,7 @@ them. We'll get back to that.) ## Using `history` -Use the `history` command to see a list of all the commands you've entered during the -current session. You can also use Ctrl + r to do a reverse lookup. Press Ctrl + r, -then start typing any part of the command you're looking for. The past command will -autocomplete. Press `enter` to run the command again, or press the arrow keys to start -editing the command. If multiple past commands contain the text you input, you can -Ctrl + r repeatedly to cycle through them. If you can't find what you're looking for -in the reverse lookup, use Ctrl + c to return to the prompt. If you want to save -your history, maybe to extract some commands from which to build a script later on, you -can do that with `history > history.txt`. This will output all history to a text file -called `history.txt` that you can later edit. To recall a command from history, enter -`history`. Note the command number, e.g. 2045. Recall the command by entering -`!2045`. This will execute the command. +Use the `history` command to see a list of all the commands you've entered during the current session. You can also use Ctrl + r to do a reverse lookup. Press Ctrl + r, then start typing any part of the command you're looking for. The past command will autocomplete. Press `enter` to run the command again, or press the arrow keys to start editing the command. If multiple past commands contain the text you input, you can Ctrl + r repeatedly to cycle through them. If you can't find what you're looking for in the reverse lookup, use Ctrl + c to return to the prompt. If you want to save your history, maybe to extract some commands from which to build a script later on, you can do that with `history > history.txt`. This will output all history to a text file called `history.txt` that you can later edit. To recall a command from history, enter `history`. Note the command number, e.g. 2045. Recall the command by entering `!2045`. This will execute the command. :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -385,18 +320,11 @@ called `history.txt` that you can later edit. To recall a command from history, ## Using the `echo` command -The `echo` command simply prints out a text you specify. Try it out: `echo 'Library Carpentry is awesome!'`. -Interesting, isn't it? +The `echo` command simply prints out a text you specify. Try it out: `echo 'Library Carpentry is awesome!'`. Interesting, isn't it? -You can also specify a variable. First type `NAME=` followed by your name, and press enter. -Then type `echo "$NAME is a fantastic library carpentry student"` and press enter. What happens? +You can also specify a variable. First type `NAME=` followed by your name, and press enter. Then type `echo "$NAME is a fantastic library carpentry student"` and press enter. What happens? -You can combine both text and normal shell commands using `echo`, for example the -`pwd` command you have learned earlier today. You do this by enclosing a shell -command in `$(` and `)`, for instance `$(pwd)`. Now, try out the following: -`echo "Finally, it is nice and sunny on" $(date)`. -Note that the output of the `date` command is printed together with the text -you specified. You can try the same with some of the other commands you have learned so far. +You can combine both text and normal shell commands using `echo`, for example the `pwd` command you have learned earlier today. You do this by enclosing a shell command in `$(` and `)`, for instance `$(pwd)`. Now, try out the following: `echo "Finally, it is nice and sunny on" $(date)`. Note that the output of the `date` command is printed together with the text you specified. You can try the same with some of the other commands you have learned so far. **Why do you think the echo command is actually quite important in the shell environment?** @@ -404,26 +332,19 @@ you specified. You can try the same with some of the other commands you have lea ## Answer -You may think there is not much value in such a basic command like `echo`. However, from the moment you -start writing automated shell scripts, it becomes very useful. For instance, you often need -to output text to the screen, such as the current status of a script. +You may think there is not much value in such a basic command like `echo`. However, from the moment you start writing automated shell scripts, it becomes very useful. For instance, you often need to output text to the screen, such as the current status of a script. -Moreover, you just used a shell variable for the first time, which can be used to temporarily store information, -that you can reuse later on. It will give many opportunities from the moment you start writing automated scripts. +Moreover, you just used a shell variable for the first time, which can be used to temporarily store information, that you can reuse later on. It will give many opportunities from the moment you start writing automated scripts. ::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::::::::: -Finally, onto deleting. We won't use it now, but if you do want to delete a file, -for whatever reason, the command is `rm`, or remove. +Finally, onto deleting. We won't use it now, but if you do want to delete a file, for whatever reason, the command is `rm`, or remove. -Using wildcards, we can even delete lots of files. And adding the `-r` flag we -can delete folders with all their content. +Using wildcards, we can even delete lots of files. And adding the `-r` flag we can delete folders with all their content. -**Unlike deleting from within our graphical user interface, there is *no* warning, -*no* recycling bin from which you can get the files back and no other undo options!** -For that reason, please be very careful with `rm` and extremely careful with `rm -r`. +**Unlike deleting from within our graphical user interface, there is *no* warning, *no* recycling bin from which you can get the files back and no other undo options!** For that reason, please be very careful with `rm` and extremely careful with `rm -r`. :::::::::::::::::::::::::::::::::::::::: keypoints diff --git a/episodes/04-loops.md b/episodes/04-loops.md index bdc13497..fef6c29e 100644 --- a/episodes/04-loops.md +++ b/episodes/04-loops.md @@ -20,15 +20,9 @@ exercises: 10 ## Writing a Loop -**Loops** are key to productivity improvements through automation as they allow us to execute -commands repetitively. Similar to wildcards and tab completion, using loops also reduces the -amount of typing (and typing mistakes). -Suppose we have several hundred document files named `project_1825.txt`, `project_1863.txt`, `XML_project.txt` and so on. -We would like to change these files, but also save a version of the original files, naming the copies -`backup_project_1825.txt` and so on. +**Loops** are key to productivity improvements through automation as they allow us to execute commands repetitively. Similar to wildcards and tab completion, using loops also reduces the amount of typing (and typing mistakes). Suppose we have several hundred document files named `project_1825.txt`, `project_1863.txt`, `XML_project.txt` and so on. We would like to change these files, but also save a version of the original files, naming the copies `backup_project_1825.txt` and so on. -We can use a **loop** to do that. -Here's a simple example that creates a backup copy of four text files in turn. +We can use a **loop** to do that. Here's a simple example that creates a backup copy of four text files in turn. Let's first create those files: @@ -64,50 +58,23 @@ c.txt d.txt ``` -When the shell sees the keyword `for`, -it knows to repeat a command (or group of commands) once for each thing `in` a list. -For each iteration, -the name of each thing is sequentially assigned to -the **loop variable** and the commands inside the loop are executed before moving on to -the next thing in the list. -Inside the loop, -we call for the variable's value by putting `$` in front of it. -The `$` tells the shell interpreter to treat -the **variable** as a variable name and substitute its value in its place, -rather than treat it as text or an external command. +When the shell sees the keyword `for`, it knows to repeat a command (or group of commands) once for each thing `in` a list. For each iteration, the name of each thing is sequentially assigned to the **loop variable** and the commands inside the loop are executed before moving on to the next thing in the list. Inside the loop, we call for the variable's value by putting `$` in front of it. The `$` tells the shell interpreter to treat the **variable** as a variable name and substitute its value in its place, rather than treat it as text or an external command. ::::::::::::::::::::::::::::::::::::::::: callout ## Double-quoting variable substitutions -Because real-world filenames often contain white-spaces, -we wrap `$filename` in double quotes (`"`). If we didn't, the -shell would treat the white-space within a filename as a separator -between two different filenames, which usually results in errors. -Therefore, it's best and generally safer to use `"$..."` unless -you are absolutely sure that no elements with white-space can ever -enter your loop variable (such as in [episode 5](05-counting-mining.md)). +Because real-world filenames often contain white-spaces, we wrap `$filename` in double quotes (`"`). If we didn't, the shell would treat the white-space within a filename as a separator between two different filenames, which usually results in errors. Therefore, it's best and generally safer to use `"$..."` unless you are absolutely sure that no elements with white-space can ever enter your loop variable (such as in [episode 5](05-counting-mining.md)). :::::::::::::::::::::::::::::::::::::::::::::::::: -In this example, the list is four filenames: 'a.txt', 'b.txt', 'c.txt', and 'd.txt' -Each time the loop iterates, it will assign a file name to the variable `filename` -and run the `cp` command. -The first time through the loop, -`$filename` is `a.txt`. -The interpreter prints the filename to the screen and then runs the command `cp` on `a.txt`, (because we asked it to echo each filename as it works its way through the loop). -For the second iteration, `$filename` becomes -`b.txt`. This time, the shell prints the filename `b.txt` to the screen, then runs `cp` on `b.txt`. The loop performs the same operations for `c.txt` and then for `d.txt` and then, since -the list only included these four items, the shell exits the `for` loop at that point. +In this example, the list is four filenames: 'a.txt', 'b.txt', 'c.txt', and 'd.txt'. Each time the loop iterates, it will assign a file name to the variable `filename` and run the `cp` command. The first time through the loop, `$filename` is `a.txt`. The interpreter prints the filename to the screen and then runs the command `cp` on `a.txt`, (because we asked it to echo each filename as it works its way through the loop). For the second iteration, `$filename` becomes `b.txt`. This time, the shell prints the filename `b.txt` to the screen, then runs `cp` on `b.txt`. The loop performs the same operations for `c.txt` and then for `d.txt` and then, since the list only included these four items, the shell exits the `for` loop at that point. ::::::::::::::::::::::::::::::::::::::::: callout ## Follow the Prompt -The shell prompt changes from `$` to `>` and back again as we were -typing in our loop. The second prompt, `>`, is different to remind -us that we haven't finished typing a complete command yet. A semicolon, `;`, -can be used to separate two commands written on a single line. +The shell prompt changes from `$` to `>` and back again as we were typing in our loop. The second prompt, `>`, is different to remind us that we haven't finished typing a complete command yet. A semicolon, `;`, can be used to separate two commands written on a single line. :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -115,33 +82,23 @@ can be used to separate two commands written on a single line. ## Same Symbols, Different Meanings -Here we see `>` being used as a shell prompt, but `>` can also be -used to redirect output from a command (i.e. send it somewhere else, such as to a file, instead of displaying the output in the terminal) --- -we'll use redirection in [episode 5](05-counting-mining.md). -Similarly, `$` is used as a shell prompt, but, as we saw earlier, -it is also used to ask the shell to get the value of a variable. +Here we see `>` being used as a shell prompt, but `>` can also be used to redirect output from a command (i.e. send it somewhere else, such as to a file, instead of displaying the output in the terminal) --- we'll use redirection in [episode 5](05-counting-mining.md). Similarly, `$` is used as a shell prompt, but, as we saw earlier, it is also used to ask the shell to get the value of a variable. -If the *shell* prints `>` or `$` then it expects you to type something, -and the symbol is a prompt. +If the *shell* prints `>` or `$` then it expects you to type something, and the symbol is a prompt. -If *you* type `>` in the shell, it is an instruction from you to -the shell to redirect output. +If *you* type `>` in the shell, it is an instruction from you to the shell to redirect output. -If *you* type `$` in the shell, it is an instruction from you to -the shell to get the value of a variable. +If *you* type `$` in the shell, it is an instruction from you to the shell to get the value of a variable. :::::::::::::::::::::::::::::::::::::::::::::::::: -We have called the variable in this loop `filename` -in order to make its purpose clearer to human readers. -The shell itself doesn't care what the variable is called. +We have called the variable in this loop `filename` in order to make its purpose clearer to human readers. The shell itself doesn't care what the variable is called. ::::::::::::::::::::::::::::::::::::::: challenge ## For loop exercise -Complete the blanks in the for loop below to print the name, first line, and last line -of each text file in the current directory. +Complete the blanks in the for loop below to print the name, first line, and last line of each text file in the current directory. ```bash ___ file in *.txt @@ -169,8 +126,7 @@ done :::::::::::::::::::::::::::::::::::::::::::::::::: -This is our first look at loops. We will run another loop in the -[Counting and Mining with the Shell](05-counting-mining.md) episode. +This is our first look at loops. We will run another loop in the [Counting and Mining with the Shell](05-counting-mining.md) episode. ![](fig/shell_script_for_loop_flow_chart.svg){alt='For Loop in Action'} @@ -178,17 +134,7 @@ This is our first look at loops. We will run another loop in the ## Running the loop from a Bash script -Alternatively, rather than running the loop above on the command line, you can -save it in a script file and run it from the command line without having to rewrite -the loop again. This is what is called a Bash script which is a plain text file that -contains a series of commands like the loop you created above. In the example script below, -the first line of the file contains what is called a Shebang (`#!`) followed by the path to the interpreter -(or program) that will run the rest of the lines in the file (`/bin/bash`). The second line demonstrates how -comments are made in scripts. This provides you with more information about what the script does. -The remaining lines contain the loop you created above. You can create this file in the same directory -you've been using for the lesson and by using the text editor of your choice (e.g. nano) but when you save the -file, make sure it has the extension **.sh** (e.g. `my_first_bash_script.sh`). When you've done this, you can run the -Bash script by typing the command bash and the file name via the command line (e.g. `bash my_first_bash_script.sh`). +Alternatively, rather than running the loop above on the command line, you can save it in a script file and run it from the command line without having to rewrite the loop again. This is what is called a Bash script which is a plain text file that contains a series of commands like the loop you created above. In the example script below, the first line of the file contains what is called a Shebang (`#!`) followed by the path to the interpreter (or program) that will run the rest of the lines in the file (`/bin/bash`). The second line demonstrates how comments are made in scripts. This provides you with more information about what the script does. The remaining lines contain the loop you created above. You can create this file in the same directory you've been using for the lesson and by using the text editor of your choice (e.g. nano) but when you save the file, make sure it has the extension **.sh** (e.g. `my_first_bash_script.sh`). When you've done this, you can run the Bash script by typing the command bash and the file name via the command line (e.g. `bash my_first_bash_script.sh`). ``` #!/bin/bash diff --git a/episodes/05-counting-mining.md b/episodes/05-counting-mining.md index 84032081..1cc3c5b2 100644 --- a/episodes/05-counting-mining.md +++ b/episodes/05-counting-mining.md @@ -26,27 +26,19 @@ exercises: 30 ## Counting and mining data -Now that you know how to navigate the shell, we will move onto -learning how to count and mine data using a few of the standard shell commands. -While these commands are unlikely to revolutionise your work by themselves, -they're very versatile and will add to your foundation for working in the shell -and for learning to code. The commands also replicate the sorts of uses library users might make of library data. +Now that you know how to navigate the shell, we will move onto learning how to count and mine data using a few of the standard shell commands. While these commands are unlikely to revolutionise your work by themselves, they're very versatile and will add to your foundation for working in the shell and for learning to code. The commands also replicate the sorts of uses library users might make of library data. ## Counting and sorting -We will begin by counting the contents of files using the Unix shell. -We can use the Unix shell to quickly generate counts from across files, -something that is tricky to achieve using the graphical user interfaces of standard office suites. +We will begin by counting the contents of files using the Unix shell. We can use the Unix shell to quickly generate counts from across files, something that is tricky to achieve using the graphical user interfaces of standard office suites. -Let's start by navigating to the directory that contains our data using the -`cd` command: +Let's start by navigating to the directory that contains our data using the `cd` command: ```bash $ cd shell-lesson ``` -Remember, if at any time you are not sure where you are in your directory structure, -use the `pwd` command to find out: +Remember, if at any time you are not sure where you are in your directory structure, use the `pwd` command to find out: ```bash $ pwd @@ -56,8 +48,7 @@ $ pwd /Users/riley/Desktop/shell-lesson ``` -And let's just check what files are in the directory and how large they -are with `ls -lhS`: +And let's just check what files are in the directory and how large they are with `ls -lhS`: ```bash $ ls -lhS @@ -74,42 +65,27 @@ total 139M drwxr-xr-x 2 riley staff 68 Feb 2 00:58 backup ``` -In this episode we'll focus on the dataset `2014-01_JA.tsv`, that contains -journal article metadata, and the three `.tsv` files derived from the original -dataset. Each of these three `.tsv` files includes all data where a keyword such -as `africa` or `america` appears in the 'Title' field of `2014-01_JA.tsv`. +In this episode we'll focus on the dataset `2014-01_JA.tsv`, that contains journal article metadata, and the three `.tsv` files derived from the original dataset. Each of these three `.tsv` files includes all data where a keyword such as `africa` or `america` appears in the 'Title' field of `2014-01_JA.tsv`. ::::::::::::::::::::::::::::::::::::::::: callout ## CSV and TSV Files -CSV (Comma-separated values) is a common plain text format for storing tabular -data, where each record occupies one line and the values are separated by commas. -TSV (Tab-separated values) is just the same except that values are separated by -tabs rather than commas. Confusingly, CSV is sometimes used to refer to both CSV, -TSV and variations of them. The simplicity of the formats make them great for -exchange and archival. They are not bound to a specific program (unlike Excel -files, say, there is no `CSV` program, just lots and lots of programs that -support the format, including Excel by the way.), and you wouldn't have any -problems opening a 40 year old file today if you came across one. +CSV (Comma-separated values) is a common plain text format for storing tabular data, where each record occupies one line and the values are separated by commas. TSV (Tab-separated values) is just the same except that values are separated by tabs rather than commas. Confusingly, CSV is sometimes used to refer to both CSV, TSV and variations of them. The simplicity of the formats make them great for exchange and archival. They are not bound to a specific program (unlike Excel files, say, there is no `CSV` program, just lots and lots of programs that support the format, including Excel by the way.), and you wouldn't have any problems opening a 40 year old file today if you came across one. :::::::::::::::::::::::::::::::::::::::::::::::::: -First, let's have a look at the largest data file, using the tools we learned in -[Reading files](03-working-with-files-and-folders.md): +First, let's have a look at the largest data file, using the tools we learned in [Reading files](03-working-with-files-and-folders.md): ```bash $ cat 2014-01_JA.tsv ``` -Like `829-0.txt` before, the whole dataset cascades by and can't really make any -sense of that amount of text. To cancel this on-going con`cat`enation, or indeed any -process in the Unix shell, press Ctrl\+C. +Like `829-0.txt` before, the whole dataset cascades by and can't really make any sense of that amount of text. To cancel this on-going con`cat`enation, or indeed any process in the Unix shell, press Ctrl\+C. -In most data files a quick glimpse of the first few lines already tells us a lot -about the structure of the dataset, for example the table/column headers: +In most data files a quick glimpse of the first few lines already tells us a lot about the structure of the dataset, for example the table/column headers: ```bash $ head -n 3 2014-01_JA.tsv @@ -124,11 +100,7 @@ History_1a-rdf.tsv Nelson, M. C. 1 59 KIVA -ARIZONA- 0023-1940 (Uk)RN00157187 In the header, we can see the common metadata fields of academic papers: `Creator`, `Issue`, `Citation`, etc. -Next, let's learn about a basic data analysis tool: -`wc` is the "word count" command: it counts the number of lines, words, and bytes. -Since we love the wildcard operator, let's run the command -`wc *.tsv` to get counts for all the `.tsv` files in the current directory -(it takes a little time to complete): +Next, let's learn about a basic data analysis tool: `wc` is the "word count" command: it counts the number of lines, words, and bytes. Since we love the wildcard operator, let's run the command `wc *.tsv` to get counts for all the `.tsv` files in the current directory (it takes a little time to complete): ```bash $ wc *.tsv @@ -144,15 +116,9 @@ $ wc *.tsv The first three columns contains the number of lines, words and bytes. -If we only have a handful of files to compare, it might be faster or more convenient -to just check with Microsoft Excel, OpenRefine or your favourite text editor, but -when we have tens, hundreds or thousands of documents, the Unix shell has a clear -speed advantage. The real power of the shell comes from being able to combine commands -and automate tasks, though. We will touch upon this slightly. +If we only have a handful of files to compare, it might be faster or more convenient to just check with Microsoft Excel, OpenRefine or your favourite text editor, but when we have tens, hundreds or thousands of documents, the Unix shell has a clear speed advantage. The real power of the shell comes from being able to combine commands and automate tasks, though. We will touch upon this slightly. -For now, we'll see how we can build a simple pipeline to find the shortest file -in terms of number of lines. We start by adding the `-l` flag to get only the -number of lines, not the number of words and bytes: +For now, we'll see how we can build a simple pipeline to find the shortest file in terms of number of lines. We start by adding the `-l` flag to get only the number of lines, not the number of words and bytes: ```bash $ wc -l *.tsv @@ -166,20 +132,15 @@ $ wc -l *.tsv 554211 total ``` -The `wc` command itself doesn't have a flag to sort the output, but as we'll -see, we can combine three different shell commands to get what we want. +The `wc` command itself doesn't have a flag to sort the output, but as we'll see, we can combine three different shell commands to get what we want. -First, we have the `wc -l *.tsv` command. We will save the output from this -command in a new file. To do that, we *redirect* the output from the command -to a file using the ‘greater than' sign (>), like so: +First, we have the `wc -l *.tsv` command. We will save the output from this command in a new file. To do that, we *redirect* the output from the command to a file using the ‘greater than' sign (>), like so: ```bash $ wc -l *.tsv > lengths.txt ``` -There's no output now since the output went into the file `lengths.txt`, but -we can check that the output indeed ended up in the file using `cat` or `less` -(or Notepad or any text editor). +There's no output now since the output went into the file `lengths.txt`, but we can check that the output indeed ended up in the file using `cat` or `less` (or Notepad or any text editor). ```bash $ cat lengths.txt @@ -193,9 +154,7 @@ $ cat lengths.txt 554211 total ``` -Next, there is the `sort` command. We'll use the `-n` flag to specify that we -want numerical sorting, not lexical sorting, we output the results into -yet another file, and we use `cat` to check the results: +Next, there is the `sort` command. We'll use the `-n` flag to specify that we want numerical sorting, not lexical sorting, we output the results into yet another file, and we use `cat` to check the results: ```bash $ sort -n lengths.txt > sorted-lengths.txt @@ -210,8 +169,7 @@ $ cat sorted-lengths.txt 554211 total ``` -Finally we have our old friend `head`, that we can use to get the first line -of the `sorted-lengths.txt`: +Finally we have our old friend `head`, that we can use to get the first line of the `sorted-lengths.txt`: ```bash $ head -n 1 sorted-lengths.txt @@ -221,12 +179,7 @@ $ head -n 1 sorted-lengths.txt 5375 2014-02-02_JA-britain.tsv ``` -But we're really just interested in the end result, not the intermediate -results now stored in `lengths.txt` and `sorted-lengths.txt`. What if we could -send the results from the first command (`wc -l *.tsv`) directly to the next -command (`sort -n`) and then the output from that command to `head -n 1`? -Luckily we can, using a concept called pipes. On the command line, you make a -pipe with the vertical bar character `|`. Let's try with one pipe first: +But we're really just interested in the end result, not the intermediate results now stored in `lengths.txt` and `sorted-lengths.txt`. What if we could send the results from the first command (`wc -l *.tsv`) directly to the next command (`sort -n`) and then the output from that command to `head -n 1`? Luckily we can, using a concept called pipes. On the command line, you make a pipe with the vertical bar character `|`. Let's try with one pipe first: ```bash $ wc -l *.tsv | sort -n @@ -240,8 +193,7 @@ $ wc -l *.tsv | sort -n 554211 total ``` -Notice that this is exactly the same output that ended up in our `sorted-lengths.txt` -earlier. Let's add another pipe: +Notice that this is exactly the same output that ended up in our `sorted-lengths.txt` earlier. Let's add another pipe: ```bash $ wc -l *.tsv | sort -n | head -n 1 @@ -251,9 +203,7 @@ $ wc -l *.tsv | sort -n | head -n 1 5375 2014-02-02_JA-britain.tsv ``` -It can take some time to fully grasp pipes and use them efficiently, but it's a -very powerful concept that you will find not only in the shell, but also in most -programming languages. +It can take some time to fully grasp pipes and use them efficiently, but it's a very powerful concept that you will find not only in the shell, but also in most programming languages. ![](fig/redirects-and-pipes.png){alt='Redirects and Pipes'} @@ -261,19 +211,9 @@ programming languages. ## Pipes and Filters -This simple idea is why Unix has been so successful. Instead of creating enormous -programs that try to do many different things, Unix programmers focus on creating -lots of simple tools that each do one job well, and that work well with each other. -This programming model is called "pipes and filters". We've already seen pipes; a -filter is a program like `wc` or `sort` that transforms a stream of input into a -stream of output. Almost all of the standard Unix tools can work this way: unless -told to do otherwise, they read from standard input, do something with what they've -read, and write to standard output. +This simple idea is why Unix has been so successful. Instead of creating enormous programs that try to do many different things, Unix programmers focus on creating lots of simple tools that each do one job well, and that work well with each other. This programming model is called "pipes and filters". We've already seen pipes; a filter is a program like `wc` or `sort` that transforms a stream of input into a stream of output. Almost all of the standard Unix tools can work this way: unless told to do otherwise, they read from standard input, do something with what they've read, and write to standard output. -The key is that any program that reads lines of text from standard input and writes -lines of text to standard output can be combined with every other program that -behaves this way as well. You can and should write your programs this way so that -you and other people can put those programs into pipes to multiply their power. +The key is that any program that reads lines of text from standard input and writes lines of text to standard output can be combined with every other program that behaves this way as well. You can and should write your programs this way so that you and other people can put those programs into pipes to multiply their power. :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -283,15 +223,13 @@ you and other people can put those programs into pipes to multiply their power. ## Adding another pipe -We have our `wc -l *.tsv | sort -n | head -n 1` pipeline. What would happen -if you piped this into `cat`? Try it! +We have our `wc -l *.tsv | sort -n | head -n 1` pipeline. What would happen if you piped this into `cat`? Try it! ::::::::::::::: solution ## Solution -The `cat` command just outputs whatever it gets as input, so you get exactly -the same output from +The `cat` command just outputs whatever it gets as input, so you get exactly the same output from ```bash $ wc -l *.tsv | sort -n | head -n 1 @@ -317,8 +255,7 @@ To count the total lines in every `tsv` file, sort the results and then print th wc -l *.tsv | sort -n | head -n 1 ``` -Now let's change the scenario. We want to know the 10 files that contain *the most* words. Check the manual for the `wc` command (either using `man wc` or `wc --help`) -to see if you can find out what flag to use to print out the number of words (but not the number of lines and bytes). Fill in the blanks below to count the words for each file, put them into order, and then make an output of the 10 files with the most words (Hint: The sort command sorts in ascending order by default). +Now let's change the scenario. We want to know the 10 files that contain *the most* words. Check the manual for the `wc` command (either using `man wc` or `wc --help`) to see if you can find out what flag to use to print out the number of words (but not the number of lines and bytes). Fill in the blanks below to count the words for each file, put them into order, and then make an output of the 10 files with the most words (Hint: The sort command sorts in ascending order by default). ```bash wc __ *.tsv | sort __ | ____ @@ -342,9 +279,7 @@ wc -w *.tsv | sort -n | tail -n 11 ## Counting number of files -Let's make a different pipeline. You want to find out how many files and -directories there are in the current directory. Try to see if you can pipe -the output from `ls` into `wc` to find the answer. +Let's make a different pipeline. You want to find out how many files and directories there are in the current directory. Try to see if you can pipe the output from `ls` into `wc` to find the answer. ::::::::::::::: solution @@ -362,9 +297,7 @@ $ ls | wc -l ## Writing to files -The `date` command outputs the current date and time. Can you write the -current date and time to a new file called `logfile.txt`? Then check -the contents of the file. +The `date` command outputs the current date and time. Can you write the current date and time to a new file called `logfile.txt`? Then check the contents of the file. ::::::::::::::: solution @@ -377,8 +310,7 @@ $ cat logfile.txt To check the contents, you could also use `less` or many other commands. -Beware that `>` will happily overwrite an existing file without warning you, -so please be careful. +Beware that `>` will happily overwrite an existing file without warning you, so please be careful. ::::::::::::::::::::::::: @@ -388,8 +320,7 @@ so please be careful. ## Appending to a file -While `>` writes to a file, `>>` appends something to a file. Try to append the -current date and time to the file `logfile.txt`? +While `>` writes to a file, `>>` appends something to a file. Try to append the current date and time to the file `logfile.txt`? ::::::::::::::: solution @@ -410,15 +341,13 @@ $ cat logfile.txt We learned about the -w flag above, so now try using it with the `.tsv` files. -If you have time, you can also try to sort the results by piping it to `sort`. -And/or explore the other flags of `wc`. +If you have time, you can also try to sort the results by piping it to `sort`. And/or explore the other flags of `wc`. ::::::::::::::: solution ## Solution -From `man wc`, you will see that there is a `-w` flag to print the number of -words: +From `man wc`, you will see that there is a `-w` flag to print the number of words: ```output -w The number of words in each input file is written to the standard @@ -459,14 +388,9 @@ $ wc -w *.tsv | sort -n ## Mining or searching -Searching for something in one or more files is something we'll often need to do, -so let's introduce a command for doing that: `grep` (short for **global regular -expression print**). As the name suggests, it supports regular expressions and -is therefore only limited by your imagination, the shape of your data, and - when -working with thousands or millions of files - the processing power at your disposal. +Searching for something in one or more files is something we'll often need to do, so let's introduce a command for doing that: `grep` (short for **global regular expression print**). As the name suggests, it supports regular expressions and is therefore only limited by your imagination, the shape of your data, and - when working with thousands or millions of files - the processing power at your disposal. -To begin using `grep`, first navigate to the `shell-lesson` directory if not already -there. Then create a new directory "results": +To begin using `grep`, first navigate to the `shell-lesson` directory if not already there. Then create a new directory "results": ```bash $ mkdir results @@ -478,9 +402,7 @@ Now let's try our first search: $ grep 1999 *.tsv ``` -Remember that the shell will expand `*.tsv` to a list of all the `.tsv` files in the -directory. `grep` will then search these for instances of the string "1999" and -print the matching lines. +Remember that the shell will expand `*.tsv` to a list of all the `.tsv` files in the directory. `grep` will then search these for instances of the string "1999" and print the matching lines. ::::::::::::::::::::::::::::::::::::::::: callout @@ -490,8 +412,7 @@ A string is a sequence of characters, or "a piece of text". :::::::::::::::::::::::::::::::::::::::::::::::::: -Press the up arrow once in order to cycle back to your most recent action. -Amend `grep 1999 *.tsv` to `grep -c 1999 *.tsv` and press enter. +Press the up arrow once in order to cycle back to your most recent action. Amend `grep 1999 *.tsv` to `grep -c 1999 *.tsv` and press enter. ```bash $ grep -c 1999 *.tsv @@ -504,9 +425,7 @@ $ grep -c 1999 *.tsv 2014-02-02_JA-britain.tsv:284 ``` -The shell now prints the number of times the string 1999 appeared in each file. -If you look at the output from the previous command, this tends to refer to the -date field for each journal article. +The shell now prints the number of times the string 1999 appeared in each file. If you look at the output from the previous command, this tends to refer to the date field for each journal article. We will try another search: @@ -521,8 +440,7 @@ $ grep -c revolution *.tsv 2014-02-02_JA-britain.tsv:9 ``` -We got back the counts of the instances of the string `revolution` within the files. -Now, amend the above command to the below and observe how the output of each is different: +We got back the counts of the instances of the string `revolution` within the files. Now, amend the above command to the below and observe how the output of each is different: ```bash $ grep -ci revolution *.tsv @@ -535,24 +453,15 @@ $ grep -ci revolution *.tsv 2014-02-02_JA-britain.tsv:122 ``` -This repeats the query, but prints a case -insensitive count (including instances of both `revolution` and `Revolution` and other variants). -Note how the count has increased nearly 30 fold for those journal article -titles that contain the keyword 'america'. As before, cycling back and -adding `> results/`, followed by a filename (ideally in .txt format), will save the results to a data file. +This repeats the query, but prints a case insensitive count (including instances of both `revolution` and `Revolution` and other variants). Note how the count has increased nearly 30 fold for those journal article titles that contain the keyword 'america'. As before, cycling back and adding `> results/`, followed by a filename (ideally in .txt format), will save the results to a data file. -So far we have counted strings in files and printed to the shell or to -file those counts. But the real power of `grep` comes in that you can -also use it to create subsets of tabulated data (or indeed any data) -from one or multiple files. +So far we have counted strings in files and printed to the shell or to file those counts. But the real power of `grep` comes in that you can also use it to create subsets of tabulated data (or indeed any data) from one or multiple files. ```bash $ grep -i revolution *.tsv ``` -This script looks in the defined files and prints any lines containing `revolution` -(without regard to case) to the shell. We let the shell add today's date to the -filename: +This script looks in the defined files and prints any lines containing `revolution` (without regard to case) to the shell. We let the shell add today's date to the filename: ```bash $ grep -i revolution *.tsv > results/$(date "+%Y-%m-%d")_JAi-revolution.tsv @@ -564,25 +473,17 @@ This saves the subsetted data to a new file. ## Alternative date commands -This way of writing dates is so common that on most platforms -you can get the same result by typing `$(date -I)` instead of -`$(date "+%Y-%m-%d")`. +This way of writing dates is so common that on most platforms you can get the same result by typing `$(date -I)` instead of `$(date "+%Y-%m-%d")`. :::::::::::::::::::::::::::::::::::::::::::::::::: -However, if we look at this file, it contains every instance of the -string 'revolution' including as a single word and as part of other words -such as 'revolutionary'. This perhaps isn't as useful as we thought... -Thankfully, the `-w` flag instructs `grep` to look for whole words only, -giving us greater precision in our search. +However, if we look at this file, it contains every instance of the string 'revolution' including as a single word and as part of other words such as 'revolutionary'. This perhaps isn't as useful as we thought... Thankfully, the `-w` flag instructs `grep` to look for whole words only, giving us greater precision in our search. ```bash $ grep -iw revolution *.tsv > results/$(date "+%Y-%m-%d")_JAiw-revolution.tsv ``` -This script looks in both of the defined files and -exports any lines containing the whole word `revolution` (without regard to case) -to the specified `.tsv` file. +This script looks in both of the defined files and exports any lines containing the whole word `revolution` (without regard to case) to the specified `.tsv` file. We can show the difference between the files we created. @@ -600,23 +501,15 @@ $ wc -l results/*.tsv ## Automatically adding a date prefix -Notice how we didn't type today's date ourselves, but let the -`date` command do that mindless task for us. Find out about the -`"+%Y-%m-%d"` option and alternative options we could have used. +Notice how we didn't type today's date ourselves, but let the `date` command do that mindless task for us. Find out about the `"+%Y-%m-%d"` option and alternative options we could have used. ::::::::::::::: solution ## Solution -Using `date --help` (on Git Bash for Windows or Linux) or `man date` (on macOS or Linux) -will show you that the `+` option introduces a date format, -where `%Y`, `%m` and `%d` are replaced by the year, month, and day respectively. -There are many other percent-codes you could use. +Using `date --help` (on Git Bash for Windows or Linux) or `man date` (on macOS or Linux) will show you that the `+` option introduces a date format, where `%Y`, `%m` and `%d` are replaced by the year, month, and day respectively. There are many other percent-codes you could use. -You might also see that `-I` is short for -[\--iso-8601](https://en.wikipedia.org/wiki/ISO_8601), which -essentially avoids the confusion between the European -and American date formats `DD.MM.YYYY` and `MM/DD/YYYY`. +You might also see that `-I` is short for [\--iso-8601](https://en.wikipedia.org/wiki/ISO_8601), which essentially avoids the confusion between the European and American date formats `DD.MM.YYYY` and `MM/DD/YYYY`. ::::::::::::::::::::::::: @@ -628,22 +521,11 @@ Finally, we'll use the **regular expression syntax** covered earlier to search f ## Basic, extended, and PERL-compatible regular expressions -There are, unfortunately, [different ways of writing regular expressions](https://www.gnu.org/software/grep/manual/html_node/Regular-Expressions.html). -Across its various versions, `grep` supports "basic", at least two types of "extended", -and "PERL-compatible" regular expressions. This is a common cause of confusion, since -most tutorials, including ours, teach regular expressions compatible with the PERL -programming language, but `grep` uses basic by default. -Unless you want to remember the details, make your life easy by always using the -most advanced regular expressions your version of `grep` supports (`-E` flag on -macOS X, `-P` on most other platforms) or when doing something more complex -than searching for a plain string. +There are, unfortunately, [different ways of writing regular expressions](https://www.gnu.org/software/grep/manual/html_node/Regular-Expressions.html). Across its various versions, `grep` supports "basic", at least two types of "extended", and "PERL-compatible" regular expressions. This is a common cause of confusion, since most tutorials, including ours, teach regular expressions compatible with the PERL programming language, but `grep` uses basic by default. Unless you want to remember the details, make your life easy by always using the most advanced regular expressions your version of `grep` supports (`-E` flag on macOS X, `-P` on most other platforms) or when doing something more complex than searching for a plain string. :::::::::::::::::::::::::::::::::::::::::::::::::: -The regular expression 'fr[ae]nc[eh]' will match "france", "french", but also "frence" and "franch". -It's generally a good idea to enclose the expression in single quotation marks, since -that ensures the shell sends it directly to grep without any processing (such as trying to -expand the wildcard operator \*). +The regular expression 'fr[ae]nc[eh]' will match "france", "french", but also "frence" and "franch". It's generally a good idea to enclose the expression in single quotation marks, since that ensures the shell sends it directly to grep without any processing (such as trying to expand the wildcard operator \*). ```bash $ grep -iwE 'fr[ae]nc[eh]' *.tsv @@ -651,8 +533,7 @@ $ grep -iwE 'fr[ae]nc[eh]' *.tsv The shell will print out each matching line. -We include the `-o` flag to print only the matching part of the lines e.g. -(handy for isolating/checking results): +We include the `-o` flag to print only the matching part of the lines e.g. (handy for isolating/checking results): ```bash $ grep -iwEo 'fr[ae]nc[eh]' *.tsv @@ -664,9 +545,7 @@ Pair up with your neighbor and work on these exercises: ## Case sensitive search -Search for all case sensitive instances of -a whole word you choose in all four derived `.tsv` files in this directory. -Print your results to the shell. +Search for all case sensitive instances of a whole word you choose in all four derived `.tsv` files in this directory. Print your results to the shell. ::::::::::::::: solution @@ -684,9 +563,7 @@ $ grep -w hero *.tsv ## Case sensitive search in select files -Search for all case sensitive instances of a word you choose in -the 'America' and 'Africa' `.tsv` files in this directory. -Print your results to the shell. +Search for all case sensitive instances of a word you choose in the 'America' and 'Africa' `.tsv` files in this directory. Print your results to the shell. ::::::::::::::: solution @@ -704,9 +581,7 @@ $ grep hero *a.tsv ## Count words (case sensitive) -Count all case sensitive instances of a word you choose in -the 'America' and 'Africa' `.tsv` files in this directory. -Print your results to the shell. +Count all case sensitive instances of a word you choose in the 'America' and 'Africa' `.tsv` files in this directory. Print your results to the shell. ::::::::::::::: solution @@ -724,8 +599,7 @@ $ grep -c hero *a.tsv ## Count words (case insensitive) -Count all case insensitive instances of that word in the 'America' and 'Africa' `.tsv` files -in this directory. Print your results to the shell. +Count all case insensitive instances of that word in the 'America' and 'Africa' `.tsv` files in this directory. Print your results to the shell. ::::::::::::::: solution @@ -743,8 +617,7 @@ $ grep -ci hero *a.tsv ## Case insensitive search in select files -Search for all case insensitive instances of that -word in the 'America' and 'Africa' `.tsv` files in this directory. Print your results to a file `results/hero.tsv`. +Search for all case insensitive instances of that word in the 'America' and 'Africa' `.tsv` files in this directory. Print your results to a file `results/hero.tsv`. ::::::::::::::: solution @@ -762,8 +635,7 @@ $ grep -i hero *a.tsv > results/hero.tsv ## Case insensitive search in select files (whole word) -Search for all case insensitive instances of that whole word -in the 'America' and 'Africa' `.tsv` files in this directory. Print your results to a file `results/hero-i.tsv`. +Search for all case insensitive instances of that whole word in the 'America' and 'Africa' `.tsv` files in this directory. Print your results to a file `results/hero-i.tsv`. ::::::::::::::: solution @@ -781,11 +653,7 @@ $ grep -iw hero *a.tsv > results/hero-i.tsv ## Searching with regular expressions -Use regular expressions to find all ISSN numbers -(four digits followed by hyphen followed by four digits) -in `2014-01_JA.tsv` and print the results to a file `results/issns.tsv`. -Note that you might have to use the `-E` flag (or `-P` with some versions -of `grep`, e.g. with Git Bash on Windows). +Use regular expressions to find all ISSN numbers (four digits followed by hyphen followed by four digits) in `2014-01_JA.tsv` and print the results to a file `results/issns.tsv`. Note that you might have to use the `-E` flag (or `-P` with some versions of `grep`, e.g. with Git Bash on Windows). ::::::::::::::: solution @@ -801,14 +669,11 @@ or $ grep -Po '\d{4}-\d{4}' 2014-01_JA.tsv > results/issns.tsv ``` -It is worth checking the file to make sure `grep` has interpreted the pattern -correctly. You could use the `less` command for this. +It is worth checking the file to make sure `grep` has interpreted the pattern correctly. You could use the `less` command for this. -The `-o` flag means that only the ISSN itself is printed out, instead of the -whole line. +The `-o` flag means that only the ISSN itself is printed out, instead of the whole line. -If you came up with something more advanced, perhaps including word boundaries, -please share your result in the collaborative document and give yourself a pat on the shoulder. +If you came up with something more advanced, perhaps including word boundaries, please share your result in the collaborative document and give yourself a pat on the shoulder. ::::::::::::::::::::::::: @@ -818,10 +683,7 @@ please share your result in the collaborative document and give yourself a pat o ## Finding unique values -If you pipe something to the `uniq` command, it will filter out adjacent duplicate lines. -In order for the 'uniq' command to only return unique values though, it needs to be used -with the 'sort' command. Try piping the output from the command in the last exercise -to `sort` and then piping these results to 'uniq' and then `wc -l` to count the number of unique ISSN values. +If you pipe something to the `uniq` command, it will filter out adjacent duplicate lines. In order for the 'uniq' command to only return unique values though, it needs to be used with the 'sort' command. Try piping the output from the command in the last exercise to `sort` and then piping these results to 'uniq' and then `wc -l` to count the number of unique ISSN values. ::::::::::::::: solution @@ -851,8 +713,7 @@ $ mv pg514.txt littlewomen.txt This renames the file to something easier to remember. -Now let's create our loop. In the loop, we will ask the computer to go through the text, looking for each girl's name, -and count the number of times it appears. The results will print to the screen. +Now let's create our loop. In the loop, we will ask the computer to go through the text, looking for each girl's name, and count the number of times it appears. The results will print to the screen. ```bash $ for name in "Jo" "Meg" "Beth" "Amy" @@ -884,25 +745,17 @@ What is happening in the loop? ## Why are the variables double-quoted here? -a) In [episode 4](04-loops.md) we learned to -use `"$..."` as a safeguard against white-space being misinterpreted. -Why *could* we omit the `"`\-quotes in the above example? +a) In [episode 4](04-loops.md) we learned to use `"$..."` as a safeguard against white-space being misinterpreted. Why *could* we omit the `"`\-quotes in the above example? -b) What happens if you add `"Louisa May Alcott"` to the first line of -the loop and remove the `"` from `$name` in the loop's code? +b) What happens if you add `"Louisa May Alcott"` to the first line of the loop and remove the `"` from `$name` in the loop's code? ::::::::::::::: solution ## Solutions -a) Because we are explicitly listing the names after `in`, -and those contain no white-space. However, for consistency -it's better to use rather once too often than once too rarely. +a) Because we are explicitly listing the names after `in`, and those contain no white-space. However, for consistency it's better to use rather once too often than once too rarely. -b) Without `"`\-quoting `$name`, the last loop will try to execute -`grep Louisa May Alcott littlewomen.txt`. `grep` interprets only the -first word as the search pattern, but `May` and `Alcott` as filenames. -This produces two errors and a possibly untrustworthy count: +b) Without `"`\-quoting `$name`, the last loop will try to execute `grep Louisa May Alcott littlewomen.txt`. `grep` interprets only the first word as the search pattern, but `May` and `Alcott` as filenames. This produces two errors and a possibly untrustworthy count: ```bash ... @@ -939,8 +792,7 @@ Duany, W. 1 NORTHEAST AFRICAN STUDIES NORTHEAST AFRICAN STUDIES 1(2/3), 75-102. Mohamed Ibrahim Khalil 1 NORTHEAST AFRICAN STUDIES NORTHEAST AFRICAN STUDIES 1(2/3), 103-118. (1994) ``` -Above we used `cut` and the `-f` flag to indicate which columns we want to retain. `cut` works on tab delimited files by default. We can use the flag `-d` to change this to a comma, or semicolon or another delimiter. -If you are unsure of your column position and the file has headers on the first line, we can use `head -n 1 ` to print those out. +Above we used `cut` and the `-f` flag to indicate which columns we want to retain. `cut` works on tab delimited files by default. We can use the flag `-d` to change this to a comma, or semicolon or another delimiter. If you are unsure of your column position and the file has headers on the first line, we can use `head -n 1 ` to print those out. ### Now your turn @@ -960,8 +812,7 @@ head -n 1 2014-01_JA.tsv File Creator Issue Volume Journal ISSN ID Citation Title Place Labe Language Publisher Date ``` -Ok, now we know `Issue` is column 3, `Volume` 4, `Language` 11, and `Publisher` 12. -We use these positional column numbers to construct our `cut` command: +Ok, now we know `Issue` is column 3, `Volume` 4, `Language` 11, and `Publisher` 12. We use these positional column numbers to construct our `cut` command: ``` cut -f 3,4,11,12 2014-01_JA.tsv > 2014-01_JA_ivlp.tsv diff --git a/episodes/index.md b/episodes/index.md index fdf5e682..8ddfc8a0 100644 --- a/episodes/index.md +++ b/episodes/index.md @@ -2,11 +2,12 @@ site: sandpaper::sandpaper_site --- -This Library Carpentry lesson introduces librarians to the Unix Shell. -At the conclusion of the lesson you will: describe the basics of the Unix shell; -explain why and how to use the command line; -use shell commands to work with directories and files; -use shell commands to find and manipulate data. +This Library Carpentry lesson introduces librarians to the Unix Shell. At the conclusion of the lesson you will: + +- describe the basics of the Unix shell; +- explain why and how to use the command line; +- use shell commands to work with directories and files; +- use shell commands to find and manipulate data. :::::::::::::::::::::::::::::::::::::::::: prereq @@ -15,5 +16,3 @@ use shell commands to find and manipulate data. To complete this lesson, you will need a Unix-like shell environment -see [Setup](learners/setup.md). You will also need to download the file **[shell-lesson.zip](episodes/data/shell-lesson.zip)** from GitHub to your *desktop* and extract it there (once you have unzipped/extracted the file, you should end up with a folder called "shell-lesson"). :::::::::::::::::::::::::::::::::::::::::::::::::: - - diff --git a/instructors/instructor-notes.md b/instructors/instructor-notes.md index bb45f735..f70e216d 100644 --- a/instructors/instructor-notes.md +++ b/instructors/instructor-notes.md @@ -40,207 +40,89 @@ As noted below, you should avoid demonstrating any more options that only work o ### Overall -Many people have questioned whether we should still teach the shell. -After all, -anyone who wants to rename several thousand data files -can easily do so interactively in the Python interpreter, -and anyone who's doing serious data analysis -is probably going to do most of their work inside the iPython Notebook or RStudio. -So why teach the shell? - -The first answer is, -"Because so much else depends on it." -Installing software, -configuring your default editor, -and controlling remote machines frequently assume a basic familiarity with the shell, -and with related ideas like standard input and output. -Many tools also use its terminology -(for example, the `%ls` and `%cd` magic commands in iPython). - -The second answer is, -"Because it's an easy way to introduce some fundamental ideas about how to use computers." -As we teach people how to use the Unix shell, -we teach them that they should get the computer to repeat things -(via tab completion, -`!` followed by a command number, -and `for` loops) -rather than repeating things themselves. -We also teach them to take things they've discovered they do frequently -and save them for later re-use -(via shell scripts), -to give things sensible names, -and to write a little bit of documentation -(like comment at the top of shell scripts) -to make their future selves' lives better. - -The third answer is, -"Because it enables use of many domain-specific tools and compute resources researchers cannot access otherwise." -Familiarity with the shell is very useful for remote accessing machines, -using high-performance computing infrastructure, -and running new specialist tools in many disciplines. -We do not teach HPC or domain-specific skills here -but lay the groundwork for further development of these skills. -In particular, understanding the syntax of commands, flags, and help systems is useful for domain specific tools -and understanding the file system (and how to navigate it) is useful for remote access. - -Finally, and perhaps most importantly, -teaching people the shell lets us teach them -to think about programming in terms of function composition. -In the case of the shell, -this takes the form of pipelines rather than nested function calls, -but the core idea of "small pieces, loosely joined" is the same. - -All of this material can be covered in three hours -as long as learners using Windows do not run into roadblocks such as: - -- not being able to figure out where their home directory is - (particularly if they're using Cygwin); +Many people have questioned whether we should still teach the shell. After all, anyone who wants to rename several thousand data files can easily do so interactively in the Python interpreter, and anyone who's doing serious data analysis is probably going to do most of their work inside the iPython Notebook or RStudio. So why teach the shell? + +The first answer is, "Because so much else depends on it." Installing software, configuring your default editor, and controlling remote machines frequently assume a basic familiarity with the shell, and with related ideas like standard input and output. Many tools also use its terminology (for example, the `%ls` and `%cd` magic commands in iPython). + +The second answer is, "Because it's an easy way to introduce some fundamental ideas about how to use computers." As we teach people how to use the Unix shell, we teach them that they should get the computer to repeat things (via tab completion,`!` followed by a command number, and `for` loops) rather than repeating things themselves. We also teach them to take things they've discovered they do frequently and save them for later re-use (via shell scripts), to give things sensible names, and to write a little bit of documentation (like comment at the top of shell scripts) to make their future selves' lives better. + +The third answer is, "Because it enables use of many domain-specific tools and compute resources researchers cannot access otherwise." Familiarity with the shell is very useful for remote accessing machines, using high-performance computing infrastructure, and running new specialist tools in many disciplines. We do not teach HPC or domain-specific skills here but lay the groundwork for further development of these skills. In particular, understanding the syntax of commands, flags, and help systems is useful for domain specific tools and understanding the file system (and how to navigate it) is useful for remote access. + +Finally, and perhaps most importantly, teaching people the shell lets us teach them to think about programming in terms of function composition. In the case of the shell, this takes the form of pipelines rather than nested function calls, but the core idea of "small pieces, loosely joined" is the same. + +All of this material can be covered in three hours as long as learners using Windows do not run into roadblocks such as: + +- not being able to figure out where their home directory is (particularly if they're using Cygwin); - not being able to run a plain text editor; and - the shell refusing to run scripts that include DOS line endings. ### Preparing to Teach -- Use the `data` directory for in-workshop exercises and live coding examples. - You can clone the shell-novice directory or use the `Download ZIP` - button on the right to get the entire [repository](https://github.com/librarycarpentry/lc-shell). We also now provide - a zip file of the `data` directory that can be downloaded on its own - from the repository by right-click + save or see the ["setup"](../learners/setup.md) page on the lesson website for more details. +- Use the `data` directory for in-workshop exercises and live coding examples. You can clone the shell-novice directory or use the `Download ZIP` button on the right to get the entire [repository](https://github.com/librarycarpentry/lc-shell). We also now provide a zip file of the `data` directory that can be downloaded on its own from the repository by right-click + save or see the ["setup"](../learners/setup.md) page on the lesson website for more details. - Website: various practices have been used. - - Option 1: Can give links to learners before the lesson so they can follow along, - catch up, and see exercises (particularly if you're following the lesson content without many changes). - - Option 2: Don't show the website to the learners during the lesson, as it can be distracting: - students may read instead of listen, and having another window open is an additional cognitive load. + - Option 1: Can give links to learners before the lesson so they can follow along, catch up, and see exercises (particularly if you're following the lesson content without many changes). + - Option 2: Don't show the website to the learners during the lesson, as it can be distracting: students may read instead of listen, and having another window open is an additional cognitive load. - In any case, make sure to point to website as a post-workshop reference. - Content: - - Unless you have a truly generous amount of time (4+ hours), - it is likely that you will not cover ALL the material in this lesson in a single half-day session. - Plan ahead on what you might skip, what you really want to emphasize, etc. + - Unless you have a truly generous amount of time (4+ hours), it is likely that you will not cover ALL the material in this lesson in a single half-day session. Plan ahead on what you might skip, what you really want to emphasize, etc. - Exercises: - - Think in advance about how you might want to handle exercises during the lesson. - How are you assigning them (website, slide, handout)? - Do you want everyone to try it and then you show the solution? - Have a learner show the solution? - Have groups each do a different exercise and present their solutions? + - Think in advance about how you might want to handle exercises during the lesson. How are you assigning them (website, slide, handout)? Do you want everyone to try it and then you show the solution? Have a learner show the solution? Have groups each do a different exercise and present their solutions? - Other preparation: - - Feel free to add your own examples or side comments, - but know that it shouldn't be necessary: - the topics and commands can be taught as given on the lesson pages. - If you think there is a place where the lesson is lacking, - feel free to file an issue or submit a pull request. + - Feel free to add your own examples or side comments, but know that it shouldn't be necessary: the topics and commands can be taught as given on the lesson pages. If you think there is a place where the lesson is lacking, feel free to file an issue or submit a pull request. ### Teaching Notes -- Super cool online resource! - [https://explainshell.com/](https://explainshell.com/) will dissect any shell command you type in - and display help text for each piece. Additional nice manual tool could be [https://tldr.sh/](https://tldr.sh/) with short very descriptive manuals for shell commands, useful especially on Windows while using Git BASH where `man` could not work. - -- Another super cool online resource is [https://www.shellcheck.net](https://www.shellcheck.net), - which will check shell scripts (both uploaded and typed in) for common errors. - -- Resources for "splitting" your shell so that recent commands - remain in view: [https://github.com/rgaiacs/swc-shell-split-window](https://github.com/rgaiacs/swc-shell-split-window). - -- Running a text editor from the command line can be - the biggest stumbling block during the entire lesson: - many will try to run the same editor as the instructor - or will not know how to navigate to the right directory to save their file, - or will run a word processor rather than a plain text editor. - The quickest way past these problems is to have more knowledgeable learners - help those who need it. - -- Introducing and navigating the filesystem in the shell (covered in - [Navigating Files and Directories](../episodes/02-navigating-the-filesystem.md) section) can be confusing. You may have both terminal and GUI file explorer open side by side so learners can see the content and file structure while they're using terminal to navigate the system. - -- Tab completion sounds like a small thing: it isn't. - Re-running old commands using `!123` or `!wc` - isn't a small thing either, - and neither are wildcard expansion and `for` loops. - Each one is an opportunity to repeat one of the big ideas of Software Carpentry: - if the computer *can* repeat it, - some programmer somewhere will almost certainly have built - some way for the computer *to* repeat it. - -- Building up a pipeline with four or five stages, - then putting it in a shell script for re-use - and calling that script inside a `for` loop, is a great opportunity to show how - "seven plus or minus two" connects to programming. - Once we have figured out how to do something moderately complicated, - we make it re-usable and give it a name - so that it only takes up one slot in working memory rather than several. - It is also a good opportunity to talk about exploratory programming: - rather than designing a program up front, we can do a few useful things - and then retroactively decide which are worth encapsulating for future re-use. - -- If everything is going well, you can drive home the point that file - extensions are essentially there to help computers (and human - readers) understand file content and are not a requirement of files - (covered briefly in [Navigating Files and Directories](../episodes/02-navigating-the-filesystem.md)). - This can be done in the [Pipes and Filters](../episodes/05-counting-mining.md) section by showing that you - can redirect standard output to a file without the .txt extension - (e.g., lengths), and that the resulting file is still a perfectly usable text file. - Make the point that if double-clicked in the GUI, the computer will - probably ask you what you want to do. - -- We have to leave out many important things because of time constraints, - including file permissions, job control, and SSH. - If learners already understand the basic material, - this can be covered instead using the online lessons as guidelines. - These limitations also have follow-on consequences: +- Super cool online resource! [https://explainshell.com/](https://explainshell.com/) will dissect any shell command you type in and display help text for each piece. Additional nice manual tool could be [https://tldr.sh/](https://tldr.sh/) with short very descriptive manuals for shell commands, useful especially on Windows while using Git BASH where `man` could not work. + +- Another super cool online resource is [https://www.shellcheck.net](https://www.shellcheck.net), which will check shell scripts (both uploaded and typed in) for common errors. + +- Resources for "splitting" your shell so that recent commands remain in view: [https://github.com/rgaiacs/swc-shell-split-window](https://github.com/rgaiacs/swc-shell-split-window). + +- Running a text editor from the command line can be the biggest stumbling block during the entire lesson: many will try to run the same editor as the instructor or will not know how to navigate to the right directory to save their file, or will run a word processor rather than a plain text editor. The quickest way past these problems is to have more knowledgeable learners help those who need it. + +- Introducing and navigating the filesystem in the shell (covered in [Navigating Files and Directories](../episodes/02-navigating-the-filesystem.md) section) can be confusing. You may have both terminal and GUI file explorer open side by side so learners can see the content and file structure while they're using terminal to navigate the system. + +- Tab completion sounds like a small thing: it isn't. Re-running old commands using `!123` or `!wc` isn't a small thing either, and neither are wildcard expansion and `for` loops. Each one is an opportunity to repeat one of the big ideas of Software Carpentry: if the computer *can* repeat it, some programmer somewhere will almost certainly have built some way for the computer *to* repeat it. + +- Building up a pipeline with four or five stages, then putting it in a shell script for re-use and calling that script inside a `for` loop, is a great opportunity to show how "seven plus or minus two" connects to programming. Once we have figured out how to do something moderately complicated, we make it re-usable and give it a name so that it only takes up one slot in working memory rather than several. It is also a good opportunity to talk about exploratory programming: rather than designing a program up front, we can do a few useful things and then retroactively decide which are worth encapsulating for future re-use. + +- If everything is going well, you can drive home the point that file extensions are essentially there to help computers (and human readers) understand file content and are not a requirement of files (covered briefly in [Navigating Files and Directories](../episodes/02-navigating-the-filesystem.md)). This can be done in the [Pipes and Filters](../episodes/05-counting-mining.md) section by showing that you can redirect standard output to a file without the .txt extension (e.g., lengths), and that the resulting file is still a perfectly usable text file. Make the point that if double-clicked in the GUI, the computer will probably ask you what you want to do. + +- We have to leave out many important things because of time constraints, including file permissions, job control, and SSH. If learners already understand the basic material, this can be covered instead using the online lessons as guidelines. These limitations also have follow-on consequences: - - It's hard to discuss `#!` (shebang) without first discussing - permissions, which we don't do. `#!` is also [pretty - complicated][shebang], so even if we did discuss permissions, we - probably still wouldn't want to discuss `#!`. + - It's hard to discuss `#!` (shebang) without first discussing permissions, which we don't do. `#!` is also [pretty complicated][shebang], so even if we did discuss permissions, we probably still wouldn't want to discuss `#!`. -- Stay within POSIX-compliant commands, as all the teaching materials do. - Your particular shell may have extensions beyond POSIX that are not available - on other machines, especially the default OSX bash and Windows bash emulators. - For example, POSIX `ls` does not have an `--ignore=` or `-I` option, and POSIX - `head` takes `-n 10` or `-10`, but not the long form of `--lines=10`. +- Stay within POSIX-compliant commands, as all the teaching materials do. Your particular shell may have extensions beyond POSIX that are not available on other machines, especially the default OSX bash and Windows bash emulators. For example, POSIX `ls` does not have an `--ignore=` or `-I` option, and POSIX `head` takes `-n 10` or `-10`, but not the long form of `--lines=10`. ### Windows -Installing Bash and a reasonable set of Unix commands on Windows -always involves some fiddling and frustration. -Please see the latest set of installation guidelines for advice, -and try it out yourself *before* teaching a class. -Options we have explored include: +Installing Bash and a reasonable set of Unix commands on Windows always involves some fiddling and frustration. Please see the latest set of installation guidelines for advice, and try it out yourself *before* teaching a class. Options we have explored include: -1. [Git Bash](https://gitforwindows.org/) (previously known as msysGit), +1. [Git Bash](https://gitforwindows.org/), 2. [Cygwin](https://www.cygwin.com/), 3. using a desktop virtual machine, and 4. having learners connect to a remote Unix machine (typically a VM in the cloud). -Cygwin was the preferred option until mid-2013, -but once we started teaching Git, Git Bash proved to work better. -Desktop virtual machines and cloud-based VMs work well for technically sophisticated learners, -and can reduce installation and configuration at the start of the workshop, but: +Cygwin was the preferred option until mid-2013, but once we started teaching Git, Git Bash proved to work better. Desktop virtual machines and cloud-based VMs work well for technically sophisticated learners, and can reduce installation and configuration at the start of the workshop, but: 1. they don't work well on underpowered machines, 2. they're confusing for novices (because simple things like copy and paste work differently), 3. learners leave the workshop without a working environment on their operating system of choice, and 4. learners may show up without having downloaded the VM or the wireless will go down (or become congested) during the lesson. -Whatever you use, please *test it yourself* on a Windows machine *before* your workshop: -things may always have changed behind your back since your last workshop. -And please also make use of our [Software Carpentry Windows Installer][windows-installer]. +Whatever you use, please *test it yourself* on a Windows machine *before* your workshop: things may always have changed behind your back since your last workshop. And please also make use of our [Software Carpentry Windows Installer][windows-installer]. #### Windows Notes -- On Windows machines - if `nano` hasn't been properly installed with the - [Software Carpentry Windows Installer][windows-installer] - it is possible to use `notepad` as an alternative. There will be a GUI - interface and line endings are treated differently, but otherwise, for - the purposes of this lesson, `notepad` and `nano` can be used almost interchangeably. +- On Windows machines if `nano` hasn't been properly installed with the [Software Carpentry Windows Installer][windows-installer] it is possible to use `notepad` as an alternative. There will be a GUI interface and line endings are treated differently, but otherwise, for the purposes of this lesson, `notepad` and `nano` can be used almost interchangeably. - On Windows, it appears that: @@ -249,9 +131,7 @@ And please also make use of our [Software Carpentry Windows Installer][windows-i $ cd Desktop ``` - ... will always put someone on their desktop. - Have them create the example directory for the shell exercises there - so that they can find it easily and watch it evolve. + ... will always put someone on their desktop. Have them create the example directory for the shell exercises there so that they can find it easily and watch it evolve. - On Windows, Microsoft OneDrive may appear in the home directory list. Desktop is often found inside the OneDrive directory. diff --git a/learners/setup.md b/learners/setup.md index ad888d6c..d9342944 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -2,16 +2,13 @@ title: Setup --- -To participate in this Library Carpentry lesson, you will need a working Unix-like shell environment. -We will be using Bash ([Bourne Again Shell](https://en.wikipedia.org/wiki/Bash_\(Unix_shell\))) which is standard on Linux and macOS. Some macOS users (Catalina or later) will have zsh (Z shell) as their default version. -Even if you are a Windows user, learning Bash will open up a powerful set of tools on your personal machine, and familiarize you with the standard remote interface used on most servers and supercomputers. +To participate in this Library Carpentry lesson, you will need a working Unix-like shell environment. We will be using Bash ([Bourne Again Shell](https://en.wikipedia.org/wiki/Bash_\(Unix_shell\))) which is standard on Linux and macOS. Some macOS users (Catalina or later) will have zsh (Z shell) as their default version. Even if you are a Windows user, learning Bash will open up a powerful set of tools on your personal machine, and familiarize you with the standard remote interface used on most servers and supercomputers. :::::::::::::::::::::::::::::::::::::::::: prereq ## Terminal Setup -Bash is the default shell on most Linux distributions and older versions of macOS. -Windows users will need to install Git Bash to provide a Unix-like environment. +Bash is the default shell on most Linux distributions and older versions of macOS. Windows users will need to install Git Bash to provide a Unix-like environment. - **Linux:** The default shell is usually Bash, but if your machine is set up differently you can run it by opening a terminal and typing `bash` followed by the enter key. There is no need to install anything. Look for Terminal in your applications to start the Bash shell.