From b708627eb1ba6668630136f4b3a933805b0f32d7 Mon Sep 17 00:00:00 2001 From: Ben Companjen Date: Tue, 23 Nov 2021 11:09:48 +0100 Subject: [PATCH] Fix typos and normalise punctuation These typos were mentioned in #95 and #79. --- episodes/01-introduction.md | 10 +++++----- episodes/02-working-with-openrefine.md | 6 +++--- episodes/03-filter-sort.md | 26 +++++++++++++------------- episodes/04-numbers.md | 2 +- episodes/07-resources.md | 2 +- 5 files changed, 23 insertions(+), 23 deletions(-) diff --git a/episodes/01-introduction.md b/episodes/01-introduction.md index 4b97b8ac..d59e6b33 100644 --- a/episodes/01-introduction.md +++ b/episodes/01-introduction.md @@ -11,7 +11,7 @@ objectives: - "Locate helpful resources to learn more about OpenRefine." keypoints: - "OpenRefine is a powerful, free, and open source tool that can be used for data cleaning." -- "OpenRefine will automatically track any steps allowing you to backtrack as needed and providing a record of all work done" +- "OpenRefine will automatically track any steps allowing you to backtrack as needed and providing a record of all work done." --- # Lesson @@ -46,10 +46,10 @@ If after installation and running OpenRefine, it does not automatically open for ## Getting help for OpenRefine -You can find out a lot more about OpenRefine at [http://openrefine.org](http://openrefine.org) and check out some great introductory videos. -These videos and others on OpenRefine can also be found on YouTube by searching under 'OpenRefine'. There is a [Google Group](https://groups.google.com/g/openrefine) -that can answer a lot of beginner questions and problems. Information can also be found on [StackOverflow](https://stackoverflow.com/questions/tagged/openrefine) -where you can find a lot of help. As with other programs of this type, OpenRefine libraries are available too, where you can find a script you need and copy it +You can find out a lot more about OpenRefine at [http://openrefine.org](http://openrefine.org) and check out some great introductory videos. +These videos and others on OpenRefine can also be found on YouTube by searching under 'OpenRefine'. There is a [Google Group](https://groups.google.com/g/openrefine) +that can answer a lot of beginner questions and problems. Information can also be found on [StackOverflow](https://stackoverflow.com/questions/tagged/openrefine) +where you can find a lot of help. As with other programs of this type, OpenRefine libraries are available too, where you can find a script you need and copy it into your OpenRefine instance to run it on your dataset. diff --git a/episodes/02-working-with-openrefine.md b/episodes/02-working-with-openrefine.md index f6900eb0..6bce984a 100644 --- a/episodes/02-working-with-openrefine.md +++ b/episodes/02-working-with-openrefine.md @@ -127,7 +127,7 @@ In OpenRefine, clustering means "finding groups of different values that might b 3. Select the `key collision` method and `metaphone3` keying function. It should identify two clusters. 4. Click the `Merge?` box beside each cluster, then click `Merge Selected and Recluster` to apply the corrections to the dataset. 4. Try selecting different `Methods` and `Keying Functions` again, to see what new merges are suggested. -5. You should find that using the default settings, no more clusters are found, for example to merge `Ruaca-Nhamuenda` with `Ruaca` or `Chirdozo` with `Chirodzo`. (Note that the `nearest neighbor` method with `ppm` distance, `radius` ≥ 4, and `block chars` ≤ 4 will find these clusters, as well as other settings with `levenshtein` distance) +5. You should find that using the default settings, no more clusters are found, for example to merge `Ruaca-Nhamuenda` with `Ruaca` or `Chirdozo` with `Chirodzo`. (Note that the `nearest neighbor` method with `ppm` distance, `radius` ≥ 4, and `block chars` ≤ 4 will find these clusters, as well as other settings with `levenshtein` distance) 6. To merge these values we will hover over them in the village text facet, select edit, and manually change the names. Change `Chirdozo` to `Chirodzo` and `Ruaca-Nhamuenda` to `Ruaca`. You should now have four clusters: `Chirodzo`, `God`, `Ruaca` and `49`. Important: If you `Merge` using a different method or keying function, or more times than described in the instructions above, @@ -206,7 +206,7 @@ You should now see a new text facet box in the left-hand pane. > column. {: .challenge} -## Using undo and redo. +## Using undo and redo It's common while exploring and cleaning a dataset to discover after you've made a change that you really should have done something else first. OpenRefine provides `Undo` and `Redo` operations to make this easy. @@ -221,7 +221,7 @@ It's common while exploring and cleaning a dataset to discover after you've made ## Trim Leading and Trailing Whitespace -Words with spaces at the beginning or end are particularly hard for we humans to tell from strings without, but the blank characters will make a difference to the computer. We usually want to remove these. As of version 3.4 of OpenRefine, the option to trim leading and trailing whitespaces is present at the moment of importing the data (see image at the top of this page). +Words with spaces at the beginning or end are particularly hard for we humans to tell from strings without, but the blank characters will make a difference to the computer. We usually want to remove these. As of version 3.4 of OpenRefine, the option to trim leading and trailing whitespaces is present at the moment of importing the data (see image at the top of this page). If you unchecked that box when importing data, or if leading or trailing whitespaces were introduced while splitting columns, or other operations, OpenRefine also provides a tool to remove blank characters from the beginning and end of any entries that have them. diff --git a/episodes/03-filter-sort.md b/episodes/03-filter-sort.md index cd09342e..cec8e002 100644 --- a/episodes/03-filter-sort.md +++ b/episodes/03-filter-sort.md @@ -25,12 +25,12 @@ There are many entries in our data table. We can filter it to work on a subset o > ## Exercise > -> 1. What roof types are selected by this procedure? -> 2. How would you restrict this to only one of the roof types? +> 1. What roof types are selected by this procedure? +> 2. How would you restrict this to only one of the roof types? > > > ## Solution > > 1. Do `Facet` > `Text facet` on the `respondent_roof_type` column after filtering. This will show that -> > two names match your filter criteria. They are `mabatipitched` and `mabatisloping`. +> > two names match your filter criteria. They are `mabatipitched` and `mabatisloping`. > > 2. To restrict to only one of these two roof types, you could include more letters in your filter. > > > {: .solution} @@ -74,10 +74,10 @@ If this is your first time sorting this table, then the drop-down menu for the s > ## Exercise > -> Sort the data by `gps_Altitude`. Do you think the first few entries may have incorrect altitudes?. +> Sort the data by `gps_Altitude`. Do you think the first few entries may have incorrect altitudes? > > > ## Solution -> > In the `gps:Altitude` column, select `Sort...` > `numbers` and select `smallest first`. The first few values are all 0. The altitudes are more likely 'missing' than incorrect. The survey is delivered by Smartphone with the gps information added automatically by the app. The lack of an altitude value suggests that the smartphone was unable to provide it and it defaulted to 0. +> > In the `gps_Altitude` column, select `Sort...` > `numbers` and select `smallest first`. The first few values are all 0. The altitudes are more likely 'missing' than incorrect. The survey is delivered by Smartphone with the gps information added automatically by the app. The lack of an altitude value suggests that the smartphone was unable to provide it and it defaulted to 0. > {: .solution} {: .challenge} @@ -88,7 +88,7 @@ If you try to re-sort a column that you have already used, the drop-down menu ch * `Sort` > `Reverse` - This option allows you to reverse the order of the sort. * `Sort` > `Remove sort` - This option allows you to undo your sort. -### Sorting by multiple columns. +### Sorting by multiple columns You can sort by multiple columns by performing sort on additional columns. The sort will depend on the order in which you select columns to sort. To restart the sorting process with a particular column, check the `sort by this column alone` box in the `Sort` pop-up menu. @@ -96,19 +96,19 @@ If you go back to one of the already sorted columns and select > `Sort` > `Remov > ## Exercise > -> We discovered in an earlier lesson that the value for one of the `village` entries was given as 49. This is clearly wrong. By looking at the GPS coordinates for the entries of the other villages can we decide what village the data in that column was collected from? -> 1. Sort on `gps_Latitude` as a number with the smallest first. -> 2. Add a sort on `gps_Longitude` as a number with the smallest first. -> 3. Using the drop down arrow on the `village` column, select `Edit column` > `Move column to end`. This will allow you to compare village names with GPS coordinates. +> We discovered in an earlier lesson that the value for one of the `village` entries was given as 49. This is clearly wrong. By looking at the GPS coordinates for the entries of the other villages can we decide what village the data in that column was collected from? +> 1. Sort on `gps_Latitude` as a number with the smallest first. +> 2. Add a sort on `gps_Longitude` as a number with the smallest first. +> 3. Using the drop down arrow on the `village` column, select `Edit column` > `Move column to end`. This will allow you to compare village names with GPS coordinates. > 4. Scroll through the entries until you find village `49`. Can you tell from it's GPS coordinates which village it belong to? > 5. Now sort only by `interview_date` as date. Move the `village` column to the start of the table. Does the row where village is `49` group with one particular village? Is it the same village as when comparing GPS coordinates? > > > ## Solution > > -> > The interview data for that row is in a small cluster of Chirodzo interviews when sorting by GPS coordinates. When sorting by interview date, it is also with Chirodzo interviews. In fact, only Chirodzo had interviews conducted on that date. -> {: .solution} +> > The interview data for that row is in a small cluster of Chirodzo interviews when sorting by GPS coordinates. When sorting by interview date, it is also with Chirodzo interviews. In fact, only Chirodzo had interviews conducted on that date. +> {: .solution} {: .challenge} -Perform a text facet on the `village` column and change `49` to the village name that was determined in the previous exercise. You should now have only three village names. +Perform a text facet on the `village` column and change `49` to the village name that was determined in the previous exercise. You should now have only three village names. {% include links.md %} diff --git a/episodes/04-numbers.md b/episodes/04-numbers.md index 1c0f8e52..1f9c0d54 100644 --- a/episodes/04-numbers.md +++ b/episodes/04-numbers.md @@ -24,7 +24,7 @@ To transform cells in the `years_farm` column to numbers, click the down arrow f > ## Exercise > -> Transform three more columns, `no_members`, `years_liv`, and `buildings_in_compound`, from text to numbers. Can all columns be transformed to numbers? - Try it with `village` for example. +> Transform three more columns, `no_membrs`, `years_liv`, and `buildings_in_compound`, from text to numbers. Can all columns be transformed to numbers? - Try it with `village` for example. > > > ## Solution > > diff --git a/episodes/07-resources.md b/episodes/07-resources.md index e7de44f7..a772d2aa 100644 --- a/episodes/07-resources.md +++ b/episodes/07-resources.md @@ -8,7 +8,7 @@ objectives: - "Understand that there are many online resources available for more information on OpenRefine." - "Identify other resources about OpenRefine." keypoints: -- "Other examples and resources online are good for learning more about OpenRefine" +- "Other examples and resources online are good for learning more about OpenRefine." --- # Lesson