From 854b55b901a82eae83e8bf63d09cb5ccba752d1e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Ni=C3=B1o?= Date: Tue, 20 Aug 2024 17:16:43 -0500 Subject: [PATCH 1/6] Closes #73. Updated objectives, questions, key points in ep1 intro --- episodes/00-intro.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/episodes/00-intro.md b/episodes/00-intro.md index 3d4b8c2f..7f2c2d9c 100644 --- a/episodes/00-intro.md +++ b/episodes/00-intro.md @@ -6,19 +6,20 @@ exercises: 3 ::::::::::::::::::::::::::::::::::::::: objectives -- Understand how to organize data so computers can make the best use of the data +- Define the scope of this lesson +- Describe some drawbacks and advantages of using spreadsheet programs :::::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::: questions -- What are basic principles for using spreadsheets for good data organization? +- What are spreadsheets useful for in a research project? :::::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::: prereq -## Things You'll Need To Complete This Tutorial +## Things You'll Need To Complete This Lesson #### Spreadsheet Software @@ -131,7 +132,8 @@ In this lesson we're going to talk about: :::::::::::::::::::::::::::::::::::::::: keypoints -- Organizing your data tables according to tidy data principles will make them easier for you and others to use for analysis. +- Good data organization is the foundation of any research project. +- Spreadsheets are good for data entry, but when doing data cleaning or analysis, it's not easy to show or replicate what you did. :::::::::::::::::::::::::::::::::::::::::::::::::: From eb267c17463576bef957a1c53f027ab180842499 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Ni=C3=B1o?= Date: Tue, 20 Aug 2024 17:52:32 -0500 Subject: [PATCH 2/6] Closes #110. Modified setup and intro to not repeat information --- .gitignore | 1 + episodes/00-intro.md | 23 --------------- learners/setup.md | 70 +++++++++++++++++++++----------------------- 3 files changed, 35 insertions(+), 59 deletions(-) diff --git a/.gitignore b/.gitignore index b8ab7062..6b9c04ff 100644 --- a/.gitignore +++ b/.gitignore @@ -2,6 +2,7 @@ episodes/*html site/* !site/README.md +*.Rproj # History files .Rhistory diff --git a/episodes/00-intro.md b/episodes/00-intro.md index 7f2c2d9c..f1e1747e 100644 --- a/episodes/00-intro.md +++ b/episodes/00-intro.md @@ -17,21 +17,6 @@ exercises: 3 :::::::::::::::::::::::::::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::: prereq - -## Things You'll Need To Complete This Lesson - -#### Spreadsheet Software - -To work through this tutorial you will need access to a spreadsheet program. -Many computers come with a pre-installed spreadsheet program like Excel. macOS users who use Apple's Numbers application should note that it does not contain some of the features (particularly data validation) that we will be using. Please use LibreOffice or Microsoft Excel instead. - -If you do not have a spreadsheet program, install one using the instructions -in the link below. - -- [Instructions to install a spreadsheet program.](../learners/setup.md) - -:::::::::::::::::::::::::::::::::::::::::::::::::: Good data organization is the foundation of your research project. Most researchers have data or do data entry in @@ -120,14 +105,6 @@ In this lesson, we will assume that you are most likely using Excel as your primary spreadsheet program - there are other programs with similar functionality but Excel seems to be the most commonly used. -In this lesson we're going to talk about: - -1. [Formatting data tables in spreadsheets](01-format-data.md) -2. [Formatting problems](02-common-mistakes.md) -3. [Dates as data](03-dates-as-data.md) -4. [Quality control](04-quality-assurance.md) -5. [Exporting data](05-exporting-data.md) - :::::::::::::::::::::::::::::::::::::::: keypoints diff --git a/learners/setup.md b/learners/setup.md index d7787f29..026b84d8 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -31,55 +31,53 @@ page](https://www.datacarpentry.org/socialsci-workshop/data). ## Software -To interact with spreadsheets, we can use [LibreOffice](https://www.libreoffice.org/), -Microsoft Excel, [Gnumeric](https://www.gnumeric.org/), -[Onlyoffice](https://www.onlyoffice.com/), [WPS office](https://www.wps.com/) -or other programs. Commands may differ a bit between programs, but +To work through this tutorial you will need access to a spreadsheet program. For this you have many options: [Microsoft Excel](https://www.microsoft.com/en-us/microsoft-365/excel), [LibreOffice](https://www.libreoffice.org/), [Apple Numbers](https://support.apple.com/numbers), [Gnumeric](http://www.gnumeric.org/), [Onlyoffice](https://www.onlyoffice.com/), [WPS office](https://www.wps.com/), among others. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same. -For this lesson, if you don't have a spreadsheet program already, you can use -LibreOffice. It's a free, open source spreadsheet program. +For this lesson, we encourage you to use LibreOffice or Microsoft Excel, as the tasks we will +be doing have been tested in these programs. If you don't have Microsoft Excel, you can use +LibreOffice. It's a free, open source spreadsheet program. Here are the instructions to install it: + -macOS users who use Apple's Numbers application should note that it does not -contain some of the features (particularly data validation) that we will -be using. Please use LibreOffice or Microsoft Excel instead. #### Windows -- Download the Installer -- Install LibreOffice by going to [the installation - page](https://www.libreoffice.org/download/libreoffice-fresh/). The version - for Windows should automatically be selected. Click Download Version X.X.X - (whichever is the most recent version). -- Install LibreOffice -- Once the installer is downloaded, double click on it and LibreOffice should +- **Download the Installer** + Install LibreOffice by going to the [installation + page](https://www.libreoffice.org/download/download-libreoffice/). The + version for Windows should automatically be selected. Click + **Download**. You will go to a page that asks about a + donation, but you don't need to make one. Your download should begin + automatically. +- **Install LibreOffice** + Once the installer is downloaded, double click on it and it should install. -#### macOS +#### Mac OS X -- Download the Installer -- Install LibreOffice by going to [the installation - page](https://www.libreoffice.org/download/libreoffice-fresh/). The version - for Mac should automatically be selected. Click Download Version X.X.X - (whichever is the most recent version). -- Install LibreOffice -- Once the installer is downloaded, double click on it and LibreOffice should - install. +- **Download the Installer** + Install LibreOffice by going to the [installation + page](https://www.libreoffice.org/download/download-libreoffice/). The + version for macOS should automatically be selected. Click + **Download**. You will go to a page that asks about a + donation, but you don't need to make one. Your download should begin + automatically. +- **Install LibreOffice** + The file *LibreOffice\_X.X.X\_MacOS\_x86-64* (whichever version of LibreOffice you have selected) should have been + downloaded. Double click on this file, and LibreOffice will be + installed. #### Linux -- Download the Installer -- Install LibreOffice by going to [the installation - page](https://www.libreoffice.org/download/libreoffice-fresh/). The version - for Linux should automatically be selected. Click Download Version X.X.X - (whichever is the most recent version). -- Install LibreOffice -- Once the installer is downloaded, double click on it and LibreOffice should +- **Download the Installer** + Install LibreOffice by going to the [installation + page](https://www.libreoffice.org/download/download-libreoffice/). The + version for Linux should automatically be selected. Click **Download**. You will go to a page that asks about a donation, + but you don't need to make one. Your download should begin + automatically. +- **Install LibreOffice** + Once the installer is downloaded, double click on it and it should install. -- package manager option: - - pacman (Arch): `pacman -S libreoffice` - - yum (Fedora, CentOS): `yum install libreoffice` - - apt (Debian, Ubuntu): `apt install libreoffice` :::::::::::::::::::::::::::::::::::::::::::::::::: From 851f2f1cfdee3cbdef6bf78dc48dab490a1817be Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Ni=C3=B1o?= Date: Tue, 20 Aug 2024 18:01:54 -0500 Subject: [PATCH 3/6] Closes #122. Update questions in ep2 and ep3 --- episodes/01-format-data.md | 2 +- episodes/02-common-mistakes.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/episodes/01-format-data.md b/episodes/01-format-data.md index b78b5df5..03d0326a 100644 --- a/episodes/01-format-data.md +++ b/episodes/01-format-data.md @@ -14,7 +14,7 @@ exercises: 15 :::::::::::::::::::::::::::::::::::::::: questions -- What are some common challenges with formatting data in spreadsheets and how can we avoid them? +- How do we format data in spreadsheets for effective data use? :::::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/episodes/02-common-mistakes.md b/episodes/02-common-mistakes.md index 9706211a..9d529e53 100644 --- a/episodes/02-common-mistakes.md +++ b/episodes/02-common-mistakes.md @@ -12,7 +12,7 @@ exercises: 0 :::::::::::::::::::::::::::::::::::::::: questions -- What are some common challenges with formatting data in spreadsheets and how can we avoid them? +- What common mistakes are made when formatting spreadsheets? :::::::::::::::::::::::::::::::::::::::::::::::::: From 8efa7d78e50a353ea4893764297ecff6244109de Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Ni=C3=B1o?= Date: Tue, 20 Aug 2024 18:23:48 -0500 Subject: [PATCH 4/6] Closes #124. Explicitly mention tidy data principles --- episodes/01-format-data.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/episodes/01-format-data.md b/episodes/01-format-data.md index 03d0326a..d7c706fb 100644 --- a/episodes/01-format-data.md +++ b/episodes/01-format-data.md @@ -72,9 +72,9 @@ what you did when Reviewer #3 asks for a different analysis, you should Put these principles in to practice today during the exercises. -### Structuring data in spreadsheets +### Tidy data in spreadsheets -The cardinal rules of using spreadsheet programs for data: +The tidy data principles when structuring data in spreadsheets are: 1. Put all your variables in columns - the thing you're measuring, like 'weight' or 'temperature'. @@ -87,6 +87,8 @@ The cardinal rules of using spreadsheet programs for data: ensures that anyone can use the data, and is required by most data repositories. +You can understand more easily these principles with the illustrations in the [Tidy Data Series by Lowndes & Horst](https://allisonhorst.com/other-r-fun). + For instance, we're going to be working with data from a study of agricultural practices among farmers in two countries in eastern sub-Saharan Africa (Mozambique and Tanzania). Researchers conducted @@ -198,15 +200,17 @@ with this data and how you would fix it. ## Handy References -Two excellent references on spreadsheet organization are: +Three excellent references on spreadsheet organization are: + +- Hadley Wickham, *Tidy Data*, Vol. 59, Issue 10, Sep 2014, Journal of + Statistical Software. [http://www.jstatsoft.org/v59/i10](https://www.jstatsoft.org/v59/i10) + +- Julia Lowndes \& Allison Horst, *Tidy Data Series by Lowndes & Horst*. [https://allisonhorst.com/other-r-fun](https://allisonhorst.com/other-r-fun) - Karl W. Broman \& Kara H. Woo, *Data Organization in Spreadsheets*, Vol. 72, Issue 1, 2018, The American Statistician. [https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989) -- Hadley Wickham, *Tidy Data*, Vol. 59, Issue 10, Sep 2014, Journal of - Statistical Software. [http://www.jstatsoft.org/v59/i10](https://www.jstatsoft.org/v59/i10) - :::::::::::::::::::::::::::::::::::::::::::::::::: From 192ae44429f320e711bed476005e60de0aa00fd5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Ni=C3=B1o?= Date: Tue, 20 Aug 2024 18:33:53 -0500 Subject: [PATCH 5/6] Closes #168. Updated reference materials for advanced Excel in ep1 intro --- episodes/00-intro.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/episodes/00-intro.md b/episodes/00-intro.md index f1e1747e..2cb8621c 100644 --- a/episodes/00-intro.md +++ b/episodes/00-intro.md @@ -75,8 +75,8 @@ Nevertheless it is important to be aware of the limitations these data may prese - How to do *plotting* in a spreadsheet - How to *write code* in spreadsheet programs -If you're looking to do this, a good reference is -[Head First Excel](https://www.amazon.com/Head-First-Excel-learners-spreadsheets/dp/0596807694/ref=sr_1_1?ie=UTF8&qid=1491594584&sr=8-1&keywords=head+first+excel), published by O'Reilly. +If you're looking to do this, a couple of good references are the +[Excel Cookbook](https://search.worldcat.org/title/1419271899), published by O'Reilly, and the [Microsoft Excel 365 bible](https://search.worldcat.org/en/title/1263023438). :::::::::::::::::::::::::::::::::::::::::::::::::: From 814d887ef7071ba131aed0c59a7f909c62a2449d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Ni=C3=B1o?= Date: Tue, 20 Aug 2024 18:36:01 -0500 Subject: [PATCH 6/6] Closes #174. Delete outdated info about Excel's date system --- episodes/03-dates-as-data.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/episodes/03-dates-as-data.md b/episodes/03-dates-as-data.md index d040ee30..8cc27531 100644 --- a/episodes/03-dates-as-data.md +++ b/episodes/03-dates-as-data.md @@ -37,17 +37,6 @@ One of the other reasons dates can be tricky is that most spreadsheet programs h The first thing you need to know is that Excel stores dates as numbers - see the last column in the above figure. This serial number represents the number of days from December 31, 1899. In the example, July 2, 2014 is stored as the serial number 41822. -::::::::::::::::::::::::::::::::::::::::: callout - -## Excel's date systems - -Excel also entertains a second date system, the 1904 date system, as the default in Excel for Macintosh. This system will assign a -different serial number than the [1900 date system](https://support.microsoft.com/en-us/help/214330/differences-between-the-1900-and-the-1904-date-system-in-excel). Because of this, -[dates must be checked for accuracy when exporting data from Excel](https://uc3.cdlib.org/2014/04/09/abandon-all-hope-ye-who-enter-dates-in-excel/) (look for dates that are ~4 years off). - - -:::::::::::::::::::::::::::::::::::::::::::::::::: - Using functions we can add days, months or years to a given date. Say you had a research plan where you needed to conduct interviews with a set of informants every ninety days for a year.