Skip to content

Latest commit

 

History

History
73 lines (49 loc) · 3.46 KB

index.md

File metadata and controls

73 lines (49 loc) · 3.46 KB
layout root
lesson
.

Data Carpentry's aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses a tabular ecology dataset and teaches data cleaning, management, analysis and visualization.

Prerequisites

There are no pre-requisites, and the materials assume no prior knowledge about the tools. {: .prereq}

Data

The data for this workshop are is the Portal Project Teaching Database available on FigShare, with a CC-BY license available for reuse.

The Portal Project Teaching Database is a simplified version of the Portal Project Database designed for teaching. It is a tabular dataset of observations of small mammals in a desert ecosystem in Arizona, USA, collected over more than 40 years. It provides a real world example of life-history, population, and ecological data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught.

More information on this dataset {: .prereq}

The workshop can be taught using R or Python as the base language.

Overview of the lessons:

  • Data organization in spreadsheets
  • Data cleaning with OpenRefine
  • Introduction to R or python
  • Data analysis and visualization in R or python
  • SQL for data management

Detailed structure

Day 1 morning: Data organization & cleaning

There are two lessons in this section. The first is a spreadsheet lesson that teaches good data organization, and some data cleaning and quality control checking in a spreadsheet program.

The second lesson uses a program called OpenRefine to teach data cleaning and filtering, and to introduce the idea scripting(application programming interfaces).

Day 1 afternoon and Day 2 morning: Data analysis & visualization

These lessons includes a basic information to R or Python syntax, importing CSV data, subsetting and merging, data, and finishes with how to do plotting.

Day 2 afternoon: Data management with SQL

This lesson introduces the concept of a database using SQLite, how to structure data for easy database import, and how to import tabular data into SQLite. Then, it teaches basic queries, combining results and doing queries across multiple tables.

Other lessons

There are a number of other ecology lessons that are not part of the base workshop. Some of these are no longer taught, and some are only taught at extended workshops.