You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When introducing R, we tend to get bogged down on dealing with missing values (NAs) and dealing with factors. And then dealing with both in read.csv(), via the arguments stringsAsFactors and na.strings.
For the "portal" data set, they both hit you hard with the sex column which has values "M", "F", or "" (blank).
Christina and I recently taught a one-day Data Carpentry workshop, and I taught dplyr and ggplot2 using the gapminder data. Those data were considerably easier because there are no missing values at all, and while there are factors, you don't need to do anything special with them.
We might consider using a reduced portal data set that has only the complete records, deferring discussion of NAs and factors to later or just skipping it entirely.
The text was updated successfully, but these errors were encountered:
I like this idea of simplified portal data set to get quickly in to R. I also see value in the gapminder data, as there is an audience that largely uses categorical data. Not sure we are ready for two versions but I like what I see at https://github.com/kbroman/Workshop_DataCarpNSBE.
When introducing R, we tend to get bogged down on dealing with missing values (
NA
s) and dealing with factors. And then dealing with both inread.csv()
, via the argumentsstringsAsFactors
andna.strings
.For the "portal" data set, they both hit you hard with the
sex
column which has values"M"
,"F"
, or""
(blank).Christina and I recently taught a one-day Data Carpentry workshop, and I taught dplyr and ggplot2 using the gapminder data. Those data were considerably easier because there are no missing values at all, and while there are factors, you don't need to do anything special with them.
We might consider using a reduced portal data set that has only the complete records, deferring discussion of
NA
s and factors to later or just skipping it entirely.The text was updated successfully, but these errors were encountered: