forked from resbaz/r-novice-gapminder
-
Notifications
You must be signed in to change notification settings - Fork 2
/
11-writing-data.html
101 lines (101 loc) · 6.02 KB
/
11-writing-data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-responsive.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="stylesheet" type="text/css" href="css/swc-workshop-and-lesson.css" />
<link rel="stylesheet" type="text/css" href="css/lesson.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container container-full-width card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<div class="row-fluid">
<div class="span10 offset1">
<h1 class="title">R for reproducible scientific analysis</h1>
<h2 class="subtitle">Reading data</h2>
<div id="learning-objectives" class="objectives">
<h2>Learning Objectives</h2>
<ul>
<li>To be able to write out data from R</li>
</ul>
</div>
<h3 id="writing-data">Writing data</h3>
<p>At some point, you'll want to write out data from R.</p>
<p>We can use the <code>write.table</code> function for this, which is very similar to <code>read.table</code> from before.</p>
<p>Let's create a data-cleaning script, for this analysis, we only want to focus on the gapminder data for Australia:</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">write.table</span>(
gapminder[gapminder$country ==<span class="st"> "Australia"</span>,],
<span class="dt">file=</span><span class="st">"cleaned-data/gapminder-aus.csv"</span>,
<span class="dt">sep=</span><span class="st">","</span>
)</code></pre>
<p>Let's switch back to the shell to take a look at the data to make sure it looks ok:</p>
<pre class="shell"><code>cd ~/gapminder/cleaned-data/
head gapminder-aus.csv</code></pre>
<pre class="output"><code>"country","year","pop","continent","lifeExp","gdpPercap"
"61","Australia",1952,8691212,"Oceania",69.12,10039.59564
"62","Australia",1957,9712569,"Oceania",70.33,10949.64959
"63","Australia",1962,10794968,"Oceania",70.93,12217.22686
"64","Australia",1967,11872264,"Oceania",71.1,14526.12465
"65","Australia",1972,13177000,"Oceania",71.93,16788.62948
"66","Australia",1977,14074100,"Oceania",73.49,18334.19751
"67","Australia",1982,15184200,"Oceania",74.74,19477.00928
"68","Australia",1987,16257249,"Oceania",76.32,21888.88903
"69","Australia",1992,17481977,"Oceania",77.56,23424.76683</code></pre>
<p>Hmm, that's not quite what we wanted. Where did all these quotation marks come from? Also the row numbers are meaningless.</p>
<p>Let's look at the help file to work out how to change this behaviour.</p>
<pre class="sourceCode r"><code class="sourceCode r">?write.table</code></pre>
<p>By default R will wrap character vectors with quotation marks when writing out to file. It will also write out the row and column names.</p>
<p>Let's fix this:</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">write.table</span>(
gapminder[gapminder$country ==<span class="st"> "Australia"</span>,],
<span class="dt">file=</span><span class="st">"cleaned-data/gapminder-aus.csv"</span>,
<span class="dt">sep=</span><span class="st">","</span>, <span class="dt">quote=</span><span class="ot">FALSE</span>, <span class="dt">row.names=</span><span class="ot">FALSE</span>
)</code></pre>
<p>Now lets look at the data again using our shell skills:</p>
<pre class="shell"><code>head gapminder-aus.csv</code></pre>
<pre class="output"><code>country,year,pop,continent,lifeExp,gdpPercap
Australia,1952,8691212,Oceania,69.12,10039.59564
Australia,1957,9712569,Oceania,70.33,10949.64959
Australia,1962,10794968,Oceania,70.93,12217.22686
Australia,1967,11872264,Oceania,71.1,14526.12465
Australia,1972,13177000,Oceania,71.93,16788.62948
Australia,1977,14074100,Oceania,73.49,18334.19751
Australia,1982,15184200,Oceania,74.74,19477.00928
Australia,1987,16257249,Oceania,76.32,21888.88903
Australia,1992,17481977,Oceania,77.56,23424.76683</code></pre>
<p>That looks better!</p>
<div id="challenge" class="challenge">
<h4>Challenge</h4>
<p>Write a data-cleaning script file that subsets the gapminder data to include only data points collected since 1990.</p>
<p>Use this script to write out the new subset to a file in the <code>cleaned-data/</code> directory.</p>
</div>
</div>
</div>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="http://software-carpentry.org/v5/js/bootstrap/bootstrap.min.js"></script>
</body>
</html>