Skip to content

Commit 1f7e0fa

Browse files
committedMay 19, 2014
Quiz 3
0 parents  commit 1f7e0fa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+194477
-0
lines changed
 

‎Getting-and-Cleaning-Data.Rproj

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Version: 1.0
2+
3+
RestoreWorkspace: Default
4+
SaveWorkspace: Default
5+
AlwaysSaveHistory: Default
6+
7+
EnableCodeIndexing: Yes
8+
UseSpacesForTab: Yes
9+
NumSpacesForTab: 2
10+
Encoding: UTF-8
11+
12+
RnwWeave: Sweave
13+
LaTeX: pdfLaTeX

‎Old Testament.R

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
library(XML)
2+
library(stringr)
3+
4+
# URL address
5+
connection.url <- 'http://www.gutenberg.org/files/10/10-h/10-h.htm'
6+
7+
# Parse HTML
8+
testament.html <- htmlTreeParse(connection.url, useInternalNodes=T)
9+
xpathSApply(testament.html, '//title', xmlValue) # check for known nodes, i.e. //title
10+
11+
# Access top node
12+
testament.xml <- xmlRoot(testament.html)
13+
class(testament.xml)
14+
15+
# Extract XML codes using xmlSapply
16+
testament.doc <- xmlSApply(testament.xml, function(x) xmlSApply(x, xmlValue))
17+
class(testament.doc)
18+
19+
testament.list <- rapply(testament.doc, c)
20+
testament.frame <- data.frame(testament.list, stringsAsFactors=F)
21+
testament.frame.dup <- testament.frame[!duplicated(testament.frame), ]
22+
unique(testa)
23+
24+
write.csv(testament.frame, file='Old Testament.csv', sep=',')
25+
26+
27+
28+
29+
30+
31+
b1 <- str_extract_all(string = testament.list, pattern = '[0-9][0-9]:[0-9][0-9]')
32+
33+
# Test
34+
testament.list <- rapply(b1, c)
35+
b1.frame <- data.frame(b1, c)
36+
class(testament.list)
37+
38+
testament.frame <- data.frame(testament.list, stringsAsFactors=F)
39+
40+
# Convert XML codes and text into dataframe
41+
testament.df <- data.frame(testament.list)
42+
testament.df <- data.frame(testament.doc)
43+
class(testament.df)
44+
t
45+
str_extract_all(string = a, pattern = "\\(.*?\\)")
46+
47+
[0-9][0-9]:[0-9][0-9]

0 commit comments

Comments
 (0)