-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
123 lines (86 loc) · 3.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
[![Travis build status](https://travis-ci.org/USCCANA/socnet.svg?branch=master)](https://travis-ci.org/USCCANA/socnet)
[![Coverage status](https://codecov.io/gh/USCCANA/socnet/branch/master/graph/badge.svg)](https://codecov.io/github/USCCANA/socnet?branch=master)
# socnet
This R package is created to access the data available in the
SOCNET website https://lists.ufl.edu/cgi-bin/wa?A0=SOCNET, which is hosted
by The University of Florida in its Listserv website.
## Installation
This package is currently under develoment and is only available by downloading the bleeding edge version. You can use `devtools` to get it:
```r
devtools::install_github("USCCANA/socnet")
```
## Example
Before starts, let's first load the package.
```{r}
library(socnet)
```
Suppose that you want to look at the SOCNET archives, but you don't know from where to start. You can use the function `socnet_list_archives` to get a list of the archives that are available in the Listserv.
```{r example-archives}
# Getting the URLs to the archives per month
archives <- socnet_list_archives(cached = TRUE)
head(archives)
```
Now that we have the list of archives, we can access one of them and list what are the subjects (emails) that show under that archive with the `socnet_list_subjects` function.
```{r example-subjects}
# What was discussed during Oct 17?: Getting the subjects during that time
subjects <- socnet_list_subjects(archives$url[1], cached = TRUE)
```
Let's take a look at the output
```{r}
str(subjects)
head(subjects[,-1])
```
Now, we can use the function `socnet_parse_subject` to actually get the data of a particular subject. Let's try with the subject titled ``r subjects$subject[1]``
```{r example-fetch-subject}
socnet_parse_subject(subjects$url[1])
```
As you can see, the function returned a list with two elements, a vector of meta information, and the actual email.
# Most active user (compose side)
```{r}
rankfun <- function(x, colnames, maxn = 100) {
x <- as.data.frame(table(x))
x <- x[order(-x$Freq),]
dimnames(x) <- list(1:nrow(x), colnames)
knitr::kable(x[1:maxn,], row.names = TRUE)
}
# Getting the from column and removing weird characters
data("subjects")
from <- subjects$from
from <- iconv(from, to="ASCII//TRANSLIT")
# Removing <[log in to unmask]> message
from <- tolower(gsub("[<].+", "", from))
# Fixing some names...
regexp <- "Th?om(as)?( W)?\\.? Valente"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Thomas W. Valente"
regexp <- "Valdis( Krebs)?"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Valdis Krebs"
regexp <- "Steve Borgatti|Borgatti, Steve"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Steve Borgatti"
regexp <- "Snijders, T\\.A\\.B\\.|Tom A\\.B\\. Snijders|T\\.A\\.B\\.Snijders"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Tom Snijders"
regexp <- "Kathleen( M\\.)? Carley"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Kathleen M. Carley"
# Capitalizing the first letter
# I learned (copied) this from stackoverflow!
# https://stackoverflow.com/questions/6364783/capitalize-the-first-letter-of-both-words-in-a-two-word-string
# from <- gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2", from, perl=TRUE)
from <- stringr::str_to_title(from)
# Creating the table
rankfun(from, colnames=c("User", "Count"))
```
# Latest version of the cache data
```{r, results='asis', echo=FALSE}
readLines("inst/cache/readme.md", warn = FALSE)
```