This repository was archived by the owner on Jun 27, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathpoligrams.Rmd
157 lines (116 loc) · 4.14 KB
/
poligrams.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
title: "Analyzing Politician Instagram Accounts Using clarifai"
author: "Gaurav Sood"
date: "2015-11-10"
vignette: >
%\VignetteIndexEntry{Analyzing Politician Instagram Accounts Using clarifai}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Analyzing Politician Instagram Accounts Using clarifai
```{r load_instagram, eval=FALSE}
library(instaR)
```
To use the instagram API, go to [https://instagram.com/developer/](https://instagram.com/developer/) and click on manage client and then register a client. Choose a name etc. For website and redirect URL, write in localhost:1410. This will give you client ID and secret. Plug these in as follows:
```{r insta_auth, eval=FALSE}
my_oauth <- instaOAuth(app_id="1f1f8228974248ba804b4c02fb3c082f", app_secret="a8a727a6b21e488988207686c88ec49e")
save(my_oauth, file="my_oauth")
```
Now it is time to load clarifai:
```{r load_clarifai, eval=FALSE}
library(clarifai)
```
Clarifai ships with instagram handles of politicians. Load the file using:
```{r get_data, eval=FALSE}
filepath <- system.file("inst/extdata/congress.csv", package = "clarifai")
pols <- read.csv(filepath)
```
Next, download data from instagram:
```{r download_data, eval=FALSE}
# getUserMedia(pols$instagram[1], token=my_oauth)
res <- list()
for (i in 1:nrow(pols)) {
# Not all politicians have instagram accounts.
if (pols$instagram[i]!="") {
# Not all have public posts
res[[i]] <- tryCatch(getUserMedia(pols$instagram[i], token=my_oauth), error=function(err) NA)
} else {
res[[i]] <- NA
}
}
# rbind
res2 <- do.call(rbind, res) # nrow = 8088 (may change for runs in the future)
```
Merge it with some pols data
```{r merge_write, eval=FALSE}
# Get pols data ready
small_pols <- pols[,c("first_name", "last_name", "party", "instagram", "dw_nominate")]
small_pols_2 <- subset(small_pols, instagram!="") # take out no username/NA
# Merge
res2[, c("first_name", "last_name", "party", "instagram", "dw_nominate")] <-
small_pols_2[match(res2$username, small_pols_2$instagram),]
# write.csv(res2, file="res2.csv", row.names=F)
```
Now, get image labels from clarifai:
```{r get_clarifai_labels, eval=FALSE}
labs <- list()
# Not implemented optimally.
# You can push all images at once. And that is the best than 8k requests.
for (i in 1:nrow(res2)) {
labs[[i]] <- tryCatch(tag_image_urls(res2$image_url[i]), error=function(err) NA)
}
labs_df <- do.call(rbind, labs)
```
Next merge the labels back into the data:
```{r merge_and_save, eval=FALSE}
# Merge
labs_df[,names(res2)] <- res2[match(labs_df$img_url, res2$image_url),]
# write.csv(labs_df, file="labs_df.csv", row.names=F)
# This data frame is available in the extdata folder
```
Let us analyze data. Popular tags:
```{r pop_tags, eval=FALSE}
head(table(labs_df$tags)[order(-table(labs_df$tags))], 40)
```
```{r out_pop, eval=FALSE}
## people politics adult men group government business women portrait leader
## 1592 1137 1132 999 910 795 793 773 763 670
## clothing politician education speech election indoors meeting room competition many
## 554 472 456 435 433 426 360 352 347 345
```
Do Republican instagram accounts have more photos with military tags than Democrats?
```{r military, eval=FALSE}
table(grepl("military", labs_df$tags), labs_df$party)
```
```{r out_mil, eval=FALSE}
## D R
## FALSE 19806 16030
## TRUE 94 90
```
How about women?
```{r women, eval=FALSE}
table(grepl("women", labs_df$tags), labs_df$party)
```
```{r out_women, eval=FALSE}
## D R
## FALSE 19458 15853
## TRUE 442 267
```
```{r men, eval=FALSE}
table(grepl("men", labs_df$tags), labs_df$party)
```
See also for men:
```{r out_men, eval=FALSE}
## D R
## FALSE 18265 14978
## TRUE 1635 1142
```
Protest?
```{r protest, eval=FALSE}
table(grepl("protest", labs_df$tags), labs_df$party)
```
```{r out_protest, eval=FALSE}
## D R
## FALSE 19734 16024
## TRUE 166 96
```