This repository will soon hold an R client package for the CrowdFlower crowdsourcing platform.
To use the package, you will also need to obtain a CrowdFlower API key. This can be passed to any function using the key
argument or set globally using the CROWDFLOWER_API_KEY
environment variable. To set this from within R, simply do the following at the beginning of your R session:
Sys.setenv("CROWDFLOWER_API_KEY" = "example12345apikeystring")
The package provides two basic interfaces to Crowdflower, one that uses a familiar functional programming style and another that relies on R6 classes with reference semantics. Both are equivalent but each might be more useful for a particular application.
The standard, functional approach provided by the package uses R functions to call Crowdflower API operations to create, modify, and launch Crowdflower jobs, and retrieve the results. To initialize a job, create a set of instructions and a "CML" file, and create the job using job_create()
. Note that you do not need to import the actual HTML or XML files. Instead, simply copy the path to the files:
# load "instructions" file
f1 <- system.file("templates", "instructions1.html", package = "crowdflower")
# with custom file
# f1_custom <- "path/to/htmlfile.html"
# load "cml" file
f2 <- system.file("templates", "cml1.xml", package = "crowdflower")
# with custom file
# f1_custom <- "path/to/xmlfile.xml"
j1 <- job_create(title = "Job Title",
instructions = readChar(f1, nchars = 1e8L),
cml = readChar(f2, nchars = 1e8L))
The instructions file is an HTML document that describes a task to a Crowdflower worker. The CML (Crowdflower Markup Language) is an XML document that serves as a template for the content shown to workers, including any data fields and questions for workers to answer. A simple example CML file is included in the package:
<div class='html-element-wrapper'><div>
<p><span style='font-weight: normal;'>
Read the text below paying close attention to detail:
</span></p>
<p><strong>{{content}}</strong></p>
</div>
</div><cml:radios
label='What is the sentiment of the opinion expressed in the tweet?'
name='sentiment' aggregation='agg' validates='required' gold='true'>
<cml:radio label='Very Positive' value='5' />
<cml:radio label='Slightly Positive' value='4' />
<cml:radio label='Neutral' value='3' />
<cml:radio label='Slightly Negative' value='2' />
<cml:radio label='Very Negative' value='1' />
<cml:radio label='This post is not related to political candidates'
value='not_relevant' />
</cml:radios>
Once the job is created, you can modify its features using job_update()
and add data using job_add_data()
:
d <- data.frame(content = c("hello", "goodbye", "world")
job_add_data(j1, data = d)
From there, you can launch the job, wait for results (checking progress using job_status()
), and retrieve the results:
# launch job
job_launch(id = j1)
# check progress
job_status(id = j1)
# get results for job
report_regenerate(id = j1, report_type = "full")
report_get(id = j1, report_type = "full")
Once you no longer have a need for a job, it can be deleted using job_delete()
.
The R6 approach provides exactly the same functionality but can be particularly useful when working interactively because it uses "reference semantics" to simplify the code needed to modify a job. To work in this style, simply initialize a "Job"-class object using the Crowdflower job ID (for a job created as above, or via the Crowdflower web interface) or using a title and instructions file (as above:
# initialize using an existing job
j2 <- Job$new(j1)
# create a new job from scratch
j2 <- Job$new(title = "Job Title",
instructions = readChar(f1, nchars = 1e8L),
cml = readChar(f2, nchars = 1e8L))
Once the Job object is initialized, all the same operations as above can be performed using $
notation without needing to specify the job ID:
# modify
j2$update(title = "New Title")
# launch
j2$launch()
# pause
j2$pause()
# resume
j2$resume()
# get results
j2$get_report()
The Job class also has a set of "active binding" fields making it very easy to change job options using a familiar, list-like assignment style:
j2$title
j2$title <- "Better Title"
j2$instructions
j2$tags <- c("sentiment", "tweets", "coding")
j2$tags
The package will soon be available on CRAN and can be installed directly in R using:
install.packages("crowdflower")
To install the latest development version of crowdflower from GitHub:
# latest stable version
install.packages("crowdflower", repos = c(getOption("repos"), "http://cloudyr.github.io/drat"))
# latest (unstable) version from GitHub
if(!require("ghit")){
install.packages("ghit")
}
ghit::install_github("cloudyr/crowdflower")