Note: Please read this confluence page which explains the complete architecture of how RDocumentation works.
R Package that uses pkgdown
package, to parse R package documentation and pass it on to the next Lambda worker to upload the documentation to the RDocumentation database.
We have forked our own version of pkgdown
which we use here: https://github.com/datacamp/pkgdown
- Read messages from rdocs-r-worker SQS queue. This will contain the packages that need to be processed. The message types are documented in the /docs folder.
- Process the messages into a JSON files that we dump in S3 for logging.
- If the message is successfully processed, add the JSON to the rdocs-app-worker SQS queue (that will then be handled in the rdocs app API).
- If the processing fails, add an error job to the rdoc-r-worker-deadletter queue.
- Ensure you have
devtools
installed to ease local development - Set an environment variable
GITHUB_PAT
- Install the package's dependencies:
remotes::install_github("datacamp/pkgdown", ref = "master") install.packages("aws.sqs", repos = c(getOption("repos"), "http://cloudyr.github.io/drat"))
- Open up
RPackageParser.RProj
in RStudio. - Select Build > Load All; this will make all exported and unexported functions of the package available.
- To verify that it works, try to following command in your R console:
res <- process_package("https://cran.r-project.org/src/contrib/Archive/R6/R6_2.5.0.tar.gz", "R6", "cran")
First, add a file .env.R
in the package root folder with info that AWS needs:
Sys.setenv(AWS_ACCESS_KEY_ID = "ACCESS_KEY_ID",
AWS_SECRET_ACCESS_KEY = "SECRET_ACCESS_KEY",
AWS_DEFAULT_REGION = "us-east-1",
DEST_QUEUE = "rdoc-app-worker",
SOURCE_QUEUE = "rdoc-r-worker",
DEADLETTER_QUEUE = "rdoc-r-worker-deadletter")
You need to add AWS keys that have write access to the SQS queues so that you can post messages to the queue.
You can find AWS_ACCESS_KEY_ID
in the AWS Parameter Store, but AWS_SECRET_ACCESS_KEY
will be encrypted there so you will need to request that value from the infra team.
After that, you can run main()
; this will poll the SQS queues and do all the processing:
RPackageParser::main()
If you want to add messages to the queue for local testing, setup the aws cli and then run:
aws sqs send-message --queue-url https://queue.amazonaws.com/301258414863/rdoc-r-worker --message-body '{"name":"ReorderCluster","version":"1.0","path":"ftp://cran.r-project.org/pub/R/src/contrib/ReorderCluster_1.0.tar.gz"}'
where you replace the body with the package that you want to test.
Note that this is the production queue, which means that the queue will be processed both by your local parser and the production parser, and whoever pics the message first will be the one to process it. That's why you might need to send a few requests until your local parser can pick the message.
After you added your message to the rdoc-r-worker queue, you should see it for a brief moment in AWS while its being processed. After the processing is done, you should be able to see new messages in rdoc-app-worker queue (click on the "Poll for messages" button in the aws console).
If you just want to test pulling a package and generating the output that will be added to the destination queue, just open this project in RStudio and run these commands in the console:
devtools::load_all(".")
library("RPackageParser")
res <- process_package("https://cran.r-project.org/src/contrib/REdaS_0.9.4.tar.gz", "REdaS", "cran")
: replace these arguments with the ones of the package you want to test.write(jsonlite::toJSON(res$topics[[1]],auto_unbox = TRUE), file = 'topic.json')
: this will create atopic.json
file in the root of the project that contains the JSON that will be added to the queue. This is what the API will process before adding the topic to the mysql database.
- Commits to master are deployed to staging
- Tags that use
vx.y.z
are deployed to production