This repository contains several one-off or not-often-used scripts used for Dataverse related work.
- Github Issues to CSV - Pull selected github issues into a CSV file
- EZID DOI update/verify - Update EZID target urls for migrated datasets. Verify that the DOIs point to the correct url.
- Basic Stress Test - Run basic browsing scenarios
Use the github API to pull Issues into a CSV file
- Requires virtualenvwrapper
- OS X install:
sudo pip install virtualenvwrapper
- OS X install:
- Open a Terminal
- cd into
src/github_issue_scraper
- Make a virtualenv:
mkvirtualenv github_issue_scraper
- Install packages (fast):
pip install -r requirements/base.txt
- Within
src/github_issue_scraper
, copycreds-template.json
tocreds.json
(in the same folder) - Change the
creds.json
settings appropriately.
- Open a Terminal
- cd into
src/github_issue_scraper
- Type
workon github_issue_scraper
(and press Return)
- Set your repository, token information, output file name, and filters in
creds.json
- cd into
src/github_issue_scraper
- Run the program
- From the Terminal:
python pull_issues.py
- From the Terminal:
- An output file will be written to
src/github_issue_scraper/output/[file specified in creds.json]
- Sample file
{
"REPOSITORY_NAME" : "iqss/dataverse",
"API_USERNAME" : "jsmith",
"API_ACCESS_TOKEN" : "access-token-for-your-repo",
"OUTPUT_FILE_NAME" : "github-issues.csv",
"GITHUB_ISSUE_FILTERS" : {
"labels" : "Component: API",
"assignee" : "",
"creator" : "",
"labels_to_exclude" : "Status: QA"
}
}
API_USERNAME
- your github username without the@
API_ACCESS_TOKEN
- see: https://github.com/blog/1509-personal-api-tokensOUTPUT_FILE_NAME
- Always written tosrc/github_issue_scraper/output/(file name)
GITHUB_ISSUE_FILTERS
- Leave filters blank to exclude them.
- JSON below would include all
assignee
values
- JSON below would include all
- Leave filters blank to exclude them.
"assignee" : "",
- Comma separate multiple
labels
andlabels_to_exclude
- Example of issues matching 3 labels:
Component: API
,Priority: Medium
andStatus: Design
- (spaces between commas are stripped before attaching to api url)
- Example of issues matching 3 labels:
"labels" : "Component: API, Priority: Medium, Status: Design",
- Location
src/ezid_helper
Scripts for two basic tasks:
- Update EZID target urls for migrated datasets.
- Quality check: Verify that the DOIs point to the correct url.
- Pipe
|
delimited .csv file with the following data:- Dataset id (pk from the 4.0 db table dataset)
- Protocol
- Authority
- Identifier
- Sample rows
66319|doi|10.7910/DVN|29379
66318|doi|10.7910/DVN|29117
66317|doi|10.7910/DVN|28746
66316|doi|10.7910/DVN|29559
The input file is the result of a query from the postres psql shell:
- Basic query
select id, protocol, authority, identifier from dataset where protocol='doi' and authority='10.7910/DVN' order by id desc;
- Basic query to pipe
|
delimited text file
COPY (select id, protocol, authority, identifier from dataset where protocol='doi' and authority='10.7910/DVN' order by id desc) TO
'/tmp/file-name-with-dataset-ids.csv' (format csv, delimiter '|')
(to do)
(to do)
These are basic tests using locustio.
- Requires virtualenvwrapper
- OS X install:
sudo pip install virtualenvwrapper
- Don't forget the Shell Startup File: https://virtualenvwrapper.readthedocs.org/en/latest/install.html#shell-startup-file
- OS X install:
- Open a Terminal
- cd into
src/stress_tests
- Make a virtualenv:
mkvirtualenv stress_tests
- Install locustio:
pip install -r requirements/base.txt
- This takes a couple of minutes
- Within
src/stress_tests
, copycreds-template.json
tocreds.json
(in the same folder) - Change the
creds.json
settings appropriately.
- Open a Terminal
- cd into
src/stress_tests
- Type
workon stress_tests
(and press Return)
- Set your server and other information in
creds.json
- cd into
src/stress_tests
- Run a test script. In this example run basic_test_02.py
- From the Terminal:
locust -f basic_test_02.py
- From the Terminal:
- Open a browser and go to: http://127.0.0.1:8089/