dataverse-helper-scripts

This repository contains several one-off or not-often-used scripts used for Dataverse related work.

Github Issues to CSV - Pull selected github issues into a CSV file
EZID DOI update/verify - Update EZID target urls for migrated datasets. Verify that the DOIs point to the correct url.
Basic Stress Test - Run basic browsing scenarios

Github Issues to CSV

Use the github API to pull Issues into a CSV file

Initial Setup

Requires virtualenvwrapper
- OS X install: sudo pip install virtualenvwrapper

Open a Terminal
cd into src/github_issue_scraper
Make a virtualenv: mkvirtualenv github_issue_scraper
Install packages (fast): pip install -r requirements/base.txt
Within src/github_issue_scraper, copy creds-template.json to creds.json (in the same folder)
Change the creds.json settings appropriately.

Setup (2nd time around)

Open a Terminal
cd into src/github_issue_scraper
Type workon github_issue_scraper (and press Return)

Run a script

Set your repository, token information, output file name, and filters in creds.json
cd into src/github_issue_scraper
Run the program
- From the Terminal: python pull_issues.py
An output file will be written to src/github_issue_scraper/output/[file specified in creds.json]

Creds.json file notes

Sample file

{       
  "REPOSITORY_NAME" : "iqss/dataverse",
  "API_USERNAME" : "jsmith",
  "API_ACCESS_TOKEN" : "access-token-for-your-repo",

  "OUTPUT_FILE_NAME" : "github-issues.csv",
  "GITHUB_ISSUE_FILTERS" : {
        "labels" : "Component: API",
        "assignee" : "",
        "creator" : "",
        "labels_to_exclude" : "Status: QA"
    }
}

API_USERNAME - your github username without the @
API_ACCESS_TOKEN - see: https://github.com/blog/1509-personal-api-tokens
OUTPUT_FILE_NAME - Always written to src/github_issue_scraper/output/(file name)
GITHUB_ISSUE_FILTERS
- Leave filters blank to exclude them.
  - JSON below would include all assignee values

  "assignee" : "",

Comma separate multiple labels and labels_to_exclude
- Example of issues matching 3 labels: Component: API, Priority: Medium and Status: Design
  - (spaces between commas are stripped before attaching to api url)

  "labels" : "Component: API, Priority: Medium, Status: Design",

EZID DOI update/verify

Location src/ezid_helper

Scripts for two basic tasks:

Update EZID target urls for migrated datasets.
Quality check: Verify that the DOIs point to the correct url.

Input File

Pipe | delimited .csv file with the following data:
1. Dataset id (pk from the 4.0 db table dataset)
2. Protocol
3. Authority
4. Identifier
Sample rows

66319|doi|10.7910/DVN|29379
66318|doi|10.7910/DVN|29117
66317|doi|10.7910/DVN|28746
66316|doi|10.7910/DVN|29559

Input file creation

The input file is the result of a query from the postres psql shell:

Basic query

select id, protocol, authority, identifier from dataset where protocol='doi' and authority='10.7910/DVN' order by id desc;

Basic query to pipe | delimited text file

COPY (select id, protocol, authority, identifier from dataset where protocol='doi' and authority='10.7910/DVN' order by id desc) TO
'/tmp/file-name-with-dataset-ids.csv' (format csv, delimiter '|')

Running the script

(to do)

Output

(to do)

Stress Tests

These are basic tests using locustio.

Initial Setup

Requires virtualenvwrapper
- OS X install: sudo pip install virtualenvwrapper
- Don't forget the Shell Startup File: https://virtualenvwrapper.readthedocs.org/en/latest/install.html#shell-startup-file

Open a Terminal
cd into src/stress_tests
Make a virtualenv: mkvirtualenv stress_tests
Install locustio: pip install -r requirements/base.txt
- This takes a couple of minutes

Initial Setup: update settings

Within src/stress_tests, copy creds-template.json to creds.json (in the same folder)
Change the creds.json settings appropriately.

Setup (2nd time around)

Open a Terminal
cd into src/stress_tests
Type workon stress_tests (and press Return)

Run a script

Set your server and other information in creds.json
cd into src/stress_tests
Run a test script. In this example run basic_test_02.py
- From the Terminal: locust -f basic_test_02.py
Open a browser and go to: http://127.0.0.1:8089/

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
src		src
system-thoughts		system-thoughts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dataverse-helper-scripts

Github Issues to CSV

Initial Setup

Setup (2nd time around)

Run a script

Creds.json file notes

EZID DOI update/verify

Input File

Input file creation

Running the script

Output

Stress Tests

Initial Setup

Initial Setup: update settings

Setup (2nd time around)

Run a script

About

Releases

Packages

Contributors 4

Languages

IQSS/dataverse-helper-scripts

Folders and files

Latest commit

History

Repository files navigation

dataverse-helper-scripts

Github Issues to CSV

Initial Setup

Setup (2nd time around)

Run a script

Creds.json file notes

EZID DOI update/verify

Input File

Input file creation

Running the script

Output

Stress Tests

Initial Setup

Initial Setup: update settings

Setup (2nd time around)

Run a script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages