The Metrics dashboard contains three different parts:
- The backend. This part aggregates the data.
- The frontend. This part uses the aggregated data and displays the chart.
- The IQSS website. The front end is embedded in an iframe on iq.harvard.edu/metrics
In most cases the scripts has two steps:
- Harvesting the data. Which means accessing the data source and saving the result.
- Aggregating the data. Which means preparing the data in the desired format for the front end. Data is filtered, summarized, grouped, counted etc.
Scripts are written in Python and uses Pandas as the library for data manipulation. There are some custom functions in myfunctions.py. All data is written as a TSV in the front end repository.
The following sources are used:
- Google sheets.
- Aggregated Dataverse data
- Dataverse API
- Github
- RT ticketing system
Uses a Google spreadsheet.
Updates: The department edits the number on a monthly basis.
Contact: Pat.
Uses a serie of google Forms. The data from the spreadsheets is aggregated.
Contact: Wendy
Update: continuous
For now, it uses google spreadsheet. In the future the RT-connection will be used.
Updates: Every FY
Contacts: Soner and Randy
Uses a couple of sources. The most important source is the aggregated data from all dataverses installation. This aggregation is done on https://dataversemetrics.odum.unc.edu/dataverse-metrics/ and copied.
Dataverse TV. Spreadsheet with videos about dataverse.
Github: Number of installation and also some infor about contributors
RT-API to harvest the number of requests for dataverse.
Contact: Sonia for the Support part and Danny for the rest.
Data is mainly stored in RT, but since DSS want to preparation before it is published, with receive the prepared CSVs. Contact: Steve Worthington
Updates: Yearly
Uses a google sheet with the aggregated data.
Updates: Yearly
Contact: Chase
Uses a Google sheet for now, with aggregated data. After the migration of RCE we will look at the at this again.
Contact: Len
pyenv brew install pyenv
pipenv brew install pipenv
an .env file in this directory with credentials and secret/links (or ENV variables set on command line)
Example.
$ cat .env
SHEET_URL_BUSINESS_OPERATIONS="XXSECRET"
SHEET_URL_CSS="XXSECRET"
SHEET_URL_PSR="XXSECRET"
SHEET_URL_MAIN_AND_RESEARCH_COMPUTING="XXSECRET"
SHEET_URL_DATAVERSE="XXSECRET"
SHEET_URL_CGA_LICENSE_REQUEST="XXSECRET"
SHEET_URL_CGA_TRAINING_REGISTRATION="XXSECRET"
SHEET_URL_CGA_EVENT_REGISTRATION="XXSECRET"
SHEET_URL_CGA_GIS_APPLICATION="XXSECRET"
SHEET_URL_CGA_CONTACT="XXSECRET"
SHEET_URL_CGA_WORKSHOP_EVALUATIONS="XXSECRET"
RT_USER="XXSECRET"
RT_PASSWORD="XXSECRET"
make build run
build
: Installs pre-requisite libraries and git pull iqss-metrics-dashboard and
run
: Runs the processing. Files updated in ./out which is symlink into iqss-metrics-dashboard repo
At this point look around at out or do a git diff in ./iqss-metrics-dashboard to see if it's good... This part will be automatic after Erik confirms.
clean
: Removes all above.
Use the following method
harvest_sheet_tsv_http(output_path, sheet_name, os.getenv("SHEET_URL_ENV"), "A:E", [0..n], gid=0)
All URLs are presumed secrets and should live in the .env
file above. The sheet url also defines which sheet tab
to pull from
I needed to add the spreadsheet, selecting A through D, but only returning the columns with index 0, 2, and 3, on the second tab sheet entitled "Z"
https://docs.google.com/spreadsheets/d/1F7fcSJoPzUloWMkigLlzJW6r9zGuurtKiAcQirFS8kU
If I click on "Z" tab, the window location becomes
The gid
is the id of the tab.
Therefore, to harvest this sheet, you would first add the URL to .env, and in your harvester, run the following:
harvest_sheet_tsv_http(output_path, "test_sheet", os.getenv("SHEET_URL_ENV"), "A:D", [0,2,3], gid=831621854)
Note that by default gid=0 meaning first tab on the sheet. gid is optional when a sheet has 1 tab.
To schedule the above job, you will also need to set the same environment variable in GitHub actions. Following the above example
SHEET_URL_ENV="https://docs.google.com/spreadsheets/d/1F7fcSJoPzUloWMkigLlzJW6r9zGuurtKiAcQirFS8kU"
Set this variable from the GitHub actions secret page. See setting encrypted secrets
The IQSS_DASHBOARD_SSH_DEPLOY_KEY
secret is an SSH key used to commit/push to the dashboard.
It is a deploy key
configured in the dashboard repo. You don't need this for local development as it is assumed you have
your own GH key capale of committing/pushing to both repos
The cron specification lives in .github/workflows/build.yml
See https://jasonet.co/posts/scheduled-actions/ for more info.
Follow your original workflow.
- Create a
.py
file for each harvester. - Add sheet URLs to .env
- Use the
harvest_sheet_tsv_http
function in your harvester to download andd optionally truncate the data from google sheets. - You can now test. Use
make run
. - You can (manually) deploy to the production dash by running
make deploy
. - Add the same env files to GH actions encrypted secrets store.
- Push to GH.