v0.3.0 - Ruth Wodak
This release is a big change in many parts of the package. It adds new API's, re-factored models and lots of new documentation.
Overview of the most important changes:
- Re-factored data models: setters, getters, data validation and JSON export and import
- Export and import of metadata to/from pre-formatted CSV templates
- Add User Guides, Use-Cases, Contributor Guide and much more to the documentation
- Add SWORD, Search, Metrics and Data Access API
- Collect the complete data tree of a Dataverse with
get_children()
- Use JSON schemas for metadata validation (
jsonschemas
required) - Updated Python requirements: Python>=3.6 (no Python 2 support anymore)
- Curl required, only for
update_datafile()
- Transfer pyDataverse to GDCC - the Global Dataverse Community Consortium (#52)
Version 0.3.0 is named in honor of Ruth Wodak (Wikipedia), an Austrian linguist. Her work is mainly located in discourse studies, more specific in critical discourse analysis, which looks at discourse as a form of social practice. She was awarded with the Wittgenstein-Preis, the highest Austrian science award.
For help or general questions please have a look in our Docs or email [email protected].
Use-Cases
The new functionalities were developed with some specific use-cases in mind:
See more detailed in our Documentation.
Retrieve data structure and metadata from Dataverse instance (DevOps)
Collect all Dataverses, Datasets and Datafiles of a Dataverse instance, or just a part of it. The results then can be stored in JSON files, which can be used for testing purposes, like checking the completeness of data after a Dataverse upgrade or migration.
Upload and removal of test data (DevOps)
For testing, you often have to upload a collection of data and metadata, which should be removed after the test is finished. For this, we offer easy to use functionalities.
Import data from CSV templates (Data Scientist)
Importing lots of data from data sources outside dataverse can be done with the CSV templates as a bridge. Fill the CSV templates with your data, by machine or by human, and import them into pyDataverse for an easy mass upload via the Dataverse API.
Bugs
Features & Enhancements
API
Summary: Add other API's next to Native API and update Native API.
- add Data Access API:
- get datafile(s) (
get_datafile()
,get_datafiles()
,get_datafile_bundle()
) - request datafile access (
request_access()
,allow_access_request()
,grant_file_access()
,list_file_access_requests()
)
- get datafile(s) (
- add Metrics API:
total()
,past_days()
,get_dataverses_by_subject()
,get_dataverses_by_category()
,get_datasets_by_subject()
,get_datasets_by_data_location()
- add SWORD API:
get_service_document()
- add Search API:
search()
- Native API:
- Get all children data-types of a Dataverse or a Dataset in a tree structure (
get_children()
) - Convert Dataverse ID's to its alias (
dataverse_id2alias()
) - Get contents of a Dataverse (Datasets, Dataverses) (
get_dataverse_contents()
) - Get Dataverse assignements (
get_dataverse_assignments()
) - Get Dataverse facets (
get_dataverse_facets()
) - Edit Dataset metadata (
edit_dataset_metadata()
) (#19) - Destroy Dataset (
destroy_dataset()
) - Dataset private URL functionalities (
create_dataset_private_url()
,get_dataset_private_url()
,delete_dataset_private_url()
) - Get Dataset version(s) (
get_dataset_versions()
,get_dataset_version()
) - Get Dataset assignments (
get_dataset_assignments()
) - Check if Dataset is locked (
get_dataset_lock()
) - Get Datafiles metadata
get_datafiles_metadata()
- Update datafile metadata (
update_datafile_metadata()
) - Redetect Datafile file type (
redetect_file_type()
) - Restrict Datafile (
restrict_datafile()
) - ingest Datafiles (
reingest_datafile()
,uningest_datafile()
) - Datafile upload in native Python (no CURL dependency anymore) (
upload_datafile()
) - Replace existing Datafile
replace_datafile()
- Roles functionalities (
get_dataverse_roles()
,create_role()
,show_role()
,delete_role()
) - Add API token functionalities (
get_user_api_token_expiration_date()
,recreate_user_api_token()
,delete_user_api_token()
) - Get current user data (
get_user()
) (#59) - Get API ToU (
get_info_api_terms_of_use()
) - Add import of existing Dataset in
create_dataset()
(#3) - Datafile upload natively in Python (no curl anymore) (
upload_datafile()
)
- Get all children data-types of a Dataverse or a Dataset in a tree structure (
- Api
- Set User-Agent for requests to
pydataverse
- Change authentication during request functions (get, post, delete, put): If API token is passed, use it. If not, don't set it. No
auth
parameter used anymore.
- Set User-Agent for requests to
Models
Summary: Re-factoring of all models (Dataverse, Dataset, Datafile).
New methods:
from_json()
imports JSON (like Dataverse's own JSON format) to pyDataverse models objectget()
returns a dict of the pyDataverse models objectjson()
returns a JSON string (like Dataverse's own JSON format) of the pyDataverse models object. Mostly used for API uploads.validate_data()
validates a pyDataverse object with a JSON schema
Utils
- Save list of metadata (Dataverses, Datasets or Datafiles) to a CSV file (
write_dicts_as_csv()
) (#11) - Walk through the data tree from
get_children()
and extract Dataverses, Datasets and Datafiles (dataverse_tree_walker()
) - Store the results from
dataverse_tree_walker()
in seperate JSON files (save_tree_data()
) - Validate any data model dictionary (Dataverse, Dataset, Datafile) against a JSON schema (
validate_data()
) - Clean strings (trim whitespace) (
clean_string()
) - Create URL's from identifier (
create_dataverse_url()
,create_dataset_url()
,create_datafile_url()
) - Update
read_csv_to_dict()
: replacedv.
prefix, load JSON cells and convert boolean cell strings
Docs
Many new pages and tutorials:
- Add User Guide - Basic
- Add User Guide - Advanced
- Add User Guide - Use-Cases
- Add Contributor Guide
- Add Installation
- Add CSV templates
- Add FAQ
- Add Resources
- Improve docstrings
- Fix typo (#40)
- Update Homepage
Tests
- Add tests for new functions
- Re-factor existing tests
- Create fixtures
- Create test data
Miscellaneous
- Add Python 3.8 and Python 2.7, 3.4 and 3.5 removed (Python>=3.6 required now)
- Add jsonschema as requirement
- Add JSON schemas for Dataverse upload, Dataset upload, Datafile upload and DSpace to package
- Add CSV templates for Dataverses, Datasets and Datafiles from pyDataverse_templates
- Transfer pyDataverse to GDCC - the Global Dataverse Community Consortium (#52)
- Improve code formatting: black, isort, pylint, mypy, pre-commit
- Add pylint linter
- Add mypy type checker
- Add pre-commit for managing pre-commit hooks.
- Add radon code metrics
- Add GitHub templates (PR, issues, commit) (#57)
- Re-structure requirements
- Get DOI:10.5281/zenodo.4470151 for GitHub repository
Other
Thanks to Daniel Melichar (@dmelichar), Vyacheslav Tykhonov (Slava), GDCC, @ecowan, @BPeuch, @j-n-c and @ambhudia for their support for this release. Special thanks to the Pandas project for their great blueprint for the Contributor Guide.
PyDataverse is supported by funding as part of the Horizon2020 project SSHOC.