Skip to content

Automate publishing matminer and deepchem datasets #188

Open
@kjschmidt913

Description

@kjschmidt913

write a script you can run on your local machine (like a .py script you can run in PyCharm) that automates publication of datasets.

The following are some general things to think about, but I suggest breaking them out into your own tasks in this story

Some steps to investigate:

  • whether or not you can pull the download links to the datasets programmatically, by webscraping with BeautifulSoup or a similar Python package
  • how to write the metadata to .json files (use the json package)
  • how to read in the metadata files and data files from your local machine programmatically (I suggest keeping them all in a folder that you walk through using os or something similar)
  • how to map the metadata files to the data files
  • how to pass arguments to the script so you can run it easily from the commandline (I suggest argparse) -- an example of an argument you might want to pass in would be the path to the directory containing the data
  • The ultimate goal is to be able to just run the script and publish these datasets with as little human labor as possible. So if there's something you can code to make less human work in the future, do that something! :)

if there does not appear to be a way to read in the title and related information from the metadata, then create a new issue to add that to foundry and amend publish(). But first, see if you can include it in the metadata

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions