This repository contains example notebooks demonstrating the Open Datasets Python SDK which allows you to enrich, and get open datasets using Azure. The OpenDataSets SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data from the cloud.
pip install azureml-opendatasets
- To learn more about Azure Open Datasets: https://docs.microsoft.com/azure/open-datasets/
- How to load open datasets into your familiar Pandas/SPARK DataFrame: check out notebooks under tutorials/data-access.
- How to join your own data with open datasets: check out notebooks under tutorials/data-join.
- For Pandas version, either you already created your own Azure Notebooks library, or you have your own Jupyter server. Then you simply upload the notebook over there to run it.
- For SPARK version, you can create an Azure Databricks Workspace in your Azure subscription, upload the notebook over there, and click 'Run'. Alternatively, you can setup your own SPARK cluster and run it there.
Detailed API references are available here.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.