Documentation: https://dyvenia.github.io/viadot/
Source Code: https://github.com/dyvenia/viadot
A simple data ingestion library to guide data flows from some places to other places.
Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.
from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
print(df)
Output:
from | to | forecast | actual | index | |
---|---|---|---|---|---|
0 | 2021-08-10T11:00Z | 2021-08-10T11:30Z | 211 | 216 | moderate |
The above df
is a pandas DataFrame
object. It contains data downloaded by viadot
from the Carbon Intensity UK API.
Depending on the source, viadot
provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. For ready-made pipelines including data validation steps using dbt
, see prefect-viadot.
We assume that you have Docker installed.
Clone the 2.0
branch, and set up and run the environment:
git clone https://github.com/dyvenia/viadot.git -b 2.0 && \
cd viadot/docker && \
sh update.sh && \
sh run.sh && \
cd ../
In order to start using sources, you must configure them with required credentials. Credentials can be specified either in the viadot config file (by default, $HOME/.config/viadot/config.yaml
), or passed directly to each source's credentials
parameter.
You can find specific information about each source's credentials in the documentation.