From 76426c6bb30aec354e0778077b901237362732aa Mon Sep 17 00:00:00 2001 From: 23rdPro Date: Mon, 23 Sep 2024 13:42:25 +0100 Subject: [PATCH] Add demo Colab notebook using magic CLI commands for setup and execution --- docs/reference/colab-demo.ipynb | 263 ++++++++++++++++++++++++++++++++ 1 file changed, 263 insertions(+) create mode 100644 docs/reference/colab-demo.ipynb diff --git a/docs/reference/colab-demo.ipynb b/docs/reference/colab-demo.ipynb new file mode 100644 index 0000000000..04404a93ec --- /dev/null +++ b/docs/reference/colab-demo.ipynb @@ -0,0 +1,263 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8065978e", + "metadata": {}, + "source": [ + "# [Demo]: Google Colab Notebook 🚀\n", + "\n", + "This notebook demonstrates how to manage and deploy data pipelines using the DLT (Data Loading Tool) CLI, directly within an IPython notebook via magic commands. We’ll recreate the pipeline setup from [Colab Demo](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing#scrollTo=GYREioraz1m6), using %pipeline, %init, %schema, and other magic commands.\n", + "\n", + "### **What you'll learn:**\n", + "\n", + "1. How to list, sync, and manage pipelines using `%pipeline`.\n", + "2. How to initialize a new DLT pipeline with `%init`.\n", + "3. How to manage schemas using `%schema`.\n", + "4. How to check DLT version with `%dlt_version`.\n", + "\n", + "Let's dive in!\n", + "\n", + "# 1. **Setup Environment**\n", + "\n", + "First, you need to install the required DLT tool if it's not already installed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33e54cbc", + "metadata": {}, + "outputs": [], + "source": [ + "# Install the dlt package with duckdb dependency\n", + "!pip install \"dlt[duckdb]\"" + ] + }, + { + "cell_type": "markdown", + "id": "3bc07810", + "metadata": {}, + "source": [ + "# 2. **Initialize a Pipeline**\n", + "\n", + "You can initialize a new DLT pipeline by specifying the source and destination. This will generate the necessary scripts for data loading.\n", + "\n", + "### Initialize a New Pipeline\n", + "\n", + "In this example, we’ll initialize a pipeline from a `pokemon` source to a `duckdb` destination." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "057d4613", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize a pipeline with a source and destination\n", + "%init --source_name pokemon --destination_name duckdb" + ] + }, + { + "cell_type": "markdown", + "id": "d31dde78", + "metadata": {}, + "source": [ + "# 3. **Sync a Pipeline**\n", + "\n", + "After initializing a pipeline, you can run a sync operation to load data from the source to the destination.\n", + "\n", + "### Sync the Pipeline\n", + "\n", + "Use the `sync` operation to load data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de982c19", + "metadata": {}, + "outputs": [], + "source": [ + "%pipeline --operation sync --pipeline_name pokemon_duckdb" + ] + }, + { + "cell_type": "markdown", + "id": "b765e8eb", + "metadata": {}, + "source": [ + "# 4. **Manage Pipelines**\n", + "\n", + "You can list all available pipelines using the %pipeline magic command with the list-pipelines operation.\n", + "\n", + "### **List Available Pipelines**\n", + "\n", + "You can see all available pipelines by running the following command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e8c5c51", + "metadata": {}, + "outputs": [], + "source": [ + "# Magic command to list pipelines\n", + "%pipeline --operation list-pipelines" + ] + }, + { + "cell_type": "markdown", + "id": "8ca56adc", + "metadata": {}, + "source": [ + "### **Pipeline Information**\n", + "\n", + "To get detailed information on a specific pipeline, use the info operation, specifying the pipeline name." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d3b9611", + "metadata": {}, + "outputs": [], + "source": [ + "%pipeline --operation info --pipeline_name pokemon" + ] + }, + { + "cell_type": "markdown", + "id": "7a6262bb", + "metadata": {}, + "source": [ + "# 5. **Managing Schemas**\n", + "\n", + "You can inspect, convert, or upgrade the schema used in the pipeline by specifying a schema file path.\n", + "\n", + "### Manage Schema\n", + "\n", + "To show the schema in JSON format:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a681652", + "metadata": {}, + "outputs": [], + "source": [ + "# Replace with the actual schema file path\n", + "%schema --file_path --format json" + ] + }, + { + "cell_type": "markdown", + "id": "6c0e564a", + "metadata": {}, + "source": [ + "# 6. **Check DLT Version**\n", + "\n", + "It's always good practice to check the version of the DLT tool in use.\n", + "\n", + "### Check DLT Version\n", + "\n", + "Ensure that you’re using the latest version of DLT." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c03a47e2", + "metadata": {}, + "outputs": [], + "source": [ + "# Check DLT version\n", + "%dlt_version" + ] + }, + { + "cell_type": "markdown", + "id": "6674c53b", + "metadata": {}, + "source": [ + "# 7. **Enable/Disable Telemetry**\n", + "\n", + "Control telemetry settings for your DLT operations.\n", + "\n", + "### Manage Telemetry\n", + "\n", + "You can enable or disable telemetry globally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6ea6a2a", + "metadata": {}, + "outputs": [], + "source": [ + "# Enable telemetry\n", + "%settings --enable-telemetry\n", + "\n", + "# Disable telemetry\n", + "%settings --disable-telemetry" + ] + }, + { + "cell_type": "markdown", + "id": "a3ac8da7", + "metadata": {}, + "source": [ + "# 8. **Additional Operations**\n", + "\n", + "You can explore other DLT pipeline operations like trace, failed-jobs, and drop-pending-packages.\n", + "\n", + "### Explore More Pipeline Operations\n", + "\n", + "Check out these additional operations for pipeline management." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f8ea828", + "metadata": {}, + "outputs": [], + "source": [ + "# Trace pipeline execution\n", + "%pipeline --operation trace --pipeline_name pokemon\n", + "\n", + "# Check for failed jobs\n", + "%pipeline --operation failed-jobs --pipeline_name pokemon\n", + "\n", + "# Drop pending packages\n", + "%pipeline --operation drop-pending-packages --pipeline_name pokemon" + ] + }, + { + "cell_type": "markdown", + "id": "fff4dbee", + "metadata": {}, + "source": [ + "## 🎉 **Finish!** _🎉_\n", + "\n", + "By using the magic commands %pipeline, %init, %schema, and others, we've streamlined the DLT pipeline management process within a Colab notebook." + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all", + "main_language": "python", + "notebook_metadata_filter": "-all" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}