diff --git a/Makefile b/Makefile index 4a786ed528..faf970708e 100644 --- a/Makefile +++ b/Makefile @@ -65,6 +65,7 @@ lint-and-test-snippets: poetry run mypy --config-file mypy.ini docs/website docs/examples docs/tools --exclude docs/tools/lint_setup --exclude docs/website/docs_processed poetry run flake8 --max-line-length=200 docs/website docs/examples docs/tools cd docs/website/docs && poetry run pytest --ignore=node_modules + modal run docs/website/docs/walkthroughs/deploy_a_pipeline/deploy-with-modal-snippets.py lint-and-test-examples: cd docs/tools && poetry run python prepare_examples_tests.py @@ -72,7 +73,6 @@ lint-and-test-examples: poetry run mypy --config-file mypy.ini docs/examples cd docs/examples && poetry run pytest - test-examples: cd docs/examples && poetry run pytest @@ -107,7 +107,7 @@ test-build-images: build-library docker build -f deploy/dlt/Dockerfile.airflow --build-arg=COMMIT_SHA="$(shell git log -1 --pretty=%h)" --build-arg=IMAGE_VERSION="$(shell poetry version -s)" . # docker build -f deploy/dlt/Dockerfile --build-arg=COMMIT_SHA="$(shell git log -1 --pretty=%h)" --build-arg=IMAGE_VERSION="$(shell poetry version -s)" . -preprocess-docs: +preprocess-docs: # run docs preprocessing to run a few checks and ensure examples can be parsed cd docs/website && npm i && npm run preprocess-docs diff --git a/deploy/dlt/README.md b/deploy/dlt/README.md index 010a2d0f67..88b8f71a79 100644 --- a/deploy/dlt/README.md +++ b/deploy/dlt/README.md @@ -1 +1 @@ -Example `Dockerfile` that installs `dlt` package on an alpine linux image. For actual pipeline deployment please refer to [deploy a pipeline walkthrough](https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-github-actions) +Example `Dockerfile` that installs `dlt` package on an alpine linux image. For actual pipeline deployment please refer to [deploy a pipeline walkthrough](https://dlthub.com/docs/walkthroughs/deploy_a_pipeline/deploy-with-github-actions) diff --git a/dlt/cli/deploy_command.py b/dlt/cli/deploy_command.py index 88c132f5e2..36895c923c 100644 --- a/dlt/cli/deploy_command.py +++ b/dlt/cli/deploy_command.py @@ -26,9 +26,9 @@ from dlt.common.destination.reference import Destination REQUIREMENTS_GITHUB_ACTION = "requirements_github_action.txt" -DLT_DEPLOY_DOCS_URL = "https://dlthub.com/docs/walkthroughs/deploy-a-pipeline" +DLT_DEPLOY_DOCS_URL = "https://dlthub.com/docs/walkthroughs/deploy_a_pipeline" DLT_AIRFLOW_GCP_DOCS_URL = ( - "https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer" + "https://dlthub.com/docs/walkthroughs/deploy_a_pipeline/deploy-with-airflow-composer" ) AIRFLOW_GETTING_STARTED = "https://airflow.apache.org/docs/apache-airflow/stable/start.html" AIRFLOW_DAG_TEMPLATE_SCRIPT = "dag_template.py" diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md index bdc440630d..d216fbd1f4 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/usage.md @@ -98,11 +98,11 @@ Examples: ## Deploying the sql_database pipeline -You can deploy the `sql_database` pipeline with any of the `dlt` deployment methods, such as [GitHub Actions](../../../walkthroughs/deploy-a-pipeline/deploy-with-github-actions), [Airflow](../../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer), [Dagster](../../../walkthroughs/deploy-a-pipeline/deploy-with-dagster), etc. See [here](../../../walkthroughs/deploy-a-pipeline) for a full list of deployment methods. +You can deploy the `sql_database` pipeline with any of the `dlt` deployment methods, such as [GitHub Actions](../../../walkthroughs/deploy_a_pipeline/deploy-with-github-actions), [Airflow](../../../walkthroughs/deploy_a_pipeline/deploy-with-airflow-composer), [Dagster](../../../walkthroughs/deploy_a_pipeline/deploy-with-dagster), etc. See [here](../../../walkthroughs/deploy_a_pipeline) for a full list of deployment methods. ### Running on Airflow When running on Airflow: -1. Use the `dlt` [Airflow Helper](../../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) to create tasks from the `sql_database` source. (If you want to run table extraction in parallel, you can do this by setting `decompose = "parallel-isolated"` when doing the source->DAG conversion. See [here](../../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-modify-dag-file) for a code example.) +1. Use the `dlt` [Airflow Helper](../../../walkthroughs/deploy_a_pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) to create tasks from the `sql_database` source. (If you want to run table extraction in parallel, you can do this by setting `decompose = "parallel-isolated"` when doing the source->DAG conversion. See [here](../../../walkthroughs/deploy_a_pipeline/deploy-with-airflow-composer#2-modify-dag-file) for a code example.) 2. Reflect tables at runtime with the `defer_table_reflect` argument. 3. Set `allow_external_schedulers` to load data using [Airflow intervals](../../../general-usage/incremental-loading.md#using-airflow-schedule-for-backfill-and-incremental-loading). diff --git a/docs/website/docs/general-usage/incremental-loading.md b/docs/website/docs/general-usage/incremental-loading.md index c8f92cf154..598b75be59 100644 --- a/docs/website/docs/general-usage/incremental-loading.md +++ b/docs/website/docs/general-usage/incremental-loading.md @@ -854,7 +854,7 @@ Here we call **get_resource(endpoint)** and that creates an un-evaluated generat ### Using Airflow schedule for backfill and incremental loading -When [running an Airflow task](../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md#2-modify-dag-file), you can opt-in your resource to get the `initial_value`/`start_value` and `end_value` from the Airflow schedule associated with your DAG. Let's assume that the **Zendesk tickets** resource contains a year of data with thousands of tickets. We want to backfill the last year of data week by week and then continue with incremental loading daily. +When [running an Airflow task](../walkthroughs/deploy_a_pipeline/deploy-with-airflow-composer.md#2-modify-dag-file), you can opt-in your resource to get the `initial_value`/`start_value` and `end_value` from the Airflow schedule associated with your DAG. Let's assume that the **Zendesk tickets** resource contains a year of data with thousands of tickets. We want to backfill the last year of data week by week and then continue with incremental loading daily. ```py @dlt.resource(primary_key="id") diff --git a/docs/website/docs/general-usage/source.md b/docs/website/docs/general-usage/source.md index f91eca58de..5cc342f28e 100644 --- a/docs/website/docs/general-usage/source.md +++ b/docs/website/docs/general-usage/source.md @@ -50,7 +50,7 @@ used when loading the source. Do not extract data in the source function. Leave that task to your resources if possible. The source function is executed immediately when called (contrary to resources which delay execution - like Python generators). There are several benefits (error handling, execution metrics, parallelization) you get when you extract data in `pipeline.run` or `pipeline.extract`. -If this is impractical (for example, you want to reflect a database to create resources for tables), make sure you do not call the source function too often. [See this note if you plan to deploy on Airflow](../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) +If this is impractical (for example, you want to reflect a database to create resources for tables), make sure you do not call the source function too often. [See this note if you plan to deploy on Airflow](../walkthroughs/deploy_a_pipeline/deploy-with-airflow-composer.md#2-modify-dag-file) ## Customize sources diff --git a/docs/website/docs/intro.md b/docs/website/docs/intro.md index 650c47920b..a7200f49d4 100644 --- a/docs/website/docs/intro.md +++ b/docs/website/docs/intro.md @@ -18,7 +18,7 @@ dlt is designed to be easy to use, flexible, and scalable: - dlt infers [schemas](./general-usage/schema) and [data types](./general-usage/schema/#data-types), [normalizes the data](./general-usage/schema/#data-normalizer), and handles nested data structures. - dlt supports a variety of [popular destinations](./dlt-ecosystem/destinations/) and has an interface to add [custom destinations](./dlt-ecosystem/destinations/destination) to create reverse ETL pipelines. -- dlt can be deployed anywhere Python runs, be it on [Airflow](./walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer), [serverless functions](./walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-functions), or any other cloud deployment of your choice. +- dlt can be deployed anywhere Python runs, be it on [Airflow](./walkthroughs/deploy_a_pipeline/deploy-with-airflow-composer), [serverless functions](./walkthroughs/deploy_a_pipeline/deploy-with-google-cloud-functions), or any other cloud deployment of your choice. - dlt automates pipeline maintenance with [schema evolution](./general-usage/schema-evolution) and [schema and data contracts](./general-usage/schema-contracts). To get started with dlt, install the library using pip: diff --git a/docs/website/docs/reference/command-line-interface.md b/docs/website/docs/reference/command-line-interface.md index e29b43bcba..6c8c12670e 100644 --- a/docs/website/docs/reference/command-line-interface.md +++ b/docs/website/docs/reference/command-line-interface.md @@ -56,7 +56,7 @@ schedule into quotation marks as in the example above. For the chess.com API example above, you could deploy it with `dlt deploy chess.py github-action --schedule "*/30 * * * *"`. -Follow the guide on [how to deploy a pipeline with GitHub Actions](../walkthroughs/deploy-a-pipeline/deploy-with-github-actions) to learn more. +Follow the guide on [how to deploy a pipeline with GitHub Actions](../walkthroughs/deploy_a_pipeline/deploy-with-github-actions) to learn more. ### `airflow-composer` @@ -66,7 +66,7 @@ dlt deploy