Skip to content

Commit

Permalink
Merge pull request #123 from dbt-labs/feature/allow-glob-file-paths
Browse files Browse the repository at this point in the history
Add ability to read from multiple files
  • Loading branch information
b-per authored Feb 5, 2025
2 parents fb9ad4a + 4073b4a commit d588486
Show file tree
Hide file tree
Showing 13 changed files with 670 additions and 317 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ The CLI comes with a few different commands

#### `validate`

Command: `dbt-jobs-as-code validate <config_file.yml>`
Command: `dbt-jobs-as-code validate <config_file_or_pattern.yml>`

Validates that the YAML file has the correct structure

Expand All @@ -58,48 +58,48 @@ Validates that the YAML file has the correct structure

#### `plan`

Command: `dbt-jobs-as-code plan <config_file.yml>`
Command: `dbt-jobs-as-code plan <config_file_or_pattern.yml>`

Returns the list of actions create/update/delete that are required to have dbt Cloud reflecting the configuration file

- this command doesn't modify the dbt Cloud jobs
- this command can be restricted to specific projects and environments
- it accepts a list of project IDs or environments IDs to limit the command for: `dbt-jobs-as-code plan <config_file.yml> -p 1234 -p 2345 -e 4567 -e 5678`
- it accepts a list of project IDs or environments IDs to limit the command for: `dbt-jobs-as-code plan <config_file_or_pattern.yml> -p 1234 -p 2345 -e 4567 -e 5678`
- it is possible to limit for specific projects and/or specific environments
- when both projects and environments are provided, the command will run for the jobs that are both part of the environment ID(s) and the project ID(s) provided
- or it accepts the flag `--limit-projects-envs-to-yml` to only check jobs that are in the projects and environments listed in the jobs YAML file
- it supports templating the jobs YAML file (see [templating](#templating-jobs-yaml-file))

#### `sync`

Command: `dbt-jobs-as-code sync <config_file.yml>`
Command: `dbt-jobs-as-code sync <config_file_or_pattern.yml>`

Create/update/delete jobs and env vars overwrites in jobs to align dbt Cloud with the configuration file

- ⚠️ this command will modify your dbt Cloud jobs if the current configuration is different from the YAML file
- this command can be restricted to specific projects and environments
- it accepts a list of project IDs or environments IDs to limit the command for: `dbt-jobs-as-code sync <config_file.yml> -p 1234 -p 2345 -e 4567 -e 5678`
- it accepts a list of project IDs or environments IDs to limit the command for: `dbt-jobs-as-code sync <config_file_or_pattern.yml> -p 1234 -p 2345 -e 4567 -e 5678`
- it is possible to limit for specific projects and/or specific environments
environment ID(s) and the project ID(s) provided
- or it accepts the flag `--limit-projects-envs-to-yml` to only check jobs that are in the projects and environments listed in the jobs YAML file
- it supports templating the jobs YAML file (see [templating](#templating-jobs-yaml-file))

#### `import-jobs`

Command: `dbt-jobs-as-code import-jobs --config <config_file.yml>` or `dbt-jobs-as-code import-jobs --account-id <account-id>`
Command: `dbt-jobs-as-code import-jobs --config <config_file_or_pattern.yml>` or `dbt-jobs-as-code import-jobs --account-id <account-id>`

Queries dbt Cloud and provide the YAML definition for those jobs. It includes the env var overwrite at the job level if some have been defined

- it is possible to restrict the list of dbt Cloud Job IDs by adding `... -j 101 -j 123 -j 234`
- this command also accepts a list of project IDs or environments IDs to limit the command for: `dbt-jobs-as-code sync <config_file.yml> -p 1234 -p 2345 -e 4567 -e 5678`
- this command also accepts a list of project IDs or environments IDs to limit the command for: `dbt-jobs-as-code sync <config_file_or_pattern.yml> -p 1234 -p 2345 -e 4567 -e 5678`
- this command accepts a `--include-linked-id` parameter to allow linking the jobs in the YAML to existing jobs in dbt Cloud, by renaming those
- once the YAML has been retrieved, it is possible to copy/paste it in a local YAML file to create/update the local jobs definition.

Once the configuration is imported, it is possible to "link" existing jobs by using the `link` command explained below.

#### `link`

Command: `dbt-jobs-as-code link <config_file.yml>`
Command: `dbt-jobs-as-code link <config_file_or_pattern.yml>`

Links dbt Cloud jobs with the corresponding identifier from the YAML file by renaming the jobs, adding the `[[ ... ]]` part in the job name.

Expand All @@ -110,7 +110,7 @@ Accepts a `--dry-run` flag to see what jobs would be changed, without actually c

#### `unlink`

Command: `dbt-jobs-as-code unlink --config <config_file.yml>` or `dbt-jobs-as-code unlink --account-id <account-id>`
Command: `dbt-jobs-as-code unlink --config <config_file_or_pattern.yml>` or `dbt-jobs-as-code unlink --account-id <account-id>`

Unlinking jobs removes the `[[ ... ]]` part of the job name in dbt Cloud.

Expand Down
2 changes: 2 additions & 0 deletions example_cicd/prod_plan_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ jobs:

- name: Run dbt-jobs-as-code
run: dbt-jobs-as-code plan jobs/jobs.yml --vars-yml jobs/vars_prod.yml --limit-projects-envs-to-yml
# or using a file pattern
# run: dbt-jobs-as-code plan jobs/**.yml --vars-yml jobs/*_prod.yml --limit-projects-envs-to-yml
env:
DBT_API_KEY: "${{secrets.DBT_API_KEY}}"
# DBT_BASE_URL is optional
Expand Down
23 changes: 18 additions & 5 deletions src/dbt_jobs_as_code/cloud_yaml_mapping/change_set.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
import glob
import os
import string
from collections import Counter

from beartype import BeartypeConf, BeartypeStrategy, beartype
from beartype.typing import Callable, List
from loguru import logger
from pydantic import BaseModel, RootModel
from pydantic import BaseModel
from rich.table import Table

from dbt_jobs_as_code.client import DBTCloud, DBTCloudException
from dbt_jobs_as_code.loader.load import load_job_configuration
from dbt_jobs_as_code.loader.load import LoadingJobsYAMLError, load_job_configuration
from dbt_jobs_as_code.schemas import check_env_var_same, check_job_mapping_same
from dbt_jobs_as_code.schemas.job import JobDefinition

Expand Down Expand Up @@ -137,8 +138,8 @@ def _check_single_account_id(defined_jobs: List[JobDefinition]):


def build_change_set(
config,
yml_vars,
config: str,
yml_vars: str,
disable_ssl_verification: bool,
project_ids: List[int],
environment_ids: List[int],
Expand All @@ -149,7 +150,19 @@ def build_change_set(
CONFIG is the path to your jobs.yml config file.
"""
configuration = load_job_configuration(config, yml_vars)
# Get list of files matching the glob pattern
config_files = glob.glob(config)
if not config_files:
logger.error(f"No files found matching pattern: {config}")
return ChangeSet()

yml_vars_files = glob.glob(yml_vars) if yml_vars else None

try:
configuration = load_job_configuration(config_files, yml_vars_files)
except (LoadingJobsYAMLError, KeyError) as e:
logger.error(f"Error loading jobs YAML file ({type(e).__name__}): {e}")
exit(1)

if limit_projects_envs_to_yml:
# if limit_projects_envs_to_yml is True, we keep all the YML jobs
Expand Down
8 changes: 4 additions & 4 deletions src/dbt_jobs_as_code/importer/__init__.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
from beartype.typing import List, Optional, TextIO
from beartype.typing import List, Optional
from loguru import logger

from dbt_jobs_as_code.client import DBTCloud
from dbt_jobs_as_code.loader.load import load_job_configuration
from dbt_jobs_as_code.schemas.job import JobDefinition


def get_account_id(config_file: Optional[TextIO], account_id: Optional[int]) -> int:
def get_account_id(config_files: Optional[List[str]], account_id: Optional[int]) -> int:
"""Get account ID from either config file or direct input"""
if account_id:
return account_id
elif config_file:
defined_jobs = load_job_configuration(config_file, None).jobs.values()
elif config_files:
defined_jobs = load_job_configuration(config_files, None).jobs.values()
return list(defined_jobs)[0].account_id
else:
raise ValueError("Either config or account_id must be provided")
Expand Down
134 changes: 107 additions & 27 deletions src/dbt_jobs_as_code/loader/load.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import glob

import yaml
from beartype.typing import Optional, Set, TextIO
from beartype.typing import List, Optional, Set
from jinja2 import Environment, StrictUndefined, meta
from jinja2.exceptions import UndefinedError
from loguru import logger
Expand All @@ -11,15 +13,15 @@ class LoadingJobsYAMLError(Exception):
pass


def load_job_configuration(config_file: TextIO, vars_file: Optional[TextIO]) -> Config:
def load_job_configuration(config_files: List[str], vars_file: Optional[List[str]]) -> Config:
"""Load the job configuration set in a YAML file into a Config object
Can be a non-templated YAML or a templated one for which we need to replace Jinja values
"""
if vars_file:
config = _load_yaml_with_template(config_file, vars_file)
config = _load_yaml_with_template(config_files, vars_file)
else:
config = _load_yaml_no_template(config_file)
config = _load_yaml_no_template(config_files)

if not config["jobs"]:
return Config(jobs={})
Expand All @@ -42,38 +44,116 @@ def load_job_configuration(config_file: TextIO, vars_file: Optional[TextIO]) ->
return Config(**config)


def _load_yaml_no_template(config_file: TextIO) -> dict:
def _load_yaml_no_template(config_files: List[str]) -> dict:
"""Load a job YAML file into a Config object"""
config_string = config_file.read()

jinja_vars = _get_jinja_variables(config_string)
if jinja_vars:
raise LoadingJobsYAMLError(
f"This is a templated YAML file. Please remove the variables {jinja_vars} or provide the variables values."
)

return yaml.safe_load(config_string)


def _load_yaml_with_template(config_file: TextIO, vars_file: TextIO) -> dict:
combined_config = {}
for config_file in config_files:
with open(config_file) as f:
config_string = f.read()

jinja_vars = _get_jinja_variables(config_string)
if jinja_vars:
raise LoadingJobsYAMLError(
f"{config_file} is a templated YAML file. Please remove the variables {jinja_vars} or provide the variables values."
)

config = yaml.safe_load(config_string)
if config:
# Merge the jobs from each file into combined_config
if "jobs" in config:
if "jobs" not in combined_config:
combined_config["jobs"] = {}
combined_config["jobs"].update(config["jobs"])
# Merge any other top-level keys
for key, value in config.items():
if key != "jobs":
combined_config[key] = value

return combined_config


def _load_yaml_with_template(config_files: List[str], vars_file: List[str]) -> dict:
"""Load a job YAML file into a Config object"""
template_vars_values = yaml.safe_load(vars_file)
config_string_unrendered = config_file.read()

# Load and merge vars files
template_vars_values = {}
for vars_path in vars_file:
with open(vars_path) as f:
# we load the vars. if there is no data in them we set it to an empty dict
vars_data = yaml.safe_load(f) or {}
# Check for duplicate variables
for key in vars_data:
if key in template_vars_values:
raise LoadingJobsYAMLError(
f"Variable '{key}' is defined multiple times in vars files"
)
template_vars_values.update(vars_data)

# Load and combine config files
combined_config = {}
env = Environment(undefined=StrictUndefined)
template = env.from_string(config_string_unrendered)

try:
config_string_rendered = template.render(template_vars_values)
except UndefinedError as e:
print(f"Error: {e}") # This will raise an error
raise LoadingJobsYAMLError(f"Some variables didn't have a value: {e.message}.") from e

return yaml.safe_load(config_string_rendered)
for config_path in config_files:
with open(config_path) as f:
config_string_unrendered = f.read()
template = env.from_string(config_string_unrendered)

try:
config_string_rendered = template.render(template_vars_values)
except UndefinedError as e:
raise LoadingJobsYAMLError(
f"Some variables didn't have a value: {e.message}."
) from e

config = yaml.safe_load(config_string_rendered)
if config:
# Merge the jobs from each file
if "jobs" in config:
if "jobs" not in combined_config:
combined_config["jobs"] = {}
combined_config["jobs"].update(config["jobs"])
# Merge any other top-level keys
for key, value in config.items():
if key != "jobs":
combined_config[key] = value

return combined_config


def _get_jinja_variables(input: str) -> Set[str]:
"""Get the variables from a Jinja template"""
env = Environment()
parsed_input = env.parse(input)
return meta.find_undeclared_variables(parsed_input)


def resolve_file_paths(
config_pattern: Optional[str], vars_pattern: Optional[str] = None
) -> tuple[List[str], List[str]]:
"""
Resolve glob patterns to lists of file paths.
Args:
config_pattern: Glob pattern for config files
vars_pattern: Optional glob pattern for vars files
Returns:
Tuple of (config_files, vars_files)
Raises:
LoadingJobsYAMLError: If no files match a pattern when pattern is provided
"""
if not config_pattern:
return [], []

config_files = glob.glob(config_pattern)
if not config_files:
raise LoadingJobsYAMLError(f"No files found matching pattern: {config_pattern}")

vars_files = []
if vars_pattern:
vars_files = glob.glob(vars_pattern)
if not vars_files:
raise LoadingJobsYAMLError(f"No files found matching pattern: {vars_pattern}")

return config_files, vars_files
Loading

0 comments on commit d588486

Please sign in to comment.