Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airbyte-ci: Add local cdk support #30461

Merged
merged 8 commits into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 4 additions & 10 deletions airbyte-cdk/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,27 +107,21 @@ pip install -e .
You should see that `pip` has uninstalled the version of `airbyte-cdk` defined by your connector's `setup.py` and installed your local CDK. Any changes you make will be immediately reflected in your editor, so long as your editor's interpreter is set to your connector's virtual environment.

##### Building a Python connector in Docker with your local CDK installed
_Pre-requisite: Install the [`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_

You can build your connector image with the local CDK using
```bash
# from the airbytehq/airbyte base directory
CONNECTOR_TAG=<TAG_NAME> CONNECTOR_NAME=<CONNECTOR_NAME> sh airbyte-integrations/scripts/build-connector-image-with-local-cdk.sh
airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build
```
Note that the local CDK is injected at build time, so if you make changes, you will have to run the build command again to see them reflected.

##### Running Connector Acceptance Tests for a single connector in Docker with your local CDK installed
_Pre-requisite: Install the [`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_

To run acceptance tests for a single connectors using the local CDK, from the connector directory, run
```bash
LOCAL_CDK=1 sh acceptance-test-docker.sh
```
To additionally fetch secrets required by CATs, set the `FETCH_SECRETS` environment variable. This requires you to have a Google Service Account, and the GCP_GSM_CREDENTIALS environment variable to be set, per the instructions [here](https://github.com/airbytehq/airbyte/tree/master/airbyte-ci/connectors/ci_credentials).

##### Running Connector Acceptance Tests for multiple connectors in Docker with your local CDK installed

To run acceptance tests for multiple connectors using the local CDK, from the root of the `airbyte` repo, run
```bash
./airbyte-cdk/python/bin/run-cats-with-local-cdk.sh -c <connector1>,<connector2>,...
airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
```

#### When you don't have access to the API
Expand Down
6 changes: 4 additions & 2 deletions airbyte-ci/connectors/pipelines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,10 @@ Available commands:
| `--use-remote-secrets` | False | True | If True, connectors configuration will be pulled from Google Secret Manager. Requires the GCP_GSM_CREDENTIALS environment variable to be set with a service account with permission to read GSM secrets. If False the connector configuration will be read from the local connector `secrets` folder. |
| `--name` | True | | Select a specific connector for which the pipeline will run. Can be used multiple time to select multiple connectors. The expected name is the connector technical name. e.g. `source-pokeapi` |
| `--support-level` | True | | Select connectors with a specific support level: `community`, `certified`. Can be used multiple times to select multiple support levels. |
| `--metadata-query` | False | | Filter connectors by the `data` field in the metadata file using a [simpleeval](https://github.com/danthedeckie/simpleeval) query. e.g. 'data.ab_internal.ql == 200' |
| `--metadata-query` | False | | Filter connectors by the `data` field in the metadata file using a [simpleeval](https://github.com/danthedeckie/simpleeval) query. e.g. 'data.ab_internal.ql == 200' |
| `--use-local-cdk` | False | False | Build with the airbyte-cdk from the local repository. " "This is useful for testing changes to the CDK. |
| `--language` | True | | Select connectors with a specific language: `python`, `low-code`, `java`. Can be used multiple times to select multiple languages. |
| `--modified` | False | False | Run the pipeline on only the modified connectors on the branch or previous commit (depends on the pipeline implementation). |
| `--modified` | False | False | Run the pipeline on only the modified connectors on the branch or previous commit (depends on the pipeline implementation). |
| `--concurrency` | False | 5 | Control the number of connector pipelines that can run in parallel. Useful to speed up pipelines or control their resource usage. |
| `--metadata-change-only/--not-metadata-change-only` | False | `--not-metadata-change-only` | Only run the pipeline on connectors with changes on their metadata.yaml file. |
| `--enable-dependency-scanning / --disable-dependency-scanning` | False | ` --disable-dependency-scanning` | When enabled the dependency scanning will be performed to detect the connectors to select according to a dependency change. |
Expand Down Expand Up @@ -406,6 +407,7 @@ This command runs the Python tests for a airbyte-ci poetry package.
## Changelog
| Version | PR | Description |
|---------| --------------------------------------------------------- |-----------------------------------------------------------------------------------------------------------|
| 1.3.0 | [#30461](https://github.com/airbytehq/airbyte/pull/30461) | Add `--use-local-cdk` flag to all connectors commands |
| 1.2.3 | [#30477](https://github.com/airbytehq/airbyte/pull/30477) | Fix a test regression introduced the previous version. |
| 1.2.2 | [#30438](https://github.com/airbytehq/airbyte/pull/30438) | Add workaround to always stream logs properly with --is-local. |
| 1.2.1 | [#30384](https://github.com/airbytehq/airbyte/pull/30384) | Java connector test performance fixes. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,25 @@ def with_python_connector_source(context: ConnectorContext) -> Container:
return with_python_package(context, testing_environment, connector_source_path)


async def apply_python_development_overrides(context: ConnectorContext, connector_container: Container) -> Container:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see an unit test on this. I'll implement one when I'll port it on top of #30474

# Run the connector using the local cdk if flag is set
if context.use_local_cdk:
context.logger.info("Using local CDK")
# mount the local cdk
path_to_cdk = "airbyte-cdk/python/"
directory_to_mount = context.get_repo_dir(path_to_cdk)

context.logger.info(f"Mounting {directory_to_mount}")

# Install the airbyte-cdk package from the local directory
# We use --no-deps to avoid conflicts with the airbyte-cdk version required by the connector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this prevent us from being able to test the local CDK if it has any new/updated dependencies?

For new dependencies, this will throw an exception, but for updates it'll be undefined (i.e. we won't necessarily be testing the code that'll be used in production).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll great question! and you are exactly right it will ignore sub-dependency conflicts that may crop up.

e.g. if airbyte-cdk needs pandas v2.0.0 but source-s3 need pandas v1.2.3 you wont see the error until you introduce the PR that explicitly bumps source-s3's airbytec-dk version.

The alternative is that we do not include --no-deps and to test local cdk changes a cdk developer would have to

  1. bump the library version number in the pyproject.toml for airbyte-cdk
  2. Ensure/bump the airbyte-cdk library version in the pyproject.toml for a connector you would like to test with to match
  3. Run airbyte-ci --use-local-cdk --name=source-s3 test

The reason I did not choose this route was because

  1. Its an extra step / commit to test different connectors (you have to update their dependency file)
  2. I assumed that the PR that released a new cdk version would not be the same PR that bumped a connector to use the new version. That assumption let me believe the dependency conflicts would be found later in the second PR since you would be running test without --use-local-cdk.

@clnoll more than happy to change the approach, what are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks! I want to take a closer look at the code and understand why we’re stuck with this tradeoff between --no-deps and extra steps, given that the original implementation of the local cdk flag didn’t have that tradeoff. This isn’t a super common scenario but it’s common enough that it doesn’t feel quite right to me to skip the deps.

I hear you re the fact that conflicts would be caught before release, but I think that rooting out the issue early is important. Correct me if I'm wrong, but with the current implementation, to fix a dependency conflict I believe we’d have to toggle back and forth between updating the CDK and updating the connector until we settle on a version that agrees with both. Some conflicts could be resolved by doing that locally, but sometimes there are issues that will only crop up in certain development environments. i.e. you’ll see them in CI but not during local development, especially if it involves something like numpy or pandas that has a C dependency - and conflicts related to those libraries are the biggest sources of these problems that I've seen.

Copy link
Contributor Author

@bnchrch bnchrch Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to take a closer look at the code and understand why we’re stuck with this tradeoff between --no-deps and extra steps

Ah whoops forgot to put a link to part of the why:
jazzband/pip-tools#215

the original implementation of the local cdk flag didn’t have that tradeoff

I can answer that!

So the original implementation used sed to edit the setup.py file using find and replace.

This approach is both brittle and a victim of the python ecosystem where there are approximately 5 "standard" ways in 2023 to define your dependencies

  1. Using requirements.txt
  2. Using setup.py
  3. using setup.py + requirements.txt
  4. Using pyproject.toml
  5. Using pyproject.toml + poetry.lock

Were currently planning on forcing python connectors to all move to the "pyproject.toml + poetry.lock" method but thats still loose plans and today were stuck with 5 possible ways that the airbyte-cdk dependency is declared.

To continue to use the sed approach we would have to have parsing and local override code for all 4 of these files.

However, because they all end up using pip at the end of the day we chose to

  1. Let the connector install using any of the above methods (for now)
  2. Call pip install path/to/airbyte-cdk/python after the fact.

As the trade off is to either

  1. Ignore connector vs airbyte-cdk dependencies conflict during cdk testing
  2. Or, require you to update a connectors dependency file when testing the cdk

Correct me if I'm wrong, but with the current implementation, to fix a dependency conflict I believe we’d have to toggle back and forth between updating the CDK and updating the connector until we settle on a version that agrees with both

🤔 I dont think you'd have to toggle back and forth.

Let me explain

Lets assume when your updating a connector to use a new version of the cdk you have this goal in mind

  • Upgrade to the latest possible version of cdk that wont require a massive refactor

WIth this you'll likely target latest cdk.

  • If theres a dependency conflict:
    • you will look at upgrading your own depenencies first
  • If you cant upgrade your dependencies (You need an old version of pandas or something):
    • you will likely walk back a version of the airbyte-cdk and try again.

I dont think a connector dev would be messing with cdk dependency versions
and vice versa, I dont think a cdk dev would be messing with a connector version (if theres a major conflict, try testing with another connector)

@clnoll Is that a fair way to think about it?

but sometimes there are issues that will only crop up in certain development environments. i.e. you’ll see them in CI but not during local development, especially if it involves something like numpy or pandas that has a C dependency - and conflicts related to those libraries are the biggest sources of these problems that I've seen.

Were hoping this method solves this very issue where airbyte-ci connectors build, airbyte-ci connectors test, and airbyte-ci connectors publish both use the same, repeatable (dockerized) environment both on your machine and in CI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the details, @bnchrch!

I agree with your description of how a connector dev would be thinking about this situation. Re the CDK dev perspective - we do test changes on existing connectors and I could see us wanting to make sure that a package upgrade was appropriate for specific connectors, particularly if it's a popular connector, or if a CDK update was made with solving problems specific to those connectors.

But, if we're confident that airbyte-ci is reproducing the prod environment I think we're good to just deal with this scenario locally.

connector_container = connector_container.with_mounted_directory(f"/{path_to_cdk}", directory_to_mount).with_exec(
["pip", "install", "--no-deps", f"/{path_to_cdk}"], skip_entrypoint=True
)

return connector_container


async def with_python_connector_installed(context: ConnectorContext) -> Container:
"""Install an airbyte connector python package in a testing environment.

Expand All @@ -390,10 +409,14 @@ async def with_python_connector_installed(context: ConnectorContext) -> Containe
".dockerignore",
]
]
return await with_installed_python_package(
container = await with_installed_python_package(
context, testing_environment, connector_source_path, additional_dependency_groups=["dev", "tests", "main"], exclude=exclude
)

container = await apply_python_development_overrides(context, container)

return container


async def with_ci_credentials(context: PipelineContext, gsm_secret: Secret) -> Container:
"""Install the ci_credentials package in a python environment.
Expand Down Expand Up @@ -858,11 +881,15 @@ async def with_airbyte_java_connector(context: ConnectorContext, connector_java_

async def get_cdk_version_from_python_connector(python_connector: Container) -> Optional[str]:
pip_freeze_stdout = await python_connector.with_entrypoint("pip").with_exec(["freeze"]).stdout()
pip_dependencies = [dep.split("==") for dep in pip_freeze_stdout.split("\n")]
for package_name, package_version in pip_dependencies:
if package_name == "airbyte-cdk":
return package_version
return None
cdk_dependency_line = next((line for line in pip_freeze_stdout.split("\n") if "airbyte-cdk" in line), None)
if not cdk_dependency_line:
return None

if "file://" in cdk_dependency_line:
return "LOCAL"

_, cdk_version = cdk_dependency_line.split("==")
return cdk_version


async def with_airbyte_python_connector(context: ConnectorContext, build_platform: Platform) -> Container:
Expand All @@ -876,8 +903,12 @@ async def with_airbyte_python_connector(context: ConnectorContext, build_platfor
.build(await context.get_connector_dir())
.with_label("io.airbyte.name", context.metadata["dockerRepository"])
)

connector_container = await apply_python_development_overrides(context, connector_container)

cdk_version = await get_cdk_version_from_python_connector(connector_container)
if cdk_version:
context.logger.info(f"Connector has a cdk dependency, using cdk version {cdk_version}")
connector_container = connector_container.with_label("io.airbyte.cdk_version", cdk_version)
context.cdk_version = cdk_version
if not await connector_container.label("io.airbyte.version") == context.metadata["dockerImageTag"]:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,13 @@ def get_selected_connectors_with_modified_files(
default=False,
type=bool,
)
@click.option(
"--use-local-cdk",
is_flag=True,
help=("Build with the airbyte-cdk from the local repository. " "This is useful for testing changes to the CDK."),
default=False,
type=bool,
)
@click.pass_context
def connectors(
ctx: click.Context,
Expand All @@ -171,6 +178,7 @@ def connectors(
concurrency: int,
execute_timeout: int,
enable_dependency_scanning: bool,
use_local_cdk: bool,
):
"""Group all the connectors-ci command."""
validate_environment(ctx.obj["is_local"], use_remote_secrets)
Expand All @@ -179,6 +187,7 @@ def connectors(
ctx.obj["use_remote_secrets"] = use_remote_secrets
ctx.obj["concurrency"] = concurrency
ctx.obj["execute_timeout"] = execute_timeout
ctx.obj["use_local_cdk"] = use_local_cdk
ctx.obj["selected_connectors_with_modified_files"] = get_selected_connectors_with_modified_files(
names,
support_levels,
Expand Down Expand Up @@ -256,6 +265,7 @@ def test(
fail_fast=fail_fast,
fast_tests_only=fast_tests_only,
code_tests_only=code_tests_only,
use_local_cdk=ctx.obj.get("use_local_cdk"),
)
for connector in ctx.obj["selected_connectors_with_modified_files"]
]
Expand Down Expand Up @@ -304,6 +314,7 @@ def build(ctx: click.Context) -> bool:
pipeline_start_timestamp=ctx.obj.get("pipeline_start_timestamp"),
ci_context=ctx.obj.get("ci_context"),
ci_gcs_credentials=ctx.obj["ci_gcs_credentials"],
use_local_cdk=ctx.obj.get("use_local_cdk"),
)
for connector in ctx.obj["selected_connectors_with_modified_files"]
]
Expand Down
2 changes: 2 additions & 0 deletions airbyte-ci/connectors/pipelines/pipelines/contexts.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,7 @@ def __init__(
fail_fast: bool = False,
fast_tests_only: bool = False,
code_tests_only: bool = False,
use_local_cdk: bool = False,
):
"""Initialize a connector context.

Expand Down Expand Up @@ -355,6 +356,7 @@ def __init__(
self.fail_fast = fail_fast
self.fast_tests_only = fast_tests_only
self.code_tests_only = code_tests_only
self.use_local_cdk = use_local_cdk

super().__init__(
pipeline_name=pipeline_name,
Expand Down
1 change: 0 additions & 1 deletion airbyte-ci/connectors/pipelines/pipelines/gradle.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ def _get_gradle_command(self, task: str) -> List[str]:
)

async def _run(self) -> StepResult:

include = [
".root",
".env",
Expand Down
2 changes: 1 addition & 1 deletion airbyte-ci/connectors/pipelines/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "pipelines"
version = "1.2.3"
version = "1.3.0"
description = "Packaged maintained by the connector operations team to perform CI for connectors' pipelines"
authors = ["Airbyte <[email protected]>"]

Expand Down
Loading