-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Publish the response to process in the context of the record's transformation and filtering steps #193
base: main
Are you sure you want to change the base?
Conversation
and avoid presenting the empty JSON as an array of an empty object
…lic structures There is an internal Flatten fransformation - let it skip the service fields
processing other than dict types. Fixed the end page parsing, when it is empty, the token read defaults to {}
* local for the filter and transform steps * not visible to the next mapping and store steps (as they should be) * not visible in the tests beyond the test_record_selector (as they should be) This allows the record transformation and filtering logic to define its "local vcariables" in the format of service fields to reuse some interim calculations.
… the record selection (filteriung and transformation), as these fields became local for the steps above.
📝 WalkthroughWalkthroughThis pull request introduces comprehensive enhancements to the Airbyte CDK's declarative source components, focusing on improving error handling, record extraction, and service key management. The changes span multiple files across decoders, extractors, and transformations, introducing new utility functions for handling service keys, refining JSON parsing logic, and updating test suites to ensure robust functionality. Changes
Suggested labels
Suggested reviewers
Hey there! 👋 I noticed some interesting improvements in this PR. Would you like me to elaborate on any specific changes? The service key handling and error logging enhancements look particularly neat. Wdyt? The modifications seem to make the declarative source components more robust and flexible. Is there anything specific you'd like me to dive deeper into? 🕵️♀️ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (31)
airbyte_cdk/sources/declarative/extractors/record_extractor.py (4)
11-16
: Neat introduction of service key convention!
This clearly explains why a prefix is needed to segregate service keys. Would you consider adding a brief usage example in a docstring, wdyt?
22-26
: In-place removal of service keys
This function effectively mutates the original dictionary. Would you consider highlighting the mutating nature in the function name, for instanceremove_service_keys_in_place
, wdyt?
28-30
: Helper functionis_service_key
The prefix check is straightforward. One minor note: usingstartswith(SERVICE_KEY_PREFIX)
might be more expressive. wdyt?
32-33
: Assertion for presence of service keys
This ensures that at least one service key remains in the mapping, which is helpful for validating transformations. Would you consider returning a more descriptive error message to clarify which key is missing, wdyt?airbyte_cdk/sources/declarative/transformations/flatten_fields.py (2)
8-8
: Usage ofis_service_key
Nice import to preserve the logic in a single place. Would you consider re-exportingis_service_key
from a central file so other modules also import it consistently, wdyt?
36-44
: Preserving service keys
By skipping flattening for service keys, you efficiently maintain their structure. Could we add a comment to describe why we early-return for service keys, wdyt?unit_tests/sources/declarative/transformations/test_flatten_fields.py (1)
50-58
: Test preserving service key with a string value
This sequence effectively checks that the prefixed field remains unflattened. Maybe we can add an additional assertion verifying the shape of the resultant dictionary for completeness, wdyt?unit_tests/sources/declarative/extractors/test_record_extractor.py (2)
1-12
: New test file creation
Good to see a dedicated test suite for service key handling. Maybe consider adding docstrings describing the purpose behind each test function, wdyt?
30-44
: Testremove_service_keys
Great approach for ensuring in-place key removal. Possibly add a scenario with nested dictionaries to confirm behavior, wdyt?airbyte_cdk/sources/declarative/decoders/json_decoder.py (2)
38-41
: Graceful handling of JSON decode failure
CatchingJSONDecodeError
and yielding from an empty list is a smooth approach to avoid breaking the flow. Would you be open to adding a short comment clarifying that we skip all records if parsing fails, wdyt?
47-50
: Flexible handling of list vs. single object
Yielding directly from the object if it’s a list, or wrapping it into a list otherwise, streamlines the logic. Looks good. Would you consider adding a small docstring example clarifying the single-object vs. multiple-object scenario, wdyt?airbyte_cdk/sources/declarative/requesters/paginators/strategies/cursor_pagination_strategy.py (1)
65-65
: Fallback for decoder results
Usingnext(self.decoder.decode(response), {})
preventsStopIteration
if nothing is yielded, which is handy. Do we want to log a warning if this default kicks in, wdyt?airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (3)
19-21
: Establishing a consistent root key
DefiningRECORD_ROOT_KEY
withSERVICE_KEY_PREFIX + "root"
is a neat pattern. Any chance we might rename it to__root__
or something similarly distinctive, wdyt?
88-88
: Preserving the complete body as root
Storingbody
inroot_response
is a convenient handle for later usage. Perhaps we might add a comment explaining its role in subsequent transformations, wdyt?
98-100
: Adding root context to each record
Usingupdate_record(record, root_response)
for items in the extracted list is great for easy transformations. Would you consider a quick doc mention that each record now includes the$root
data, wdyt?unit_tests/sources/declarative/decoders/test_json_decoder.py (2)
15-18
: Model and factory imports
ImportingDeclarativeStream
andModelToComponentFactory
might be part of a broader test suite. Should we add a small integration test showcasing these, wdyt?
22-27
: Testing empty and non-empty JSON
The parameterization ensures coverage for empty and valid JSON, which is great. Maybe add a test case for malformed JSON to confirm we yield an empty list, wdyt?airbyte_cdk/sources/declarative/extractors/record_selector.py (1)
114-115
: Great approach filtering service keys before normalization!
By removing these keys early, you avoid persisting unnecessary or sensitive fields in subsequent transformations. Would you consider adding a small inline comment explaining that this step is purely for service key removal, so new contributors understand the rationale? Wdyt?unit_tests/sources/declarative/extractors/test_dpath_extractor.py (5)
45-46
: Well-structured nested array test coverage!
Testing with a single record containing nested entries is helpful. Wdyt about adding checks for uniquely typed nested fields?
47-51
: Extended nested structure test ensures correctness.
It’s great for verifying deeper nesting. Should we also test a scenario with partial missing nested fields? Wdyt?
58-63
: Valid approach for repeated testing of deeply nested arrays!
Everything is consistent with previous param sets. Would you like to verify any type conversions here, or is that out of scope? Wdyt?
66-71
: Check for single-field nested extraction is well-defined!
No issues here. Possibly consider negative tests (e.g., absent ‘id’ key). Wdyt?
169-170
: Ensuring nested extraction with JSONL lines is validated.
Are you comfortable with the coverage? Possibly add an invalid JSON line test? Wdyt?unit_tests/sources/declarative/extractors/test_record_selector.py (4)
13-16
: Addingexclude_service_keys
andverify_service_keys_exist
is consistent with the rest of the suite.
This promotes uniform logic. Wdyt about also testing default behavior (no service keys found)?
47-60
: New test for returning all records with nested data.
It’s well-defined. Would you consider adding a partial-nesting scenario for completeness? Wdyt?
62-75
: Testing{{True}}
filter with nested data.
Covers a universal pass scenario effectively. Possibly try a trivial fail scenario too?
166-168
: Applyingexclude_service_keys
after record selection is correct.
This ensures the final output is free of service fields. Maybe mention that in code comments, wdyt?unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
242-242
: New assertion checking final state for full refresh sync completion.
This is excellent for clarity. Consider adding a note about possible partial state merges? Wdyt?unit_tests/sources/declarative/test_manifest_declarative_source.py (1)
1764-1764
: Add a descriptive message in the assertion? Since this test checks if the discovered records match the expected ones, do you want to include a clarifying message in theassert
to help diagnose failures more quickly? For example,assert output_data == expected_records, f"Mismatch in {test_name}..."
. wdyt?unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py (2)
325-327
: Use a more descriptive variable name for clarity?
Currently,records_from_state
might benefit from a more explicit name that conveys its content and purpose, such asrecords_read_from_intermediate_state
. Wdyt?
Line range hint
331-333
: Consider an alternative deduplication approach for large data sets?
Here, converting records to a JSON string and using them as dictionary keys might be memory-intensive for large volumes of data. Would you consider using a more straightforward approach (e.g., hashing tuples of normalized content) to reduce overhead? Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
airbyte_cdk/sources/declarative/decoders/json_decoder.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
(3 hunks)airbyte_cdk/sources/declarative/extractors/record_extractor.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/record_selector.py
(3 hunks)airbyte_cdk/sources/declarative/requesters/paginators/strategies/cursor_pagination_strategy.py
(1 hunks)airbyte_cdk/sources/declarative/transformations/flatten_fields.py
(2 hunks)unit_tests/sources/declarative/decoders/test_json_decoder.py
(1 hunks)unit_tests/sources/declarative/extractors/test_dpath_extractor.py
(5 hunks)unit_tests/sources/declarative/extractors/test_record_extractor.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(5 hunks)unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py
(1 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(2 hunks)unit_tests/sources/declarative/test_manifest_declarative_source.py
(1 hunks)unit_tests/sources/declarative/transformations/test_flatten_fields.py
(2 hunks)
🔇 Additional comments (32)
airbyte_cdk/sources/declarative/extractors/record_extractor.py (1)
18-19
: Function exclude_service_keys
Great approach to returning a filtered mapping that excludes service keys. Would you consider clarifying in the docstring that this function doesn't mutate the original mapping, wdyt?
unit_tests/sources/declarative/transformations/test_flatten_fields.py (2)
7-7
: Import of SERVICE_KEY_PREFIX
Bringing in the constant aids test readability. Looks great!
59-67
: Set usage with service key
Admirable to test sets alongside lists. Everything looks good. Perhaps you could add a case with nested service keys in the future, wdyt?
unit_tests/sources/declarative/extractors/test_record_extractor.py (4)
15-28
: Test exclude_service_keys
Testing multiple scenarios is helpful. This thoroughness is great for coverage!
46-56
: Check presence of service keys
The parameterized approach covers a variety of input shapes. Looks comprehensive!
58-67
: Handling the absence of service keys
Clever usage of try/except to confirm the assertion is raised. No further issues here.
69-74
: Function test_service_field
and test_regular_field
Straightforward tests confirming the logic behind service and non-service fields. This is quite clear and concise!
airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (3)
12-15
: Imports for service keys
These additions ensure we can properly reference the SERVICE_KEY_PREFIX
. Everything looks in order. Feeling confident about it, wdyt?
23-30
: Utility function for record injection
update_record
effectively injects the root
data when record
is a dict. Would you consider logging or handling non-dict cases differently instead of passing them through unchanged, wdyt?
101-101
: Single object extraction
Similarly, returning the updated single object ensures consistency. LGTM. Maybe just confirm there's no performance overhead from the extra dictionary copy, wdyt?
unit_tests/sources/declarative/decoders/test_json_decoder.py (4)
11-12
: New imports for testing
Introducing YamlDeclarativeSource
and SyncMode
could indicate expanded scenarios. Are we using them in these tests, or do we plan to, wdyt?
30-33
: Verifying empty string yields empty list
This aligns with the new JsonDecoder
behavior. Kudos for thorough testing. Good to go!
40-41
: jsonl decoder edge cases
Checking an empty object returns [{}]
and an empty list returns [[]]
is interesting. Is that the desired behavior, or should empty lists remain empty, wdyt?
48-54
: Clearer parameter IDs
The new ids
for parameterized tests clarifies each scenario well. This is helpful in test logs. Good job!
airbyte_cdk/sources/declarative/extractors/record_selector.py (2)
11-14
: Imports from record_extractor
look good!
Nice job importing exclude_service_keys
here. This keeps validation logic centralized. Wdyt about also adding a short docstring to indicate why it’s being imported?
164-169
: Helper method _remove_service_keys
is concise and cohesive!
It clearly separates the logic for removing service keys from the main flow. The name is self-explanatory, which is great. Wdyt about possibly adding a quick docstring snippet describing usage?
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (13)
18-21
: LGTM on the new imports!
Bringing in exclude_service_keys
and verify_service_keys_exist
ensures consistent testing across your modules. Wdyt about clarifying in code comments how each function is used in these tests?
40-41
: Empty string handling test looks good!
It’s valuable to confirm no records are returned for empty input. Everything seems fine here.
52-57
: Multiple-record nested scenario coverage is clear!
No immediate issues. Maybe consider adding a note on expected behavior if partially malformed JSON arises?
64-65
: Single field extraction from root object tested.
This single-case approach is straightforward. Looks good to me!
72-77
: Similar patterned test for multiple nested fields.
Maintaining consistency is a good practice. Thanks for adding clarity.
78-83
: Handling multiple dictionaries at the same depth.
All good. The param definitions are descriptive.
84-94
: Thorough test with an even bigger nested structure.
Thanks for covering that scenario. Seems robust!
126-127
: JSONL extraction tests extended for empty data.
That ensures no unexpected defaults appear. Good coverage!
156-160
: Test names are descriptive!
Naming them like “test_extract_from_empty_string” is helpful. This improves maintainability.
161-164
: Continuing the pattern with param-based test IDs.
All consistent. This is a neat approach.
167-168
: Confirmation that JSONL extraction matches single array.
Looks great; no changes needed.
177-178
: Handles multiline JSONL with text fields.
Yes, this is crucial to ensure newlines are handled properly.
196-200
: Service key management verification is great!
This final step cements correctness. Encouraging thorough coverage.
unit_tests/sources/declarative/extractors/test_record_selector.py (1)
111-123
: Verifies use of $root
in filter conditions.
This is a valuable scenario, capturing references to the parent. Great job!
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
360-360
: Ensuring __ab_full_refresh_sync_complete
is set properly.
This line addresses the correct final state. LGTM!
unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py (1)
Line range hint 335-341
: Double-check your sorting logic for correctness?
Using a string serialization for sorting is perfectly acceptable in tests. However, if the order of keys changes or if floating-point precision is altered, it might trigger unexpected results. Did you consider any possible corner cases with sorting by JSON representation? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (1)
23-29
: Consider adding type hints for better code clarity, wdyt?The function looks good! Would you consider adding more specific type hints to improve code clarity? For example:
-def update_record(record: Any, root: Any) -> Any: +def update_record(record: Union[dict, Any], root: Union[dict, list, Any]) -> Union[dict, Any]:docs/CONTRIBUTING.md (1)
43-44
: Consider rephrasing to avoid repetition, wdyt?The instructions are clear, but there's repetition of "Run" at the start of consecutive lines. Maybe we could rephrase like this?
-Run `poetry run ruff format` to format your Python code. -Run `poetry run ruff check .` to report the not fixed issues. Fix them manually. +Use `poetry run ruff format` to format your Python code. +Then `poetry run ruff check .` to report any remaining issues. Fix these manually.🧰 Tools
🪛 LanguageTool
[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...f formatto format your Python code. - Run
poetry run ruff check .` to report the...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
docs/RELEASES.md (2)
52-52
: Consider creating a GitHub issue for the TODO.This TODO seems important for improving the release process visibility. Would you like me to create a GitHub issue to track this improvement?
60-64
: Consider improving the formatting of the instructions.The instructions are clear but could be more readable with better formatting. What do you think about this structure?
-Once the publish pipeline has completed, choose a connector to test. Set the base_image in the connector's metadata to your pre-release version in Dockerhub (make sure to update the SHA as well). -Next, build the pre-release image locally using `airbyte-ci connectors —name=<source> build`. -You can now run connector interfaces against the built image using the pattern `docker run airbyte/<source-name>:dev <spec/check/discover/read>`. -The connector's README should include a list of these commands, which can be copy/pasted and run from the connector's directory for quick testing against a local config. -You can also run `airbyte-ci connectors —name=<source> test` to run the CI test suite against the dev image. +1. Choose a connector to test after the publish pipeline completes +2. Update the connector's metadata: + - Set base_image to your pre-release version in Dockerhub + - Update the SHA accordingly +3. Build the pre-release image locally: + ```bash + airbyte-ci connectors —name=<source> build + ``` +4. Run connector interfaces against the built image: + ```bash + docker run airbyte/<source-name>:dev <spec/check/discover/read> + ``` + Note: The connector's README includes these commands for quick testing. +5. Optionally run the CI test suite: + ```bash + airbyte-ci connectors —name=<source> test + ```🧰 Tools
🪛 LanguageTool
[grammar] ~60-~60: Did you mean the noun “publishing”?
Context: ...ting Manifest-Only connectors Once the publish pipeline has completed, choose a connec...(PREPOSITION_VERB)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (6)
airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
(3 hunks)airbyte_cdk/sources/declarative/extractors/record_extractor.py
(1 hunks)docs/CONTRIBUTING.md
(1 hunks)docs/RELEASES.md
(2 hunks)package.json
(1 hunks)unit_tests/sources/declarative/extractors/test_record_extractor.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- package.json
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/extractors/record_extractor.py
- unit_tests/sources/declarative/extractors/test_record_extractor.py
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...f formatto format your Python code. - Run
poetry run ruff check .` to report the...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
docs/RELEASES.md
[uncategorized] ~56-~56: Possible missing article found.
Context: ...lly test changes against a dev image of SDM before committing to a release, first u...
(AI_HYDRA_LEO_MISSING_THE)
[grammar] ~60-~60: Did you mean the noun “publishing”?
Context: ...ting Manifest-Only connectors Once the publish pipeline has completed, choose a connec...
(PREPOSITION_VERB)
[grammar] ~68-~68: Did you mean the noun “publishing”?
Context: ...ng Low-Code Python connectors Once the publish pipeline has completed, set the version...
(PREPOSITION_VERB)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (2)
19-20
: LGTM! Clear and descriptive constant definition.
The constant follows the established naming convention and its purpose is well-documented in the comment.
Line range hint 88-101
: Verify the memory impact of storing the root response.
The changes look good! Just want to ensure we've considered the memory implications:
- The root response is now stored in each record
- For large responses, this could increase memory usage significantly
Also, would you consider adding a comment explaining why we're storing the root response? This would help future maintainers understand the design decision. 🤔
✅ Verification successful
Let me search for test cases and actual usage patterns to better understand the memory implications. The previous command might have failed because the path pattern wasn't specific enough.
Memory impact looks reasonable based on test cases 🌟
Looking at the test cases, I can see that the responses being handled are relatively small (mostly containing simple records with IDs and nested objects). The update_record
function creates a shallow copy of the dictionary and only stores a reference to the root response, not a deep copy.
A few observations that make me feel confident about the memory usage:
- Most test cases work with small JSON payloads
- The root response is stored by reference, not by copying
- The pattern is used consistently across all test cases without any reported issues
Would you consider adding a quick comment explaining that we're storing a reference to the root response for downstream processing? Something like:
# Store reference to root response to allow downstream processors access to the full context
root_response = body
wdyt? 🤔
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for typical response sizes in tests
rg -A 5 "def test.*extract_records" unit_tests/
Length of output: 49
Script:
#!/bin/bash
# Let's check the test files specifically for the DPath extractor
fd "test.*dpath.*\.py" unit_tests/ --exec cat {}
# Also check the actual usage of update_record to understand the memory implications
rg -B 3 -A 3 "update_record"
Length of output: 9434
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/extractors/record_extractor.py (3)
17-18
: Consider adding type hints for better code clarity, wdyt?The function looks good, but we could make it more explicit about the return type being a new mapping rather than modifying the input.
-def exclude_service_keys(mapping: Mapping[str, Any]) -> Mapping[str, Any]: +def exclude_service_keys(mapping: Mapping[str, Any]) -> dict[str, Any]:
21-24
: Consider adding return type annotation for consistency.The function modifies the dict in place, which is clear from the implementation. Would you consider adding a return type annotation of
None
to make this explicit?-def remove_service_keys(mapping: dict[str, Any]): # type: ignore[no-untyped-def] +def remove_service_keys(mapping: dict[str, Any]) -> None:
31-32
: Fix typo in assertion message.There's a small typo in the assertion message.
- assert mapping != exclude_service_keys(mapping), "Expected service are present" + assert mapping != exclude_service_keys(mapping), "Expected service keys are present"docs/CONTRIBUTING.md (1)
43-46
: Consider restructuring commands for better readability, wdyt?The commands are clear but start with repetitive "Run". Maybe we could make it more concise and readable?
-# Run `poetry run ruff format` to format your Python code. -# Run `poetry run ruff check --fix` to fix the formatting issues. -# Run `poetry run ruff check .` to report the not fixed issues. Fix them manually. -# Run `poetry run mypy --config-file mypy.ini airbyte_cdk` to validate the code. Resolve the reported issues. +Format and validate your code using these commands: + +1. `poetry run ruff format` - Format Python code +2. `poetry run ruff check --fix` - Auto-fix formatting issues +3. `poetry run ruff check .` - Report remaining issues (fix manually) +4. `poetry run mypy --config-file mypy.ini airbyte_cdk` - Validate type annotations🧰 Tools
🪛 LanguageTool
[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...f formatto format your Python code. - Run
poetry run ruff check --fix` to fix th...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... --fixto fix the formatting issues. - Run
poetry run ruff check .` to report the...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/extractors/record_extractor.py
(1 hunks)docs/CONTRIBUTING.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...f formatto format your Python code. - Run
poetry run ruff check --fix` to fix th...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... --fixto fix the formatting issues. - Run
poetry run ruff check .` to report the...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/extractors/record_extractor.py (1)
10-14
: LGTM! Clear documentation of service field conventions.
The documentation clearly explains the convention for service fields and their prefix. This aligns well with the PR objective of binding the response root object to records.
Minor edit to trigger the reevaluation of the integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/RELEASES.md (2)
60-61
: How about making the testing instructions more explicit? 🤔The instructions are good, but what do you think about these small improvements to make them even clearer?
- Replace angle brackets with concrete examples:
-airbyte-ci connectors —name=<source> build +airbyte-ci connectors —name=source-example build # Replace 'source-example' with your connector name
- Add a note about configuration requirements:
You can now run connector interfaces against the built image using the pattern `docker run airbyte/<source-name>:dev <spec/check/discover/read>`. +# Make sure you have a valid config.json file in your current directory for check/discover/read commands
wdyt? 🙂
Also applies to: 68-68
52-52
: Would you like me to help track this TODO? 🎯This is a good suggestion about improving version inspection without requiring prod DB access. Would you like me to create a GitHub issue to track this improvement?
I can help create an issue with:
- Title: "Improve SDM version inspection without prod DB access"
- Description detailing the current limitations and desired outcome
- Labels: documentation, enhancement
Let me know if you'd like me to proceed! 😊
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/RELEASES.md
(2 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/RELEASES.md
[uncategorized] ~56-~56: Possible missing article found.
Context: ...lly test changes against a dev image of SDM before committing to a release, first u...
(AI_HYDRA_LEO_MISSING_THE)
[grammar] ~68-~68: Did you mean the noun “publishing”?
Context: ...ng Low-Code Python connectors Once the publish pipeline has completed, set the version...
(PREPOSITION_VERB)
🔇 Additional comments (1)
docs/RELEASES.md (1)
14-14
: LGTM! Nice formatting improvement! 👍
The consistent formatting with italics makes the notes section more readable.
The acceptance test failures are random and unrelated to the actual change. |
Update CONTRIBUTING.md to clarify and make explicit the procedures and steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
docs/CONTRIBUTING.md (4)
17-18
: The Poetry installation instructions could be more specific, wdyt?Consider adding the recommended installation method for different operating systems to help new contributors get started quickly. For example:
1. Install Poetry: - macOS: `curl -sSL https://install.python-poetry.org | python3 -` - Windows: `(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -` - Linux: `curl -sSL https://install.python-poetry.org | python3 -`
26-28
: Consider restructuring the test commands section for better readability, wdyt?The commands are clear but could be organized better to avoid repetitive sentence starts. Here's a suggestion:
### Running Unit Tests Use these commands to run tests: - All tests: `poetry run pytest` - Specific tests: `poetry run pytest -k <suite or test name>` - Fast tests only (<5 min): `poetry run pytest-fast`🧰 Tools
🪛 LanguageTool
[style] ~28-~28: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests which ar...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
59-74
: Unify the list style in the assumptions and steps sectionsThe static analysis tool flagged inconsistent list markers (asterisks vs. dashes). Consider using dashes consistently throughout the document for better maintainability:
**Assumptions:** -* The connector to test with is in the root [Airbyte project](https://github.com/airbytehq/airbyte). -* The [Airbyte project](https://github.com/airbytehq/airbyte) is checked out in `airbyte` directory. +- The connector to test with is in the root [Airbyte project](https://github.com/airbytehq/airbyte). +- The [Airbyte project](https://github.com/airbytehq/airbyte) is checked out in `airbyte` directory.🧰 Tools
🪛 Markdownlint (0.37.0)
59-59: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
60-60: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
61-61: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
65-65: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
66-66: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
72-72: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
73-73: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
74-74: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
174-174
: Consider adding a note about Docker networkingSince the section discusses running tests in Docker, it might be helpful to add a note about potential networking issues and how to resolve them, similar to the MockServer section that mentions using
host.docker.internal
. What do you think?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(2 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~28-~28: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests which ar...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~35-~35: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. - `poetry run ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~36-~36: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...hon code. - poetry run poe format-fix
to auto-fix formatting issues. ### Run Co...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
59-59: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
60-60: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
61-61: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
65-65: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
66-66: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
72-72: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
73-73: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
74-74: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
82-82: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
Update CONTRIBUTING.md to state the actual testing steps. The integration tests of the CDK fail due to missing `secrets/config.json` file for the Shopify source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
docs/CONTRIBUTING.md (4)
26-28
: Consider restructuring the test commands for better readability?The test commands are accurate but could be more readable. What do you think about grouping them with bullet points and adding brief descriptions for each command? Something like:
- `poetry run pytest` to run all unit tests. - `poetry run pytest -k <suite or test name>` to run specific unit tests. - `poetry run pytest-fast` to run the subset of PyTest tests which are not flagged as `slow`. (Should take <5 min for fast tests only.) + - Run all tests: `poetry run pytest` + - Run specific tests: `poetry run pytest -k <suite or test name>` + - Run fast tests only (~5 min): `poetry run pytest-fast`wdyt? 🤔
🧰 Tools
🪛 LanguageTool
[style] ~28-~28: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests which ar...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
58-61
: Consider standardizing the bullet points style?The unordered list uses asterisks (*) while other lists in the document use dashes (-). Would you like to standardize this for consistency?
-* The connector to test with is in the root [Airbyte project](https://github.com/airbytehq/airbyte). -* The [Airbyte project](https://github.com/airbytehq/airbyte) is checked out in `airbyte` directory. -* The CDK development project is checked out in the `aibyte/airbyte-python-cdk` subdirectory of the root [Airbyte project](https://github.com/airbytehq/airbyte). +- The connector to test with is in the root [Airbyte project](https://github.com/airbytehq/airbyte). +- The [Airbyte project](https://github.com/airbytehq/airbyte) is checked out in `airbyte` directory. +- The CDK development project is checked out in the `aibyte/airbyte-python-cdk` subdirectory of the root [Airbyte project](https://github.com/airbytehq/airbyte).🧰 Tools
🪛 Markdownlint (0.37.0)
59-59: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
60-60: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
61-61: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
74-79
: Add language specifiers to code blocks?The code blocks would be more readable with syntax highlighting. Should we add language specifiers? For example:
-``` +```bash poetry run <connector name> spec poetry run <connector name> check --config secrets/config.json ...Also applies to: 81-86
🧰 Tools
🪛 Markdownlint (0.37.0)
74-74: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
187-187
: Consider adding a note about Docker prerequisites?The section about running connector acceptance tests assumes Docker knowledge. Would it be helpful to add a note about Docker requirements or link to Docker setup instructions? wdyt? 🤔
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(2 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~28-~28: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests which ar...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~35-~35: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. - `poetry run ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~36-~36: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...hon code. - poetry run poe format-fix
to auto-fix formatting issues. ### Run Co...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
59-59: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
60-60: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
61-61: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
65-65: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
66-66: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
72-72: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
73-73: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
87-87: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
95-95: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
102-102: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
74-74: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
81-81: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (2)
docs/CONTRIBUTING.md (2)
17-18
: LGTM! Clear installation instructions.
The setup instructions are clear and concise, properly directing users to install Poetry and its dependencies.
97-100
: Fix the command in the example
The command example shows build
but the context suggests it should be test
:
-airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build
+airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
Trying to collect and organize the knowledge on running the test locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
docs/CONTRIBUTING.md (6)
26-28
: Consider improving command descriptions for clarityThe commands look good, but we could make them more descriptive. What do you think about adding brief descriptions of what each command does? For example:
-`poetry run pytest` to run all unit tests. -`poetry run pytest -k <suite or test name>` to run specific unit tests. -`poetry run pytest-fast` to run the subset of PyTest tests which are not flagged as `slow`. +`poetry run pytest` - Executes the complete test suite +`poetry run pytest -k <suite or test name>` - Runs tests matching the specified pattern +`poetry run pytest-fast` - Executes only fast tests (not marked as `slow`, completes in <5 min)🧰 Tools
🪛 LanguageTool
[style] ~28-
2828: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests which ar...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
58-82
: Consider restructuring the installation instructions for better flowThe installation steps could be more streamlined. What do you think about this structure:
- Move the assumptions section after the preparation steps
- Combine the path example with the directory change instruction
- Add a note about why we're editing
pyproject.toml
Also, the code block at line 68-71 seems out of place - should it be moved to a different section? wdyt?
🧰 Tools
🪛 Markdownlint (0.37.0)
59-5959: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
60-6060: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
61-6161: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
64-6464: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
65-6565: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
66-6666: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
73-7373: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
75-7575: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
76-7676: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
82-8282: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
63-6363: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
68-6868: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
Line range hint
213-224
: Consolidate duplicate test instructionsThis section appears to duplicate the information from lines 115-134. Should we combine these sections to avoid confusion? We could keep one comprehensive section about running tests with the local CDK. wdyt?
59-76
: Fix markdown list formattingThe unordered lists are using mixed styles (asterisks and dashes). For consistency with the rest of the document, should we standardize on dashes? Example:
-* The connector to test with is in the root [Airbyte project](https://github.com/airbytehq/airbyte). -* The [Airbyte project](https://github.com/airbytehq/airbyte) is checked out in `airbyte` directory. +- The connector to test with is in the root [Airbyte project](https://github.com/airbytehq/airbyte). +- The [Airbyte project](https://github.com/airbytehq/airbyte) is checked out in `airbyte` directory.🧰 Tools
🪛 Markdownlint (0.37.0)
59-5959: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
60-6060: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
61-6161: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
64-6464: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
65-6565: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
66-6666: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
73-7373: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
75-7575: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
76-7676: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
63-6363: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
68-6868: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
68-68
: Add language specifiers to code blocksThe fenced code blocks are missing language specifiers. Should we add them for better syntax highlighting? For example:
-``` +```bashAlso applies to: 87-87, 97-97, 107-107
🧰 Tools
🪛 Markdownlint (0.37.0)
68-6868: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
140-142
: Fix typo in Poe tasks descriptionThere's a small typo in the Poe tasks section:
-The Airbyte CDK uses [Poe the Poet](https://poethepoet.natn.io/) to define common development task. +The Airbyte CDK uses [Poe the Poet](https://poethepoet.natn.io/) to define common development tasks.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(2 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~28-2828: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests which ar...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~35-3535: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. - `poetry run ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~36-3636: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...hon code. - poetry run poe format-fix
to auto-fix formatting issues. ### Run Co...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
59-5959: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
60-6060: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
61-6161: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
64-6464: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
65-6565: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
66-6666: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
73-7373: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
75-7575: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
76-7676: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
82-8282: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
86-8686: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
106-106106: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
113-113113: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
121-121121: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
128-128128: Expected: dash; Actual: asterisk
Unordered list style
(MD004, ul-style)
63-6363: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
68-6868: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
87-8787: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
97-9797: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
107-107107: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (1)
docs/CONTRIBUTING.md (1)
124-126
: Fix the command in the exampleThere seems to be a copy-paste error in the command example. The text mentions running tests, but the command shows
build
:-airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build +airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
* to organize and present the setup steps explicitly * to avoid misunderstandings and wasted eforts
The reported failed checks are due to #197 |
Fixed typos
@airbytehq/dev-python Can someone take a look at this contribution? |
renamed verify_* function to asser_* what it actually does. Commended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (1)
196-200
: Would you consider adding a quick comment about service keys? 📝The service key validation and exclusion logic looks good! To help future maintainers, maybe we could add a brief comment explaining why we're doing these steps? Something like:
actual_records = list(extractor.extract_records(response)) + # Validate and remove service keys (fields starting with '$') that are only used + # during the transformation phase and shouldn't be part of the final output for record in actual_records: assert_service_keys_exist(record) actual_records = [exclude_service_keys(record) for record in actual_records]unit_tests/sources/declarative/extractors/test_record_selector.py (1)
170-170
: Clean handling of service keys! ✨Good addition of
exclude_service_keys
before assertions. Would it be valuable to add a test case that explicitly verifies that service keys are properly excluded? This could help catch any regressions in the service key handling. wdyt?Here's a potential test case to consider adding:
( "test_service_keys_are_excluded", ["data"], None, { "data": [ {"id": 1, "$service_field": "hidden", "visible_field": "shown"} ] }, [{"id": 1, "visible_field": "shown"}] ),Also applies to: 244-244
🧰 Tools
🪛 GitHub Actions: Linters
[warning] File requires formatting. Run 'ruff format' to fix code formatting issues.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/extractors/record_extractor.py
(1 hunks)unit_tests/sources/declarative/extractors/test_dpath_extractor.py
(5 hunks)unit_tests/sources/declarative/extractors/test_record_extractor.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/extractors/record_extractor.py
- unit_tests/sources/declarative/extractors/test_record_extractor.py
🧰 Additional context used
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/extractors/test_dpath_extractor.py
[error] 4-23: Import block is un-sorted or un-formatted
unit_tests/sources/declarative/extractors/test_record_selector.py
[warning] File requires formatting. Run 'ruff format' to fix code formatting issues.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (5)
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (2)
40-94
: Awesome test coverage! 🎯Really nice job on covering various edge cases and nested structures! The test cases are well-organized and comprehensive.
156-179
: Love the descriptive test IDs! 👌The test identifiers are very clear and help understand the purpose of each test case at a glance.
unit_tests/sources/declarative/extractors/test_record_selector.py (3)
50-79
: Great test coverage for nested structures! 🎯The new test cases thoroughly verify the handling of nested data structures. Have you considered adding a test case where the nested structure is invalid or malformed to verify error handling? wdyt?
🧰 Tools
🪛 GitHub Actions: Linters
[warning] File requires formatting. Run 'ruff format' to fix code formatting issues.
114-126
: Nice test for root response access! 💡Good test case demonstrating access to the original response. Would it be helpful to add a test case where we try to access an invalid path in the root response to verify error handling? wdyt?
🧰 Tools
🪛 GitHub Actions: Linters
[warning] File requires formatting. Run 'ruff format' to fix code formatting issues.
12-19
: Run formatter to fix indentation.The pipeline indicates formatting issues. Would you mind running
ruff format
to fix the indentation? 🛠️🧰 Tools
🪛 GitHub Actions: Linters
[warning] File requires formatting. Run 'ruff format' to fix code formatting issues.
unit_tests/sources/declarative/extractors/test_dpath_extractor.py
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* main: fix(airbyte-cdk): unable to create custom retriever (airbytehq#198) feat(low-code): added keys replace transformation (airbytehq#183) feat: add `min` macros (airbytehq#203) fix(low-code cdk pagination): Fix the offset strategy so that it resets back to 0 when a stream is an incremental data feed (airbytehq#202)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (2)
40-94
: Add docstrings to explain test scenarios?The new test cases are comprehensive and well-structured! Would you consider adding docstrings to explain the purpose of each test scenario? This could help other developers understand the edge cases being tested, wdyt?
For example:
@pytest.mark.parametrize( "field_path, decoder, body, expected_records", [ # Test empty input handling pytest.param([], decoder_json, b"", [], id="test_extract_from_empty_string"), # Test nested structure handling pytest.param([], decoder_json, [{"id": 1, "nested": {"id2": 2}}], [{"id": 1, "nested": {"id2": 2}}], id="test_extract_from_nonempty_array_with_nested_array"), # ... more test cases ] )Also applies to: 156-179
196-200
: Add explicit test cases for service key handling?The service key verification and exclusion steps are good additions! Would it be helpful to add test cases that explicitly verify:
- The presence of required service keys after extraction
- The successful removal of service keys after exclusion
Here's a possible test case structure:
def test_service_key_handling(): extractor = DpathExtractor(field_path=[], config=config, decoder=decoder_json, parameters=parameters) response = create_response({"id": 1}) records = list(extractor.extract_records(response)) # Verify service keys exist assert "$root" in records[0] # Verify service keys are removed cleaned_record = exclude_service_keys(records[0]) assert "$root" not in cleaned_recorddocs/CONTRIBUTING.md (5)
71-72
: Consider consolidating duplicate references to Issue #197Hey there! 👋 I noticed we're referencing the same issue twice in close proximity. Maybe we could combine these into a single reference under the "See also" section? wdyt?
See also: - [Dager-Podman Integration](https://blog.playgroundtech.io/introduction-to-dagger-6ab55ee28723) - [CDK Issue 197](https://github.com/airbytehq/airbyte-python-cdk/issues/197) - -7. Make sure Docker is installed locally, instead of Podman - See also: - -- [CDK Issue 197](https://github.com/airbytehq/airbyte-python-cdk/issues/197) +7. Make sure Docker is installed locally, instead of PodmanAlso applies to: 77-77
21-23
: Add language specifiers to code blocksHey! 🚀 Would you mind adding language specifiers to the code blocks? This helps with syntax highlighting and makes the documentation more readable. For example:
-``` +```bash sudo dnf install python3.11Also applies to: 29-31, 37-39, 43-45, 49-51, 55-57, 61-67, 81-83, 140-143, 154-157, 189-197, 213-229 <details> <summary>🧰 Tools</summary> <details> <summary>🪛 Markdownlint (0.37.0)</summary> 21-21: null Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> --- `163-163`: **Consider adding a comma for better readability** What do you think about adding a comma after "page" in this sentence? ```diff -- On the **Settings / Domains** page find the subdomain ++ On the **Settings / Domains** page, find the subdomain
🧰 Tools
🪛 LanguageTool
[uncategorized] ~163-~163: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...(AI_HYDRA_LEO_MISSING_COMMA)
283-283
: Consider using American English for consistencyHey! 👋 The phrase "have a look" is British English. Since the rest of the documentation uses American English, would you consider changing it to "take a look"? wdyt?
-Please have a look at the [Release Management] +Please take a look at the [Release Management]🧰 Tools
🪛 LanguageTool
[locale-violation] ~283-~283: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....(HAVE_A_LOOK)
136-136
: Consider using proper heading syntaxInstead of using bold text for these section headers, would you consider using proper markdown heading syntax? This would improve the document's structure and make it easier to navigate. For example:
-**Preparation steps** +### Preparation stepsAlso applies to: 235-235
🧰 Tools
🪛 Markdownlint (0.37.0)
136-136: null
Emphasis used instead of a heading(MD036, no-emphasis-as-heading)
unit_tests/sources/declarative/extractors/test_record_selector.py (3)
50-79
: Great test coverage for nested structures! 🎯The new test cases thoroughly cover nested data scenarios. What do you think about making the test names more Pythonic by using underscores instead of camelCase? For example:
test_no_record_filter_returns_all_records_with_nested
(already good!)test_true_record_filter_returns_all_records_with_nested
(already good!)
114-126
: Consider renaming this test case for consistencyWhat do you think about renaming
test_the original response is available in filters and transformations
totest_original_response_available_in_filters_and_transformations
to match the Python naming convention used in other test cases? wdyt?
170-170
: Consider adding explicit service key testsI see you've added service key exclusion in the assertions. What do you think about adding specific test cases to verify the behavior of service keys? This could include:
- Verifying that service keys are properly excluded
- Checking edge cases with missing or malformed service keys
Example test case structure:
@pytest.mark.parametrize( "test_name, input_record, expected_output", [ ( "test_exclude_service_keys", {"id": 1, "$root": {"data": []}, "$parameters": {}}, {"id": 1} ), # Add more test cases ] ) def test_service_key_handling(test_name, input_record, expected_output): assert exclude_service_keys(input_record) == expected_outputAlso applies to: 244-244
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (4)
docs/CONTRIBUTING.md
(5 hunks)unit_tests/sources/declarative/extractors/test_dpath_extractor.py
(5 hunks)unit_tests/sources/declarative/extractors/test_record_extractor.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/declarative/extractors/test_record_extractor.py
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~96-~96: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~103-~103: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. - `poetry run ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~104-~104: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...hon code. - poetry run poe format-fix
to auto-fix formatting issues. ### Run Co...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[uncategorized] ~163-~163: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~265-~265: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~283-~283: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
136-136: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
235-235: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
21-21: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
29-29: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
37-37: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
43-43: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
49-49: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
55-55: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
61-61: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
81-81: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
140-140: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
154-154: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
170-170: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
189-189: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
201-201: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
213-213: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
221-221: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (3)
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (1)
18-21
: LGTM! Clean import organization.The new imports for service key management functions are well-organized and properly placed.
docs/CONTRIBUTING.md (1)
245-246
: The command example needs updatingThe text mentions testing, but the command shows
build
. This issue was previously identified in a past review.unit_tests/sources/declarative/extractors/test_record_selector.py (1)
12-19
: LGTM! Clean import organization.The new imports are well-organized and properly scoped for the added functionality.
Trigger another another build. The unit tests in this exactly version succeed under Fedora 41, while one of them fails under Oracle Linux 8.7. The integration and acceptance tests fail under specific OS and setup, not related to the contents of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/CONTRIBUTING.md (3)
19-56
: Would you consider improving the formatting consistency of installation instructions?The Fedora-specific installation steps could be better organized. Here's a suggestion, wdyt?
- Add language specifiers to code blocks (e.g.,
bash
orshell
)- Consider using a consistent pattern for OS-specific instructions, like:
2. Make sure your Python version is 3.10 or 3.11 Installation commands: - Generic: ```bash # Your generic installation commands
- Fedora 41:
sudo dnf install python3.11<details> <summary>🧰 Tools</summary> <details> <summary>🪛 Markdownlint (0.37.0)</summary> 21-21: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 29-29: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 37-37: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 43-43: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 49-49: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 55-55: null Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> --- `99-102`: **Would you consider reorganizing the test commands for better clarity?** The test commands could be organized by purpose. How about something like this, wdyt? ```markdown ### Run Unit Tests Run all tests: ```bash poetry run pytest
Run specific tests:
poetry run pytest -k <suite or test name>Run fast tests only (<5 min):
poetry run pytest-fast<details> <summary>🧰 Tools</summary> <details> <summary>🪛 LanguageTool</summary> [style] ~101-~101: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ... unit tests. - `poetry run pytest-fast` to run the subset of PyTest tests, which a... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) </details> </details> --- `175-184`: **Consider adding a note about sensitive information in the example** The example configuration contains placeholders for sensitive information. Would you consider adding a note to remind users not to commit their actual credentials, wdyt? ```diff { "shop": "nw0ipt-vr", "start_date": "2020-11-01", "credentials": { "auth_method": "api_password", "api_password": "shpat_XXXXXXXXXXX" # Replace with your actual API password } } + +Note: Never commit your actual credentials to version control. Make sure config.json is in .gitignore.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(5 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~101-~101: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~108-~108: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. - `poetry run ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...hon code. - poetry run poe format-fix
to auto-fix formatting issues. ### Run Co...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~270-~270: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~288-~288: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
136-136: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
235-235: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
21-21: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
29-29: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
37-37: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
43-43: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
49-49: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
55-55: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
61-61: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
81-81: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
140-140: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
154-154: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
170-170: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
189-189: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
201-201: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
213-213: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
221-221: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (2)
docs/CONTRIBUTING.md (2)
250-251
: The command example needs to be fixedThe text mentions running tests, but the command shows
build
. This issue was previously identified.-airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build +airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
Line range hint
270-294
: Documentation generation instructions look good!The instructions for generating API documentation and managing releases are clear and well-organized.
🧰 Tools
🪛 LanguageTool
[locale-violation] ~288-~288: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....(HAVE_A_LOOK)
Trigger another build, hoping the infrastructure will not fail it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/CONTRIBUTING.md (2)
21-23
: Would you consider adding language specifiers to code blocks? 🤔Adding language specifiers (e.g.,
bash,
json) would enable syntax highlighting and improve readability. For example:-``` +```bash sudo dnf install python3.11Also applies to: 29-31, 37-39, 43-45, 49-51, 54-56, 60-62, 66-72, 86-88, 144-147, 158-161, 174-183, 193-201, 205-213, 217-221, 225-233
🧰 Tools
🪛 Markdownlint (0.37.0)
21-21: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
287-287
: Small style suggestions to enhance readability 📝Would you consider these minor improvements?
- Line 287: In American English, "take a look" is more commonly used than "have a look"
- Lines 101, 108: Consider varying the sentence beginnings to avoid repetition
Just nitpicks to make the docs even better, wdyt? 😊
Also applies to: 101-101, 108-108
🧰 Tools
🪛 LanguageTool
[locale-violation] ~287-~287: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....(HAVE_A_LOOK)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(5 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~101-~101: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~108-~108: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. ### Run Code ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~269-~269: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~287-~287: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
140-140: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
239-239: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
21-21: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
29-29: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
37-37: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
43-43: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
49-49: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
54-54: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
60-60: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
66-66: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
86-86: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
144-144: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
158-158: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
174-174: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
193-193: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
205-205: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
217-217: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
225-225: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
docs/CONTRIBUTING.md (2)
8-11
: Great job on improving the documentation structure! 🎉The installation instructions are now much clearer with OS-specific commands and detailed steps. This will help contributors get started more easily.
Also applies to: 17-23, 25-56
249-250
: The command example needs to be updated.The text mentions running tests, but the command shows
build
.-airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build +airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
Handled nitpick comments by rabbit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
docs/CONTRIBUTING.md (4)
8-8
: Enhance clarity of contribution typesThe line mentions "bug fixes" but doesn't specify the scope. Would it be helpful to clarify the types of bug fixes (e.g., runtime bugs, documentation fixes, test fixes)? wdyt?
-You can contribute by reporting bugs, posting GitHub discussions, opening issues, improving docs, and submitting pull requests with bug fixes and new features. +You can contribute by reporting bugs, posting GitHub discussions, opening issues, improving docs, and submitting pull requests with bug fixes (runtime, documentation, or test fixes) and new features.
19-23
: Consider adding version verification commandWould it be helpful to add a command to verify the Python version after installation? wdyt?
sudo dnf install python3.11 + +# Verify installation +python3.11 --version
99-102
: Consider restructuring test commands for better readabilityThe test commands could be more readable with additional context. What do you think about adding descriptions and examples? wdyt?
-poetry run pytest` to run all unit tests. -poetry run pytest -k <suite or test name>` to run specific unit tests. -poetry run pytest-fast` to run the subset of PyTest tests, which are not flagged as `slow`. (It should take <5 min for fast tests only.) +# Run all unit tests +poetry run pytest + +# Run specific test suite or test (e.g., 'test_record_extractor') +poetry run pytest -k <suite or test name> + +# Run fast tests only (takes <5 min) +poetry run pytest-fast🧰 Tools
🪛 LanguageTool
[style] ~101-~101: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests, which a...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
167-171
: Fix formatting in the configuration exampleThe configuration example has inconsistent formatting. Consider using a code block for better readability:
- domain: nw0ipt-vr.myshopify.com - shop: nw0ipt-vr +```yaml +domain: nw0ipt-vr.myshopify.com +shop: nw0ipt-vr +```🧰 Tools
🪛 LanguageTool
[uncategorized] ~167-~167: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...(AI_HYDRA_LEO_MISSING_COMMA)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(5 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~101-~101: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~108-~108: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. ### Run Code ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[uncategorized] ~137-~137: You might be missing the article “the” here.
Context: ...om/airbytehq/airbyte) is checked out in airbyte
directory. - The [CDK developme...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~167-~167: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~269-~269: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~287-~287: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
140-140: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
239-239: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
86-86: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
docs/CONTRIBUTING.md (1)
249-250
: Fix the command in the exampleThere seems to be a copy-paste error in the command example. The text mentions running tests, but the command shows
build
:-airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build +airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/sources/declarative/test_manifest_declarative_source.py (2)
1771-1771
: Consider adding a descriptive assertion message?The assertion could be more helpful during test failures if we add a descriptive message. What do you think about:
- assert output_data == expected_records + assert output_data == expected_records, f"Expected {expected_records} but got {output_data}"
Line range hint
1772-2051
: Great test coverage for caching behavior! 🎉The test thoroughly verifies the caching behavior between parent and child streams. The assertions effectively check both the stream creation and caching configuration.
Consider extracting common configurations to fixtures?
The test setup contains repeated stream configurations. What do you think about extracting common parts into fixtures to improve maintainability? For example:
@pytest.fixture def base_stream_config(): return { "paginator": { "type": "DefaultPaginator", "page_size": 10, "page_size_option": { "type": "RequestOption", "inject_into": "request_parameter", "field_name": "per_page", }, # ... other common config } }airbyte_cdk/sources/declarative/extractors/record_selector.py (1)
162-167
: Consider adding a docstring for clarity.The method looks good, but would you consider adding a docstring to explain what service keys are and why we're removing them? Something like:
def _remove_service_keys( self, records: Iterable[Mapping[str, Any]] ) -> Iterable[Mapping[str, Any]]: """ Remove service keys (fields starting with '$') from each record. These keys are used during transformation but should not be part of the final output. Args: records: Iterator of records potentially containing service keys Returns: Iterator of records with service keys removed """wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/extractors/record_selector.py
(3 hunks)airbyte_cdk/sources/declarative/requesters/paginators/strategies/cursor_pagination_strategy.py
(1 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(2 hunks)unit_tests/sources/declarative/test_manifest_declarative_source.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/requesters/paginators/strategies/cursor_pagination_strategy.py
- unit_tests/sources/declarative/retrievers/test_simple_retriever.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/extractors/record_selector.py (2)
11-14
: LGTM! Clean import addition.The import of
exclude_service_keys
is well-placed alongside related imports from the same module.
112-113
: Verify the sequence of operations.The sequence looks good: filter -> transform -> remove service keys -> normalize. However, would it be more efficient to remove service keys during the transformation phase instead of having a separate pass through the data? wdyt?
Let's verify if any transformations depend on service keys:
✅ Verification successful
The separate service key removal step is necessary
The current sequence is correct. Service keys are actively used during transformations (e.g., in FlattenFields), so removing them earlier would break existing functionality. The separate pass keeps the code clean and maintainable by following single responsibility principle.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any transformations that might use service keys (fields starting with $) rg -A 5 'transform.*\$' ./airbyte_cdk/sources/declarative/transformations/Length of output: 76
Script:
#!/bin/bash # Find all transformation implementations echo "=== Transformation Implementations ===" ast-grep --pattern 'class $_ implements TransformationInterface' echo -e "\n=== Service Key Patterns ===" # Look for service key patterns more broadly rg -g '*.py' -A 5 'service.*key|SERVICE.*KEY' ./airbyte_cdk/sources/declarative/Length of output: 7629
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
airbyte_cdk/sources/declarative/decoders/decoder.py (1)
29-30
: Great docstring improvement! Would you consider adding more details? 🤔Thanks for clarifying the empty response behavior! This is super helpful. Since we're improving the docs, would you consider also adding examples of what constitutes an "empty response" (e.g., null, empty string, etc.) and maybe mention any special handling for malformed responses? wdyt?
Decodes a requests.Response into a Mapping[str, Any] or an array. -The empty response produces an empty list. +The empty response (e.g., null, empty string, no content) produces an empty list. +Malformed responses will raise a DecodeError with details about the failure. :param response: the response to decode :return: Generator of Mapping describing the responseunit_tests/sources/declarative/decoders/test_json_decoder.py (2)
11-18
: Hey there! 👋 I noticed some unused imports.The following imports aren't being used in this file:
YamlDeclarativeSource
DeclarativeStreamModel
ModelToComponentFactory
Would you like to remove them to keep the imports clean? wdyt? 🤔
-from airbyte_cdk import YamlDeclarativeSource from airbyte_cdk.models import SyncMode from airbyte_cdk.sources.declarative.decoders import GzipJsonDecoder from airbyte_cdk.sources.declarative.decoders.json_decoder import JsonDecoder, JsonlDecoder -from airbyte_cdk.sources.declarative.models import DeclarativeStream as DeclarativeStreamModel -from airbyte_cdk.sources.declarative.parsers.model_to_component_factory import ( - ModelToComponentFactory, -)🧰 Tools
🪛 GitHub Actions: Linters
[warning] File needs formatting. Run 'ruff format' to fix formatting issues.
Line range hint
1-150
: Quick formatting note 🎨The linter reports some formatting issues. Could you run
ruff format
to fix them? This will ensure consistent formatting across the codebase.🧰 Tools
🪛 GitHub Actions: Linters
[warning] File needs formatting. Run 'ruff format' to fix formatting issues.
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (3)
40-42
: Consider adding error scenario tests?The empty input test cases look good! Would you consider adding some error scenarios too? For example:
- Malformed JSON input
- Invalid field paths
- Unexpected data types
What do you think? 🤔
Also applies to: 126-129
40-40
: Could we clarify the JSONL contract comment?The comment
# The JSON contract is irregular, compare with JSONL
might be a bit cryptic for future readers. Would you mind expanding on what makes it irregular and why it's important to compare with JSONL? Maybe something like:# JSON decoder returns [{}] for empty input while JSONL returns [], # this difference is by design due to their distinct parsing contractswdyt? 🤓
196-201
: Enhance empty record handling?The service key handling looks solid! A couple of small suggestions to make it even better:
- The comment
# A valid JSON parsed, see the contract
could be more descriptive. Maybe:# Only validate service keys for non-empty records as per the extractor contract
- Would it make sense to define an
EMPTY_RECORD = {}
constant at the module level? This would make the empty record check more maintainable:if record != EMPTY_RECORD:What are your thoughts on these suggestions? 🎯
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/decoders/decoder.py
(1 hunks)airbyte_cdk/sources/declarative/decoders/json_decoder.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
(2 hunks)unit_tests/sources/declarative/decoders/test_json_decoder.py
(1 hunks)unit_tests/sources/declarative/extractors/test_dpath_extractor.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
- airbyte_cdk/sources/declarative/decoders/json_decoder.py
🧰 Additional context used
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/decoders/test_json_decoder.py
[warning] File needs formatting. Run 'ruff format' to fix formatting issues.
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
unit_tests/sources/declarative/decoders/test_json_decoder.py (2)
22-27
: Could we add more context about the JSON contract? 🤔I see you've added a comment about the JSON contract being irregular, but it might help future maintainers to understand:
- Why empty string returns
[{}]
instead of an empty list- What makes the contract irregular
- Where this behavior is documented
Would you mind adding a more detailed docstring or comment explaining these points? wdyt?
🧰 Tools
🪛 GitHub Actions: Linters
[warning] File needs formatting. Run 'ruff format' to fix formatting issues.
48-54
: Love the improved test case IDs! 🎉The descriptive IDs make the test cases much more readable and maintainable. Nice job on adding edge cases for empty objects and lists too!
🧰 Tools
🪛 GitHub Actions: Linters
[warning] File needs formatting. Run 'ruff format' to fix formatting issues.
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (1)
18-21
: LGTM! Clean import organization.The new imports for service key management functions are well-organized and properly placed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/sources/declarative/test_manifest_declarative_source.py (3)
137-139
: Would you like to enhance the docstring with more details?The current docstring could be more helpful if it included:
- Purpose of the dynamic declarative stream configuration
- Parameters it accepts from
_declarative_stream
- Example usage in tests
wdyt?- """ - Generates a DynamicDeclarativeStream configuration. - """ + """ + Generates a DynamicDeclarativeStream configuration for testing. + + This fixture builds upon the base declarative stream configuration from _declarative_stream + and adds dynamic resolution capabilities through a components resolver. + + Returns: + dict: A configuration dictionary for a DynamicDeclarativeStream with template-based + field resolution. + + Example: + def test_dynamic_stream(self, _dynamic_declarative_stream): + manifest = {"streams": [_dynamic_declarative_stream]} + # Test implementation + """
1773-1773
: Fix indentation for better readability?The line appears to be incorrectly indented. Should we align it with the rest of the function body?
+ # Assert the output matches expected records + assert output_data == expected_records
Line range hint
402-425
: Consider adding type hints to helper functions?Adding type hints to
_create_request
,_create_response
, and_create_page
would improve code maintainability and IDE support. wdyt?-def _create_request(): +def _create_request() -> requests.PreparedRequest: -def _create_response(body): +def _create_response(body: dict) -> requests.Response: -def _create_page(response_body): +def _create_page(response_body: dict) -> requests.Response:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/test_manifest_declarative_source.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
unit_tests/sources/declarative/test_manifest_declarative_source.py (1)
Line range hint
1-2000
: Great test coverage! 👍The test suite is well-structured with comprehensive coverage of:
- Manifest validation
- Pagination scenarios
- Partition routing
- Data transformations
- Error cases
@airbytehq/dev-python can someone take a look at this contribution? |
The squashed commits are in a new PR: #214 |
Feat: Publish the response to process in the context of the record's transformation and filtering steps
See:
In the declarative sources, after the record extraction, in the phase of record transformation and record filtering:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores