-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): Couchbase ingestion source #12345
base: master
Are you sure you want to change the base?
feat(ingest): Couchbase ingestion source #12345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial comments on the code
It feels like some features here were added just because they exist for other sources e.g. profiling/classification/domain mapping. Are those strictly necessary for what we're trying to accomplish with this source? Are the implementations sufficiently performant to work at scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just use tenacity
or another library for this?
samples: List[Any] = [] | ||
|
||
|
||
def json_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have a number of utilities for doing "schema inference" from objects - e.g.
datahub/metadata-ingestion/src/datahub/ingestion/source/schema_inference/object.py
Line 86 in 0361f24
def construct_schema( |
Does this stuff still make sense?
@@ -147,6 +150,7 @@ export const PLATFORM_URN_TO_LOGO = { | |||
[BIGQUERY_URN]: bigqueryLogo, | |||
[CLICKHOUSE_URN]: clickhouseLogo, | |||
[COCKROACHDB_URN]: cockroachdbLogo, | |||
[COUCHBASE_URN]: couchbaseLogo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, I prefer to have the UI forms get added in a separate PR.
Small PRs are easier to review - and in this case, I like to have the UI changes reviewed by someone who does more frontend dev than I do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this docker compose and entrypoint file seem more complex than I would have expected
Usually docker containers come with good defaults, which means our docker setups are very simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like duplicate files here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first source that really uses async
In general, we should almost never be using "bare" asyncio. instead, the anyio
library is preferred. We also should not really have references to the event loop all over the place - that tends to be an anti-pattern. Methods that need an event loop should be async. For cpu-bound operations, we can use the asyncer
library to bridge between asyncio tasks and worker threads.
Checklist