-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rest_api): custom client for specific resources #2082
feat(rest_api): custom client for specific resources #2082
Conversation
dlt/sources/rest_api/__init__.py
Outdated
@@ -263,12 +263,17 @@ def create_resources( | |||
incremental_cursor_transform, | |||
) = setup_incremental_object(request_params, endpoint_config.get("incremental")) | |||
|
|||
merged_client_config: ClientConfig = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configs get merged here, with the keys defined in the endpoint taking preference over the global one.
✅ Deploy Preview for dlt-hub-docs canceled.
|
dlt/sources/rest_api/typing.py
Outdated
headers: Optional[Dict[str, str]] | ||
auth: Optional[AuthConfig] | ||
paginator: Optional[PaginatorConfig] | ||
session: Optional[Session] | ||
|
||
|
||
class ClientConfig(BaseClientConfig, total=False): | ||
base_url: str # type: ignore[misc] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typing overrides are currently not yet supported (see python/mypy#7435) thus the ignore here.
For the optional dict in each endpoint we need to ensure that all keys are optional.
dlt/sources/rest_api/__init__.py
Outdated
resources = c.get("resources") | ||
if resources: | ||
for resource in resources: | ||
if isinstance(resource, str) or isinstance(resource, DltResource): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go for the opposite check (is instance of EndpointResource
?), is there a reason for testing like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a TypedDict
and the check is not (yet) supported on it :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, you could test only for dict
, but that should be enough. if's it's a dict
(even if an incorrect one) the code will try to mask the client field. Then the real validation will happen.
It is not super precise (a comment to mention that the actual test should be for the proper type, but python...), but a bit simpler to read.
Happy to hear other opinions @burnash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to swap it for a check for dict if that's preferred. Whilst not amazing to read, the current type is a Union out of three possible types, so it currently is precise, but not super future proof.
4cbf8de
to
1bf1fe7
Compare
@willi-mueller @burnash would you mind taking a look and give me feedback on this, please? |
Hi @joscha, thanks you for the contribution. I think this is a great idea. As you may have noticed, One thing that I would like to discuss is that the PR extends to the resource configuration with the {
"client": {
"base_url": "https://api.mything.com",
"auth": {
# ...
},
"paginator": "json_link"
},
"resources": [
"resource-using-bearer-auth",
{
"name": "my-resource-with-special-auth",
"client": {
"auth": HttpBasicAuth("user", dlt.secrets["your_basic_auth_password"]),
"paginator": "json_link" # <-- now it could be configured here as well
},
"endpoint": {
"paginator": "json_link"
}
}
]
} I think it would be better to keep the configuration in one place. To achieve this, I see two options:
Although I think the first option is more consistent with the global
Your example would then look like this: {
"client": {
"base_url": "https://api.mything.com",
"auth": {
"type": "bearer",
"token": dlt.secrets["your_api_token"],
}
},
"resources": [
"resource-using-bearer-auth",
{
"name": "my-resource-with-special-auth",
"endpoint": {
"path": "my-resource-with-special-auth",
"auth": HttpBasicAuth("user", dlt.secrets["your_basic_auth_password"]),
"paginator": "json_link",
}
}
]
} What do you think? |
Hi @burnash, I initially had only |
For my experience, changing the paginator is something more common than changing the authentication method for different endpoints of the same API. I am in favour of the first option, but I can see how it can cause some confusion. In my opinion we should make clear how things are handled, possible options are:
So it will be like
What do you think? |
I think either of these are okay. Personally I'd most likely change the type of the accepted client config in the resource to NOT accept a paginator (option 1). |
Thank you @joscha and @francescomucio for your thoughtful input and discussion on this PR. After considering all the points raised, I wanted to share my perspective. I'm not fully convinced that adding a client configuration directly to the resource is the optimal approach. In practice, setting the paginator explicitly per endpoint is much more common than changing the authentication method within the same API. Introducing a client config at the resource level may lead to unnecessary ambiguity. Regarding the headers, I believe these should be specified within the endpoint configuration. In fact, there's already a PR adding headers at the endpoint level #2084 As for the Here's my proposal for moving forward.
Here's how the configuration would look: {
"client": {
"base_url": "https://api.mything.com",
"auth": {
"type": "bearer",
"token": dlt.secrets["your_api_token"],
}
},
"resources": [
"resource-using-bearer-auth",
{
"name": "my-resource-with-special-auth",
"endpoint": {
"path": "my-resource-with-special-auth",
"auth": HttpBasicAuth("user", dlt.secrets["your_basic_auth_password"]),
"paginator": "json_link",
"headers": {"Custom-Header": "value"}
}
}
]
} @joscha, would you be open to updating the PR to reflect this approach? |
I can most definitely update the PR that way if that's what you wish. I think possibly spreading the individual properties of |
Purely in regards to mechanics, do you:
* Want me to wait until the open headers PR is decided and merged
* Open the PR anyway and we deal with conflicts
* Want me to base my PR off the headers PR
?
…On Mon, Nov 25, 2024, 14:15 Anton Burnashev ***@***.***> wrote:
Thank you @joscha <https://github.com/joscha> and @francescomucio
<https://github.com/francescomucio> for your thoughtful input and
discussion on this PR. After considering all the points raised, I wanted to
share my perspective.
I'm not fully convinced that adding a client configuration directly to the
resource is the optimal approach. In practice, setting the paginator
explicitly per endpoint is much more common than changing the
authentication method within the same API. Introducing a client config at
the resource level may lead to unnecessary ambiguity.
Regarding the headers, I believe these should be specified within the
endpoint configuration. In fact, there's already a PR adding headers at the
endpoint level #2084 <#2084>
As for the base_url, I'm not sure it's necessary within the resource
configuration since endpoint.path can accept an absolute URL (I'd need
update the documentation to reflect this). Putting headers and auth into
the endpoint configuration would make it consistent with how the underlying
RESTClient and RESTClient.paginate operate.
Here's my proposal for moving forward.
1. Extend the endpoint configuration to accept auth which overrides
the client's auth configuration for the endpoint.
2. Add headers configuration to the endpoint via #2084
<#2084>
Here's how the configuration would look:
{
"client": {
"base_url": "https://api.mything.com",
"auth": {
"type": "bearer",
"token": dlt.secrets["your_api_token"],
}
},
"resources": [
"resource-using-bearer-auth",
{
"name": "my-resource-with-special-auth",
"endpoint": {
"path": "my-resource-with-special-auth",
"auth": HttpBasicAuth("user", dlt.secrets["your_basic_auth_password"]),
"paginator": "json_link",
"headers": {"Custom-Header": "value"}
}
}
]
}
@joscha <https://github.com/joscha>, would you be open to updating the PR
to reflect this approach?
—
Reply to this email directly, view it on GitHub
<#2082 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABN5BQTIFFG2MKADD2KV7T2CMWJVAVCNFSM6AAAAABSF4PSF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJYGEZTOMRQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
1bf1fe7
to
dff62c9
Compare
I just saw that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you @joscha
Look good, there are a couple of small improvements needed, please the comments.
docs/website/docs/dlt-ecosystem/verified-sources/rest_api/basic.md
Outdated
Show resolved
Hide resolved
Very good question.
Yes, it's intentional, we moved rest_api, sql_database and filesystem sources to dlt core in 1.0.0. Versions prior to 1.0.0 still reference these sources as being in the verified-sources repository when you execute
There isn’t a formal sync mechanism in place. If someone is working with projects that require or benefit from the latest updates or additional features available in the newer versions of these resources, I recommend upgrading to dlt version 1.0.0 or later.
As this is a feature PR, for now, there's no need to port these changes to verified-sources repo. For bugfixes it's different. |
…c.md Co-authored-by: Anton Burnashev <[email protected]>
Co-authored-by: Anton Burnashev <[email protected]>
Co-authored-by: Anton Burnashev <[email protected]>
🎉 all good to go now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 all good to go now
Yes, looks great, thank you!
Awesome, thank you. Are there instructions on how to use the current devel branch at a given sha? The only mention of nightly builds I could find is here: #997 |
You can use git+ prefix in pip, for example: Install a devel branch:
Install a devel branch with duckdb extra
Replace |
Description
This adds the ability to pass a custom
ClientConfig
to anEndpointResource
.The passed
ClientConfig
object will be merged with the global one from the containing@dlt.resource
.Example
In the corresponding Slack thread a question came up whether a custom authenticator could be used. The answer is yes, it would be possible to write custom authenticator which introspects the resource requested (base on the URL for example), however the implementation of such an authenticator would be a lot more involved, less obvious than just merging a configuration object and it would also not have an answer for differing
base_url
s ,headers
or other client-specific parameters for example.Related Issues
Additional Context
See originating Slack thread.
The reason for this feature is that some API endpoints, even though available under the same base URL and returning the same entities, may have different ways of accessing them. For example the Affinity V2 API uses an OpenAPI-based access model with Bearer authentication. This part of the API is available under
https://api.affinity.co/v2/
.The first version of the API is available under
https://api.affinity.co/
but uses HTTP Basic Auth.Entities from both APIs are using the same entity IDs and are related, thus must be specified in the same
@dlt.resource
in order to define those relations.This is currently (
1.4.0
) not possible, due to the differing (incompatible) authentication methods.