-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure HttpHook's auth_type from Connection #35591
base: main
Are you sure you want to change the base?
Configure HttpHook's auth_type from Connection #35591
Conversation
c80e6e1
to
c8574c1
Compare
This looks good in principle (small and simple, yet powerful), but it would require a more complete documentation and examples in order to be mergable. |
@potiuk What about a more aggressive PR, with most likely a breaking change ? (Sorry for the big chunk of text, here is the important part:)
Would that be okay ? |
I'd say I am not so thrilled by the other option. and Certainly would not comment on it unless you show the code rather than explain in words what you really mean by changing it. I am not sure if you can pass in words what you want to do do without actually trying and implementing POC where you would show the code and we could assess how "breaking" it is. Airflow is used by 10s of thousands of enterprises and we cannot afford breaking changes that will make everyones workflows broken when migrating. We can break few peple workflows (this is inevitable) but not everyoene's So before even attempting that, you should answer yourself a few questions. And decide if you want to go there at all.
.... I think most of thos questions are only worth looking in detail when there is at least Proof-Of-Concept where discussion can be done over the code rather than abstract concept of the change :) . Otherwise It will take too much time of those who review it to understand what you really want to do - having a code to look at is pretty much starting point of someone looking at proposing a change touching this part - part that is pretty much "core" of Airflow and part of Public Interface of Airflow: https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html |
c8574c1
to
ba6d715
Compare
73a3cfe
to
0daaa70
Compare
Thanks for the detailed answer ! I won't go for proxy settings and extra parameters in this PR. Just for info, adding a tool like forwarder solve globally all proxy configuration issues. And disabling ssl "verify" can be done via the For the rest, currently this PR:
There is no breaking change in Airflow. It may be a breaking change for user relying on the previous logic of the property (see below's code-review). But as the property was introduced recently, that won't break many users' worflows. |
airflow/providers/http/hooks/http.py
Outdated
|
||
@auth_type.setter | ||
def auth_type(self, v): | ||
self._auth_type = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the auth_type property by a simple class attribute. In a previous PR (#29206 (review) - 8 months ago), it was introduced as a mechanism to detect if the user passed a custom auth class. I replaced that by the self._is_auth_type_setup
attribute.
Why ? Maintaining this property means adding in it all the logic to load and return a auth_type potentially defined in the Connection. Which is useless regarding how this property is used in the rest of the codebase (here in livy, and in dbt hook).
14d4b30
to
af69628
Compare
i just realised there is likely one big problem here - security. While we cannot prevent it completely for some kind of connections (this is why Connection Editing user should be highly priviledged, introducing RCE deliberately is another thing. If I understand correctly, someone who edits connection can decide which arbtirary class will be instantiated and executed when HTTP connection is established via HTTP Hook ? Which - if I understand correctly is basically a "no-go" - we removed a number of cases like that from the past from a number of providers precisely for that reason. Is there any way we can make UI connection "declarative" for that? for example we could limit the list of predefined auth types we can choose. Does it make sense at all? |
8ddedcf
to
b01d620
Compare
8763e34
to
11408ce
Compare
c22fd95
to
29c3cff
Compare
c9e5d10
to
53a38eb
Compare
7f0e8fb
to
070be4b
Compare
@potiuk The number of possible connections is already limited and is exposed as a frozenset in the HttpHook, so there is not way to fiddle with it. You can also configure the allowed auth types through the airflow.cfg, so this means that you would actually need access to the airflow installation to be able to modify it. So maybe I'm naïve here, but I think we're quite safe here unless I'm missing something important here which could be the case. Even if you would fiddle with the HTTP form on the client side, it wouldn't be accepted as the changed auth type wouldn't be part of the allowed auth type, which is checked in this part:
|
@potiuk @jscheffl I think it would be nice, once we know how the connection forms work in Airflow 3.0, to finish this PR, as that would be a nice feature. For example when using the LivyHook/Operator, it would then be easily possible to change the auth type to kerberos for example as this is of how we use it, at the moment we have to patch the LivyHook to be able to use it that way. Of course, the feature should downgrade itself if it detects if the provider is used on Airflow 2.x, which I think is feasible until the provider is Airflow 3.x only. |
Still not further implemented - but if you want to contribute, we can also define it together. Starting point atm are the params UI parts which from spec will then be usable for connection forms... at least that is the plan: #45270 |
some static checks and others are failing |
Hello,
This PR makes possible to setup and parameterize HttpHook and HttpAsyncHook's
auth_type
from the Connection UI.Concretely, this PR:
auth_type
field to define a auth classauth_kwargs
field to provide a dict of extra parameters to theauth_type
class.auth_type
is validated against a list of Auth classes, to protect against code injection.AIRFLOW__HTTP__EXTRA_AUTH_TYPES
configauth_type
auth_kwargs
Side effect of the UI changes: The Extra field was until now used to pass params to the Headers (anything in the Extra was passed to the Headers). But now,
auth_kwargs
andauth_type
are also being written over there, which I don't find very convenient. Furthermore, this PR add logic to exclude those keys from the Headers (IMO this start to be a bit of tech-debt). And finally, user cannot pass a header named like those keys (it is unlikely, but it could happen).Thus, I propose to deprecate headers parameters passed directly in the 'Extra' field. And to pass them via a dedicated "Headers" field.
UI:
Side effect:
I also tried to add a CodeMirrorField (for Headers and Auth kwargs), and a CollapsibleField (to hide Extra), but it was a bit too much compared to the initial goal of this PR. Maybe in a future one.
Use-case:
The
auth_type
is typically a subclass ofrequest.AuthBase
. Many custom Auth classes exist for many different protocols. Sometimes, passing only two hard-codedconn.username
andconn.password
is not enough: The Auth class expects more than two arguments.Examples:
Right now, to deal with those cases, they are three possibilities:
functools.partial
, like mentioned in this PR, in the dag file / in the operator declaration.Opinion: The dag developer should not care about handling the connection. He just want a working connection_id to call an endpoint (especially if its a beginner / low-experienced dev). Furthermore, some parameters are sensitive and cannot be written in a dag.
Opinion: This is not okay. Other hooks are doing better. Take the ODBCHook, which allows to parameterize every aspect of the connection without subclassing anything:
"connect_kwargs"
).Opinion: This is definitively a bad workaround. I'm mentioning it because this PR won't entirely solve the issue, and this may (continue to) happen.
Coming to this PR, I propose to add two reserved field: "auth_type" and "auth_kwargs", which are passed to the underlying Auth class. No breaking change. This solve most of the issues: a partial is not needed anymore, a subclass is not needed anymore, and there are less cases where conn.username and conn.password will be misused.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.