-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identity API at /api/me #671
Conversation
Thanks for opening this. I would be super happy to see a minimal version of this merged (the change to {
"notebook:Untitled1": [
USER1, USER3
],
"notebook:example": {
USER2, USER3, USER4
}
} Whenever a user connects or disconnects to a document via the RTC WebSocket, they would add or removes themselves from that dict. |
Hey @minrk, I think that it may be relevant to look at the |
@SylvainCorlay yes, absolutely! IUser currently has these fields:
First question: what fields should come from the server. At first glance, it seems like username, name, and displayName make sense. Should the server also be responsible for picking avatar url and color? What about anonymous? Second question: what should be done when some fields are undefined? In almost all cases, My first impression:
|
Hi @minrk, thank you soo much for working on this! If I understood correctly, the
In my opinion, we should enforce the extensions to specify the |
Thanks for filling in!
Yes, I agree the server should provide this information when available. But in the ~90% of cases where I don't really understand the difference between
I think that makes sense. It makes sense to me to follow the same pattern as |
If you mean the default implementation, we should follow the anonymous pattern implemented at the moment in the frontend, creating random usernames from the list of Jupiter's moons. But this should probably be a JupyterLab server extension. In any other case (Hub, or other solutions), if the IdP doesn't have the information, then those fields will be an empty string (the collaborative experience will be completely broken in those cases, but we can not do anything, that is the responsibility of who is deploying the application).
On the UI, we only use |
I mean JupyterHub - normally, only a username is provided, no other information. So in the vast majority of fully authenticated deployments of Jupyter, a username is all we have to work with, so it's an important case to handle.
Why is it broken? I don't see any reason that more than a username is needed for anything to function, only aesthetic refinement. This is going to be close to all cases in practice, so it probably shouldn't be broken, and it seems to be handled just fine by using
Moving that to the server makes sense, whether as an extension or in the default server itself. Probably for another PR, though, at least. What is the |
What I meant with broken is that, If we leave the
Sure, I can work on this later in another PR. That will help me to get into the server source code.
That field is used to know whether the user is randomly generated on the frontend or is an identified user whose name comes from an IdP. The idea is to allow users to rename themself. The problem is that if the user information comes from an identity provider, we can not rename the user because we need to PUT the info to the IdP, and should not rename the user because the identity is verified by the IdP, if we allow renaming on the frontend anyone can impersonate someone else.
We can remove it from now because we can not rename users yet. |
Codecov Report
@@ Coverage Diff @@
## main #671 +/- ##
==========================================
+ Coverage 69.96% 70.13% +0.16%
==========================================
Files 62 63 +1
Lines 7368 7493 +125
Branches 1223 1251 +28
==========================================
+ Hits 5155 5255 +100
- Misses 1841 1857 +16
- Partials 372 381 +9
Continue to review full report at Codecov.
|
Ah, sure. A value needs to be set for these fields eventually, by the time they are actually used. My question is about the (majority of) cases where some fields aren't available, which piece of code is responsible for filling in the derivative values, in the order in which they occur:
I think 2. is probably the simplest, while 3. allows the UI to make the most informed display decisions because it knows what fields are actually authoritative vs generated from fallbacks. 1. offers no benefits over 2, so I don't think it makes sense.
I don't think that follows. We can't define if not display_name:
display_name = username # or name, etc. without any issues. JupyterLab already has exactly this logic: I've updated this PR to implement 2 to show you what I mean - these are purely fallbacks for when authoritative info for some fields is unavailable; the IdP adapter may return any and all fields it has an answer for (this includes computing its own derivative values, so 2. covers a strict superset of 1.), so this is just filling in the missing values when they are missing. |
This PR builds on top of the authorization branch, which makes perfectly sense. To get the gist of the changes related to identity, I use this link Zsailer/jupyter_server@authorization...minrk:identity |
I can't entirely agree with this part because, in the server or the frontend, we do not know if the IdP is using a UUID, email, or specific combination of the name to create a username (to create a unique identifier). I don't think it is good to display a UUID as the name for users. We are not the most informed to make a display decision. Maybe, the best we can do, is introduce the
As a fallback, this is fine. But still, if the username is a UUID, we will be displaying UUIDs in the UI.
That's not 100% true. The logic you are talking about is here and is a temporary solution to allow anonymous users to rename themself by introducing the name, color, and initials as a parameter in the URL. And for example, we are not extracting the initials from the name, as you can see on this code. Instead, we keep the old initials forcing the user to introduce them. The reason is there is a lot of different cultures where names are entirely different, and we should never assume anything. let name = decodeURIComponent(env.getParam('--username', ''));
let color = env.getParam('--usercolor', '');
let initials = decodeURIComponent(env.getParam('--initials', ''));
.
.
.
this._username = user.username as string;
this._name = name !== '' ? name : (user.name as string);
this._displayName = name !== '' ? name : (user.displayName as string);
this._initials = initials !== '' ? initials : (user.initials as string); |
I understand the concern about showing UUIDs to the user, that makes sense. I think it's reasonable, then, to keep the implementation of fallbacks I have here and put that requirement on the IdP adapter that if all it has for a username is a UUID, it should provide a (generated) real name as well. That can be the responsibility of the IdP and a suggestion to always provide a 'nice' name, since everything works correctly with any username, but the experience is more pleasant for UUID usernames. If users should have the right to edit anonymous names, it makes sense that this would be a server-side feature, as it will have server-side storage components. I think we can work out the details of who generates/tracks these anonymous names in a second round and/or extension. If anonymous providers are common (or we just need it for the default case), an implementation of anonymous utilities may well make sense in this repo.
I agree that's true - the problem is that we are implementing every step of this process, and initials aren't provided anywhere. So it's still up to 'us' to do it, just not JupyterLab. If JupyterLab forces Jupyter Server to do it, and if Server doesn't do it itself, it's forcing JupyterHub to do it. At no point in that process are initials available from a trustworthy source. So what should we do, then? Can we eliminate the requirement for initials, since there's ~never a safe way to generate them? If the user should be able to pick them, then I think it should be JupyterLab's responsibility, as that's the only place they are used, and also the only place UI can feasibly be presented to do so. Maybe Server needs to be involved, too, for storage purposes, but I think this is principally a task for Lab. |
How's this for a general design:
The result is that server can guarantee these fields are present and non-empty: |
That's true. Usually, IdPs don't provide initials so I think we can deal with the Initials in the frontend for now and will see what to do with them. I was thinking maybe to change initials for random images of jupyter's moons a la Google Docs with animals, but we can think about it on a second round.
I like the design you are presenting, just one thing, I think would be better to have initials as null instead of empty string. |
👍 so
If it would help with review, I can remove the permissions field from this PR and rebase without #165. Then I could add the permissions field in a third PR once this and #165 are merged. Configuration/implementation question: do folks have a preference for whether the identity-provider-adapter and Authorizer should be different classes or just one? I initially thought just one (currently reflected in this PR) since I suspect they will ~always go together, but it might be better for them to be separate classes to allow for one identity provider to be mixed with different authorization schemes (e.g. oauth + read-only, etc.).
|
With the link I have shared Zsailer/jupyter_server@authorization...minrk:identity I am fine having this PR as it is and avoid additional work to you. Authorization and Authentication (aka identity) are To me, both can remain in this PR.
They sounds to me like separated classes, but I need to makes sense of what you say in the above comment: |
That's what I'm thinking, too (
I mean, for example, to configure Jupyter Server to run under JupyterHub, it must always set both the authorizer and identity provider, e.g. c.ServerApp.authorizer_class = JupyterHubAuthorizer
c.ServerApp.identity_provider_class = JupyterHubIdentityProvider If one made the mistake of only setting On the other hand, setting only the JupyterHub identity provider would not result in properly applying permissions - it would appear to work because authentication would be correct. But authenticated requests that shouldn't have permission will be allowed, too (it's easy to overlook a configuration error when things that are supposed to work do, but things that aren't supposed to work do, too). This is technically correct because that's what the default authorizer is meant to do, but is likely not the configurer's intention. In the specific case of JupyterHub, I don't expect this to be an issue because it will be taken care of by the |
OK, I see, thx for clarifying. There are so many configuration to setup that I don't see that as an issue to ask the platform administrator to define needed configuration... as long (in capitals)...as long as the system is secured by default. |
Thanks for working on this @minrk ! I would like to make sure I understand the overall model (at least parts of it). It looks like the identity API itself defers to the |
Yes, though I think the latest consensus is that we'll have separate
Yes, that's the goal, so we want to make sure we have all the fields ICurrentUser may want defined, and be clear what to expect/require when certain fields aren't available from the IdentityProvider. |
This is indeed what has been discussed recently.
Looking at the diff, there is indeed a |
Yup, that's the idea. I have done it locally already, hope to have a draft up by the end of the week. |
Rebased now that #671 is merged. Not quite ready for a full review, but close enough that folks can take a look. Overview: IdentityProvider defines the Authentication API to be combined with Authorizer for authorization.
Remaining tasks:
Design decisions:
Things that I think are good to do, but ideally not in this PR:
|
I added some docs and tests. Could still use an example, but I think there's enough to review for now. |
I've run a successful test with this PR and jupyterhub/jupyterhub#3833 With this, I'm able to run a fully authenticated JupyterLab server with read-only access for some users without the JupyterHub wrappers at all, purely an implementation of the IdentityProvider and Authorizer APIs. That tells me we are pretty close to my goal of being able to replace the jupyterhub-singleuser wrapper and its extensive monkeypatching with a public-api-consuming ExtensionApp. The main thing I needed was a Config files: https://gist.github.com/e5f47d6f4253532635e9716076db0fb4 |
We've created a 1.x branch during this week's contributing hour. Thanks for all the great work here @minrk! |
I can make async get_user work, I'm just thinking through the deprecation process. The 'right' way to check the current user is to access Assuming nobody overrides The trickier case is what JupyterHub does currently, which is override So far, I think:
That way, setting a custom IdentityProvider takes precedence, but the deprecated approach still works without issue. Only downside is the somewhat convoluted check for the deprecated case on each request. |
FWIW, I'd also be okay if the JupyterHub approach errored in a simple, reliable way, as long as the failure isn't allowing requests it shouldn't. That just means JupyterHub users need to wait for a release before upgrading to server 2.0. I'm fine saying that, but I want to be very sure that the incompatible-versions failure mode is not ignoring auth altogether. |
Has |
Same issue comes up again in check_xsrf_cookie, called in |
Yes, it was released in v1.15.5 and above. |
Worked out the async get_user. I suspect this may come up as a sticky issue a few times in the future because some of tornado's 'pre-flight' checks run pre-prepare, and many of them now require running post-prepare. |
identity includes fields: - username: str - name: Optional[str] - display_name: Optional[str] - initials: Optional[str] - avatar_url: Optional[str] - color: Optional[str] permissions is sibling to identity, with the form: {"resource": ["action", ],} where permissions are only populated _by request_, because the server cannot know what all resource/action combinations are available. Defines new jupyter_server.auth.IdentityProvider API for implementing authorization - IdP.get_user(Handler) returns User dataclass - IdP.identity_model renders dataclass to standard JSON dict model - IdP.get_handlers returns possible custom RequestHandlers (e.g. OAuth callback) - get user_id from cookie, even when token-authenticated, for stable random user ids in the default implementation, even for token-authenticated requests. Default get_user implementation still resides in LoginHandler.get_user, to be moved and deprecated in subsequent PR. Ensures authorizer, identity_provider are defined in case of custom Application, load default Authorizer/IdentityProvider if none is available in settings For CI: Fix numpy-style docstring format
careful to deprecate overridden get_current_user without ignoring auth Needs some changes due to early steps that are called before prepare, but must now be moved to prepare due to the reliance on auth info. - setting CORS headers (set_default_headers) - check_xsrf_cookie - check_origin now that get_user is async, we have to re-run these bits in prepare after user is authenticated
🎉 Great work, Min!
Do you mind expanding on this a little bit? I'm don't immediately see the issue, but I'm sure your experience from JupyterHub makes you much more keen to see these issues. This looks good to go in my mind! 👍 🚀 |
I'm going to merge this now, since we've had multiple rounds of review. I think it's better to get this merge and start using it. We can iterate in future PRs if issues arise. Thanks so much, @minrk! Super excited for this work! |
It's mostly references to
I think these will be rare, but I don't know exactly how rare. We could make this work far more often if the failures only occurred when get_user is async, rather than always. That's likely to hide bugs, though, as the default auth is always going to be sync and JupyterHub probably shouldn't be (even though it is right now). |
Draft implement of identity API (closes #638). Includes #165.
Key issues:
Jupyter Server has never put any requirements on the return of
get_user
other than truthiness - this poses a challenge in coercing to a user model. I'm aware of two patterns:so these are supported, and others result in
username: "unknown"
.We could take this opportunity to enforce a structure on .current_user, or we could require custom User objects to use a corresponding Authorizer class to define the new Authorizer.user_model method to adapt to the dict format.
checking permissions - the design in Add authorization layer to server request handlers #165 doesn't allow a priori listing of available resource+action combinations, so permissions to check are inputs to the API request. Currently, permissions input is two lists - resources and actions, but it could be a JSON dict of
resource: [action,]
(matching output).TODO: