-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(UserManager): handle concurrent token refresh requests via leader election #434
base: main
Are you sure you want to change the base?
fix(UserManager): handle concurrent token refresh requests via leader election #434
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #434 +/- ##
==========================================
+ Coverage 75.25% 75.40% +0.15%
==========================================
Files 44 44
Lines 1572 1590 +18
Branches 288 292 +4
==========================================
+ Hits 1183 1199 +16
- Misses 363 364 +1
- Partials 26 27 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Great work , thanks for the fast response! Do we really need the
The leader election feature could be taken, simplified and converted into TypeScript from the The main idea behind this is to have as few external dependencies as really needed.... |
I suppose it should be no problem to use the native broadcast-channel. I'll have a look at this. |
@pamapa Just wondering, as this would ease implementation a lot: Is the browser support for the https://developer.mozilla.org/en-US/docs/Web/API/Web_Locks_API |
Good question. We only care about modern browsers (primary chrome, firefox, edge, safari + mobile versions, not IE) anyway. In that list i only see Safari to be a problem for us now. Maybe we could first detect if that API exists and then use it, if not found we keep current way, which was the default for many years in the predecessor library... @kherock What is your opinion here? |
That sounds like a sane way to me to implement it! |
Web locks seem like a more appropriate API to use since we don't have any information to publish between tabs. Also, I have concerns over using the Are you replacing the webstorage backend with // extend with a method for obtaining a lock and running a task
interface StateStore {
async runTask(key: string, task: () => Promise<void>);
}
class WebStorageStateStore implements StateStore {
/** whether tasks run against this store require synchronization from other instances of oidc-client */
private _sync: boolean;
public constructor({ prefix = "oidc.", store = localStorage, sync } = {}) {
this._store = store;
this._prefix = prefix;
this._sync = sync ?? store === localStorage
}
public async runTask(key: string, task: () => void | Promise<void>) {
if (!this._sync) return await task();
await navigator.locks.request(this._prefix + key, async lock => {
// The lock has been acquired.
await task();
// Now the lock will be released.
});
}
} This locking utility probably doesn't need to be in |
I am using localStorage, yes. But Wouldn't sessionStorage suffer from the same issue, two tabs still access the same sessionStorage value with the same key. Only memory storage wouldn't suffer from this issue. I agree that we could limit this to the refresh token function and locking onto the refresh token. I assume we wouldn't need a BroadcastChannel in this case, as after acquiring the lock, we could potentially just verify if the access token was just refreshed recently (e.g. in the last X seconds). |
Page sessions are unique per tab! From MDN.
In my case, I'm writing an enterprise application where sessions don't persist beyond the lifespan of a tab. So for me separate tabs for the same application shouldn't have to ever interact or conflict with each other since they never shared any state to begin with. But it does seem like there is an exception where a browser may duplicate a tab and its associated session storage:
So it does make sense that we need to provide some synchronization mechanism independent of the storage backend. |
Uh, so this becomes kinda tricky. Basically when you duplicate a tab using session storage while using refresh tokens, they become permanently linked to each other. I guess the following method could be implemented (given that
Does this sound right to you? |
Well, I went ahead and implemented this to solve it for localStorage, which was relatively easy. I still have to think how exactly I want to handle that for sessionStorage. Any input is welcome! |
Thanks for working on this, looks much simpler now with locking. For easier review could you please squash your changes into a single commit? |
@DASPRiD thanks for raising this issue as I also faced it a few months ago and was yet to find an ideal solution. In my case, I'm using a simple web app which stores its tokens in memory (no local/session storage of any sort). Opening this same web app in two different tabs, results in the following: Using [email protected]Revoke Refresh Token: ON (ie. activating refresh token rotation)
Using Auth0Refresh Token Rotation: ON
No error whatsoever, both tabs can renew their tokens. ConclusionI'm not familiar enough with the OIDC spec so I don't know if this could relates to a bug in one or another. |
@Badisi thanks for sharing this, I was also seeing a similar error with my Keycloak application and decided to simply disable refresh tokens, but your explanation makes sense. Rather than locking on a refresh token, what happens if we lock onto |
That would work because The only common data I'm thinking of is the |
The more I think about it, the more I realize it should be an issue on Keycloak side. Auth0 reasoning (see Automatic-Reuse-Detection) seems way more logical. Whereas Keycloak is invalidating the RT family if any previously delivered RT (before the rotation) is being reused. |
Yeah, this seems to be a Keycloak issue. The way their implementation works, you could never have two independent clients authenticated at the same time, because they invalidate each others tokens. The way auth0 works makes a lot more sense. |
4ac6086
to
b9d5c01
Compare
I just squashed it all down to a single commit. Regarding linked sessionStorage sessions, I think we should consider that a different issue. |
b9d5c01
to
14436d1
Compare
For some reason, I cannot answer to your review comments: Yes, the check for navigator.locks covers testing for the existence of the BroadcastChannel feature, as the latter is supported much longer already. |
… election Introduces leader election for concurrent token refresh requests. Gracefully falls back when Web Lock API is not available. Closes authts#430
14436d1
to
fb75619
Compare
@Badisi Thanks for helping reviewing. Valid concerns, which must be answered/resolved before we merge... |
@pamapa I thought I answered all his questions :) |
I can not see the answers to #434 (comment) + #434 (comment) |
Oh, stupid me, I had a review running instead of commenting directly, so my comments were pending :D The second one is not really a question to me, is it? Technically @kherock already said that we should lock on the refresh token, if you guys want a slightly different naming though, we can change that. |
if (!user) { | ||
user = await refreshUser(); | ||
broadcastChannel.postMessage(user); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way that this code handles competing requests seems leaky. It looks like when two tabs both ask for a lock on the refresh token state, they can still both send refresh tokens sequentially. It seems to make an assumption that sending a message informing the other tab of the new user via the broadcast channel is a synchronous operation, but if this were the case, no web locks would be necessary at all. I doubt that there will ever be a case where releasing and re-obtaining the lock takes longer than sending a BroadcastChannel message, but I think the possibility is there.
Instead, I think it would be better to use the Web Locks API to perform leader election:
const topic = `refresh_token_${state.refresh_token}`;
const broadcastChannel = new BroadcastChannel(topic);
const bcUserPromise = new Promise(resolve => broadcastChannel.addEventListener(ev => resolve(ev.data)));
const user = await navigator.locks.request(topic, { ifAvailable: false }, async (lock) => {
// a null `lock` implies that this tab lost in leader election
if (lock) {
const user = await refreshUser();
await this.storeUser(user);
broadcastChannel.postMessage(user);
return user;
} else {
return await bcUserPromise;
}
});
broadcastChannel.close();
this._events.load(user);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this implementation best.
@kherock, just a few questions:
- Why is
storeUser(user)
only performed by the leader ? (and not by everyone at the end, like_events.load
) - Is a timeout necessary in case
postMessage
never occurred ? (ex:refreshUser()
taking too long and in the meantime user closes the leader tab) - Still wondering about the proper lock key to use... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about the scenario of the tab being interrupted, but more realistically, my code doesn't handle when the leader encounters an error either.
storeUser
writes the user to localStorage
/sessionStorage
which is presumably shared by both tabs in this scenario, so it only needs to be serialized and stored once. The user actually doesn't even need to be sent over the channel at all, the other tab could just as well do return await this._loadUser()
after receiving the notification from the leader (or by waiting for the lock to release, though this doesn't account for the lock releasing because of a failure or interruption).
I think think finding a "proper" lock key to use in a separate issue. My hunch is that we should lock onto session_state
when the IDP implements OIDC sessions (e.g. Keycloak) and fall back to the refresh token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Agreed that some error handling needs to be made.
-
which is presumably shared by both tabs in this scenario
If I'm not mistaken,
sessionStorage
is not shared between tabs - so it won't work in that case. -
I've seen
session_state
being returned by Keycloak (but not Auth0), so yes we could use it if we have it or fall back otherwise. But I'm still not confident in usingrefresh token
as the fall back. Each tab receives a different refresh token, which makes the purpose of the lock completely useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In instances where sessionStorage isn't shared, how would two tabs obtain the same refresh token in the first place? Also, see this comment: #434 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DASPRiD, I'm using neither LocalStorage nor SessionStorage, I'm keeping everything in memory. But as I see it, this concurrency issue exists no matter the type of storage, so I would opt for a way to solve it once and for all (and not only for LocalStorage). Why do you think your proposal does only cover LocalStorage ? I was convinced that simply using a common lock key would be enough to solved it for SessionStorage/MemoryStorage too. Don't you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, this issue does not affect memory storage, as memory storage is always bound to it's own session without sharing it with any other process.
The problem with SessionStorage is the part about session storage duplication when a tab is duplicated. In that instance, they will share the same session (from the perspective of the IDP), but not from their own perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kherock Actually, I do see an issue with using client ID + sub as the lock key: It would block unrelated tabs with different sessions but same client ID + sub from performing their refresh, that's why I was using the refresh token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then we are talking about two different concurrency issues.
- The one you are facing, ie. when tokens are physically shared between tabs (over a storage). Tabs should wait for the first one to refresh it for the others.
- And the one I'm facing (only seen with Keycloak and might be a bug but didn't get any answers from them yet), where the first tab to do a refresh will cancel any previous generated tokens for that same client_id/session (ie. the ones generated from the other tabs). So here, tabs should also wait for the first one to refresh the tokens and send them to the others. This feels hacky and weird (because they all started with different tokens and ends with identical ones) but at least your PR could solve it, as long as the lock key is identical for each tabs 🙂.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well yeah, the issue is absolutely on their side. Solving it with a shared lock key like that is a hacky way to solve your issue though. Especially since it would only really solve it as long as your are using local storage. If you are using anything else (based on Kheros asked change of not using the BroadcastChannel), this would not help with your issue.
It would, as mentioned above, also introduce the issue that it would block possibly unrelated tabs from performing their refresh. I really only see one way here to address your issue without affecting all other users:
- You would have to opt into local storage (since Keycloak forces a shared session on you anyway).
- Add an option to the Usermanager so you can opt into using client ID + sub as the lock key, although the default would be the refresh token.
@kherock Actually, instead of using the refresh token, we could also add a randomly generated ID to the store which is used as common identifier for the lock, but when session_state
is available, store that as the common identifier. That way it would address @Badisi issue, and make us not rely on the value of the refresh token.
Although to ultimately address their issue, we'd need to use the BroadcastChannel. Thinking about that, we could add a delay of maybe 250ms or so into the leader function. That would address any possible common race condition. We probably might not even need such a long delay.
@kherock Can you have a look at the inline conversation? :) |
|
Thanks for the summary. I have to add a little bit to that, so let me see if I can summarize that in a better way, at least from my point of view :)
|
One last thing though; I had a small epiphany yesterday when going to bed: We could actually solve all problems by moving the leader election up to a higher level, so that for each session (determined by the session state or the randomly generated state key), only one tab is at all responsible for handling any refresh logic, that includes watching for the token to expire. All non-leaders would simply be informed through the BroadcastChannel. This would solve the session duplication issue, as well as the problem with key cloak of having all clients share the same session, independent of their storage mechanism. It would also solve the race condition problem, as a leader would only die when a tab is actually closed, not every time a refresh happened. There'd still be a tiny chance that the user closes a tab exactly in the moment when the leader tries to perform a refresh, but a small delay at the start of the leader election would solve that. Any thoughts about that? |
Hi, sorry for the long delay. I do like the concept of moving leader election up to a higher level. I didn't see a good explanation of how a new leader is elected when the current one dies (I don't think we can rely on the leader's BroadcastChannel message making it out as it's dying). What if, upon starting the Upon being granted the lock, the instance checks that the lock's topic still matches the current the session state or refresh token; if the value doesn't match, it means that the current leader has already changed topics after storing the state. In this case, the instance will start over and attempt to obtain a new lock on the updated value. Once the leader has confirmed the lock, it sets up the expiration event listeners and unregisters its BroadcastChannel listeners. In order for the lock to stay active, it should return a Promise that only resolves when the session state or refresh token changes. |
With slight modifications, that sounds like a good idea to me. What's kinda annoying is that we still have to treat the case where I'm a little busy with another project at the moment though, but I'll come back to this ASAP. I might create a new PR though, as the diff will be wildly different I assume. |
Hello @DASPRiD, When do you think you'll be able to merge this PR and release a new package ? Thanks for the job 👍 |
I am pretty interested in this feature. I was wondering how we could progress this PR? Is there anyway I could help with this?
I have been trying to use the Weblock API inside an IFRAME and appears to work. Only not 100% sure if that's due some configured CORS policy or not. In which case would Lock API not available? Appears to be well supported by the modern browsers. It's not working when there is not a secure context but for an authentication library I can't imagine a case were we wouldn't have this. |
Hi. When will this commits pulled to the main branch or new release? |
@adamtang79 this merge request is not ready and needs a rework, currently it looks stalled. |
@pamapa Sorry, I moved on a while ago to another library and don't really have to time to look further into this. |
In what manner does it need to be reworked? @DASPRiD At first sight that library doesn't appear to have a similar leader election functionality as suggested through this PR? |
Introduces leader election for concurrent token refresh requests. This adds a new dependency (broadcast-channel) which takes care of the election process.
Closes #430