-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filters/auth: use sync.Map for tokeninfo cache #3267
base: master
Are you sure you want to change the base?
Conversation
AlexanderYastrebov
commented
Oct 9, 2024
- use sync.Map for tokeninfo cache to avoid synchronizing all callers on a single mutex
- evict stale entries periodically instead of least recently used
- store token expiration time instead of creation time
67cd561
to
813bf2a
Compare
info: info, | ||
href: c.history.PushFront(token), | ||
func (c *tokeninfoCache) evictLoop() { | ||
ticker := time.NewTicker(time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense this time.Minute
configurable at the next PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure which other value would be good.
mu sync.Mutex | ||
cache map[string]*entry | ||
// least recently used token at the end | ||
history *list.List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you would like to keep tokeninfo filter lock-free, that is why you are thinking about not keeping history any more.
However, there is a data structure which acts like a list and has lock-free implementations (at least in theory). https://en.wikipedia.org/wiki/Skip_list
Do you think it makes sense to try to obtain this data structure to have lock-free history access and not evict random items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used to evict oldest entries when cache grows over the limit.
This change simplifies this by simply removing random items if number of cached items is over the size limit.
It also adds a metric for monitoring.
In production setup one should have cache size set large enough such that entries are never evicted due to overflow. This way there is no need to complicate eviction algorithm, keep access history or use clever datastructures.
|
||
// Evict random entries until the cache size is within limits | ||
if c.count.Load() > int64(c.size) { | ||
c.cache.Range(func(key, value any) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, the random entries here comes from the random sort of the map, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the iteration order is no defined for maps and hence for sync.Map as it uses map internally.
@@ -111,7 +118,7 @@ func TestTokeninfoCache(t *testing.T) { | |||
|
|||
assert.Equal(t, int32(1), authRequests, "expected no request to auth sever") | |||
assert.Equal(t, token, info["uid"]) | |||
assert.Equal(t, float64(595), info["expires_in"], "expected TokenTTLSeconds - truncate(delay)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still quite not obvious why 595 changed to 594?
Is it related to this part of code?
https://github.com/zalando/skipper/pull/3267/files#diff-a2721f4aa66d24557c036d686d4e3344636871b2de93fb75266b8da83d511891L79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, previous version adjusted cached expires_in
based on the time elapsed since entry was cached.
Since the number must be integer it truncated the elapsed time:
new_expires_in = expires_in - truncateToSeconds(now-cachedAt)
this version stores expiration date instead of cached date and calculates expires_in from it:
new_expires_in = truncateToSeconds(infoExpiresAt-now)
This test moves clock by 5.7
seconds so previous version resulted in:
new_expires_in = expires_in - truncateToSeconds(now-cachedAt) = 600 - truncateToSeconds(cachedAt+5.7-cachedAt) = truncateToSeconds(5.7) = 595
and this version results in:
new_expires_in = truncateToSeconds(infoExpiresAt-now) = truncateToSeconds(cachedAt+600-(cachedAt+5.7)) = truncateToSeconds(594.3) = 594
* use sync.Map for tokeninfo cache to avoid synchronizing all callers on a single mutex * evict stale entries periodically instead of least recently used * store token expiration time instead of creation time ``` │ master │ HEAD │ │ sec/op │ sec/op vs base │ TokeninfoCache/tokens=1,cacheSize=1,p=0-8 275.5n ± 6% 170.1n ± 4% -38.26% (p=0.000 n=10) TokeninfoCache/tokens=2,cacheSize=2,p=0-8 492.9n ± 21% 176.8n ± 2% -64.12% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=100,p=0-8 455.9n ± 7% 165.5n ± 1% -63.70% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=100,p=10000-8 593.4n ± 4% 179.8n ± 4% -69.71% (p=0.000 n=10) TokeninfoCache/tokens=4,cacheSize=2,p=0-8 2571424.0n ± 0% 149.7n ± 3% -99.99% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=10,p=0-8 2579227.5n ± 0% 139.3n ± 1% -99.99% (p=0.000 n=10) geomean 7.903µ 162.9n -97.94% │ master │ HEAD │ │ B/op │ B/op vs base │ TokeninfoCache/tokens=1,cacheSize=1,p=0-8 344.0 ± 0% 344.0 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=2,cacheSize=2,p=0-8 344.0 ± 0% 344.0 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=0-8 344.0 ± 0% 344.0 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=10000-8 368.0 ± 1% 350.0 ± 0% -4.89% (p=0.000 n=10) TokeninfoCache/tokens=4,cacheSize=2,p=0-8 27.00 ± 0% 344.00 ± 0% +1174.07% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=10,p=0-8 27.00 ± 7% 344.00 ± 0% +1174.07% (p=0.000 n=10) geomean 149.0 345.0 +131.62% ¹ all samples are equal │ master │ HEAD │ │ allocs/op │ allocs/op vs base │ TokeninfoCache/tokens=1,cacheSize=1,p=0-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=2,cacheSize=2,p=0-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=0-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=10000-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=4,cacheSize=2,p=0-8 0.000 ± 0% 3.000 ± 0% ? (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=10,p=0-8 0.000 ± 0% 3.000 ± 0% ? (p=0.000 n=10) geomean ² 3.000 ? ¹ all samples are equal ² summaries must be >0 to compute geomean ``` Signed-off-by: Alexander Yastrebov <[email protected]>
813bf2a
to
818dd4d
Compare