filters/auth: use sync.Map for tokeninfo cache #3267

AlexanderYastrebov · 2024-10-09T10:44:31Z

use sync.Map for tokeninfo cache to avoid synchronizing all callers on a single mutex
evict stale entries periodically instead of least recently used
store token expiration time instead of creation time

                                                  │      master      │                HEAD                 │
                                                  │      sec/op      │   sec/op     vs base                │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8               275.5n ±  6%   170.1n ± 4%  -38.26% (p=0.000 n=10)
TokeninfoCache/tokens=2,cacheSize=2,p=0-8               492.9n ± 21%   176.8n ± 2%  -64.12% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=100,p=0-8           455.9n ±  7%   165.5n ± 1%  -63.70% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8       593.4n ±  4%   179.8n ± 4%  -69.71% (p=0.000 n=10)
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           2571424.0n ±  0%   149.7n ± 3%  -99.99% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        2579227.5n ±  0%   139.3n ± 1%  -99.99% (p=0.000 n=10)
geomean                                                 7.903µ         162.9n       -97.94%

                                                  │   master   │                  HEAD                   │
                                                  │    B/op    │    B/op      vs base                    │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8           344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=2,cacheSize=2,p=0-8           344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=0-8       344.0 ± 0%    344.0 ± 0%          ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8   368.0 ± 1%    350.0 ± 0%     -4.89% (p=0.000 n=10)
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           27.00 ± 0%   344.00 ± 0%  +1174.07% (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        27.00 ± 7%   344.00 ± 0%  +1174.07% (p=0.000 n=10)
geomean                                             149.0         345.0        +131.62%
¹ all samples are equal

                                                  │    master    │              HEAD              │
                                                  │  allocs/op   │ allocs/op   vs base            │
TokeninfoCache/tokens=1,cacheSize=1,p=0-8           3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=2,cacheSize=2,p=0-8           3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=0-8       3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=100,cacheSize=100,p=10000-8   3.000 ± 0%     3.000 ± 0%  ~ (p=1.000 n=10) ¹
TokeninfoCache/tokens=4,cacheSize=2,p=0-8           0.000 ± 0%     3.000 ± 0%  ? (p=0.000 n=10)
TokeninfoCache/tokens=100,cacheSize=10,p=0-8        0.000 ± 0%     3.000 ± 0%  ? (p=0.000 n=10)
geomean                                                        ²   3.000       ?
¹ all samples are equal
² summaries must be >0 to compute geomean

RomanZavodskikh · 2024-10-09T11:10:42Z

filters/auth/tokeninfocache.go

-		info:      info,
-		href:      c.history.PushFront(token),
+func (c *tokeninfoCache) evictLoop() {
+	ticker := time.NewTicker(time.Minute)


Does it make sense this time.Minute configurable at the next PR?

I am not sure which other value would be good.

RomanZavodskikh · 2024-10-09T11:20:51Z

filters/auth/tokeninfocache.go

-		mu    sync.Mutex
-		cache map[string]*entry
-		// least recently used token at the end
-		history *list.List


I see you would like to keep tokeninfo filter lock-free, that is why you are thinking about not keeping history any more.

However, there is a data structure which acts like a list and has lock-free implementations (at least in theory). https://en.wikipedia.org/wiki/Skip_list

Do you think it makes sense to try to obtain this data structure to have lock-free history access and not evict random items?

It was used to evict oldest entries when cache grows over the limit.
This change simplifies this by simply removing random items if number of cached items is over the size limit.
It also adds a metric for monitoring.

In production setup one should have cache size set large enough such that entries are never evicted due to overflow. This way there is no need to complicate eviction algorithm, keep access history or use clever datastructures.

MustafaSaber · 2024-10-09T14:40:06Z

filters/auth/tokeninfocache.go

+
+	// Evict random entries until the cache size is within limits
+	if c.count.Load() > int64(c.size) {
+		c.cache.Range(func(key, value any) bool {


just to confirm, the random entries here comes from the random sort of the map, right?

Yes, the iteration order is no defined for maps and hence for sync.Map as it uses map internally.

filters/auth/tokeninfocache.go

filters/auth/authclient.go

RomanZavodskikh · 2024-10-09T16:32:12Z

filters/auth/tokeninfocache_test.go

@@ -111,7 +118,7 @@ func TestTokeninfoCache(t *testing.T) {

 	assert.Equal(t, int32(1), authRequests, "expected no request to auth sever")
 	assert.Equal(t, token, info["uid"])
-	assert.Equal(t, float64(595), info["expires_in"], "expected TokenTTLSeconds - truncate(delay)")


It is still quite not obvious why 595 changed to 594?

Is it related to this part of code?
https://github.com/zalando/skipper/pull/3267/files#diff-a2721f4aa66d24557c036d686d4e3344636871b2de93fb75266b8da83d511891L79

Yes, previous version adjusted cached expires_in based on the time elapsed since entry was cached.
Since the number must be integer it truncated the elapsed time:

new_expires_in = expires_in - truncateToSeconds(now-cachedAt)

this version stores expiration date instead of cached date and calculates expires_in from it:

new_expires_in = truncateToSeconds(infoExpiresAt-now)

This test moves clock by 5.7 seconds so previous version resulted in:

new_expires_in = expires_in - truncateToSeconds(now-cachedAt) = 600 - truncateToSeconds(cachedAt+5.7-cachedAt) = truncateToSeconds(5.7) = 595

and this version results in:

new_expires_in = truncateToSeconds(infoExpiresAt-now) = truncateToSeconds(cachedAt+600-(cachedAt+5.7)) = truncateToSeconds(594.3) = 594

* use sync.Map for tokeninfo cache to avoid synchronizing all callers on a single mutex * evict stale entries periodically instead of least recently used * store token expiration time instead of creation time ``` │ master │ HEAD │ │ sec/op │ sec/op vs base │ TokeninfoCache/tokens=1,cacheSize=1,p=0-8 275.5n ± 6% 170.1n ± 4% -38.26% (p=0.000 n=10) TokeninfoCache/tokens=2,cacheSize=2,p=0-8 492.9n ± 21% 176.8n ± 2% -64.12% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=100,p=0-8 455.9n ± 7% 165.5n ± 1% -63.70% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=100,p=10000-8 593.4n ± 4% 179.8n ± 4% -69.71% (p=0.000 n=10) TokeninfoCache/tokens=4,cacheSize=2,p=0-8 2571424.0n ± 0% 149.7n ± 3% -99.99% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=10,p=0-8 2579227.5n ± 0% 139.3n ± 1% -99.99% (p=0.000 n=10) geomean 7.903µ 162.9n -97.94% │ master │ HEAD │ │ B/op │ B/op vs base │ TokeninfoCache/tokens=1,cacheSize=1,p=0-8 344.0 ± 0% 344.0 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=2,cacheSize=2,p=0-8 344.0 ± 0% 344.0 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=0-8 344.0 ± 0% 344.0 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=10000-8 368.0 ± 1% 350.0 ± 0% -4.89% (p=0.000 n=10) TokeninfoCache/tokens=4,cacheSize=2,p=0-8 27.00 ± 0% 344.00 ± 0% +1174.07% (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=10,p=0-8 27.00 ± 7% 344.00 ± 0% +1174.07% (p=0.000 n=10) geomean 149.0 345.0 +131.62% ¹ all samples are equal │ master │ HEAD │ │ allocs/op │ allocs/op vs base │ TokeninfoCache/tokens=1,cacheSize=1,p=0-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=2,cacheSize=2,p=0-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=0-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=100,cacheSize=100,p=10000-8 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TokeninfoCache/tokens=4,cacheSize=2,p=0-8 0.000 ± 0% 3.000 ± 0% ? (p=0.000 n=10) TokeninfoCache/tokens=100,cacheSize=10,p=0-8 0.000 ± 0% 3.000 ± 0% ? (p=0.000 n=10) geomean ² 3.000 ? ¹ all samples are equal ² summaries must be >0 to compute geomean ``` Signed-off-by: Alexander Yastrebov <[email protected]>

AlexanderYastrebov added the major moderate risk, for example new API, small filter changes that have no risk like refactoring or logs label Oct 9, 2024

AlexanderYastrebov force-pushed the filters/auth/tokeninfocache-syncmap branch from 67cd561 to 813bf2a Compare October 9, 2024 11:01

RomanZavodskikh reviewed Oct 9, 2024

View reviewed changes

MustafaSaber reviewed Oct 9, 2024

View reviewed changes

szuecs reviewed Oct 9, 2024

View reviewed changes

filters/auth/tokeninfocache.go Outdated Show resolved Hide resolved

szuecs reviewed Oct 9, 2024

View reviewed changes

filters/auth/authclient.go Show resolved Hide resolved

RomanZavodskikh reviewed Oct 9, 2024

View reviewed changes

AlexanderYastrebov force-pushed the filters/auth/tokeninfocache-syncmap branch from 813bf2a to 818dd4d Compare October 10, 2024 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filters/auth: use sync.Map for tokeninfo cache #3267

filters/auth: use sync.Map for tokeninfo cache #3267

AlexanderYastrebov commented Oct 9, 2024

RomanZavodskikh Oct 9, 2024

AlexanderYastrebov Oct 9, 2024

RomanZavodskikh Oct 9, 2024

AlexanderYastrebov Oct 9, 2024 •

edited

Loading

MustafaSaber Oct 9, 2024

AlexanderYastrebov Oct 10, 2024

RomanZavodskikh Oct 9, 2024

AlexanderYastrebov Oct 10, 2024

filters/auth: use sync.Map for tokeninfo cache #3267

Are you sure you want to change the base?

filters/auth: use sync.Map for tokeninfo cache #3267

Conversation

AlexanderYastrebov commented Oct 9, 2024

RomanZavodskikh Oct 9, 2024

Choose a reason for hiding this comment

AlexanderYastrebov Oct 9, 2024

Choose a reason for hiding this comment

RomanZavodskikh Oct 9, 2024

Choose a reason for hiding this comment

AlexanderYastrebov Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

MustafaSaber Oct 9, 2024

Choose a reason for hiding this comment

AlexanderYastrebov Oct 10, 2024

Choose a reason for hiding this comment

RomanZavodskikh Oct 9, 2024

Choose a reason for hiding this comment

AlexanderYastrebov Oct 10, 2024

Choose a reason for hiding this comment

AlexanderYastrebov Oct 9, 2024 •

edited

Loading