This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

Use `IsBlacklistedOrBackingOff` to determine if we should try to fetch devices #3254

Merged

S7evinK merged 11 commits into main from s7evink/device-list-updater-again

Nov 9, 2023

Contributor

S7evinK commented Nov 1, 2023 •

edited

Loading

Use IsBlacklistedOrBackingOff from the federation API to check if we should fetch devices.

To reduce back pressure, we now only queue retrying servers if there's space in the channel.


          Reduce timeout to 1s

0b3a264

S7evinK requested a review from a team as a code owner

November 1, 2023 09:39

S7evinK added 2 commits

November 1, 2023 12:01


          Merge branch 'main' of github.com:matrix-org/dendrite into s7evink/de…

116b0b9

…vice-list-updater-again


          This probably makes more sense

be0c699

codecov bot commented Nov 1, 2023 •

edited

Loading

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (ee73a90) 65.51% compared to head (fa3e0b5) 65.51%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3254      +/-   ##
==========================================
- Coverage   65.51%   65.51%   -0.01%     
==========================================
  Files         507      507              
  Lines       57217    57245      +28     
==========================================
+ Hits        37484    37502      +18     
- Misses      15892    15899       +7     
- Partials     3841     3844       +3

Flag	Coverage Δ
unittests	`49.50% <88.67%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
cmd/dendrite-demo-pinecone/monolith/monolith.go	`77.55% <100.00%> (ø)`
cmd/dendrite/main.go	`62.23% <ø> (-0.27%)`	⬇️
federationapi/federationapi.go	`77.14% <100.00%> (ø)`
federationapi/internal/api.go	`80.50% <100.00%> (ø)`
userapi/userapi.go	`72.44% <100.00%> (ø)`
userapi/internal/device_list_update.go	`74.52% <87.50%> (+1.11%)`	⬆️

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


          Use IsBlacklistedOrBackingOff from the fed API to check if we should

b82ac59

query the given server or not

S7evinK changed the title ~~Reduce DeviceListUpdater timeout to 1s~~ Use IsBlacklistedOrBackingOff to determine if we should try to fetch devices

S7evinK added 2 commits

November 1, 2023 15:46


          Make Sytest happy again

999463a


          Don't fully fill the buffer

9df2b9a

kegsay suggested changes

View reviewed changes

Member

kegsay left a comment

Mostly good, few things.

federationapi/routing/profile_test.go Show resolved Hide resolved

userapi/internal/device_list_update.go

+              	var federationClientError *fedsenderapi.FederationClientError
+              	if errors.As(err, &federationClientError) {
+              		if federationClientError.Blacklisted {
+              			return

Member

kegsay Nov 8, 2023

Needs unit tests.

userapi/internal/device_list_update.go

               	hash := fnv.New32a()
               	_, _ = hash.Write([]byte(remoteServer))
               	index := int(int64(hash.Sum32()) % int64(len(u.workerChans)))
               	ch := u.assignChannel(userID)
               	deviceListUpdaterBackpressure.With(prometheus.Labels{"worker_id": strconv.Itoa(index)}).Inc()
-              	defer deviceListUpdaterBackpressure.With(prometheus.Labels{"worker_id": strconv.Itoa(index)}).Dec()

Member

kegsay Nov 8, 2023

Why remove? Using a defer is much cleaner than what we have now.

Contributor Author

S7evinK Nov 8, 2023

Just because we sent the work to the worker doesn't mean the back pressure is gone, the worker might still be processing the server. So decrementing doesn't make sense here.
We're incrementing right before trying to send the work and the worker decrements it once it is done processing the server. (either when skipping or actually done)

userapi/internal/device_list_update.go Show resolved Hide resolved

userapi/internal/device_list_update.go Outdated

+              			// The channel is at capacity, don't try to send more work
+              			if len(ch) == cap(ch) {
+              				continue
+              			}
               			serversToRetry = serversToRetry[:0] // reuse memory

Member

kegsay Nov 8, 2023

This is super unclear that it's nuking the serversToRetry slice, and why. Can we add more comments here please?

userapi/internal/device_list_update.go

               		_, exists := retries[serverName]
               		retriesMu.Unlock()
               		if exists {
+              			deviceListUpdaterBackpressure.With(prometheus.Labels{"worker_id": strconv.Itoa(workerID)}).Dec()

Member

kegsay Nov 8, 2023

I am unconvinced that this is going to track counts correctly. It is incremented for every call to notifyWorkers(userID) but only decremented under very certain scenarios (when not full, etc).

Contributor Author

S7evinK Nov 8, 2023

See above.

S7evinK added 4 commits

November 8, 2023 14:17


          Merge branch 'main' of github.com:matrix-org/dendrite into s7evink/de…

95c99bc

…vice-list-updater-again


          Review comments, check blacklisted when processing a server

1408ccf


          Merge branch 'main' of github.com:matrix-org/dendrite into s7evink/de…

51770e2

…vice-list-updater-again


          Add UT for notifyWorks with unreachable server

fa3e0b5

kegsay approved these changes

View reviewed changes

userapi/internal/device_list_update_test.go Outdated

+              			case "localhost":
+              				delete(expectedServers, serverName)
+              				aliceCh <- true // unblock notifyWorkers
+              			case "notlocalhost": // this should not happen as it is "filtered" away by the blacklist

Member

kegsay Nov 8, 2023

Suggested change

      
            			case "notlocalhost": // this should not happen as it is "filtered" away by the blacklist
          
            			case unreachableServer: // this should not happen as it is "filtered" away by the blacklist

userapi/internal/device_list_update_test.go

+              		userIDToChan:   make(map[string]chan bool),
+              		userIDToMutex:  make(map[string]*sync.Mutex),
+              	}
+              	workerCh := make(chan spec.ServerName)

Member

kegsay Nov 8, 2023

defer close(workerCh)?


          Close channels

478d34b

S7evinK merged commit 7863a40 into main

15 of 18 checks passed

S7evinK deleted the s7evink/device-list-updater-again branch

November 9, 2023 07:43

matrixbot mentioned this pull request

Use IsBlacklistedOrBackingOff to determine if we should try to fetch devices element-hq/dendrite#3254

Closed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet