Skip to content

Parallel processing #17

Open
poma opened this issue Sep 18, 2017 · 5 comments · May be fixed by #56
Open

Parallel processing #17

poma opened this issue Sep 18, 2017 · 5 comments · May be fixed by #56
Assignees

Comments

@poma
Copy link
Member

poma commented Sep 18, 2017

Uploader performance can be greatly increased by introducing the following optimizations:

  • Parse replays on a bunch of separate threads to saturate all CPU cores
  • Upload replays on 2-4 separate threads
  • Check for duplicates using bulk API POST /fingerprints when the pool of checked replays is exhausted by upload threads
@poma
Copy link
Member Author

poma commented Nov 23, 2017

using bulk fingerprint check increases complexity a lot (at least I don't know how to do it in easy way) so for now I only implemented naive multi threading with a single queue.

@poma
Copy link
Member Author

poma commented Nov 23, 2017

Looks like multi threaded uploader can hit api throttle limits if there is a long list of replays consisting almost entirely of duplicates (for example when someone lost/deleted their replay cache). Implementing bulk check can fix that so I guess I still need to make it. To keep things simple I can put new replays found on launch in a Dataflow while still processing new ones with a standard loop.

@martijnhoekstra martijnhoekstra linked a pull request Dec 4, 2019 that will close this issue
@martijnhoekstra
Copy link
Collaborator

heroesprofile mentions they prefer not changing the client at all in https://discordapp.com/channels/650747275886198815/651068646025592832
I'm not sure how to link to a specific message on discord.

@Zemill
Copy link

Zemill commented Dec 4, 2019

This is where I am coming from.

I would like the replay uploader to upload from Oldest to Newest, and I would like it to do so sequentially.

I am not sure what reason we have to do otherwise other than to speed up the uploads? The standard user uploads after every game, so a non-issue. Even if they have a few games, it really isn't an issue. So we are making an update to resolve an inconvenience for the few that have never uploaded.

I am not opposed to making updates, I am opposed to making updates that makes the data harder to use for the developers.

@martijnhoekstra
Copy link
Collaborator

I would like the replay uploader to upload from Oldest to Newest, and I would like it to do so sequentially.

This is going to be best-effort at best. You can't know what other users have uploaded at the time of upload, and there will be uploads that are older than the newest upload. Disallowing that is infeasible.

On the local machine, it's also best effort, depending on where the files are.

Regardless, the meat of the PR is parallelizing the fingerprint checking and doing it in bulk. I'm happy to do uploads fully sequential in order of replay time. That does require first parsing all available replays to even find the replay time though, which would also be nice to do in parallel.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants