-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetAllCharts takes too much time when there is a considerable amount of charts in backend storage #40
Comments
Wow, that is alot of charts :) What do you suggest? Perhaps the solution is on chartmuseum side, option to only refresh the index every X minutes for example? Or maybe there are some improvements we can make on Baidu backend |
Firstly,the time complexity of this function is O(N^2), which can be optimized to O(N) by more efficient data structures for looking up. Secondly, is it possible to provide some delete policy to reduce the number of charts? |
@xiongkun01 - can you comment on efficiency improvements? In terms of delete policy, please see helm/chartmuseum#316 |
Hi @jdolitsky , the PR above should fix it. Using maps, I brought the O(N^2) complexity to O(N). Is there anything I can do to make it merged ? |
Merged, thank you @Retenodus ! @xiongkun01 - would you be able to verify if master branch speeds thing up for you? |
The time analysis of the /api/:repo/charts api as follows:
2020-03-11T19:18:55.184+0800 DEBUG [1] Incoming request: /api/datacollect/charts {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:06.957+0800 DEBUG [1] index-cache.yaml loaded {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:06.957+0800 DEBUG [1] Fetching chart list from storage {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.519+0800 DEBUG [1] start get object slice {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.531+0800 DEBUG [1] objects length {"o1": 57376, "o2": 57370, "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.531+0800 DEBUG [1] start get object slice diff {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:58.470+0800 DEBUG [1] Change detected between cache and storage {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
Note: the bold parts are added by myself for easy analysis.
When the back-end storage (BOS) has 57376 charts, after time-consuming analysis, it takes 8 seconds to get all the files from the back-end. It takes 44s to calculate the difference between cache and back-end data by cm_storage.GetObjectSliceDiff(objects, fo.objects).
The text was updated successfully, but these errors were encountered: