Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetAllCharts takes too much time when there is a considerable amount of charts in backend storage #40

Open
xiongkun01 opened this issue Mar 12, 2020 · 5 comments · Fixed by #41

Comments

@xiongkun01
Copy link

xiongkun01 commented Mar 12, 2020

The time analysis of the /api/:repo/charts api as follows:

2020-03-11T19:18:55.184+0800 DEBUG [1] Incoming request: /api/datacollect/charts {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:06.957+0800 DEBUG [1] index-cache.yaml loaded {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:06.957+0800 DEBUG [1] Fetching chart list from storage {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.519+0800 DEBUG [1] start get object slice {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.531+0800 DEBUG [1] objects length {"o1": 57376, "o2": 57370, "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}
2020-03-11T19:19:14.531+0800 DEBUG [1] start get object slice diff {"reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}

2020-03-11T19:19:58.470+0800 DEBUG [1] Change detected between cache and storage {"repo": "datacollect", "reqID": "a543f2fe-53b5-469a-b90d-cc80a7f36657"}

Note: the bold parts are added by myself for easy analysis.

When the back-end storage (BOS) has 57376 charts, after time-consuming analysis, it takes 8 seconds to get all the files from the back-end. It takes 44s to calculate the difference between cache and back-end data by cm_storage.GetObjectSliceDiff(objects, fo.objects).

image

@jdolitsky
Copy link
Collaborator

Wow, that is alot of charts :)

What do you suggest? Perhaps the solution is on chartmuseum side, option to only refresh the index every X minutes for example? Or maybe there are some improvements we can make on Baidu backend

@xiongkun01
Copy link
Author

Firstly,the time complexity of this function is O(N^2), which can be optimized to O(N) by more efficient data structures for looking up. Secondly, is it possible to provide some delete policy to reduce the number of charts?

@jdolitsky
Copy link
Collaborator

@xiongkun01 - can you comment on efficiency improvements?

In terms of delete policy, please see helm/chartmuseum#316

@Retenodus
Copy link
Contributor

Hi @jdolitsky , the PR above should fix it. Using maps, I brought the O(N^2) complexity to O(N). Is there anything I can do to make it merged ?

@jdolitsky jdolitsky reopened this May 5, 2020
@jdolitsky
Copy link
Collaborator

Merged, thank you @Retenodus !

@xiongkun01 - would you be able to verify if master branch speeds thing up for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants