Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize deduplication while processing held tasks #2203

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

rahul-privado
Copy link
Contributor

@rahul-privado rahul-privado commented Jan 20, 2023

Description

The list given to deduplicateTableEntries() becomes 30,000 long for snap ( 375 MB CPG ) in many cases and the function invocation then takes 40 milliseconds. 80% of the time is spend in .groupBy() and rest in the .map(). Most of the computation here is repeated since most of the elements are from the old list.

Cached the repeated part in 2 maps to avoid repetition.

Testing done

Verified for 6 different CPGs that the flow counts match with master.
While the flow counts including source and sinks are the same, some of the intermediate calls in the flows are different. This change results in some of the detours not being taken. However, the path is still the same from source to sink.

The difference is due to the union between the earlier stored and he new group sets when there are common keys. The merged group has same key and value hash code as the old group. However, it is not equal to what it would have been had it been fully recomputed.

Performance impact

The held task computation time for OpenCRX comes down from 35 seconds to 8 seconds with this change.

rahul-privado and others added 30 commits January 18, 2023 17:52
2. Removed caching for merge list since the new table entries would need a recomputation
2. Used parallelization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant