Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gosrc2cpg] download dependencies and caching improvements #4352

Merged
merged 28 commits into from
Mar 26, 2024

Conversation

pandurangpatil
Copy link
Contributor

@pandurangpatil pandurangpatil commented Mar 18, 2024

  1. Removed some unwanted brackets
  2. Parallelised downloading of the dependencies and processing them.
  3. Unit tests for the optimisation changes to improve the processing time as well as reduce the memory footprint.
  4. Handling for used dependencies
    - Handling for the downloading of dependencies only if those are getting used in the main code.
    - While doing optimisation, came across a bug where if more than one package with the same name was created in the code. Then it was creating package level TypeDecl and NamspaceBlock only once. Introduced a few unit tests which cover these use cases as well as made the respective handling for the same.
  5. Changes to not cache unwanted imports
    - Made changes to not cache unwanted imports from dependency source code.
    - Made changes to cache only used imports in source code with all the non-aliased imports to global cache and aliased ones in the context of file AstCreator.
    - Caching only those packages whose package name is different from the enclosing folder name inside the global cache.
  6. Changes to not cache non-used lambdaTypeInformation.
  7. Optimisations to store method meta data along with struct type metadata
    - Changed the storage structure to minimize the amount of data being stored for method meta data cache and struct type members type information.
    - Made respective changes to fix all the breaking unit tests.

@pandurangpatil pandurangpatil changed the title [WIP] Go download dependencies and caching improvements [WIP] [Gosrc2cpg] download dependencies and caching improvements Mar 18, 2024
@pandurangpatil pandurangpatil added the go Relates to gosrc2cpg label Mar 18, 2024
.filter(dep => !dep.beingUsed)
.map(dependency => {
Future {
val dependencyStr = s"${dependency.module}@${dependency.version}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose go get will handle this, but one issue that comes to mind is rate limiting of the dependency APIs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be good to have a version of this where the dependencies are downloaded directly from the APIs like C# and Ruby so that we don't require go as a dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose go get will handle this, but one issue that comes to mind is rate limiting of the dependency APIs

go get doesn't parallelise it, I did a test to check whether it improves the time it takes to process. I do see the improvements.

Copy link
Contributor Author

@pandurangpatil pandurangpatil Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be good to have a version of this where the dependencies are downloaded directly from the APIs like C# and Ruby so that we don't require go as a dependency

I think that will take time, I will handle that through a separate ticket for this one as well as ProgramSummary incorporation.

1. Handling for the downloading of dependencies only if those are
getting used in the main code.
2. While doing optimisation, came across a bug where if more than one
packages with the same name created in the code. Then it was creating
package level `TypeDecl` and `NamspaceBlock` only once. Introduced few
unit test which covers these use cases as well as made the respective
handling for the same.
1. Made changes to not cache unwanted imports from dependency source
code.
2. Made changes to cache only used imports in source code with all the
non aliased imports to global cache and aliased ones in the context of
file `AstCreator`.
3. Caching only those packages whose package name is different from
enclosing folder name inside global cache.
Optimisation to cache lamdbda type info
1. Changed the storage structure to minimize the amount of data being
stored for method meta data cache and struct type members type
information.
2. Made respective changes to fix all the breaking unit tests.
1. While making the optimisations, while processing imports if the main
source code package is being imported and processed. In some cases
TypeDecl for package level global variables wasn't getting created.
2. Identified the issue and made a fix for the same.
@pandurangpatil pandurangpatil changed the title [WIP] [Gosrc2cpg] download dependencies and caching improvements [Gosrc2cpg] download dependencies and caching improvements Mar 26, 2024
@pandurangpatil pandurangpatil marked this pull request as ready for review March 26, 2024 10:11
@DavidBakerEffendi DavidBakerEffendi merged commit 5106dd3 into joernio:master Mar 26, 2024
5 checks passed
@pandurangpatil pandurangpatil deleted the go-download-impr branch March 26, 2024 12:47
@pandurangpatil pandurangpatil restored the go-download-impr branch March 26, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Relates to gosrc2cpg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants