-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gosrc2cpg] download dependencies and caching improvements #4352
[Gosrc2cpg] download dependencies and caching improvements #4352
Conversation
1. Removed some unwanted brackets 2. Parallelised downloading of the dependencies and processing them.
1. Record which dependencies are getting used as well as which subpackages are getting used. 2. Only download those dependencies which are directly getting imported or used.
.filter(dep => !dep.beingUsed) | ||
.map(dependency => { | ||
Future { | ||
val dependencyStr = s"${dependency.module}@${dependency.version}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose go get
will handle this, but one issue that comes to mind is rate limiting of the dependency APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be good to have a version of this where the dependencies are downloaded directly from the APIs like C# and Ruby so that we don't require go
as a dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose go get will handle this, but one issue that comes to mind is rate limiting of the dependency APIs
go get
doesn't parallelise it, I did a test to check whether it improves the time it takes to process. I do see the improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be good to have a version of this where the dependencies are downloaded directly from the APIs like C# and Ruby so that we don't require go as a dependency
I think that will take time, I will handle that through a separate ticket for this one as well as ProgramSummary
incorporation.
joern-cli/frontends/gosrc2cpg/src/main/scala/io/joern/gosrc2cpg/GoSrc2Cpg.scala
Outdated
Show resolved
Hide resolved
1. Handling for the downloading of dependencies only if those are getting used in the main code. 2. While doing optimisation, came across a bug where if more than one packages with the same name created in the code. Then it was creating package level `TypeDecl` and `NamspaceBlock` only once. Introduced few unit test which covers these use cases as well as made the respective handling for the same.
1. Made changes to not cache unwanted imports from dependency source code. 2. Made changes to cache only used imports in source code with all the non aliased imports to global cache and aliased ones in the context of file `AstCreator`. 3. Caching only those packages whose package name is different from enclosing folder name inside global cache.
Optimisation to cache lamdbda type info
1. Changed the storage structure to minimize the amount of data being stored for method meta data cache and struct type members type information. 2. Made respective changes to fix all the breaking unit tests.
1. While making the optimisations, while processing imports if the main source code package is being imported and processed. In some cases TypeDecl for package level global variables wasn't getting created. 2. Identified the issue and made a fix for the same.
- Handling for the downloading of dependencies only if those are getting used in the main code.
- While doing optimisation, came across a bug where if more than one package with the same name was created in the code. Then it was creating package level
TypeDecl
andNamspaceBlock
only once. Introduced a few unit tests which cover these use cases as well as made the respective handling for the same.- Made changes to not cache unwanted imports from dependency source code.
- Made changes to cache only used imports in source code with all the non-aliased imports to global cache and aliased ones in the context of file
AstCreator
.- Caching only those packages whose package name is different from the enclosing folder name inside the global cache.
- Changed the storage structure to minimize the amount of data being stored for method meta data cache and struct type members type information.
- Made respective changes to fix all the breaking unit tests.