-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gosrc2cpg] Memory cleanup #4407
Conversation
Queued processing of one dependency at one time post parallel proessing of download. 1. Delinked downloading and processing of the dependency. 2. In one thread we are downloading the dependencies and queuing them to be processed in separate writer thread.
protected val declaredPackageName = parserResult.json(ParserKeys.Name)(ParserKeys.Name).str | ||
protected val fullyQualifiedPackage = | ||
goMod.getNameSpace(parserResult.fullPath, declaredPackageName) | ||
protected var aliasToNameSpaceMapping: mutable.Map[String, String] = mutable.Map.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 49-56 is going to become technical debt and hard to maintain. Can we try to find a solution that avoids so much global mutable state in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code below, perhaps you want to look at what happens in AstSummaryVisitor
in RubySrc2Cpg
and CSharpSrc2Cpg
for how to do a pre-parse without all this mutable state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, we need to move these variables to another trait, as it's becoming very difficult to follow this file.
*/ | ||
private def deserialise(bytes: Array[Byte]): Any = { | ||
val ois = new ObjectInputStream(new ByteArrayInputStream(bytes)) | ||
val value = ois.readObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readObject
is an RCE waiting to happen, it is an extremely dangerous function.
We have a whole talk at BSides about exploiting Java deserialization https://www.youtube.com/watch?v=8qo0ZGK0tt0
https://msgpack.org is on the classpath from ODB, rather serialize data using that format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess serialising the alias to namespace mapping is anyway shouldn't make much of the difference. I need to try removing this.
}) | ||
} | ||
|
||
def cleanup(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather suggest using close
and having AstCreator
extend AutoCloseable
, this can enforce Using.resource(astCreator) =>
which guarantees this global mutable state is at least closed otherwise we may sit with an implicit protocol which may not be followed and result in a memory leak.
This is no more valid |
As we are processing the main source code in two phases. We create the initial list of
AstCreator
first pass and we hold the data like ParserResult, Line no to line code mapping, parserNodeInfo cache etc for each file. Till the time we don't run through the second phase.parserNodeInfo
cache as upiclke map is not serializable. Rebuild the released information back in the second phase when it starts processing.Future
threads (which are created per dependency) were getting blocked waiting for the results. In turn lesser number ofFuture
threads being used or available while processing the dependency files. Made changes and removed the Futures while downloading the dependencies, anyway we had to synchronise the section which executes the command. Processing of individual dependency code is queued in the queue which will be picked up by another writer thread. This will make sure processing is triggered in parallel, however it will also make sure only one dependency code is being processed at time.