Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gosrc2cpg] download dependencies and caching improvements #4352

Merged
merged 28 commits into from
Mar 26, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
46ef88b
Some code refactor and optimisations
pandurangpatil Mar 12, 2024
e611fcb
Partial changes to add failing unit tests.
pandurangpatil Mar 13, 2024
7f277c5
Merge branch 'master' into go-download-impr
pandurangpatil Mar 13, 2024
5401c17
Unit tests to cover expected situations to lower the memory footprint
pandurangpatil Mar 14, 2024
fee3296
Merge branch 'master' into go-download-impr
pandurangpatil Mar 14, 2024
bb104b4
minor changes
pandurangpatil Mar 15, 2024
f0d66c5
Merge branch 'master' into go-download-impr
pandurangpatil Mar 15, 2024
19e2c25
Merge branch 'master' into go-download-impr
pandurangpatil Mar 18, 2024
aef54ca
Initial download dependency optimisation
pandurangpatil Mar 18, 2024
bcc6a4a
Merge branch 'master' into go-download-impr
pandurangpatil Mar 18, 2024
d595227
handling for used dependencies
pandurangpatil Mar 18, 2024
3222da6
Merge branch 'master' into go-download-impr
pandurangpatil Mar 18, 2024
5cfe3fa
Fixing one more unit test from first
pandurangpatil Mar 18, 2024
ae4fa1d
changes to not cache unwanted imports
pandurangpatil Mar 18, 2024
cf455e3
Merge branch 'master' into go-download-impr
pandurangpatil Mar 18, 2024
cc7f1d6
few test corrections as per updated changes
pandurangpatil Mar 18, 2024
1ea95ea
minor updates
pandurangpatil Mar 18, 2024
206c06e
Merge branch 'master' into go-download-impr
pandurangpatil Mar 21, 2024
e0cfa7f
Optimisation to cache lamdbda type info
pandurangpatil Mar 21, 2024
4b47630
Merge branch 'master' into go-download-impr
pandurangpatil Mar 21, 2024
fdefb9d
optimisations to store method meta data along with strcut type metata
pandurangpatil Mar 23, 2024
89d843f
Merge branch 'master' into go-download-impr
pandurangpatil Mar 23, 2024
d942fd7
Merge branch 'master' into go-download-impr
pandurangpatil Mar 25, 2024
e2d11e3
not caching namespaces having starting letter in small case
pandurangpatil Mar 25, 2024
967b878
Fix for issue related to package TypeDecl
pandurangpatil Mar 25, 2024
1dfe680
Ignoring few unit tests which needs to be updated with improvements.
pandurangpatil Mar 26, 2024
0506a10
Merge branch 'master' into go-download-impr
pandurangpatil Mar 26, 2024
ebce63b
review comment fixes
pandurangpatil Mar 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,31 +22,36 @@ import io.shiftleft.codepropertygraph.generated.Languages
import java.nio.file.Paths
import scala.util.Try

class GoSrc2Cpg extends X2CpgFrontend[Config] {
class GoSrc2Cpg(goGlobalOption: Option[GoGlobal] = Some(GoGlobal())) extends X2CpgFrontend[Config] {
DavidBakerEffendi marked this conversation as resolved.
Show resolved Hide resolved
private val report: Report = new Report()

private var goMod: Option[GoModHelper] = None
def createCpg(config: Config): Try[Cpg] = {
withNewEmptyCpg(config.outputPath, config) { (cpg, config) =>
File.usingTemporaryDirectory("gosrc2cpgOut") { tmpDir =>
val goGlobal = GoGlobal()
val goGlobal = goGlobalOption.getOrElse(GoGlobal())
pandurangpatil marked this conversation as resolved.
Show resolved Hide resolved
new MetaDataPass(cpg, Languages.GOLANG, config.inputPath).createAndApply()
val astGenResult = new AstGenRunner(config).execute(tmpDir).asInstanceOf[GoAstGenRunnerResult]
val goMod = new GoModHelper(
Some(config),
astGenResult.parsedModFile.flatMap(modFile => GoAstJsonParser.readModFile(Paths.get(modFile)).map(x => x))
goMod = Some(
new GoModHelper(
Some(config),
astGenResult.parsedModFile.flatMap(modFile => GoAstJsonParser.readModFile(Paths.get(modFile)).map(x => x))
)
)
if (config.fetchDependencies) {
goGlobal.processingDependencies = true
new DownloadDependenciesPass(goMod, goGlobal, config).process()
new DownloadDependenciesPass(goMod.get, goGlobal, config).process()
goGlobal.processingDependencies = false
}
val astCreators =
new MethodAndTypeCacheBuilderPass(Some(cpg), astGenResult.parsedFiles, config, goMod, goGlobal).process()
new MethodAndTypeCacheBuilderPass(Some(cpg), astGenResult.parsedFiles, config, goMod.get, goGlobal).process()
new AstCreationPass(cpg, astCreators, report).createAndApply()
if goGlobal.pkgLevelVarAndConstantAstMap.size() > 0 then
new PackageCtorCreationPass(cpg, config, goGlobal).createAndApply()
report.print()
}
}
}

def getGoModHelper: GoModHelper = goMod.get
}
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import scala.util.{Success, Try}
trait AstForTypeDeclCreator(implicit withSchemaValidation: ValidationMode) { this: AstCreator =>

protected def astForTypeSpec(typeSpecNode: ParserNodeInfo): Seq[Ast] = {
val (name, fullName, memberAsts) = processTypeSepc(typeSpecNode.json)
val (name, fullName, memberAsts) = processTypeSepc(createParserNodeInfo(typeSpecNode.json))
val typeDeclNode_ =
typeDeclNode(typeSpecNode, name, fullName, relPathFileName, typeSpecNode.code)
val modifier = addModifier(typeDeclNode_, name)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ trait CacheBuilder(implicit withSchemaValidation: ValidationMode) { this: AstCre
cpgOpt.map { _ =>
// We don't want to process this part when third party dependencies are being processed.
val result = goGlobal.recordAliasToNamespaceMapping(declaredPackageName, fullyQualifiedPackage)
// TODO: we need to add this mapping only when declared package name is not matching
// with ending path string of fullyQualifiedPackage
if (result == null) {
// if result is null that means item got added first time otherwise it has been already added to global map
val rootNode = createParserNodeInfo(parserResult.json)
Expand Down Expand Up @@ -86,15 +88,14 @@ trait CacheBuilder(implicit withSchemaValidation: ValidationMode) { this: AstCre
ParserKeys.NodeReferenceId
)
) {
processImports(obj)
processImports(obj, true)
} else if (
json.obj
.contains(ParserKeys.NodeType) && obj(ParserKeys.NodeType).str == "ast.TypeSpec" && !json.obj.contains(
ParserKeys.NodeReferenceId
)
) {
createParserNodeInfo(obj)
processTypeSepc(obj)
processTypeSepc(createParserNodeInfo(obj))
} else if (
json.obj
.contains(ParserKeys.NodeType) && obj(ParserKeys.NodeType).str == "ast.FuncDecl" && !json.obj.contains(
Expand All @@ -118,14 +119,14 @@ trait CacheBuilder(implicit withSchemaValidation: ValidationMode) { this: AstCre
}
}

protected def processTypeSepc(typeSepc: Value): (String, String, Seq[Ast]) = {
val name = typeSepc(ParserKeys.Name)(ParserKeys.Name).str
protected def processTypeSepc(typeSepc: ParserNodeInfo): (String, String, Seq[Ast]) = {
val name = typeSepc.json(ParserKeys.Name)(ParserKeys.Name).str
if (checkForDependencyFlags(name)) {
// Ignoring recording the Type details when we are processing dependencies code with Type name starting with lower case letter
// As the Types starting with lower case letters will only be accessible within that package. Which means
// these Types are not going to get referred from main source code.
val fullName = fullyQualifiedPackage + Defines.dot + name
val typeNode = createParserNodeInfo(typeSepc(ParserKeys.Type))
val typeNode = createParserNodeInfo(typeSepc.json(ParserKeys.Type))
val ast = typeNode.node match {
// As of don't see any use case where InterfaceType needs to be handled.
case InterfaceType => Seq.empty
Expand All @@ -140,14 +141,25 @@ trait CacheBuilder(implicit withSchemaValidation: ValidationMode) { this: AstCre
("", "", Seq.empty)
}

protected def processImports(importDecl: Value): (String, String) = {
protected def processImports(importDecl: Value, recordFindings: Boolean = false): (String, String) = {
val importedEntity = importDecl(ParserKeys.Path).obj(ParserKeys.Value).str.replaceAll("\"", "")
val importedAs =
if (recordFindings) {
goMod.recordUsedDependencies(importedEntity)
}
val importedAsOption =
Try(importDecl(ParserKeys.Name).obj(ParserKeys.Name).str).toOption
.getOrElse(importedEntity.split("/").last)

aliasToNameSpaceMapping.put(importedAs, importedEntity)
(importedEntity, importedAs)
importedAsOption match {
case Some(importedAs) =>
if (recordFindings)
goGlobal.recordAliasToNamespaceMapping(importedAs, importedEntity)
(importedEntity, importedAs)
case _ =>
// As these alias could be different for each file. Hence we maintain the cache at file level.
val derivedImportedAs = importedEntity.split("/").last
if (recordFindings)
aliasToNameSpaceMapping.put(derivedImportedAs, importedEntity)
(importedEntity, derivedImportedAs)
}
}

protected def processFuncDecl(funcDeclVal: Value): MethodMetadata = {
Expand All @@ -170,6 +182,7 @@ trait CacheBuilder(implicit withSchemaValidation: ValidationMode) { this: AstCre
val params = funcDeclVal(ParserKeys.Type)(ParserKeys.Params)(ParserKeys.List)
val signature =
s"$methodFullname(${parameterSignature(params, genericTypeMethodMap)})$returnTypeStr"

goGlobal.recordFullNameToReturnType(methodFullname, returnTypeStr, signature)
MethodMetadata(name, methodFullname, signature, params, receiverInfo, genericTypeMethodMap)
} else
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,16 @@ class GoGlobal {
*/
val aliasToNameSpaceMapping: ConcurrentHashMap[String, String] = new ConcurrentHashMap()

/** This map will record the Type FullName of Struct Type defined for Lambda Expression along with return type
* fullname against the lambda signature.
*
* This will help map the Lambda TypeFullName with the respective Struct Type as supper Type
*/
val lambdaSignatureToLambdaTypeMap: ConcurrentHashMap[String, Set[(String, String)]] = new ConcurrentHashMap()

val pkgLevelVarAndConstantAstMap: ConcurrentHashMap[String, Set[(Ast, String)]] = new ConcurrentHashMap()

// Mapping method fullname to its return type and signature
// Mapping method fullname to its return type and signature, lambda expression return type also getting recorded under this map
val methodFullNameReturnTypeMap: ConcurrentHashMap[String, (String, String)] = new ConcurrentHashMap()

/** Mapping fully qualified name of the member variable of a struct type to it's type It will also maintain the type
Expand Down Expand Up @@ -61,30 +66,25 @@ class GoGlobal {
methodFullNameReturnTypeMap.putIfAbsent(methodFullName, (returnType, signature))
}

def recordPkgLevelVarAndConstantAst(pkg: String, ast: Ast, filePath: String): Unit = {
synchronized {
Option(pkgLevelVarAndConstantAstMap.get(pkg)) match {
case Some(existingList) =>
val t = (ast, filePath)
pkgLevelVarAndConstantAstMap.put(pkg, existingList + t)
case None => pkgLevelVarAndConstantAstMap.put(pkg, Set((ast, filePath)))
}
def recordPkgLevelVarAndConstantAst(pkg: String, ast: Ast, filePath: String): Unit = synchronized {
Option(pkgLevelVarAndConstantAstMap.get(pkg)) match {
case Some(existingList) =>
val t = (ast, filePath)
pkgLevelVarAndConstantAstMap.put(pkg, existingList + t)
case None => pkgLevelVarAndConstantAstMap.put(pkg, Set((ast, filePath)))
}
}

def recordLambdaSigntureToLambdaType(
signature: String,
lambdaStructTypeFullName: String,
returnTypeFullname: String
): Unit = {
synchronized {
Option(lambdaSignatureToLambdaTypeMap.get(signature)) match {
case Some(existingList) =>
val t = (lambdaStructTypeFullName, returnTypeFullname)
lambdaSignatureToLambdaTypeMap.put(signature, existingList + t)
case None => lambdaSignatureToLambdaTypeMap.put(signature, Set((lambdaStructTypeFullName, returnTypeFullname)))
}
): Unit = synchronized {
Option(lambdaSignatureToLambdaTypeMap.get(signature)) match {
case Some(existingList) =>
val t = (lambdaStructTypeFullName, returnTypeFullname)
lambdaSignatureToLambdaTypeMap.put(signature, existingList + t)
case None => lambdaSignatureToLambdaTypeMap.put(signature, Set((lambdaStructTypeFullName, returnTypeFullname)))
}
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ import io.circe.{Decoder, HCursor}
import io.joern.gosrc2cpg.Config
import io.joern.gosrc2cpg.utils.UtilityConstants.fileSeparateorPattern

import java.util.Set
import java.util.concurrent.ConcurrentSkipListSet
import scala.util.control.Breaks.*

class GoModHelper(config: Option[Config] = None, meta: Option[GoMod] = None) {

def getModMetaData(): Option[GoMod] = meta
Expand Down Expand Up @@ -41,6 +45,22 @@ class GoModHelper(config: Option[Config] = None, meta: Option[GoMod] = None) {
val tokens = meta.get.module.name +: pathTokens.dropRight(1).filterNot(x => x == null || x.trim.isEmpty)
tokens.mkString("/")
}

def recordUsedDependencies(importStmt: String): Unit = {
breakable {
meta.map(mod =>
// TODO: && also add a check for builtin package imports to skip those
if (!importStmt.startsWith(mod.module.name)) {
for (dependency <- mod.dependencies) {
if (importStmt.startsWith(dependency.module)) {
dependency.beingUsed = true
dependency.usedPackages.add(importStmt)
}
}
}
)
}
}
}

case class GoMod(fileFullPath: String, module: GoModModule, dependencies: List[GoModDependency])
Expand All @@ -55,10 +75,12 @@ case class GoModDependency(
module: String,
version: String,
indirect: Boolean,
var beingUsed: Boolean,
lineNo: Option[Int] = None,
colNo: Option[Int] = None,
endLineNo: Option[Int] = None,
endColNo: Option[Int] = None
endColNo: Option[Int] = None,
usedPackages: Set[String] = new ConcurrentSkipListSet[String]()
)

object CirceEnDe {
Expand Down Expand Up @@ -94,6 +116,7 @@ object CirceEnDe {
module = module.getOrElse(""),
version = version.getOrElse(""),
indirect = indirect.getOrElse(false),
beingUsed = false,
lineNo = lineNo.toOption,
colNo = colNo.toOption,
endLineNo = endLineNo.toOption,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ import org.slf4j.LoggerFactory

import java.io.File as JFile
import java.nio.file.Paths
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
import scala.util.{Failure, Success, Try}

class DownloadDependenciesPass(parentGoMod: GoModHelper, goGlobal: GoGlobal, config: Config) {
Expand All @@ -25,21 +28,25 @@ class DownloadDependenciesPass(parentGoMod: GoModHelper, goGlobal: GoGlobal, con
private def setupDummyProjectAndDownload(prjDir: String): Unit = {
parentGoMod
.getModMetaData()
.map(mod => {
.foreach(mod => {
ExternalCommand.run("go mod init joern.io/temp", prjDir) match
case Success(_) =>
mod.dependencies
.filter(dep => config.includeIndirectDependencies || !dep.indirect)
.foreach(dependency => {
val dependencyStr = s"${dependency.module}@${dependency.version}"
val cmd = s"go get $dependencyStr"
ExternalCommand.run(cmd, prjDir) match
case Success(_) =>
print(". ")
processDependency(dependencyStr)
case Failure(f) =>
logger.error(s"\t- command '${cmd}' failed", f)
val futures = mod.dependencies
.filter(dep => !dep.beingUsed)
.map(dependency => {
Future {
val dependencyStr = s"${dependency.module}@${dependency.version}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose go get will handle this, but one issue that comes to mind is rate limiting of the dependency APIs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be good to have a version of this where the dependencies are downloaded directly from the APIs like C# and Ruby so that we don't require go as a dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose go get will handle this, but one issue that comes to mind is rate limiting of the dependency APIs

go get doesn't parallelise it, I did a test to check whether it improves the time it takes to process. I do see the improvements.

Copy link
Contributor Author

@pandurangpatil pandurangpatil Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be good to have a version of this where the dependencies are downloaded directly from the APIs like C# and Ruby so that we don't require go as a dependency

I think that will take time, I will handle that through a separate ticket for this one as well as ProgramSummary incorporation.

val cmd = s"go get $dependencyStr"
synchronized(ExternalCommand.run(cmd, prjDir)) match
case Success(_) =>
print(". ")
processDependency(dependencyStr)
case Failure(f) =>
logger.error(s"\t- command '$cmd' failed", f)
}
})
val allResults: Future[List[Unit]] = Future.sequence(futures)
Await.result(allResults, Duration.Inf)
case Failure(f) =>
logger.error("\t- command 'go mod init joern.io/temp' failed", f)
})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,15 @@ class MethodAndTypeCacheBuilderPass(
) {
def process(): Seq[AstCreator] = {
val futures = astFiles
.map(file => {
.map(file =>
Future {
val parserResult = GoAstJsonParser.readFile(Paths.get(file))
val relPathFileName = SourceFiles.toRelativePath(parserResult.fullPath, config.inputPath)
val astCreator = new AstCreator(relPathFileName, parserResult, goMod, goGlobal)(config.schemaValidation)
val diffGraph = astCreator.buildCache(cpgOpt)
(astCreator, diffGraph)
}
})
)
val allResults: Future[List[(AstCreator, DiffGraphBuilder)]] = Future.sequence(futures)
val results = Await.result(allResults, Duration.Inf)
val (astCreators, diffGraphs) = results.unzip
Expand Down
Loading
Loading