All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Support for Java source file via Soot's source parser. This can be disabled via
Jimple2Cpg.createCpg(parseJavaSource)
.
- Upgraded
Joern
with latest data flow engine fix andTypeDecl
modifiers.
- Upgraded
Joern
with the latest support for annotations and standardized alloc sites.
- Upgraded
Joern
with the latest bug fixes in JVM bytecode frontend.
- Added ability to disable cache sharing on
flowsBetween
. - Simplified
EngineContext
andSemantics
onOverflowDbDriver
startup. OverflowDbDriver::methodSemantics
now public.
- Upgraded Joern version to include configurations that disable cache sharing.
OverflowDbDriver::flowsBetween
performance improvement on initial cache preparation.
OverflowDbDriver
now takes aDataFlowCacheConfig
argument that specifies data flow engine specific configurations.OverflowDbDriver::nodesReachableBy
renamed toflowsBetween
and now takes functions assources
andsinks
parameters.
- Improved data-flow caching performance by holding the same pointer as the initial cache and only converting to a serializable form later.
- New
PlumeStastistics
entries related to measuring result re-use and fetching speeds.
- Data flow cache now writes to
.cbor
instead of.json
for improved I/O performance.
OverflowDbDriver::nodesReachableBy
does not initialize the data-flow engine with the last query's cache if results are not being re-used.
- Made data flow cache compression optional.
OverflowDbDriver::nodesReachableBy
now filters paths where first and last node refer to the same immediate parent node and paths no longer than 1 path element.
- Added statistic to measure the cost of removing outdated graph information.
- Data flow cache serialization is now buffered and uses LZ4 instead of GZIP.
- Upgraded Joern version and adjusted to
x2cpg
changes.
- Do not write a data-flow cache file if no results were saved.
- Data flow engine context regenerated with latest cache after each query.
- Change detection bug where temporary directory where classes were moved to were being accounted for.
OverflowDbDriver::nodesReachableBy
now hassanitizers
parameter for method calls to filter paths out with.
- Upgraded Joern and moved to using the
ProgramHandlingUtil
on thatjimple2cpg
.
- Reworked data flow deserialization to address Jackson unable to deserialize
Option[Long]
. - Fixed major bug where expired node keys were being kept in data flow storage.
domain::isNodeUnderTypes
not handling potential NPE.
- Upgraded Joern to 1.1.637 and CPG to 1.3.519.
ProgramHandlingUtil::clean
no longer refers to a transitive dependency.
- Upgraded Joern to 1.1.628 and CPG to 1.3.517.
- Misplaced where
createCpg
would abort on thesootOnly
config. This has been fixed.
OverflowDbDriver::exportAsGraphML
for writing the graph instance as an XML file.
- Only adding changed files to Soot to improve performance.
- Simplified
PlumeStatistics
file related changes to only those class/methods changed. PlumeDynamicCallLinker
now extendsSimpleCpgPass
.
- Mention that Kryo should only run in Java < 17 in README.
- Upgraded Soot to 4.3.0, TinkerGraph/Gremlin to 3.4.11, Joern to 1.1.622.
- Making the reference DB
None
by default to avoid accidentally generating large amounts of stored graphs.
- Delete the unpacking dir after clean and generating a new one as needed if a successive analysis occurs.
- Making the reference DB
None
by default to avoid accidentally generating large amounts of stored graphs.
- Two new metrics that are tracked:
PROGRAM_CLASSES
andPROGRAM_METHODS
.
- Two new metrics that are tracked:
CHANGED_CLASSES
andCHANGED_METHODS
.
- Logger classpath issues by removing TigerGraph GSQL client from dependencies and only using SLF4J-API.
- Now calling
gsql_client
directly from command line as a process. - Upgraded TigerGraph version to 3.5.0.
- No longer set
deleteOnExit
for files in temp dir since they get cleaned via try-final call already. - Set Soot to non-application mode and non-whole program mode for efficiency.
OverflowDB
from throwingunable to calculate occurrenceCount
exceptions.
OverflowDb::safeRemove
to handle exceptions when deleting nodes.
- Updated Joern, CPG, and Logback versions.
- Importing
jimple2cpg
instead of duplicating work here. - Removed parsing Jimple
IdentityStmt
since they end up duplicating parameters as locals. - Swapped out the deprecated
ParallelCpgPass
forConcurrentWriterCpgPass
- Virtual calls now get passed the correct
this
object as the object the method is actually invoked with. - Virtual call
code
properties now include the object invoking the method. - Instance where
Local
nodes were duplicated for each time they were referenced - Issue where
NewCall(Operators.assignment)
did not havemethodFullName = Operators.assignment
and thus messing up data flow paths
- Overloaded
bulkTx
to handle newoverflowdb.BatchedUpdate
objects.
- Instance where dynamic
InvokeExpr::getMethod
would fail by usinggetMethodRef
instead.
- Updated passes to handle new
ForkJoinParallel
passes. - Upgraded CPG and Joern versions to latest.
- More logging to the
clear
method inNeptuneDriver
.
- Issue in
TigerGraphDriver
where HTTP client would timeout. - Removed
slf4j.simple
from build that goes into JAR.
- Using an explicit
GRAPHBINARY_V1D0
serializer forNeptuneDriver
queries. - Catching HTTP request failure when checking for node status in
NeptuneDriver
.
- Methods now have modifier nodes.
- Using
<operator>.alloc
for initialnew
assignments to objects and this has been added todefault.semantics
.
- Instances where dynamic calls were failing because of
soot.dummy.InvokeDynamic
not being loaded atSIGNATURES
level.
- Made constructors a bit friendlier to work with and they now have
this
parameters. - Made some code optimizations on passes.
- Fixed instances where
.fieldRef.getField
would return anull
and crash method body parsing.
- Method parameters now have correct evaluation strategies.
- Fixed performance issues in Gremlin drivers related to not re-using traversal objects.
- Fixed instance in methods where
this
parameter was not passed through on dynamic calls. - Fixed performance issues in Neo4j's driver by using more parameterized and re-usable queries.
- Generate
<operator>.assignment
call node'scode
property from child argumentcode
.
TigerGraphDriver
default transaction limit was 3 instead of 30 seconds.- AST linking in
TigerGraphDriver
did not escape[]
or_
but now does. - Diversified error handling on exceptions on
TigerGraphDriver
HTTP requests.
Jimple2Cpg::createCpg
can now enable an experimental "Soot only" build.
PlumeStatistics::reset
now actually sets all values for keys to0L
instead of just clearing.
PlumeDynamicCallLinker
more generous with trying static call linking as a fallback before reporting an issue.- Now wrapped
OverflowDb::clear
in aTry
to prevent aunable to calculate occurrenceCount
runtime exception.
- Created
util
package to containHashUtil
andProgramHandlingUtil
- All input files are unpacked to a temporary directory.
- Methods are checked to be concrete before retrieving method body.
- Interfaces are now recognized and treated as types with implementation represented by
INHERITS_FROM
.
PlumeStatistics
now captures library performance.
OverflowDbDriver(dataflowCachePath)
property is nowOption[Path]
.- If
dataFlowCachePath
isNone
then data-flow results are not saved. - Moved HTTP response case classes to
domain.HttpResponse
.
- Data-flow paths are saved to a GZIP compressed JSON and are re-used on future runs. Only available
on
OverflowDbDriver
.
- Unchanged methods no longer have REACHABLE_BY edges regenerated/duplicated
- Support for multi-array creation added
- Array tests derived from JavaSrc2Cpg included
- Fixed access path issue where array index accesses were reported to be invalid ASTs. This was just a change in AST
children's
order
from(0, 1)
to(1, 2)
- Fixed bug where if a single file was specified then all files in the directory were loaded
- Updated frontend to leverage
Call(<operator>.arrayInitializer)
instead ofUnknown(new)
vertices
- Issue where if a single file was given, all surrounding files are checked to be included too
- Warnings related to matching generics susceptible to type erasure issues
- Performance and anti-patterns reported by DeepSource
GremlinDriver
now handles defaults for all calls toby
steps
GremlinDriver
now handles nodes that do not include properties specified underGremlinDriver::propertyFromNodes
- Whole project migrated to Scala
- Every transaction as far as possible is a bulk transaction
- Processing follows closely to layers used in other Joern frontends
- Package structure changed from io.github to com.github
- JanusGraph support
- Removed unnecessary use of Log4j2
- Modified class loading to handle exceptions when CG methods cannot be extracted
- Extractor main process wrapped in try-final to ensure resource release
- Soot only configuration for the metrics
- Upgraded CPG to latest version before IDs were removed again
- Removed unused methods in ODB
Traversals
- Plume going into maintenance mode
TableSwitchStmt
jump targets now have the correct order
Extractor::project
now generated overloaded methods for Java
- OverflowDB graphs now generated from
io.shiftleft.codepropertygraph.generated.Cpg
- Upgrade ShiftLeft dependencies to 1.3.314
- Upgrade Gradle to 7.2
- Removed
Binding
vertices - Can now handle new type arguments API of domain classes
Extractor::project
now takes an optional boolean to disable reaching defs calculationExtractor::projectReachingDefs
now calculates reaching defs separately- Removed
IDriver::getProgramStructure
as it's not used by the core extractor VertexMapper
now handles new default property system
- SPARK is now the default call graph as it is more precise and pays off later when performing data-flow analysis
- All methods are now accepted as entry-points, no need for parachute code to catch the case where no call edges generated by Soot
- Upgraded
codepropertygraph
version to 1.3.151 and made respective changes. - Re-added
TYPE_FULL_NAME
toCall
vertices.
- Corrected Identifier's code for Static Field Access
- Downgraded
codepropertygraph
version to 1.3.120.
- Fixed issue where JAR files were only being identified by their suffix. They are now checked for being zip files first which will include WARs.
- Improved logging for
NeptuneDriver
.
- Bug where cluster builder details cause empty re-connections.
- The
id
was not being set from the deserialized IDs inNeptuneDriver
.
NeptuneDriver::idStorageLocation
specifies a storage location where ID mapper values can be written.
- Increased wait time after
clearGraph
call inNeptuneDriver
- External methods getting marked for rebuild on disconnected updates.
- External classes getting marked for rebuild on disconnected updates.
NeptuneDriver::clearGraph
now uses the HTTP system database reset if the graph has over 10 000 vertices.
- Added new timer measurements
CONNECT_DESERIALIZE
andDISCONNECT_SERIALIZE
.
DataFlowPass
now convertsDiffGraph
s intoDeltaGraphs
in order to usebulkTransaction
s.DataFlowPass
now shows progress bar.- Separated Extractor and Driver measurements under
PlumeTimer
.
TigerGraphDriver
now uses a TigerGraph v3.0 feature that allows edges to be defined between different vertex types. This means that now the CPG schema can be properly defined with unique vertex names.- If a property is unused it is stripped from being added.
PlumeKeyProvider
was getting stuck on thecurrentMax
variable - this is now fixed.Neo4jDriver
'sbulkTransction
was too tightly coupled on the vertex and edge add which lead to bugs. This is now separated and the bug is no longer present.
Neo4jDriver
bulk transactions are now chunked in that they insert by chunks of 50.
PlumeKeyProvider
was providing duplicates - this is now fixed.
- Duplicates are now handled the same in
TigerGraphDriver
as the rest of the drivers.
- Latest SCPG schema applied with deprecated properties removed.
- Removed deprecated
DeltaGraph.apply
GremlinDriver
bulk transactions properly implemented now.- Duplicate edges in
bulkTransaction
filtered out
- Duplicates in bulk transactions of the
GremlinDriver
are more thoroughly removed. NeptuneDriver
now clears the graph in chunks to avoid timing out on larger graphs.- Grouped field construction into chunks and execute chunks in bulk transactions.
OverflowDbDriver
's existence checking now also makes sure that the ID returned matches the ID given.
- Indicating in the logging which number of each member being reported belongs to the application or an external library.
- Making a progress bar is now done via
ProgressBarUtil
.
TigerGraphDriver
now has timeout as a configurable parameter.
- Neptune driver by mapping
Long
IDs to Neptune's nativeString
IDs
- Removed
GremlinOverriddenIdDriver
as it is no longer used.
AST
andCONTAINS
edges for external method stubs
- Fixed a bug where
Call
vertices which were removed were being recreated underCGPass
- Associated
NamespaceBlock
also removed from cache during class removal.
- New fields are not checked for rebuild and are immediately added. Only updated class fields are now checked.
- Removed driver classes that were deprecated and due for removal
- Removed unused constants
- Uncaught exceptions are sometimes thrown when looking for all methods that the program references. These are now caught appropriately.
- Artifact hash under
MetaData
. If artifact has no change then project will end early.
- In instances where classes are removed, their respective cached data is removed now too.
- Method bodies are now hashed and stored on the
Method
node. - Finer updates on:
- class modifier level,
- field type, value, modifier level, and
- method level
- The latest copyright on all class headers.
order
of methods in theMethodStubPass
.DynamicInvoke
bootstrap arguments are now projected.- External methods referenced in calls are now added too.
- Separated hash functions to into a new util class called
HashUtil
. - Plume now expects the whole artifact to be loaded in order to detect class removals.
ArrayRef
now gets projected as anOperators.indexAccess
call with index and base identifier as the arguments.InstanceOfExpr
now gets projected as anOperators.instanceOf
call.LengthExpr
now gets projected as anOperators.lengthOf
call. This is a custom operatorMonitorStmt
now gets projected as anUnknown
vertex.NegExpr
now gets projected as anOperators.minus
call.
- Crashing passes from making the program hang. Exceptions are caught, logged, and the build is saved as far as it got.
ThrowStmt
is now anUnknown
vertex where control flow ends at.NewArrayExpr
is now anUnknown
vertex.
IDriver.bulkTransaction
to replaceDeltaGraph.apply
and make database specific bulk changes.IdentityStmt
is now handled as part of theLOCAL
andIDENTIFIER
cycleIdentityRef
is now handled underprojectOp
ThrowStmt
is now handled as a special kind ofReturn
- External method <-
AST
- External type is now fixed. EVAL_TYPE
links forMethodReturn
andBlockVertex
on the method stubs.CacheOptions.cacheSize
is mutable via setters now.DataFlowPass
now gets method head along with method body so the passes no longer throw exceptions.parseBinopExpr
had some incorrect mappings which are now fixed.
PlumeTimer
is simplified and now only usesmeasure
function.- Disabled cache2k from collecting its own statistics.
- Early stopping enabled when no classes needed to update is detected.
- Feedback regarding files to updated now moved from
INFO
toDEBUG
logging. - Marked
DeltaGraph.apply
as deprecated. DeltaGraph::toOverflowDb
now only writes to an existing OverflowDB instance.- Increased
CacheOptions.cacheSize
and the cache is now partitioned among the 4 caches based on average allocation from the benchmarks. Cache expiry is now removed as an option. GotoStmt
is now added as aCONTROL_STRUCTURE_VERTEX
withJUMP_TARGET
s removed.- CFG now connects nodes within each expression and follows the stack pointer like Joern/Ocular
StaticFieldRef
has moved from usingTypeRef
toIdentifier
otherwise data flow passes through errors.
- Progress bar causing call graph pass to freeze on large graphs. This has been removed.
- Resource clearing was accidentally commented out in 0.3.8 - this has been addressed.
- Progress bar when logging level is
>= Level.INFO
for method related operations - Added cache2k to handle caching
CacheMetrics
to track hits and missesMETHOD_PARAMETER_IN
-PARAMETER_LINK
->METHOD_PARAMETER_OUT
edge was included
- Improved the node caching and centralized
tryGet
andgetOrMake
-style operations toDriverCache.kt
- Separated the cache and storage into
storage._Cache
classes andstorage.PlumeStorage
- Method/Local/MethodParameterIn have been created more closely to Ocular's output.
TigerGraphDriver
bug where empty strings for intentional properties would be unintentionally excluded.Member.name
andFieldIdentifier.code
properly handled- Fixed temp dir resolution issue on macOS and Windows
CONTAINS
edges are generated forMETHOD
to body vertices.ListMapper
to process Scala lists to a serialized string and back. More formally processing Scala lists to and from OverflowDB node objects.- Handle inheritance edges i.e.
TYPE_DECL -INHERITS_FROM-> TYPE
BaseCpgPass
now uses a local cache for method body nodes instead of relying solely onGlobalCache
SCPGPass
now known asDataFlowPass
as all passes now come fromdataflowengineoss
.- Added
PROGRAM_STRUCTURE
to timer keys.
IDriver::getVerticesOfType
to aid in caching from existing database vertices.- External methods signatures are parsed to figure out their method parameters.
MethodStubPass
andBaseCPGPass
now includesMETHOD_PARAM_IN
andMETHOD_PARAM_OUT
and connects them to their type.- Field accesses are now constructed as a
Call
vertex. - Plume now has a new logo and branding.
- Better logging for loaded files.
- Many of the
nodeCache
uses inIProgramPass
passes were converted to using theGlobalCache
instead. MethodStubPass
now runs in parallel if possible.
- Upped the default chunk size
DeltaGraph::toOverflowDb
can now take in an optionaloverflowdb.Graph
object to write to
- Memory leak where thread pools weren't getting shutdown
DeltaGraph
as aNewNodeBuilder
variant of ShiftLeft'sDiffGraph
.BaseCpgPass
which is a combination of theASTPass
,CFGPass
, andPDGPass
and returns aDeltaGraph
instead of directly apply changes to the driver.methodBodies
was added toGlobalCache
to save on database requests when moving toSCPGPass
afterBaseCpgPass
- Chunk size can now be configured via
ExtractorOptions::methodChunkSize
- Replaced
ASTPass
,CFGPass
, andPDGPass
withBaseCpgPass
. - Spawns a thread pool to run base CPG building in parallel and apply
DeltaGraph
s in serial. - SCPG flows are only run on new/updated method bodies since the analysis is independent of other methods.
- Types for global primitives
- Return types are now added to all types built in the CPG
- Moved the maps in
Extractor
to a dedicatedGlobalCache
object that usesConcurrentHashMap
s. - SCPG pass now concurrently pulls all methods and merges it into an input graph. This code has been moved to
passes.SCPGPass.kt
- External method stubs have call-to-returns generated i.e. (METHOD)-CFG->(RETURN)-CFG->(METHOD_RETURN)
- Better
INFO
threshold logging withinExtractor::project
.
- Combined
Extractor::project
andExtractor::postProject
intoproject
. - Deprecated
getProgramTypeData
- Changed
UNIT_GRAPH_BUILDING
toSOOT
and added the time taken on loading files into Soot, calling FastHierarchy, and using Soot's call graph.
- Method pass
MethodStubPass
- Structure pass
ExternalTypePass
,FileAndPackagePass
,MarkForRebuildPass
, andTypePass
- Type pass
GlobalTypePass
- Added
getVerticesByProperty
andgetPropertyFromVertices
toIDriver
- Graph builders are now known as "passes" to conform to how SCPG builds graphs. Each has an interface
under
IGraphPass
. graph/[AST|CFG|PDG|CallGraph]Builder
topasses/graph/[AST|CFG|PDG|CallGraph]Pass
- Deprecated
getMethodNames
- Added timer probes regarding database closer to database methods
- Duplication of files, types, namespace vertices on updates
ContainsEdgePass
added beforeReachingDefPass
PlumeTimer
to measure various intervals of the projection process- Added a filter step before
constructStructure
call inExtractor::project
as not to duplicate types
- Fixed
PlumeKeyProvider
infinite loop and added proper tests forgetNewId
- Added a check in the setter for
keyPoolSize
to not allow anything less than 1
- Added
getMethodNames
andgetProgramTypeData
toIDriver
- Used
getMethodNames
andgetProgramTypeData
to reduce the sub-graphs inExtractor::postProject
- Changed subgraph-style results to list of edge results in order to improve performance in
GremlinDriver
- Switched to using
SLF4J
as the logging API
- Fixed issue where
${sys:LOG_DIR}
is generated when there is nolog4j2
config file Call
vertices not containing consistent full names and signatures asMethod
vertices. Resolves #76.
- Log4j-Core is now only added as a
testImplementation
since this is used as a library and not an application ExtractorConst::getPlumeVersion
now used to get package versionVERSION.md
is now where the build obtains version details
code
,lineNumber
,columnNumber
toArrayInitializer
- Escape " (quotes) to fix Neo4j bug where strings containing quotes fail vertex insertion
TypeDecl
toArrayInitializer
edge warning
TigerGraphDriver::authKey
never null and now just blank if not set- Removed
log4f2.properties
under the main artifact - Made the visibility of driver constructors module specific so that users are forced to use the
DriverFactory
connect
methods on drivers now return the driver instead of nothing.
ISchemeSafeDriver
interface for drivers who can install schemas on the databaseJanusGraphDriver::buildSchema
to dynamically build and install JanusGraph schema
- Dependency
com.tigergraph.client:gsql_client
TigerGraphDriver::buildSchema
to dynamically build and install GSQL schema
- Assigned all operator calls to
io.shiftleft.codepropertygraph.generated.Operators
constants - Assigned values to
ControlStructure::controlStructureType
- Improved logging
Extractor::postProject
to add additionalio.shiftleft.semanticcpg.passes
andio.shiftleft.dataflowengineoss.passes
- Added
IDriver::getMetaData
to get theNewMetaData
vertex from the database if present
Extractor::load
andExtractor::project
now returnExtractor
instance to allow call chaining
- Graph updates would add duplicate program structure information and fail to link prior
CALL
edges - Handle the case where
NewFileBuilder#hash
is null - Where
TypeDecl
s were attempted to be duplicated ingetProgramStructure
- Fixed case where
Node
types were not handled inDiffGraphUtil::processDiffGraph
IDriver::getProgramStructure
would not return vertices with degree 0
deleteEdge
toIDriver
updateVertexProperty
toIDriver
DiffGraphUtil::processDiffGraph
to acceptDiffGraph
s and apply changes to a givenIDriver
- Modified
deleteVertex
signature to take ID and optional label
- Lifted compilation directory to $TEMP/plume/build. This is then deleted recursively after project.
- Module not found bug introduced by improper class cleanup in temp dir.
- Fixed instances where CallGraphBuilder would connect non-NewCallBuilder source nodes to methods.
- Fixed GraphML not escaping ampersands
- Support for loading JAR files via
load
function
AST
edges betweenTypeDecl
and theirModifier
sSOURCE_FILE
edges betweenTypeDecl
and theirFile
s- A
File
vertex to represent unknown files
- When Soot cannot get method data, it will log this as a warning instead of throwing a
RuntimeException
TypeDecl
are now properly generated for external types
- Replaced Plume enums with
codepropertygraph
constants
CALL
edges not created if nostatic void main
present
- Performance issues with
getProgramStructure
inOverflowDbDriver
- Replaced
PlumeGraph
withoverflowdb.Graph
. - Removed Gremlin driver transaction logic being present by default.
- Fixed
cmp
bug by adding this toExtractorConst#BIN_OPS
. - Neo4j driver now also connects in the extractor if given to extractor disconnected
- Upgraded ASM5 -> ASM8 to fix some JAR support
- Migrated to ShiftLeft's codepropertygraph domain classes
- Migrated from Neo4j Gremlin Bolt to Neo4j Java Driver (Official Driver)
- Fixed order property and got rid of old implementation
- Removed use of reflection to improve performance of serializing and deserializing
- Extractor now longer halts process if a schema violation occurs
- ShiftLeft dependencies upgraded
- Argument index was not being implemented properly, this has been fixed.
- The following additional configuration options for OverflowDB
- overflow
- heapPercentageThreshold
- serializationStatsEnabled
- The configuration option
dbfilename
changed tostorageLocation
to match OverflowDB's respective config's name. - Removed polyglot support
- All analyzed files are sent to a temp directory so there is no longer a need to specify class path in the Extractor
- Replaced REF edges between calls and methods with CALL edges.
- Broken jCenter link in README
- Support for 6 graph databases
- TinkerGraph
- OverflowDB
- JanusGraph
- TigerGraph
- Amazon Neptune
- Neo4j
- Can extract code property graphs using Soot for:
- Java class and source code
- JavaScript 170 (1.7)
- Python 2.72
- Can construct call graphs using Soot with the following algorithms:
- CHA
- SPARK