Releases: neo4j/graph-data-science
Graph Data Science 1.8.1
GDS 1.8.1 is compatible with Neo4j 4.1, 4.2, 4.3 and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5.
Bug fixes
- Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms, e.g. Page Rank.
- Fixed a bug where
gds.beta.graphSage
could produce incorrect results for small graphs - Fixed a bug where
gds.beta.graphSage
could product incorrect results for the pool aggregator - Fixed a bug where
gds.graph.create.cypher
would not accept list properties for nodes - Fixed a bug in
gds.beta.graph.create.subgraph
where long values greater than 253 were not properly handled during expression evaluation
Graph Data Science 1.7.3
GDS 1.7.3 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x, 4.0, or 4.4. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.4 compatible release, please see GDS 1.8.0.
Bug fixes
- Fixed a bug where Node2Vec would produce an AIOOBE on sufficiently large graphs.
- Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms,e.g. Page Rank.
GDS 1.8.0
GDS 1.8 is compatible with Neo4j 4.1, 4.2, 4.3, and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5
Breaking changes
- GDS now throws error messages on identifiers with trailing whitespaces to avoid input errors. This affects
graphName
,modelName
, and several property parameters such asnodeWeightProperty
orseedProperty
. - We have removed the separate
concurrency
parameter from the model parameter space ingds.alpha.ml.nodeClassification.train
,gds.alpha.ml.linkPrediction.train
andgds.alpha.ml.pipeline.linkPrediction.configureParams
. Theconcurrency
value in the configuration of the train procedure will be used. - The procedure
gds.alpha.randomWalk.stream
has graduated to thebeta
tier, asgds.beta.randomWalk.stream
.- Random Walk has been improved and aligned with the
Node2Vec
implementation. Please consult the documentation to find out about the new configuration options. gds.alpha.randomWalk.stream
has been removed.- A memory estimation procedure,
gds.beta.randomWalk.estimate
has been added
- Random Walk has been improved and aligned with the
- The procedure
gds.beta.fastRPExtended
has been merged withgds.fastRP
.
New features
- Link Prediction
- Add new link prediction stream procedure
gds.alpha.ml.pipeline.linkPrediction.predict.stream
. - Added
probabilityDistribution
andsamplingStats
to the result ofgds.alpha.ml.pipeline.linkPrediction.predict.mutate
. - To improve prediction performance, we’ve added kNN-based approximate search strategy option to link prediction procedures
gds.alpha.ml.pipeline.linkPrediction.predict.stream|mutate
. - Node property steps in Link Prediction pipelines can use a relationship property.
- Add new link prediction stream procedure
- Node Classification pipelines: similar to link prediction pipelines, we’ve added a pipeline procedure for node classification, where users can define the features, splitting strategy, and model training options. We’ve added:
gds.alpha.ml.pipeline.nodeClassification.create
gds.alpha.ml.pipeline.nodeClassification.addNodeProperty
gds.alpha.ml.pipeline.nodeClassification.selectFeatures
gds.alpha.ml.pipeline.nodeClassification.configureParams
gds.alpha.ml.pipeline.nodeClassification.configureSplit
gds.alpha.ml.pipeline.nodeClassification.train
gds.alpha.ml.pipeline.nodeClassification.predict.mutate|stream|write
- New algorithm: Conductance,
gds.alpha.conductance.stream
, can be used to compute a metric to evaluate the quality of communities identified by community detection algorithms. - Added support for preserving a relationship property in
gds.alpha.ml.splitRelationships.mutate
. - The procedure
gds.fastRP
has received additional configuration parameters:featureProperties
: to configure using node properties as part of the embedding.propertyRatio
: to control how much of the embedding is computed from properties.nodeSelfInfluence
: allows using each node's initial random vector as a contribution to the node's embedding. Especially useful for graphs with disconnected nodes.
Bug fixes
- Added check that
concurrency
is meeting determinism constraints for K-Nearest Neighbors wheneverrandomSeed
is overridden. - Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
- Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
- Fixed an issue where KNN did not add candidates to the topK result.
- Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter
nodeLabels
. - Fixed an issue where running
gds.alpha.ml.pipeline.linkPrediction.train
could result in an error on graphs filtered with the configuration parameternodeLabels
. - Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
- Fixed an issue with unmapped Neo4j node ids throwing
ArrayIndexOutOfBoundsException
. - Fixed a bug where the in-memory storage engine would not find the correct graph store if the db name was not lowercase
- Fixed a bug where the graph store would be released when storing the CypherGraphStore in the catalog
- Fixed a bug where Node2Vec would produce an ArrayIndexOutOfBounds error on sufficiently large graphs.
Improvements
- Added context information to log entries in debug and warning.
- Log Training loss as part of general progress logging
- Running transactions while projecting a graph now has less chance of breaking the projected graph
- Improve runtime performance for FastRP
- Use Neo4j node id instead of internal GDS node id when seeding generation of initial random vectors in FastRP.
- The in-memory cypher db is now capable of querying relationship ids, types and properties
- The procedure
gds.alpha.randomWalk.stream
has been improved and should now run faster and more stable.
Graph Data Science 1.8.0-Preview
GDS 1.8 is compatible with Neo4j 4.1, 4.2, 4.3, and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5
Breaking changes
- GDS now throws error messages on identifiers with trailing whitespaces to avoid input errors. This affects
graphName
,modelName
, and several property parameters such asnodeWeightProperty
orseedProperty
. - We have removed the separate
concurrency
parameter from the model parameter space ingds.alpha.ml.nodeClassification.train
,gds.alpha.ml.linkPrediction.train
andgds.alpha.ml.pipeline.linkPrediction.configureParams
. Theconcurrency
value in the configuration of the train procedure will be used. - The procedure
gds.alpha.randomWalk.stream
has been improved and aligned with theNode2Vec
implementation. Please consult the documentation to find out about the new configuration options. - The procedure
gds.beta.fastRPExtended
has been merged withgds.fastRP
.
New features
- Link Prediction
- Add new link prediction stream procedure
gds.alpha.ml.pipeline.linkPrediction.predict.stream
. - Added
probabilityDistribution
andsamplingStats
to the result ofgds.alpha.ml.pipeline.linkPrediction.predict.mutate
. - To improve prediction performance, we’ve added kNN-based approximate search strategy option to link prediction procedures
gds.alpha.ml.pipeline.linkPrediction.predict.stream|mutate
. - Node property steps in Link Prediction pipelines can use a relationship property.
- Add new link prediction stream procedure
- Node Classification pipelines: similar to link prediction pipelines, we’ve added a pipeline procedure for node classification, where users can define the features, splitting strategy, and model training options. We’ve added:
gds.alpha.ml.pipeline.nodeClassification.create
gds.alpha.ml.pipeline.nodeClassification.addNodeProperty
gds.alpha.ml.pipeline.nodeClassification.addFeatures
gds.alpha.ml.pipeline.nodeClassification.configureParams
gds.alpha.ml.pipeline.nodeClassification.configureSplit
gds.alpha.ml.pipeline.nodeClassification.train
gds.alpha.ml.pipeline.nodeClassification.predict.mutate|stream|write
- New algorithm: Conductance,
gds.alpha.conductance.stream
, can be used to compute a metric to evaluate the quality of communities identified by community detection algorithms. - Added support for preserving a relationship property in
gds.alpha.ml.splitRelationships.mutate
. - The procedure
gds.fastRP
has received additional configuration parameters:featureProperties
: to configure using node properties as part of the embedding.propertyRatio
: to control how much of the embedding is computed from properties.nodeSelfInfluence
: allows using each node's initial random vector as a contribution to the node's embedding. Especially useful for graphs with disconnected nodes.
Bug fixes
- Added check that
concurrency
is meeting determinism constraints for K-Nearest Neighbors wheneverrandomSeed
is overridden. - Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
- Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
- Fixed an issue where KNN did not add candidates to the topK result.
- Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter
nodeLabels
. - Fixed an issue where running
gds.alpha.ml.pipeline.linkPrediction.train
could result in an error on graphs filtered with the configuration parameternodeLabels
. - Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
- Fixed an issue with unmapped Neo4j node ids throwing
ArrayIndexOutOfBoundsException
. - Fixed a bug where the in-memory storage engine would not find the correct graph store if the db name was not lowercase
- Fixed a bug where the graph store would be released when storing the CypherGraphStore in the catalog
- Fixed a bug where Node2Vec would produce an ArrayIndexOutOfBounds error on sufficiently large graphs.
Improvements
- Added context information to log entries in debug and warning.
- Log Training loss as part of general progress logging
- Running transactions while projecting a graph now has less chance of breaking the projected graph
- Improve runtime performance for FastRP
- Use Neo4j node id instead of internal GDS node id when seeding generation of initial random vectors in FastRP.
- The in-memory cypher db is now capable of querying relationship ids, types and properties
- The procedure
gds.alpha.randomWalk.stream
has been improved and should now run faster and more stable.
Graph Data Science 1.7.2
GDS 1.7.2 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5
Bug fixes
- Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
- Fixed an issue where KNN did not add candidates to the topK result.
- Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.
- Fixed an issue where running
gds.alpha.ml.pipeline.linkPrediction.train
could result in an error on graphs filtered with the configuration parameter nodeLabels. - Fixed an issue with unmapped Neo4j node ids throwing
ArrayIndexOutOfBoundsException
GDS 1.1.7
GDS 1.1.7 is compatible with Neo4j Neo4j 3.5.x. For a 4.x compatible release, please see GDS 1.7.2.
Bug fixes
- Fixed a bug in Louvain where changes to
maxIterations
were ignored. - Fixed a bug which caused
gds.graph.list
andgds.graph.drop
to throw an error when specifying a graph with duplicate property keys by failing early - Fixed a bug where
gds.alpha.scc
would sometimes fail with anArrayIndexOutOfBoundsException
. - Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.
GDS 1.7.1
Release Date October 12, 2021
GDS 1.7.1 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6. For a 4.0 compatible release, please see GDS 1.6.5
- Fixed a bug where Cypher graph loading and subgraph creation which could lead to
ArrayIndexOutOfBounds
errors. - Fixed an
ArrayIndexOutOfBounds
caused by running triangleCount on graphs with multiple relationship types.
Graph Data Science 1.7.0
GDS 1.7.0 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6. For a 4.0 compatible release, please see GDS 1.6.5
Breaking changes
- This release does not support Neo4j 4.0.x
- Align returned
modelInfo
entry names ofgds.alpha.ml.linkPrediction.train
andgds.alpha.ml.nodeClassification.train
with the model catalog. Now containingmodelName
andmodelInfo
instead ofname
andinfo
. - Remove the
sharedUpdater
parameter fromgds.alpha.ml.linkPrediction
andgds.alpha.ml.nodeClassification
. gds.beta.graph.export.csv
now exports into a subdirectory calledexport
. Previously, the exported graphs were written directly into the configured directory.- Renamed all
graphalgo
packages togds
New features
- New Algorithm: Approximate Maximum K-Cut
- Includes procedures:
gds.alpha.maxkcut.[mutate|mutate.estimate|stream|stream.estimate]
.
- Includes procedures:
- Introduced Link Prediction Pipelines to make it easier to define and calculate features, split your graph, and make predictions.
- Includes procedures:
gds.alpha.ml.pipeline.linkPrediction.create|addNodeProperty|addFeature|configureSplit|configureParams|train|predict.mutate
.
- Includes procedures:
- Introduced support for exporting additional node properties, including strings, from the underlying database.
- Added
additionalNodeProperties
parameter togds.graph.export
- Added
additionalNodeProperties
parameter togds.graph.export.csv
- Added
- Introduced experimental support for querying the in-memory graph with Cypher
- Added
gds.alpha.create.cypherdb
to allow neo4j to recognize the in-memory graph as a database for Cypher queries
- Added
- To allow users better ability to handle multiple concurrent users, we’ve added a system monitoring procedure,
gds.alpha.systemMonitor,
to provide an overview of the system's workload and available resources. - Progress logging is now turned on by default, and no longer requires changing your configuration settings. Progress can be accessed with
gds.beta.listProgress
- GraphSAGE now supports deterministic results with the
randomSeed
configuration parameter togds.beta.graphSage.train
. - Improve performance (up to 20x speedup) of weakly connected components,
gds.wcc,
for undirected graphs by applying a subgraph sampling optimization.
Bug fixes
- Fixed a bug regarding weighted graphs with multiple relationship types, which affected
gds.beta.graphSage
andgds.alpha.spanningTree
. - Supervised Machine Learning (Node Classification & Link Prediction):
- Fixed a
NaN
issue in NodeClassification where computations with very small probability values can cause the result to flip to infinity. - Fixed a bug in seeded NodeClassification and LinkPrediction which lead to non-deterministic behaviour.
- Corrected the training size used in
gds.alpha.ml.linkPrediction.train
. This affects thepenality
parameter used in logistic regression.
- Fixed a
- Progress Logging:
- Fixed a bug in beta progress event tracking where progress events would not be released if computation was abandoned before completion.
- Fixed a bug in beta progress event tracking for Pregel algorithms where progress events would not be released on algorithm completion.
- Node Similarity & KNN:
- Fixed a bug where on a node-filtered multi-relationship-type graph KNN and NodeSimilarity could write out of bounds.
- Fixed a bug which affected
gds.nodeSimilarity.write
andgds.alpha.knn.write
when being executed in combination with anodeLabels
filter. The bug either led to an exception or to wrong results due to an incorrect mapping between internal and Neo4j node ids. - Fixed a bug where
gds.nodeSimilarity.[write|mutate]
andgds.beta.knn.[write|mutate]
wrote duplicate relationships if the input graph is undirected.
- KNN:
- Fixed a bug in
gds.beta.knn
where negative values in node properties of type float arrays failed when returning thesimilarityDistribution
.
- Fixed a bug in
- Fast RP:
- FastRP stream mode explicitly returns a list of floats rather than a list of numbers. This agrees with the other embeddings, and saves users from having to cast/transform when processing the results further in Cypher.
- GraphSAGE:
- Fixed a bug in weighted GraphSAGE where the relationshipWeightProperty was not loaded.
- Fixed a bug in
gds.beta.graphSage
, where the concurrency parameter was not considered.
- Graph Operations:
- Fixed a bug in
gds.graph.removeNodeProperties
whereremovedPropertiesWritten
was too large for properties shared across multiple labels. - Fixed a bug in
gds.beta.graph.generate
, where random graphs with relationship properties could not be generated. - Fixed a bug in
gds.create.subgraph
which could lead to undefined behaviour or an AIOOB exception when executed on GDS Enterprise Edition. - Fixed a bug in
gds.graph.create
, where default values for array properties would throw for convertable types.
Improvements
- Pathfinding: Added existence checks for
sourceNode
andtargetNode
to all shortest path procedures in the product tier. - Improved runtime of
gds.fastRP
via better workload balancing between threads. - Lower memory footprint for LinkPrediction and NodeClassification.
- Improved the procedure output of
gds.beta.listProgress
. - Scale down scores computed by
gds.articleRank
.
- Fixed a bug in
Graph Data Science 1.6.5
GDS 1.6.5 is compatible with Neo4j 4.0, 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Bug fixes
- Fixed a bug in
gds.beta.graph.generate
, where random graphs with relationship properties could not be generated. - Fixed a bug in
gds.graph.create
, where default values for array properties would throw for convertable types. - Fixed a bug in
gds.beta.graphSage
, where the concurrency parameter was not considered. - Fixed a bug where the BitIdMap node mapping builder (on by default in GDS Enterprise Edition) would not correctly count all nodes in certain situations.
- Corrected the training size used in
gds.alpha.ml.linkPrediction.train
. This affects thepenality
parameter used in logistic regression.
GDS 1.7.0-Preview
GDS 1.7.0-preview is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6. For a 4.0 compatible release, please see GDS 1.1.6
Breaking changes
- This release does not support Neo4j 4.0.x
- Align returned
modelInfo
entry names ofgds.alpha.ml.linkPrediction.train
andgds.alpha.ml.nodeClassification.train
with the model catalog. Now containingmodelName
andmodelInfo
instead ofname
andinfo
. - Remove the
sharedUpdater
parameter fromgds.alpha.ml.linkPrediction
andgds.alpha.ml.nodeClassification
. gds.beta.graph.export.csv
now exports into a subdirectory calledexport
. Previously, the exported graphs were written directly into the configured directory.- Renamed all
graphalgo
packages togds
New features
- New Algorithm: Approximate Maximum K-Cut
- Includes procedures:
gds.alpha.maxkcut.[mutate|mutate.estimate|stream|stream.estimate]
.
- Includes procedures:
- Introduced Link Prediction Pipelines to make it easier to define and calculate features, split your graph, and make predictions.
- Includes procedures:
gds.alpha.ml.pipeline.linkPrediction.create|addNodeProperty|addFeature|configureSplit|configureParams|train|predict.mutate
.
- Includes procedures:
- Introduced support for exporting additional node properties, including strings, from the underlying database.
- Added
additionalNodeProperties
parameter togds.graph.export
- Added
additionalNodeProperties
parameter togds.graph.export.csv
- Added
- Introduced experimental support for querying the in-memory graph with Cypher
- Added
gds.alpha.create.cypherdb
to allow neo4j to recognize the in-memory graph as a database for Cypher queries
- Added
- To allow users better ability to handle multiple concurrent users, we’ve added a system monitoring procedure,
gds.alpha.systemMonitor,
to provide an overview of the system's workload and available resources. - Progress logging is now turned on by default, and no longer requires changing your configuration settings. Progress can be accessed with
gds.beta.listProgress
- GraphSAGE now supports deterministic results with the
randomSeed
configuration parameter togds.beta.graphSage.train
. - Improve performance (up to 20x speedup) of weakly connected components,
gds.wcc,
for undirected graphs by applying a subgraph sampling optimization.
Bug fixes
- Fixed a bug regarding weighted graphs with multiple relationship types, which affected
gds.beta.graphSage
andgds.alpha.spanningTree
. - Supervised Machine Learning (Node Classification & Link Prediction):
- Fixed a
NaN
issue in NodeClassification where computations with very small probability values can cause the result to flip to infinity. - Fixed a bug in seeded NodeClassification and LinkPrediction which lead to non-deterministic behaviour.
- Corrected the training size used in
gds.alpha.ml.linkPrediction.train
. This affects thepenality
parameter used in logistic regression.
- Fixed a
- Progress Logging:
- Fixed a bug in beta progress event tracking where progress events would not be released if computation was abandoned before completion.
- Fixed a bug in beta progress event tracking for Pregel algorithms where progress events would not be released on algorithm completion.
- Node Similarity & KNN:
- Fixed a bug where on a node-filtered multi-relationship-type graph KNN and NodeSimilarity could write out of bounds.
- Fixed a bug which affected
gds.nodeSimilarity.write
andgds.alpha.knn.write
when being executed in combination with anodeLabels
filter. The bug either led to an exception or to wrong results due to an incorrect mapping between internal and Neo4j node ids. - Fixed a bug where
gds.nodeSimilarity.[write|mutate]
andgds.beta.knn.[write|mutate]
wrote duplicate relationships if the input graph is undirected.
- KNN:
- Fixed a bug in
gds.beta.knn
where negative values in node properties of type float arrays failed when returning thesimilarityDistribution
.
- Fixed a bug in
- Fast RP:
- FastRP stream mode explicitly returns a list of floats rather than a list of numbers. This agrees with the other embeddings, and saves users from having to cast/transform when processing the results further in Cypher.
- GraphSAGE:
- Fixed a bug in weighted GraphSAGE where the relationshipWeightProperty was not loaded.
- Fixed a bug in
gds.beta.graphSage
, where the concurrency parameter was not considered.
- Graph Operations:
- Fixed a bug in
gds.graph.removeNodeProperties
whereremovedPropertiesWritten
was too large for properties shared across multiple labels. - Fixed a bug in
gds.beta.graph.generate
, where random graphs with relationship properties could not be generated. - Fixed a bug in
gds.create.subgraph
which could lead to undefined behaviour or an AIOOB exception when executed on GDS Enterprise Edition. - Fixed a bug in
gds.graph.create
, where default values for array properties would throw for convertable types.
Improvements
- Pathfinding: Added existence checks for
sourceNode
andtargetNode
to all shortest path procedures in the product tier. - Improved runtime of
gds.fastRP
via better workload balancing between threads. - Lower memory footprint for LinkPrediction and NodeClassification.
- Improved the procedure output of
gds.beta.listProgress
. - Scale down scores computed by
gds.articleRank
.
- Fixed a bug in