Skip to content

Releases: neo4j/graph-data-science

Graph Data Science 1.8.1

20 Dec 17:15
Compare
Choose a tag to compare

GDS 1.8.1 is compatible with Neo4j 4.1, 4.2, 4.3 and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5.

Bug fixes

  • Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms, e.g. Page Rank.
  • Fixed a bug where gds.beta.graphSage could produce incorrect results for small graphs
  • Fixed a bug where gds.beta.graphSage could product incorrect results for the pool aggregator
  • Fixed a bug where gds.graph.create.cypher would not accept list properties for nodes
  • Fixed a bug in gds.beta.graph.create.subgraph where long values greater than 253 were not properly handled during expression evaluation

Graph Data Science 1.7.3

03 Dec 13:41
Compare
Choose a tag to compare

GDS 1.7.3 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x, 4.0, or 4.4. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.4 compatible release, please see GDS 1.8.0.

Bug fixes

  • Fixed a bug where Node2Vec would produce an AIOOBE on sufficiently large graphs.
  • Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms,e.g. Page Rank.

GDS 1.8.0

01 Dec 19:23
Compare
Choose a tag to compare

GDS 1.8 is compatible with Neo4j 4.1, 4.2, 4.3, and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5

Breaking changes

  • GDS now throws error messages on identifiers with trailing whitespaces to avoid input errors. This affects graphName, modelName, and several property parameters such as nodeWeightProperty or seedProperty.
  • We have removed the separate concurrency parameter from the model parameter space in gds.alpha.ml.nodeClassification.train, gds.alpha.ml.linkPrediction.train and gds.alpha.ml.pipeline.linkPrediction.configureParams. The concurrency value in the configuration of the train procedure will be used.
  • The procedure gds.alpha.randomWalk.stream has graduated to the beta tier, as gds.beta.randomWalk.stream.
    • Random Walk has been improved and aligned with the Node2Vec implementation. Please consult the documentation to find out about the new configuration options.
    • gds.alpha.randomWalk.stream has been removed.
    • A memory estimation procedure, gds.beta.randomWalk.estimate has been added
  • The procedure gds.beta.fastRPExtended has been merged with gds.fastRP.

New features

  • Link Prediction
    • Add new link prediction stream procedure gds.alpha.ml.pipeline.linkPrediction.predict.stream.
    • Added probabilityDistribution and samplingStats to the result of gds.alpha.ml.pipeline.linkPrediction.predict.mutate.
    • To improve prediction performance, we’ve added kNN-based approximate search strategy option to link prediction procedures gds.alpha.ml.pipeline.linkPrediction.predict.stream|mutate.
    • Node property steps in Link Prediction pipelines can use a relationship property.
  • Node Classification pipelines: similar to link prediction pipelines, we’ve added a pipeline procedure for node classification, where users can define the features, splitting strategy, and model training options. We’ve added:
    • gds.alpha.ml.pipeline.nodeClassification.create
    • gds.alpha.ml.pipeline.nodeClassification.addNodeProperty
    • gds.alpha.ml.pipeline.nodeClassification.selectFeatures
    • gds.alpha.ml.pipeline.nodeClassification.configureParams
    • gds.alpha.ml.pipeline.nodeClassification.configureSplit
    • gds.alpha.ml.pipeline.nodeClassification.train
    • gds.alpha.ml.pipeline.nodeClassification.predict.mutate|stream|write
  • New algorithm: Conductance, gds.alpha.conductance.stream, can be used to compute a metric to evaluate the quality of communities identified by community detection algorithms.
  • Added support for preserving a relationship property in gds.alpha.ml.splitRelationships.mutate.
  • The procedure gds.fastRP has received additional configuration parameters:
    • featureProperties: to configure using node properties as part of the embedding.
    • propertyRatio: to control how much of the embedding is computed from properties.
    • nodeSelfInfluence: allows using each node's initial random vector as a contribution to the node's embedding. Especially useful for graphs with disconnected nodes.

Bug fixes

  • Added check that concurrency is meeting determinism constraints for K-Nearest Neighbors whenever randomSeed is overridden.
  • Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
  • Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
  • Fixed an issue where KNN did not add candidates to the topK result.
  • Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an issue where running gds.alpha.ml.pipeline.linkPrediction.train could result in an error on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
  • Fixed an issue with unmapped Neo4j node ids throwing ArrayIndexOutOfBoundsException.
  • Fixed a bug where the in-memory storage engine would not find the correct graph store if the db name was not lowercase
  • Fixed a bug where the graph store would be released when storing the CypherGraphStore in the catalog
  • Fixed a bug where Node2Vec would produce an ArrayIndexOutOfBounds error on sufficiently large graphs.

Improvements

  • Added context information to log entries in debug and warning.
  • Log Training loss as part of general progress logging
  • Running transactions while projecting a graph now has less chance of breaking the projected graph
  • Improve runtime performance for FastRP
  • Use Neo4j node id instead of internal GDS node id when seeding generation of initial random vectors in FastRP.
  • The in-memory cypher db is now capable of querying relationship ids, types and properties
  • The procedure gds.alpha.randomWalk.stream has been improved and should now run faster and more stable.

Graph Data Science 1.8.0-Preview

26 Nov 15:20
Compare
Choose a tag to compare

GDS 1.8 is compatible with Neo4j 4.1, 4.2, 4.3, and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5

Breaking changes

  • GDS now throws error messages on identifiers with trailing whitespaces to avoid input errors. This affects graphName, modelName, and several property parameters such as nodeWeightProperty or seedProperty.
  • We have removed the separate concurrency parameter from the model parameter space in gds.alpha.ml.nodeClassification.train, gds.alpha.ml.linkPrediction.train and gds.alpha.ml.pipeline.linkPrediction.configureParams. The concurrency value in the configuration of the train procedure will be used.
  • The procedure gds.alpha.randomWalk.stream has been improved and aligned with the Node2Vec implementation. Please consult the documentation to find out about the new configuration options.
  • The procedure gds.beta.fastRPExtended has been merged with gds.fastRP.

New features

  • Link Prediction
    • Add new link prediction stream procedure gds.alpha.ml.pipeline.linkPrediction.predict.stream.
    • Added probabilityDistribution and samplingStats to the result of gds.alpha.ml.pipeline.linkPrediction.predict.mutate.
    • To improve prediction performance, we’ve added kNN-based approximate search strategy option to link prediction procedures gds.alpha.ml.pipeline.linkPrediction.predict.stream|mutate.
    • Node property steps in Link Prediction pipelines can use a relationship property.
  • Node Classification pipelines: similar to link prediction pipelines, we’ve added a pipeline procedure for node classification, where users can define the features, splitting strategy, and model training options. We’ve added:
    • gds.alpha.ml.pipeline.nodeClassification.create
    • gds.alpha.ml.pipeline.nodeClassification.addNodeProperty
    • gds.alpha.ml.pipeline.nodeClassification.addFeatures
    • gds.alpha.ml.pipeline.nodeClassification.configureParams
    • gds.alpha.ml.pipeline.nodeClassification.configureSplit
    • gds.alpha.ml.pipeline.nodeClassification.train
    • gds.alpha.ml.pipeline.nodeClassification.predict.mutate|stream|write
  • New algorithm: Conductance, gds.alpha.conductance.stream, can be used to compute a metric to evaluate the quality of communities identified by community detection algorithms.
  • Added support for preserving a relationship property in gds.alpha.ml.splitRelationships.mutate.
  • The procedure gds.fastRP has received additional configuration parameters:
    • featureProperties: to configure using node properties as part of the embedding.
    • propertyRatio: to control how much of the embedding is computed from properties.
    • nodeSelfInfluence: allows using each node's initial random vector as a contribution to the node's embedding. Especially useful for graphs with disconnected nodes.

Bug fixes

  • Added check that concurrency is meeting determinism constraints for K-Nearest Neighbors whenever randomSeed is overridden.
  • Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
  • Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
  • Fixed an issue where KNN did not add candidates to the topK result.
  • Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an issue where running gds.alpha.ml.pipeline.linkPrediction.train could result in an error on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
  • Fixed an issue with unmapped Neo4j node ids throwing ArrayIndexOutOfBoundsException.
  • Fixed a bug where the in-memory storage engine would not find the correct graph store if the db name was not lowercase
  • Fixed a bug where the graph store would be released when storing the CypherGraphStore in the catalog
  • Fixed a bug where Node2Vec would produce an ArrayIndexOutOfBounds error on sufficiently large graphs.

Improvements

  • Added context information to log entries in debug and warning.
  • Log Training loss as part of general progress logging
  • Running transactions while projecting a graph now has less chance of breaking the projected graph
  • Improve runtime performance for FastRP
  • Use Neo4j node id instead of internal GDS node id when seeding generation of initial random vectors in FastRP.
  • The in-memory cypher db is now capable of querying relationship ids, types and properties
  • The procedure gds.alpha.randomWalk.stream has been improved and should now run faster and more stable.

Graph Data Science 1.7.2

01 Nov 20:35
Compare
Choose a tag to compare

GDS 1.7.2 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5

Bug fixes

  • Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
  • Fixed an issue where KNN did not add candidates to the topK result.
  • Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an issue where running gds.alpha.ml.pipeline.linkPrediction.train could result in an error on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an issue with unmapped Neo4j node ids throwing ArrayIndexOutOfBoundsException

GDS 1.1.7

01 Nov 20:31
Compare
Choose a tag to compare

GDS 1.1.7 is compatible with Neo4j Neo4j 3.5.x. For a 4.x compatible release, please see GDS 1.7.2.

Bug fixes

  • Fixed a bug in Louvain where changes to maxIterations were ignored.
  • Fixed a bug which caused gds.graph.list and gds.graph.drop to throw an error when specifying a graph with duplicate property keys by failing early
  • Fixed a bug where gds.alpha.scc would sometimes fail with an ArrayIndexOutOfBoundsException.
  • Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.

GDS 1.7.1

13 Oct 13:17
Compare
Choose a tag to compare

Release Date October 12, 2021

GDS 1.7.1 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6. For a 4.0 compatible release, please see GDS 1.6.5

  • Fixed a bug where Cypher graph loading and subgraph creation which could lead to ArrayIndexOutOfBounds errors.
  • Fixed an ArrayIndexOutOfBounds caused by running triangleCount on graphs with multiple relationship types.

Graph Data Science 1.7.0

23 Sep 18:32
Compare
Choose a tag to compare

GDS 1.7.0 is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6. For a 4.0 compatible release, please see GDS 1.6.5

Breaking changes

  • This release does not support Neo4j 4.0.x
  • Align returned modelInfo entry names of gds.alpha.ml.linkPrediction.train and gds.alpha.ml.nodeClassification.train with the model catalog. Now containing modelName and modelInfo instead of name and info.
  • Remove the sharedUpdater parameter from gds.alpha.ml.linkPrediction and gds.alpha.ml.nodeClassification.
  • gds.beta.graph.export.csv now exports into a subdirectory called export. Previously, the exported graphs were written directly into the configured directory.
  • Renamed all graphalgo packages to gds

New features

  • New Algorithm: Approximate Maximum K-Cut
    • Includes procedures: gds.alpha.maxkcut.[mutate|mutate.estimate|stream|stream.estimate].
  • Introduced Link Prediction Pipelines to make it easier to define and calculate features, split your graph, and make predictions.
    • Includes procedures: gds.alpha.ml.pipeline.linkPrediction.create|addNodeProperty|addFeature|configureSplit|configureParams|train|predict.mutate.
  • Introduced support for exporting additional node properties, including strings, from the underlying database.
    • Added additionalNodeProperties parameter to gds.graph.export
    • Added additionalNodeProperties parameter to gds.graph.export.csv
  • Introduced experimental support for querying the in-memory graph with Cypher
    • Added gds.alpha.create.cypherdb to allow neo4j to recognize the in-memory graph as a database for Cypher queries
  • To allow users better ability to handle multiple concurrent users, we’ve added a system monitoring procedure, gds.alpha.systemMonitor, to provide an overview of the system's workload and available resources.
  • Progress logging is now turned on by default, and no longer requires changing your configuration settings. Progress can be accessed with gds.beta.listProgress
  • GraphSAGE now supports deterministic results with the randomSeed configuration parameter to gds.beta.graphSage.train.
  • Improve performance (up to 20x speedup) of weakly connected components, gds.wcc, for undirected graphs by applying a subgraph sampling optimization.

Bug fixes

  • Fixed a bug regarding weighted graphs with multiple relationship types, which affected gds.beta.graphSage and gds.alpha.spanningTree.
  • Supervised Machine Learning (Node Classification & Link Prediction):
    • Fixed a NaN issue in NodeClassification where computations with very small probability values can cause the result to flip to infinity.
    • Fixed a bug in seeded NodeClassification and LinkPrediction which lead to non-deterministic behaviour.
    • Corrected the training size used in gds.alpha.ml.linkPrediction.train. This affects the penality parameter used in logistic regression.
  • Progress Logging:
    • Fixed a bug in beta progress event tracking where progress events would not be released if computation was abandoned before completion.
    • Fixed a bug in beta progress event tracking for Pregel algorithms where progress events would not be released on algorithm completion.
  • Node Similarity & KNN:
    • Fixed a bug where on a node-filtered multi-relationship-type graph KNN and NodeSimilarity could write out of bounds.
    • Fixed a bug which affected gds.nodeSimilarity.write and gds.alpha.knn.write when being executed in combination with a nodeLabels filter. The bug either led to an exception or to wrong results due to an incorrect mapping between internal and Neo4j node ids.
    • Fixed a bug where gds.nodeSimilarity.[write|mutate] and gds.beta.knn.[write|mutate] wrote duplicate relationships if the input graph is undirected.
  • KNN:
    • Fixed a bug in gds.beta.knn where negative values in node properties of type float arrays failed when returning the similarityDistribution.
  • Fast RP:
    • FastRP stream mode explicitly returns a list of floats rather than a list of numbers. This agrees with the other embeddings, and saves users from having to cast/transform when processing the results further in Cypher.
  • GraphSAGE:
    • Fixed a bug in weighted GraphSAGE where the relationshipWeightProperty was not loaded.
    • Fixed a bug in gds.beta.graphSage, where the concurrency parameter was not considered.
  • Graph Operations:
    • Fixed a bug in gds.graph.removeNodeProperties where removedPropertiesWritten was too large for properties shared across multiple labels.
    • Fixed a bug in gds.beta.graph.generate, where random graphs with relationship properties could not be generated.
    • Fixed a bug in gds.create.subgraph which could lead to undefined behaviour or an AIOOB exception when executed on GDS Enterprise Edition.
    • Fixed a bug in gds.graph.create, where default values for array properties would throw for convertable types.

    Improvements

    • Pathfinding: Added existence checks for sourceNode and targetNode to all shortest path procedures in the product tier.
    • Improved runtime of gds.fastRP via better workload balancing between threads.
    • Lower memory footprint for LinkPrediction and NodeClassification.
    • Improved the procedure output of gds.beta.listProgress.
    • Scale down scores computed by gds.articleRank.

Graph Data Science 1.6.5

13 Sep 17:48
Compare
Choose a tag to compare

GDS 1.6.5 is compatible with Neo4j 4.0, 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.

Bug fixes

  • Fixed a bug in gds.beta.graph.generate, where random graphs with relationship properties could not be generated.
  • Fixed a bug in gds.graph.create, where default values for array properties would throw for convertable types.
  • Fixed a bug in gds.beta.graphSage, where the concurrency parameter was not considered.
  • Fixed a bug where the BitIdMap node mapping builder (on by default in GDS Enterprise Edition) would not correctly count all nodes in certain situations.
  • Corrected the training size used in gds.alpha.ml.linkPrediction.train. This affects the penality parameter used in logistic regression.

GDS 1.7.0-Preview

09 Sep 22:10
Compare
Choose a tag to compare
GDS 1.7.0-Preview Pre-release
Pre-release

GDS 1.7.0-preview is compatible with Neo4j 4.1, 4.2, and 4.3 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6. For a 4.0 compatible release, please see GDS 1.1.6

Breaking changes

  • This release does not support Neo4j 4.0.x
  • Align returned modelInfo entry names of gds.alpha.ml.linkPrediction.train and gds.alpha.ml.nodeClassification.train with the model catalog. Now containing modelName and modelInfo instead of name and info.
  • Remove the sharedUpdater parameter from gds.alpha.ml.linkPrediction and gds.alpha.ml.nodeClassification.
  • gds.beta.graph.export.csv now exports into a subdirectory called export. Previously, the exported graphs were written directly into the configured directory.
  • Renamed all graphalgo packages to gds

New features

  • New Algorithm: Approximate Maximum K-Cut
    • Includes procedures: gds.alpha.maxkcut.[mutate|mutate.estimate|stream|stream.estimate].
  • Introduced Link Prediction Pipelines to make it easier to define and calculate features, split your graph, and make predictions.
    • Includes procedures: gds.alpha.ml.pipeline.linkPrediction.create|addNodeProperty|addFeature|configureSplit|configureParams|train|predict.mutate.
  • Introduced support for exporting additional node properties, including strings, from the underlying database.
    • Added additionalNodeProperties parameter to gds.graph.export
    • Added additionalNodeProperties parameter to gds.graph.export.csv
  • Introduced experimental support for querying the in-memory graph with Cypher
    • Added gds.alpha.create.cypherdb to allow neo4j to recognize the in-memory graph as a database for Cypher queries
  • To allow users better ability to handle multiple concurrent users, we’ve added a system monitoring procedure, gds.alpha.systemMonitor, to provide an overview of the system's workload and available resources.
  • Progress logging is now turned on by default, and no longer requires changing your configuration settings. Progress can be accessed with gds.beta.listProgress
  • GraphSAGE now supports deterministic results with the randomSeed configuration parameter to gds.beta.graphSage.train.
  • Improve performance (up to 20x speedup) of weakly connected components, gds.wcc, for undirected graphs by applying a subgraph sampling optimization.

Bug fixes

  • Fixed a bug regarding weighted graphs with multiple relationship types, which affected gds.beta.graphSage and gds.alpha.spanningTree.
  • Supervised Machine Learning (Node Classification & Link Prediction):
    • Fixed a NaN issue in NodeClassification where computations with very small probability values can cause the result to flip to infinity.
    • Fixed a bug in seeded NodeClassification and LinkPrediction which lead to non-deterministic behaviour.
    • Corrected the training size used in gds.alpha.ml.linkPrediction.train. This affects the penality parameter used in logistic regression.
  • Progress Logging:
    • Fixed a bug in beta progress event tracking where progress events would not be released if computation was abandoned before completion.
    • Fixed a bug in beta progress event tracking for Pregel algorithms where progress events would not be released on algorithm completion.
  • Node Similarity & KNN:
    • Fixed a bug where on a node-filtered multi-relationship-type graph KNN and NodeSimilarity could write out of bounds.
    • Fixed a bug which affected gds.nodeSimilarity.write and gds.alpha.knn.write when being executed in combination with a nodeLabels filter. The bug either led to an exception or to wrong results due to an incorrect mapping between internal and Neo4j node ids.
    • Fixed a bug where gds.nodeSimilarity.[write|mutate] and gds.beta.knn.[write|mutate] wrote duplicate relationships if the input graph is undirected.
  • KNN:
    • Fixed a bug in gds.beta.knn where negative values in node properties of type float arrays failed when returning the similarityDistribution.
  • Fast RP:
    • FastRP stream mode explicitly returns a list of floats rather than a list of numbers. This agrees with the other embeddings, and saves users from having to cast/transform when processing the results further in Cypher.
  • GraphSAGE:
    • Fixed a bug in weighted GraphSAGE where the relationshipWeightProperty was not loaded.
    • Fixed a bug in gds.beta.graphSage, where the concurrency parameter was not considered.
  • Graph Operations:
    • Fixed a bug in gds.graph.removeNodeProperties where removedPropertiesWritten was too large for properties shared across multiple labels.
    • Fixed a bug in gds.beta.graph.generate, where random graphs with relationship properties could not be generated.
    • Fixed a bug in gds.create.subgraph which could lead to undefined behaviour or an AIOOB exception when executed on GDS Enterprise Edition.
    • Fixed a bug in gds.graph.create, where default values for array properties would throw for convertable types.

    Improvements

    • Pathfinding: Added existence checks for sourceNode and targetNode to all shortest path procedures in the product tier.
    • Improved runtime of gds.fastRP via better workload balancing between threads.
    • Lower memory footprint for LinkPrediction and NodeClassification.
    • Improved the procedure output of gds.beta.listProgress.
    • Scale down scores computed by gds.articleRank.