NetworkXum

NetworkXum is NetworkX-like interface for large persistent graphs stored inside DBMS. This lets you upscale from Megabyte-Gigabyte graphs to Terabyte-Petabyte graphs (that won't fit into RAM), without changing your code. We provide wrappers for following DBs:

MongoDB - modern (yet mature) distributed document DB with good performance in most workloads
Neo4J - disturbingly slow and unstable DBMS positioned as the go-to Graph database,
SQLite3 - ubiquitous compact relational DBMS with basic SQL support,
PostgreSQL - most feature-rich open-source relational DB,
MySQL - the most commonly-used relational DB.

Project Structure

networkxum - Python wrappers for Graph (Network) datastructures backed by persistent DBs.
benchmarks - benchmarking tools and performance results.
assets - tiny datasets for testing purposes.
regexum - Python wrappers for search-able containers backed by persistent DBs.
benchmarks - benchmarking tools and performance results.
assets - tiny datasets for testing purposes.

Implementation Details & Included DBs

Some common databases have licences that prohibit sharing of benchmark results, so they were excluded from comparisons.

Name	Purpose	Implementation Language	Lines of Code (in `/src/`)
MongoDB	Documents	C++	3'900'000
Postgre	Tables	C	1'300'000
Neo4J	Graphs	Java	800'000
ElasticSearch	Text	Java	730'000
Unum	Graphs, Table, Text	C++	80'000

MongoDB

A distributed ACID document store.
Internally uses the BSON binary format.
Very popular open-source project backed by the $MDB publicly traded company.
Provides bindings for most programming languages (including PyMongo for Python).

ElasticSearch

Java-based document store built on top of Lucene text index.
Widely considered high-performance solutions due to the lack of competition.
Lucene was ported to multiple languages including projects like: CLucene and LucenePlusPlus.
Very popular open-source project backed by the $ESTC publicly traded company.

SQLite3

Embedded tabular single-file SQL database with an extreme level of adoption.
Provides a more direct C API in addition to the SQL interface.
We use SQLAlchemy for Object-Relational-Mapping, which is by far the most common Python ORM tool.
We overwrite the page size, Write-Ahead-Log format and concurrency settings for higher performance.

Postgre, MySQL and other SQLs

Most common open-source SQL databases.
Work well in single-node environment, but scale poorly out of the box.
Mostly store search indexes in a form of a B-Tree. They generally provide good read performance, but are slow to update.

Neo4J

The best known graph database with over 10 year of history.
Instead of SQL provides Cyper DSL for queries, which are transmitted using Bolt protocol.
Some of the essential indexing capabilities are not availiable in the free version.
There are some compatiability issues between API versions 3.5 and 4.
In our experience, Neo4J is extremely unstable and doesn't scale beyond tiny datasets. Generally crashes due to Java VM heap management issues.

TODO

Benchmark on small & mid graphs.
Session management in SQL and Neo4J.
Duration constraints for benchmarks.
Mixed Multithreaded Read/Write benchmarks.

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
assets		assets
benchmarks		benchmarks
networkxum		networkxum
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
bench.sh		bench.sh
build.sh		build.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NetworkXum

Project Structure

Implementation Details & Included DBs

MongoDB

ElasticSearch

SQLite3

Postgre, MySQL and other SQLs

Neo4J

TODO

About

Releases

Packages

Languages

LucyMaber/NetworkXum

Folders and files

Latest commit

History

Repository files navigation

NetworkXum

Project Structure

Implementation Details & Included DBs

MongoDB

ElasticSearch

SQLite3

Postgre, MySQL and other SQLs

Neo4J

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages