Skip to content

Commit

Permalink
Add README for PyPI project page
Browse files Browse the repository at this point in the history
  • Loading branch information
toadharvard committed Nov 28, 2023
1 parent 9eb87ec commit 8cf95a7
Show file tree
Hide file tree
Showing 2 changed files with 96 additions and 0 deletions.
95 changes: 95 additions & 0 deletions README_PYPI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
<p>
<img src="https://github.com/Mstrutov/Desbordante/assets/88928096/d687809b-5a3b-420e-a192-a1a2b6697b2a"/>
</p>

---

# desbordante: high-performance data profiler

## What is it?

**Desbordante** is a high-performance data profiler oriented towards exploratory data analysis

Try on https://desbordante.unidata-platform.ru/

## Table of Contents

- [Main Features](#main-features)
- [Usage Example](#usage-example)
- [Installation](#installation)
- [Installation from sources](#installation-from-sources)
- [Cite](#cite)

# Main Features

**desbordante** can discover and validate a range of data patterns, such as

1. Functional dependencies, both exact and approximate (discovery and validation)
2. Metric functional dependencies (validation)
3. Fuzzy algebraic constraints (discovery)
4. Association rules (discovery)

This package uses original library Desbordante platform, which is written in C++.
So, depending on the algorithm and dataset, it may be **2-10 times faster** than the alternatives

## Usage example

Discover approximate functional dependencies with different error thresholds:

```python-repl
>>> import desbordante
>>> pyro = desbordante.Pyro()
>>> pyro.load_data('iris.csv', ',', False)
>>> pyro.execute(error=0.0)
>>> pyro.get_fds()
[( 0 1 2 ) -> 4, ( 0 2 3 ) -> 4, ( 0 1 3 ) -> 4, ( 1 2 3 ) -> 4]
>>> pyro.execute(error=0.1)
>>> pyro.get_fds()
[( 2 ) -> 0, ( 2 ) -> 1, ( 0 ) -> 2, ( 2 ) -> 4, ( 2 ) -> 3, ( 3 ) -> 2, ( 3 ) -> 0, ( 0 ) -> 1, ( 0 ) -> 3, ( 1 ) -> 0, ( 1 ) -> 2, ( 3 ) -> 4, ( 3 ) -> 1, ( 1 ) -> 3, ( 0 ) -> 4, ( 1 ) -> 4]
>>> pyro.execute(error=0.2)
>>> pyro.get_fds()
[( 2 ) -> 1, ( 2 ) -> 0, ( 2 ) -> 4, ( 0 ) -> 2, ( 2 ) -> 3, ( 0 ) -> 1, ( 3 ) -> 4, ( 3 ) -> 2, ( 3 ) -> 1, ( 3 ) -> 0, ( 1 ) -> 2, ( 0 ) -> 3, ( 0 ) -> 4, ( 1 ) -> 0, ( 1 ) -> 4, ( 1 ) -> 3]
>>> pyro.execute(error=0.3)
>>> pyro.get_fds()
[( 2 ) -> 1, ( 0 ) -> 2, ( 2 ) -> 0, ( 3 ) -> 0, ( 2 ) -> 3, ( 1 ) -> 0, ( 2 ) -> 4, ( 3 ) -> 2, ( 0 ) -> 1, ( 1 ) -> 2, ( 3 ) -> 1, ( 3 ) -> 4, ( 0 ) -> 3, ( 4 ) -> 2, ( 4 ) -> 1, ( 0 ) -> 4, ( 1 ) -> 3, ( 1 ) -> 4, ( 4 ) -> 3]
```

More examples can be found
in [Desbordante repository](https://github.com/Mstrutov/Desbordante/tree/main/examples) on GitHub

## Installation

The source code is currently hosted on GitHub at https://github.com/Mstrutov/Desbordante

Wheels for the latest released version are available at the Python Package Index (PyPI)

**Currently only manylinux2014 (Ubuntu 20.04+, or any other linux distribution with gcc 10+) is supported**

```bash
$ pip install desbordante
```

## Installation from sources

Install all dependencies listed in [README.md](https://github.com/Mstrutov/Desbordante/blob/main/README.md)

Then, in the desbordante directory (the same one that contains this file), execute:

```bash
$ ./build.sh
$ pip install .
```

## Cite

If you use this software for research, please cite one of our papers:

1) George Chernishev, et al. Solving Data Quality Problems with Desbordante: a Demo. CoRR abs/2307.14935 (2023).
2) George Chernishev, et al. "Desbordante: from benchmarking suite to high-performance science-intensive data profiler (
preprint)". CoRR abs/2301.05965. (2023).
3) M. Strutovskiy, N. Bobrov, K. Smirnov and G. Chernishev, "Desbordante: a Framework for Exploring Limits of Dependency
Discovery Algorithms," 2021 29th Conference of Open Innovations Association (FRUCT), 2021, pp. 344-354, doi:
10.23919/FRUCT52173.2021.9435469.
4) A. Smirnov, A. Chizhov, I. Shchuckin, N. Bobrov and G. Chernishev, "Fast Discovery of Inclusion Dependencies with
Desbordante," 2023 33rd Conference of Open Innovations Association (FRUCT), Zilina, Slovakia, 2023, pp. 264-275, doi:
10.23919/FRUCT58615.2023.10143047.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ name = "desbordante"
version = "1.0.0"
description = "Python bindings for Desbordante, a science-intensive high-performance data profiler"
requires-python = ">=3.7"
readme = "README_PYPI.md"
license = { text = "AGPL-3.0-only" }

[tool.scikit-build.cmake.define]
Expand Down

0 comments on commit 8cf95a7

Please sign in to comment.