-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9eb87ec
commit 8cf95a7
Showing
2 changed files
with
96 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
<p> | ||
<img src="https://github.com/Mstrutov/Desbordante/assets/88928096/d687809b-5a3b-420e-a192-a1a2b6697b2a"/> | ||
</p> | ||
|
||
--- | ||
|
||
# desbordante: high-performance data profiler | ||
|
||
## What is it? | ||
|
||
**Desbordante** is a high-performance data profiler oriented towards exploratory data analysis | ||
|
||
Try on https://desbordante.unidata-platform.ru/ | ||
|
||
## Table of Contents | ||
|
||
- [Main Features](#main-features) | ||
- [Usage Example](#usage-example) | ||
- [Installation](#installation) | ||
- [Installation from sources](#installation-from-sources) | ||
- [Cite](#cite) | ||
|
||
# Main Features | ||
|
||
**desbordante** can discover and validate a range of data patterns, such as | ||
|
||
1. Functional dependencies, both exact and approximate (discovery and validation) | ||
2. Metric functional dependencies (validation) | ||
3. Fuzzy algebraic constraints (discovery) | ||
4. Association rules (discovery) | ||
|
||
This package uses original library Desbordante platform, which is written in C++. | ||
So, depending on the algorithm and dataset, it may be **2-10 times faster** than the alternatives | ||
|
||
## Usage example | ||
|
||
Discover approximate functional dependencies with different error thresholds: | ||
|
||
```python-repl | ||
>>> import desbordante | ||
>>> pyro = desbordante.Pyro() | ||
>>> pyro.load_data('iris.csv', ',', False) | ||
>>> pyro.execute(error=0.0) | ||
>>> pyro.get_fds() | ||
[( 0 1 2 ) -> 4, ( 0 2 3 ) -> 4, ( 0 1 3 ) -> 4, ( 1 2 3 ) -> 4] | ||
>>> pyro.execute(error=0.1) | ||
>>> pyro.get_fds() | ||
[( 2 ) -> 0, ( 2 ) -> 1, ( 0 ) -> 2, ( 2 ) -> 4, ( 2 ) -> 3, ( 3 ) -> 2, ( 3 ) -> 0, ( 0 ) -> 1, ( 0 ) -> 3, ( 1 ) -> 0, ( 1 ) -> 2, ( 3 ) -> 4, ( 3 ) -> 1, ( 1 ) -> 3, ( 0 ) -> 4, ( 1 ) -> 4] | ||
>>> pyro.execute(error=0.2) | ||
>>> pyro.get_fds() | ||
[( 2 ) -> 1, ( 2 ) -> 0, ( 2 ) -> 4, ( 0 ) -> 2, ( 2 ) -> 3, ( 0 ) -> 1, ( 3 ) -> 4, ( 3 ) -> 2, ( 3 ) -> 1, ( 3 ) -> 0, ( 1 ) -> 2, ( 0 ) -> 3, ( 0 ) -> 4, ( 1 ) -> 0, ( 1 ) -> 4, ( 1 ) -> 3] | ||
>>> pyro.execute(error=0.3) | ||
>>> pyro.get_fds() | ||
[( 2 ) -> 1, ( 0 ) -> 2, ( 2 ) -> 0, ( 3 ) -> 0, ( 2 ) -> 3, ( 1 ) -> 0, ( 2 ) -> 4, ( 3 ) -> 2, ( 0 ) -> 1, ( 1 ) -> 2, ( 3 ) -> 1, ( 3 ) -> 4, ( 0 ) -> 3, ( 4 ) -> 2, ( 4 ) -> 1, ( 0 ) -> 4, ( 1 ) -> 3, ( 1 ) -> 4, ( 4 ) -> 3] | ||
``` | ||
|
||
More examples can be found | ||
in [Desbordante repository](https://github.com/Mstrutov/Desbordante/tree/main/examples) on GitHub | ||
|
||
## Installation | ||
|
||
The source code is currently hosted on GitHub at https://github.com/Mstrutov/Desbordante | ||
|
||
Wheels for the latest released version are available at the Python Package Index (PyPI) | ||
|
||
**Currently only manylinux2014 (Ubuntu 20.04+, or any other linux distribution with gcc 10+) is supported** | ||
|
||
```bash | ||
$ pip install desbordante | ||
``` | ||
|
||
## Installation from sources | ||
|
||
Install all dependencies listed in [README.md](https://github.com/Mstrutov/Desbordante/blob/main/README.md) | ||
|
||
Then, in the desbordante directory (the same one that contains this file), execute: | ||
|
||
```bash | ||
$ ./build.sh | ||
$ pip install . | ||
``` | ||
|
||
## Cite | ||
|
||
If you use this software for research, please cite one of our papers: | ||
|
||
1) George Chernishev, et al. Solving Data Quality Problems with Desbordante: a Demo. CoRR abs/2307.14935 (2023). | ||
2) George Chernishev, et al. "Desbordante: from benchmarking suite to high-performance science-intensive data profiler ( | ||
preprint)". CoRR abs/2301.05965. (2023). | ||
3) M. Strutovskiy, N. Bobrov, K. Smirnov and G. Chernishev, "Desbordante: a Framework for Exploring Limits of Dependency | ||
Discovery Algorithms," 2021 29th Conference of Open Innovations Association (FRUCT), 2021, pp. 344-354, doi: | ||
10.23919/FRUCT52173.2021.9435469. | ||
4) A. Smirnov, A. Chizhov, I. Shchuckin, N. Bobrov and G. Chernishev, "Fast Discovery of Inclusion Dependencies with | ||
Desbordante," 2023 33rd Conference of Open Innovations Association (FRUCT), Zilina, Slovakia, 2023, pp. 264-275, doi: | ||
10.23919/FRUCT58615.2023.10143047. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters