-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
алгоритм агломеративной кластеризации гиперграфов
- Loading branch information
1 parent
e407a14
commit db8544f
Showing
33 changed files
with
3,591 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# Hypergraph Clustering | ||
|
||
## Описание | ||
Библиотека предоставляет инструменты для агломеративной кластеризации гиперграфов. Она включает методы преобразования гиперграфов в матрицы инцидентности и смежности, автоматический выбор количества кластеров, а также метрики оценки качества кластеризации. | ||
|
||
Кластеризация гиперграфов полезна в задачах анализа данных, таких как: | ||
- Социальные сети | ||
- Транспортные системы | ||
- Биологические сети | ||
- Анализ текстов и документов | ||
|
||
Библиотека использует возможности `scikit-learn` и `scipy` для вычислений. | ||
|
||
--- | ||
|
||
## Установка | ||
1. Установите зависимости и проект: | ||
```bash | ||
pip install -e . | ||
``` | ||
|
||
--- | ||
|
||
## Использование | ||
|
||
### Пример 1: Кластеризация гиперграфа с заданным количеством кластеров | ||
```python | ||
from hypergraph_clustering.utils.graph_conversion import hypergraph_to_incidence_matrix, incidence_to_adjacency | ||
from hypergraph_clustering.clustering.agglomerative import AgglomerativeHypergraphClustering | ||
|
||
# Пример гиперграфа | ||
hyperedges = [[0, 1, 2], [1, 2, 3], [3, 4]] | ||
|
||
# Преобразуем гиперграф в матрицы | ||
incidence_matrix = hypergraph_to_incidence_matrix(hyperedges) | ||
adjacency_matrix = incidence_to_adjacency(incidence_matrix) | ||
|
||
# Кластеризация | ||
clustering = AgglomerativeHypergraphClustering(n_clusters=2) | ||
labels = clustering.fit(adjacency_matrix) | ||
|
||
print("Кластеры:", labels) | ||
``` | ||
|
||
### Пример 2: Автоматический выбор количества кластеров | ||
```python | ||
from hypergraph_clustering.clustering.auto_clustering import AutoClusterHypergraphClustering | ||
|
||
clustering = AutoClusterHypergraphClustering(linkage="average", max_clusters=5, scoring="silhouette") | ||
labels = clustering.fit(adjacency_matrix) | ||
|
||
print("Кластеры:", labels) | ||
print("Лучшее количество кластеров:", clustering.best_n_clusters) | ||
print("Оценка:", clustering.best_score) | ||
``` | ||
|
||
--- | ||
|
||
## Теоретическая справка | ||
|
||
Пример гиперграфа: | ||
- Узлы: {0, 1, 2, 3, 4} | ||
- Гиперрёбра: {{0, 1, 2}, {1, 2, 3}, {3, 4}} | ||
|
||
#### Преобразования гиперграфа | ||
Для применения алгоритмов кластеризации гиперграфы преобразуются: | ||
1. **Матрица инцидентности**: показывает связь между узлами и гиперрёбрами. | ||
2. **Матрица смежности**: вычисляется на основе матрицы инцидентности, показывает связи между узлами. | ||
|
||
--- | ||
|
||
### Алгоритмы кластеризации | ||
|
||
#### Агломеративная кластеризация | ||
Метод «снизу вверх», который начинает с представления каждого узла как отдельного кластера и объединяет их итеративно на основе заданного метода связи (`ward`, `complete`, `average`, `single`). | ||
|
||
- **Ward**: минимизирует увеличение внутрикластерной дисперсии. | ||
- **Complete**: учитывает максимальное расстояние между узлами. | ||
- **Average**: основывается на среднем расстоянии между узлами. | ||
- **Single**: минимизирует минимальное расстояние между узлами. | ||
|
||
#### Автоматический выбор количества кластеров | ||
Для автоматического выбора количества кластеров используется одна из метрик: | ||
- **Silhouette**: оценивает, насколько хорошо данные сгруппированы. | ||
- **Calinski-Harabasz**: измеряет плотность и разделение кластеров. | ||
- **Davies-Bouldin**: оценивает близость между кластерами (меньше лучше). | ||
|
||
--- | ||
|
||
## Структура проекта | ||
|
||
``` | ||
hypergraph_clustering/ | ||
├── clustering/ | ||
│ ├── agglomerative.py # Класс для агломеративной кластеризации | ||
│ ├── auto_clustering.py # Класс для автоматического выбора количества кластеров | ||
│ └── __init__.py | ||
├── metrics/ | ||
│ ├── evaluation.py # Метрики для оценки кластеризации | ||
│ └── __init__.py | ||
├── tests/ | ||
│ ├── test_agglomerative.py # Тесты для агломеративной кластеризации | ||
│ ├── test_auto_clustering.py # Тесты для автоматического выбора кластеров | ||
│ ├── test_graph_conversion.py # Тесты для преобразования гиперграфов | ||
│ └── __init__.py | ||
├── utils/ | ||
│ ├── graph_conversion.py # Утилиты для работы с гиперграфами | ||
│ ├── examples.py # Примеры гиперграфов | ||
│ └── __init__.py | ||
├── README.md # Документация | ||
├── setup.py # Файл установки | ||
├── requirements.txt # Зависимости | ||
└── data/ # Примеры гиперграфов в формате JSON | ||
├── social_network.json | ||
├── transport_network.json | ||
└── biological_network.json | ||
``` | ||
|
||
--- | ||
|
||
## Тестирование проекта | ||
Для проверки работоспособности проекта доступны тесты. Запустите их с помощью: | ||
```bash | ||
pytest hypergraph_clustering/tests/ | ||
``` | ||
|
||
--- | ||
|
||
## Источники | ||
- [Scikit-learn Documentation](https://scikit-learn.org/stable/) | ||
- [Silhouette Coefficient](https://en.wikipedia.org/wiki/Silhouette_(clustering)) | ||
- [Agglomerative Clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) | ||
|
||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
{ | ||
"hyperedges": [ | ||
[ | ||
0, | ||
1, | ||
2, | ||
3, | ||
4 | ||
], | ||
[ | ||
2, | ||
4, | ||
5, | ||
6 | ||
], | ||
[ | ||
6, | ||
7, | ||
8, | ||
9, | ||
10 | ||
], | ||
[ | ||
3, | ||
8, | ||
11, | ||
12 | ||
], | ||
[ | ||
1, | ||
9, | ||
10, | ||
13, | ||
14 | ||
], | ||
[ | ||
5, | ||
7, | ||
12, | ||
13 | ||
] | ||
], | ||
"num_nodes": 15 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
{ | ||
"hyperedges": [ | ||
[ | ||
0, | ||
1, | ||
2 | ||
], | ||
[ | ||
2, | ||
3, | ||
4, | ||
5 | ||
], | ||
[ | ||
5, | ||
6, | ||
7 | ||
], | ||
[ | ||
3, | ||
6, | ||
8, | ||
9 | ||
], | ||
[ | ||
8, | ||
9, | ||
10 | ||
], | ||
[ | ||
1, | ||
7, | ||
9, | ||
10 | ||
] | ||
], | ||
"num_nodes": 11 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
{ | ||
"hyperedges": [ | ||
[ | ||
0, | ||
1, | ||
2 | ||
], | ||
[ | ||
2, | ||
3, | ||
4, | ||
5 | ||
], | ||
[ | ||
5, | ||
6, | ||
7 | ||
], | ||
[ | ||
3, | ||
6, | ||
8, | ||
9 | ||
], | ||
[ | ||
8, | ||
9, | ||
10, | ||
11 | ||
], | ||
[ | ||
1, | ||
4, | ||
7, | ||
10 | ||
] | ||
], | ||
"num_nodes": 12 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
{ | ||
"hyperedges": [ | ||
[ | ||
0, | ||
1, | ||
2 | ||
], | ||
[ | ||
2, | ||
3, | ||
4 | ||
], | ||
[ | ||
4, | ||
5, | ||
6 | ||
], | ||
[ | ||
6, | ||
7, | ||
8 | ||
], | ||
[ | ||
1, | ||
4, | ||
7 | ||
], | ||
[ | ||
0, | ||
3, | ||
6 | ||
] | ||
], | ||
"num_nodes": 9 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
{ | ||
"hyperedges": [ | ||
[0, 1, 2], | ||
[2, 3, 4, 5], | ||
[5, 6, 7], | ||
[7, 8, 9, 10], | ||
[10, 11, 12], | ||
[1, 3, 8, 12] | ||
], | ||
"num_nodes": 13 | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
{ | ||
"hyperedges": [ | ||
[ | ||
0, | ||
1, | ||
2 | ||
], | ||
[ | ||
2, | ||
3, | ||
4 | ||
], | ||
[ | ||
4, | ||
5, | ||
6 | ||
], | ||
[ | ||
6, | ||
7, | ||
8 | ||
], | ||
[ | ||
1, | ||
4, | ||
7 | ||
], | ||
[ | ||
0, | ||
3, | ||
6 | ||
] | ||
], | ||
"num_nodes": 9 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
{ | ||
"hyperedges": [ | ||
[ | ||
0, | ||
1, | ||
2, | ||
3 | ||
], | ||
[ | ||
3, | ||
4, | ||
5, | ||
6 | ||
], | ||
[ | ||
6, | ||
7, | ||
8 | ||
], | ||
[ | ||
1, | ||
4, | ||
7 | ||
], | ||
[ | ||
2, | ||
5, | ||
8, | ||
9, | ||
10 | ||
], | ||
[ | ||
0, | ||
3, | ||
7, | ||
10 | ||
] | ||
], | ||
"num_nodes": 14 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"hypergraph_1": [[0, 1, 2], [1, 2, 3], [3, 4]], | ||
"hypergraph_2": [[0, 1], [2, 3], [4]], | ||
"hypergraph_3": [[0, 1, 2, 3]] | ||
} |
Oops, something went wrong.