-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Stochastic Variability in Community Detection Algorithms #820
base: main
Are you sure you want to change the base?
Conversation
This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures (NMI, VI, RI) on both random and structured graphs.
I won't have time to look in detail today, but I checked whether the docs build with this change, and unfortunately they do not. Can you please check if you can fix this? You can build the docs using |
Issue Description: AttributeError: module 'igraph' has no attribute '_igraph'. Did you mean: 'Graph'?
|
I think I've fixed the build issue in the |
Thank you! It worked. Screen.Recording.2025-03-27.at.9.33.28.PM.mov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice illustration!
I left a few comments for improvement.
Please remove changes to the sg_execution_times.rst
file. This file was likely committed by accident, and I think we should remove it (but not as part of this PR).
Stochastic Variability in Community Detection Algorithms | ||
========================================================= | ||
This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures (NMI, VI, RI) on both random and structured graphs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please spell out the names of similarity measures. If you like, you can add the abbreviations in parentheses.
# %% | ||
# Import Libraries | ||
import igraph as ig | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not import libraries you don't use.
""" | ||
# %% | ||
# Import Libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not capitalize words without good reason.
# Import Libraries | |
# Import libraries: |
# First, we generate a graph. | ||
# Generates a random Erdos-Renyi graph (no clear community structure) | ||
def generate_random_graph(n, p): | ||
return ig.Graph.Erdos_Renyi(n=n, p=p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not omit diacritics. It is Erdős-Rényi.
For clarity, do indicate that it is an Erdős-Rényi
Do we really need to define new functions to generate these graphs? This function just wraps Graph.Erdos_Renyi
.
# Generates a clustered graph with clear communities using the Stochastic Block Model (SBM) | ||
def generate_clustered_graph(n, clusters, intra_p, inter_p): | ||
block_sizes = [n // clusters] * clusters | ||
prob_matrix = [[intra_p if i == j else inter_p for j in range(clusters)] for i in range(clusters)] | ||
return ig.Graph.SBM(sum(block_sizes), prob_matrix, block_sizes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simplify code the code while also make it more illustrative and use empirical network data here?
You could try the karate club network, the Les Miserables network (already available in the same directory) or perhaps the famous Jazz musicians network. See which one gives a nicer result.
For the random graph, let's use one that has the same vertex count and density as the empirical one. Measure the density and pass it as the
|
||
# %% | ||
# Stochastic Community Detection | ||
# Runs Louvain's method iteratively to generate partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called the Louvain method, not Louvain's method.
Can you include a short explanation of why the result is different on each run? This is often a point of confusion for empirical researchers who are inexperienced in data analysis.
This is a modularity maximization method. Since the exact maximization of modularity is NP-hard, the Louvain method uses a greedy heuristic, processing vertices in a random order.
for i, (random_scores, clustered_scores, measure) in enumerate(measures): | ||
axes[i][0].hist(random_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black") | ||
axes[i][0].set_title(f"Histogram of {measure} - Random Graph") | ||
axes[i][0].set_xlabel(f"{measure} Score") | ||
axes[i][0].set_ylabel("Frequency") | ||
|
||
axes[i][1].hist(clustered_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black") | ||
axes[i][1].set_title(f"Histogram of {measure} - Clustered Graph") | ||
axes[i][1].set_xlabel(f"{measure} Score") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please plot the probability density instead of counts? While doesn't make a difference here, it is generally good practice, and it becomes relevant when comparing datasets of different sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please adjust the NMI and RI histograms to span the range
# %% | ||
# The results are plotted as histograms for random vs. clustered graphs, highlighting differences in detected community structures. | ||
#The key reason for the inconsistency in random graphs and higher consistency in structured graphs is due to community structure strength: | ||
#Random Graphs: Lack clear communities, leading to unstable partitions. Stochastic algorithms detect different structures across runs, resulting in low NMI, high VI, and inconsistent RI. | ||
#Structured Graphs: Have well-defined communities, so detected partitions are more stable across multiple runs, leading to high NMI, low VI, and stable RI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please be explicit about the range and interpretation of the three measures? NMI and RI are in
I have made the changes as per the review.
I have made the suggested changes. Screen.Recording.2025-03-28.at.3.38.52.PM.mov |
Should I directly delete this file from PR through files changed section? |
Yes, that would be fine. |
Used Chatgpt for understanding the functions and theory around the mathematical process.
At first run:

At second run:

This supports stochastic method might give wildly different answers on a network without any significant community structure, while it gives consistent answers on one that has obvious communities.
Since the number of iterations is set to 50, even in structured graphs, slight variations in similarity scores may be observed.