Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Stochastic Variability in Community Detection Algorithms #820

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

SKG24
Copy link

@SKG24 SKG24 commented Mar 26, 2025

Used Chatgpt for understanding the functions and theory around the mathematical process.

  • By submitting this pull request, I assign the copyright of my contribution to The igraph development team.

At first run:
Screenshot 2025-03-26 at 9 25 41 PM

At second run:
Screenshot 2025-03-26 at 9 27 55 PM

This supports stochastic method might give wildly different answers on a network without any significant community structure, while it gives consistent answers on one that has obvious communities.

Since the number of iterations is set to 50, even in structured graphs, slight variations in similarity scores may be observed.

SKG24 added 2 commits March 26, 2025 21:17
This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures (NMI, VI, RI) on both random and structured graphs.
@szhorvat
Copy link
Member

I won't have time to look in detail today, but I checked whether the docs build with this change, and unfortunately they do not. Can you please check if you can fix this? You can build the docs using scripts/mkdoc.sh -c.

@szhorvat szhorvat marked this pull request as draft March 26, 2025 20:53
@SKG24
Copy link
Author

SKG24 commented Mar 27, 2025

Screenshot 2025-03-27 at 9 56 39 AM

Issue Description:
The script mkdoc.sh -c fails with the following error:

AttributeError: module 'igraph' has no attribute '_igraph'. Did you mean: 'Graph'?
This issue is persistent even after:

  • Reinstalling python-igraph
  • Passing clustering objects instead of membership lists in compare_communities()
  • Ensuring the same code works in Google Colab

@ntamas
Copy link
Member

ntamas commented Mar 27, 2025

I think I've fixed the build issue in the main branch; please try again and let me know if it still doesn't work.

@SKG24
Copy link
Author

SKG24 commented Mar 27, 2025

Thank you! It worked.

Screen.Recording.2025-03-27.at.9.33.28.PM.mov

Copy link
Member

@szhorvat szhorvat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice illustration!

I left a few comments for improvement.

Please remove changes to the sg_execution_times.rst file. This file was likely committed by accident, and I think we should remove it (but not as part of this PR).

Stochastic Variability in Community Detection Algorithms
=========================================================
This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures (NMI, VI, RI) on both random and structured graphs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out the names of similarity measures. If you like, you can add the abbreviations in parentheses.

# %%
# Import Libraries
import igraph as ig
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not import libraries you don't use.

"""
# %%
# Import Libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not capitalize words without good reason.

Suggested change
# Import Libraries
# Import libraries:

Comment on lines 19 to 22
# First, we generate a graph.
# Generates a random Erdos-Renyi graph (no clear community structure)
def generate_random_graph(n, p):
return ig.Graph.Erdos_Renyi(n=n, p=p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not omit diacritics. It is Erdős-Rényi.

For clarity, do indicate that it is an Erdős-Rényi $G(n,p)$ graph (i.e. not $G(n,m)$).

Do we really need to define new functions to generate these graphs? This function just wraps Graph.Erdos_Renyi.

Comment on lines 25 to 29
# Generates a clustered graph with clear communities using the Stochastic Block Model (SBM)
def generate_clustered_graph(n, clusters, intra_p, inter_p):
block_sizes = [n // clusters] * clusters
prob_matrix = [[intra_p if i == j else inter_p for j in range(clusters)] for i in range(clusters)]
return ig.Graph.SBM(sum(block_sizes), prob_matrix, block_sizes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify code the code while also make it more illustrative and use empirical network data here?

You could try the karate club network, the Les Miserables network (already available in the same directory) or perhaps the famous Jazz musicians network. See which one gives a nicer result.

For the random graph, let's use one that has the same vertex count and density as the empirical one. Measure the density and pass it as the $p$ parameter of the $G(n,p)$ model. Alternatively, measure the edge count and pass it as the $m$ parameter of the $G(n,m)$ model.


# %%
# Stochastic Community Detection
# Runs Louvain's method iteratively to generate partitions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called the Louvain method, not Louvain's method.

Can you include a short explanation of why the result is different on each run? This is often a point of confusion for empirical researchers who are inexperienced in data analysis.

This is a modularity maximization method. Since the exact maximization of modularity is NP-hard, the Louvain method uses a greedy heuristic, processing vertices in a random order.

Comment on lines 76 to 84
for i, (random_scores, clustered_scores, measure) in enumerate(measures):
axes[i][0].hist(random_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black")
axes[i][0].set_title(f"Histogram of {measure} - Random Graph")
axes[i][0].set_xlabel(f"{measure} Score")
axes[i][0].set_ylabel("Frequency")

axes[i][1].hist(clustered_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black")
axes[i][1].set_title(f"Histogram of {measure} - Clustered Graph")
axes[i][1].set_xlabel(f"{measure} Score")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please plot the probability density instead of counts? While doesn't make a difference here, it is generally good practice, and it becomes relevant when comparing datasets of different sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please adjust the NMI and RI histograms to span the range $[0,1]$, and adjust the VI histogram to have a lower bound of 0.

Comment on lines 89 to 93
# %%
# The results are plotted as histograms for random vs. clustered graphs, highlighting differences in detected community structures.
#The key reason for the inconsistency in random graphs and higher consistency in structured graphs is due to community structure strength:
#Random Graphs: Lack clear communities, leading to unstable partitions. Stochastic algorithms detect different structures across runs, resulting in low NMI, high VI, and inconsistent RI.
#Structured Graphs: Have well-defined communities, so detected partitions are more stable across multiple runs, leading to high NMI, low VI, and stable RI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please be explicit about the range and interpretation of the three measures? NMI and RI are in $[0,1]$, and larger values indicate higher similarity. VI is a distance metric, thus lower values indicate higher similarity.

I have made the changes as per the review.
@SKG24
Copy link
Author

SKG24 commented Mar 28, 2025

I have made the suggested changes.

Screen.Recording.2025-03-28.at.3.38.52.PM.mov

@SKG24
Copy link
Author

SKG24 commented Apr 1, 2025

This is a nice illustration!

I left a few comments for improvement.

Please remove changes to the sg_execution_times.rst file. This file was likely committed by accident, and I think we should remove it (but not as part of this PR).

Should I directly delete this file from PR through files changed section?

@szhorvat
Copy link
Member

szhorvat commented Apr 1, 2025

Should I directly delete this file from PR through files changed section?

Yes, that would be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants