Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat ✨ : Add closeness centrality and floyd warshall feature #72

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Fre0Grella
Copy link

Hi, i'm marco, a students from the university of Bologna.
For my final project i want to contribute this repository adding the closeness centrality and the floyd warshall algorithms for numpy.
The closeness centrality use the newly implemented floyd warshall.
The parallelized floyd warshall is a version of tiled FW referenced in the code.

@Schefflera-Arboricola Schefflera-Arboricola added the type: Enhancement New feature or request label Jul 31, 2024
@Schefflera-Arboricola
Copy link
Member

@Fre0Grella nx-parallel algorithms receive a ParallelGraph object, so you need to convert it back to the nx.Graph object to run the usual graph object methods on it. That's why you'll see the following lines in all the nx-parallel algorithms

if hasattr(G, "graph_object"):
        G = G.graph_object

Hopefully this will make the tests pass.

Thanks!

@Fre0Grella
Copy link
Author

@Schefflera-Arboricola Hi, thanks for the first advice, i hope you could help with this one too.
Running the test i got an error that i don't understand;
on the closeness.py file, when i use joblib Parallel to call the _closeness_measure function (that from a row of the adjacency matrix get the value of closeness for the node rappresented by the row) i get the error in the joblib code saying TypeError: 'numpy.float64' object is not callable.
I tried changing the data format thinking that joblib doesn't support numpy data structure but he kept saying in this case TypeError: 'float' object is not callable.

Fix weight problem for optional unweighted graphs but it's painfully slow
@Schefflera-Arboricola
Copy link
Member

Hi @Fre0Grella, Thank you for your contribution :)

Is this PR ready for review? I will aim to get to it as soon as possible, though I cannot provide a specific timeline at the moment. I'm also tagging @dschult in case he would like to review it.

Additionally, could you let us know if there is a deadline for your final project and if this PR needs to be merged before then?

Thank you :)

@Fre0Grella
Copy link
Author

Hi @Schefflera-Arboricola, the PR is ready for review although i aim to improve the performance further. The dead line of the project is due the 18 of September.
I doesn't need the PR to be merged before that

@Schefflera-Arboricola
Copy link
Member

Thanks @Fre0Grella! I think instead of nxp.cpu_count using nxp.get_n_jobs would have fixed the failing CI tests because we renamed it recently. If you still have the bandwidth please feel free to merge the main branch again. Really sorry for the delayed review!

@Fre0Grella
Copy link
Author

Thanks @Fre0Grella! I think instead of nxp.cpu_count using nxp.get_n_jobs would have fixed the failing CI tests because we renamed it recently. If you still have the bandwidth please feel free to merge the main branch again. Really sorry for the delayed review!

Not sure if i'm doing something wrong but replacing nxp.cpu_count to nxp.get_n_jobs isn't working. On the test i get AttributeError: module 'nx_parallel' has no attribute 'get_n_jobs'

if block_j == no_of_primary - 1:
j_range = (block_j * blocking_factor, n - 1)
params.append((i_range, j_range))
Parallel(n_jobs=(no_of_primary - 1) ** 2, require="sharedmem")(
Copy link
Member

@Schefflera-Arboricola Schefflera-Arboricola Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently we have made it so that all the parameters of the Parallel call are controlled by the user and we don't pass anything in it. I would highly encourage you(@Fre0Grella) to join our weekly nx-parallel meetings : https://scientific-python.org/calendars/networkx.ics , if you can. I think your insights and feedback would really help us shape the nx-parallel project in a better way. Thank you :)

@Schefflera-Arboricola
Copy link
Member

All the tests are passing now and the style test should also pass once you update this PR branch with the recent fix that got merged. Thank you :)

@Fre0Grella
Copy link
Author

It would be awesome if we can merge this PR before 1st of october so i can present the project directly from the NetworkX repository to the exam commision. Thanks a lot for the support along the project. I'm also interest in further contribuition to the NetworkX project

@Fre0Grella
Copy link
Author

@Schefflera-Arboricola hey, the PR is ready to be merged, it only need a Review

Comment on lines +150 to +180
def _find_nearest_divisor(x, y):
"""
find the optimal value for the blocking factor parameter

Parameters
----------
x : node number

y : cpu core available
"""
if x < y:
return 1, False
# Find the square root of x
sqrt_x = int(math.sqrt(x)) + 1

# Execute the calculation in parallel
results = Parallel(n_jobs=-1)(
delayed(_calculate_divisor)(i, x, y) for i in range(2, sqrt_x)
)

# Filter out None results
results = [r for r in results if r is not None]

if len(results) <= 0:
# This check if a number is prime, although repeat process with a non prime number
best_divisor, _ = _find_nearest_divisor(x - 1, y)
return best_divisor, True
# Find the best divisor
best_divisor, _, _ = min(results, key=lambda x: x[2])

return best_divisor, False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious-- for large graphs with prime numbers of nodes, it seems like the final block could be disproportionately larger, leading to load imbalance across cores?

Copy link
Author

@Fre0Grella Fre0Grella Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could lead to load inbalance but the disproportion i think is sort of minimized.
I don't know if there's a better way of doing it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what about just swapping:

best_divisor, _ = _find_nearest_divisor(x - 1, y)

with something as simple as:

best_divisor = max(1, x // y)

to distribute load approximately evenly if no divisor is found?

Comment on lines +46 to +73
def _closeness_measure(k, v, wf_improved, len_G):
"""calculate the closeness centrality measure of one node using the row of edges i

Parameters
----------
n : 1D numpy.ndarray
the array of distance from every other node

Returns
-------
k : numebr
the closeness value for the selected node
"""
n = v.values()
# print(n)
n_reachable = [x for x in n if x != float("inf")]
# print(n_reachable,len(n_reachable))
totsp = sum(n_reachable)
# print(totsp)
closeness_value = 0.0
if totsp > 0.0 and len_G > 1:
closeness_value = (len(n_reachable) - 1.0) / totsp
# normalize to number of nodes-1 in connected part
if wf_improved:
s = (len(n_reachable) - 1.0) / (len_G - 1)
closeness_value *= s

return k, closeness_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it could make sense to include an initial check in this function for graph disconnectedness, the goal being to "fail fast" to prevent edge-case scenarios where disconnected nodes lead to infinite geodesic distances. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a transitive closure test could be added in the future i think

Comment on lines +139 to +143
difference1 = abs(result1 - y)
# difference2 = abs((result2 - 1) ** 2 - y)
difference2 = abs(result2 - y)

if difference1 < difference2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be missing something obvious, but aren't divisor1 and divisor2 actually going to be the same thing here (i)? i.e. do we even need to compare difference1 with difference2?

I wonder whether this could be simplified to something like:

def _calculate_divisor(i, x, y):
    if x % i == 0:
        quotient = x // i
        return (i, quotient, abs(i - y))
    return None

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really the same thing, the idea is to find the divisor of x ( size of matrix) most similar to y (core available).
To find i try to divide x to all number from 1 to sqrt(x) so when i do x // i, divisor1 is gonna be i, the result of x // i is gonna be divisor2.

An example to clarify:
x=10
y=4
i=2
x // i = 5
So divisor1 = 2, divisor2 = 5

divisor2 is the most similar to 4 so i choose 5 as divisor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification on this @Fre0Grella

I think what I'm still struggling with is the need for two separate comparisons. Since divisor1 is i and divisor2 is x // i, aren't you always comparing i and x // i? The current implementation is functionally correct, but this seems like it could be greatly simplified without losing the intent of matching divisor2 more closely to y.

For example, maybe we could directly compute the abs difference of both i and x // i with y, then just return whichever is closest? And this would avoid the extra complexity in checking two sets of variables. So, building on my (incomplete) alternative suggestion previously, this might look something like:

def _calculate_divisor(i, x, y):
    if x % i == 0:
        divisor = x // i
        return (i, divisor, min(abs(i - y), abs(divisor - y)))
    return None

keeps the logic intact but then we don't need to assign and compare divisor1 and divisor2 explicitly. Let me know what you think / if I'm missing something!

@dPys
Copy link
Contributor

dPys commented Oct 23, 2024

@Fre0Grella -- I should mention (since I haven't yet) that this is excellent work. @Schefflera-Arboricola-- hopefully after the remaining polishing we can get this thing merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

3 participants