-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat ✨ : Add closeness centrality and floyd warshall feature #72
base: main
Are you sure you want to change the base?
Conversation
@Fre0Grella nx-parallel algorithms receive a if hasattr(G, "graph_object"):
G = G.graph_object Hopefully this will make the tests pass. Thanks! |
still give numpy.float64 object is not callable on closeness.py
@Schefflera-Arboricola Hi, thanks for the first advice, i hope you could help with this one too. |
assertion error on closeness centrality test
les miserables test closeness
Fix weight problem for optional unweighted graphs but it's painfully slow
Hi @Fre0Grella, Thank you for your contribution :) Is this PR ready for review? I will aim to get to it as soon as possible, though I cannot provide a specific timeline at the moment. I'm also tagging @dschult in case he would like to review it. Additionally, could you let us know if there is a deadline for your final project and if this PR needs to be merged before then? Thank you :) |
Hi @Schefflera-Arboricola, the PR is ready for review although i aim to improve the performance further. The dead line of the project is due the 18 of September. |
Thanks @Fre0Grella! I think instead of |
Not sure if i'm doing something wrong but replacing |
d886d3c
to
76e9ef4
Compare
if block_j == no_of_primary - 1: | ||
j_range = (block_j * blocking_factor, n - 1) | ||
params.append((i_range, j_range)) | ||
Parallel(n_jobs=(no_of_primary - 1) ** 2, require="sharedmem")( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently we have made it so that all the parameters of the Parallel
call are controlled by the user and we don't pass anything in it. I would highly encourage you(@Fre0Grella) to join our weekly nx-parallel meetings : https://scientific-python.org/calendars/networkx.ics , if you can. I think your insights and feedback would really help us shape the nx-parallel project in a better way. Thank you :)
All the tests are passing now and the style test should also pass once you update this PR branch with the recent fix that got merged. Thank you :) |
It would be awesome if we can merge this PR before 1st of october so i can present the project directly from the NetworkX repository to the exam commision. Thanks a lot for the support along the project. I'm also interest in further contribuition to the NetworkX project |
@Schefflera-Arboricola hey, the PR is ready to be merged, it only need a Review |
def _find_nearest_divisor(x, y): | ||
""" | ||
find the optimal value for the blocking factor parameter | ||
|
||
Parameters | ||
---------- | ||
x : node number | ||
|
||
y : cpu core available | ||
""" | ||
if x < y: | ||
return 1, False | ||
# Find the square root of x | ||
sqrt_x = int(math.sqrt(x)) + 1 | ||
|
||
# Execute the calculation in parallel | ||
results = Parallel(n_jobs=-1)( | ||
delayed(_calculate_divisor)(i, x, y) for i in range(2, sqrt_x) | ||
) | ||
|
||
# Filter out None results | ||
results = [r for r in results if r is not None] | ||
|
||
if len(results) <= 0: | ||
# This check if a number is prime, although repeat process with a non prime number | ||
best_divisor, _ = _find_nearest_divisor(x - 1, y) | ||
return best_divisor, True | ||
# Find the best divisor | ||
best_divisor, _, _ = min(results, key=lambda x: x[2]) | ||
|
||
return best_divisor, False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious-- for large graphs with prime numbers of nodes, it seems like the final block could be disproportionately larger, leading to load imbalance across cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could lead to load inbalance but the disproportion i think is sort of minimized.
I don't know if there's a better way of doing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what about just swapping:
best_divisor, _ = _find_nearest_divisor(x - 1, y)
with something as simple as:
best_divisor = max(1, x // y)
to distribute load approximately evenly if no divisor is found?
def _closeness_measure(k, v, wf_improved, len_G): | ||
"""calculate the closeness centrality measure of one node using the row of edges i | ||
|
||
Parameters | ||
---------- | ||
n : 1D numpy.ndarray | ||
the array of distance from every other node | ||
|
||
Returns | ||
------- | ||
k : numebr | ||
the closeness value for the selected node | ||
""" | ||
n = v.values() | ||
# print(n) | ||
n_reachable = [x for x in n if x != float("inf")] | ||
# print(n_reachable,len(n_reachable)) | ||
totsp = sum(n_reachable) | ||
# print(totsp) | ||
closeness_value = 0.0 | ||
if totsp > 0.0 and len_G > 1: | ||
closeness_value = (len(n_reachable) - 1.0) / totsp | ||
# normalize to number of nodes-1 in connected part | ||
if wf_improved: | ||
s = (len(n_reachable) - 1.0) / (len_G - 1) | ||
closeness_value *= s | ||
|
||
return k, closeness_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it could make sense to include an initial check in this function for graph disconnectedness, the goal being to "fail fast" to prevent edge-case scenarios where disconnected nodes lead to infinite geodesic distances. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a transitive closure test could be added in the future i think
difference1 = abs(result1 - y) | ||
# difference2 = abs((result2 - 1) ** 2 - y) | ||
difference2 = abs(result2 - y) | ||
|
||
if difference1 < difference2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be missing something obvious, but aren't divisor1
and divisor2
actually going to be the same thing here (i
)? i.e. do we even need to compare difference1 with difference2?
I wonder whether this could be simplified to something like:
def _calculate_divisor(i, x, y):
if x % i == 0:
quotient = x // i
return (i, quotient, abs(i - y))
return None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really the same thing, the idea is to find the divisor of x ( size of matrix) most similar to y (core available).
To find i try to divide x to all number from 1 to sqrt(x) so when i do x // i, divisor1 is gonna be i, the result of x // i is gonna be divisor2.
An example to clarify:
x=10
y=4
i=2
x // i = 5
So divisor1 = 2, divisor2 = 5
divisor2 is the most similar to 4 so i choose 5 as divisor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification on this @Fre0Grella
I think what I'm still struggling with is the need for two separate comparisons. Since divisor1
is i
and divisor2
is x // i
, aren't you always comparing i
and x // i
? The current implementation is functionally correct, but this seems like it could be greatly simplified without losing the intent of matching divisor2
more closely to y
.
For example, maybe we could directly compute the abs difference of both i
and x // i
with y
, then just return whichever is closest? And this would avoid the extra complexity in checking two sets of variables. So, building on my (incomplete) alternative suggestion previously, this might look something like:
def _calculate_divisor(i, x, y):
if x % i == 0:
divisor = x // i
return (i, divisor, min(abs(i - y), abs(divisor - y)))
return None
keeps the logic intact but then we don't need to assign and compare divisor1
and divisor2
explicitly. Let me know what you think / if I'm missing something!
@Fre0Grella -- I should mention (since I haven't yet) that this is excellent work. @Schefflera-Arboricola-- hopefully after the remaining polishing we can get this thing merged? |
Hi, i'm marco, a students from the university of Bologna.
For my final project i want to contribute this repository adding the closeness centrality and the floyd warshall algorithms for numpy.
The closeness centrality use the newly implemented floyd warshall.
The parallelized floyd warshall is a version of tiled FW referenced in the code.