Seed and random results for embeddings #144

sebkaz · 2023-07-26T14:46:47Z

Hi!

I want to ask You about seed parameters for most node embeddings.
In the documentation, You have info that you put seed=42 as a default, but when You run, for example, Node2Vec twice, you get different embedding vectors.

Do you plan to make some changes so that if you have seed as default, there will also be workers=1?

best regards
S.

tomlincr · 2023-08-17T10:55:39Z

@sebkaz I've also noticed this issue (different embedding vectors per iteration of the same algorithm/params/seed).

I think it's a hard one to solve at the karateclub level, across all algorithms, given reliance on other packages under the hood.

E.g. NetMF uses sklearn's TruncatedSVD which defaults to a randomised solver and seems to acknowledge this issue in the documentation:

SVD suffers from a problem called “sign indeterminacy”, which means the sign of the components_ and the output from transform depend on the algorithm and random state. To work around this, fit instances of this class to data once, then keep the instance around to do transformations.

It would seem to me that any workarounds (e.g. setting workers=1, using other solvers) would lead to an increased compute time and on balance isn't worth it? E.g. if you have a specific use case where you need it to be reproducible then the user can address that on a case-by-case basis?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seed and random results for embeddings #144

Seed and random results for embeddings #144

sebkaz commented Jul 26, 2023

tomlincr commented Aug 17, 2023

Seed and random results for embeddings #144

Seed and random results for embeddings #144

Comments

sebkaz commented Jul 26, 2023

tomlincr commented Aug 17, 2023