Improve computation of ranking related metrics #530

darrylong · 2023-10-12T07:11:44Z

Description

Certain ranking metrics currently take a considerably long time for a single calculation output.

Other Comments

It could be due to:

Some slow calculation in the ranking_eval function.

cornac/cornac/eval_methods/base_method.py

Line 174 in 5747077

test_set.user_indices, desc="Ranking", disable=not verbose, miniters=100

More testing is required to determine which part is to be optimised.
How the score function calculates during evaluation on a user-by-user basis.

cornac/cornac/models/recommender.py

Line 192 in 5747077

def score(self, user_idx, item_idx=None):

Further discussion to find how we could improve performance would be great. :)

Thank you all!

The text was updated successfully, but these errors were encountered:

tqtg · 2023-10-12T18:39:12Z

One way to speed up evaluation is to do sampling for negative items. This usually makes sense when the set of items is huge and we can't afford to rank all of them. The slowness is probably caused by argsort function:

cornac/cornac/models/recommender.py

Line 285 in 5747077

# rank items based on their scores

Sketch idea is to let user define how many items they want to use as negatives (e.g., 100), then we do sampling somewhere here:

cornac/cornac/eval_methods/base_method.py

Line 195 in 5747077

# filter items being considered for evaluation

After that, where are two ways to continue:

let the model scores all items as usual, we then only compare the true positives and the sampled negatives while computing metrics.
- pros: utilize matrix ops which are quite efficient, less tedious changes
- cons: still compute scores for some items that will not be used
ask the model to score only true positives and sampled negatives, put dummy scores for the items not being used while computing metrics.
- pros: could be faster if negative items set to a small number
- cons: diminishing return when not utilizing matrix ops and negative items set to a large number

A smarter way is adaptive, analyzing both and set a flag to pick one or the other for subsequent iterations. Just something coming on top of my head at the moment. Let's discuss if anyone has more clever ideas.

hieuddo · 2023-10-13T08:57:29Z

My thought on this is because of the score() function in Recommender class:

cornac/cornac/models/recommender.py

Line 192 in 5747077

def score(self, user_idx, item_idx=None):

where we only evaluate users one by one. Will doing batch user evaluation be faster for GPU-based models?

tqtg · 2023-10-13T15:58:24Z

Most of the GPU-based models only utilize GPU during training but not evaluation. Their predictive functions tend to be dot-product between user and item embedding which is quite efficient with Numpy. It wouldn't hurt to have, let's say score_batch() function, to do scoring for a batch of users. However, the speedup might be negligible based on my experience . Also, to take full advantage of this, we might need to re-design metrics to compute for a batch of users, which is not trivial for some cases.

lthoang · 2023-10-15T09:14:23Z

Some models with a large number of parameters cannot generate scores for a large number of items (using batch scoring function) due to insufficient memory. Especially, when the scoring function requires to compute the score for every pairs of user and item that leads to scalability issue when ranking the full set of items.

I prefer to add an optional parameter to specify the number of negative samples for ranking evaluation. This has been applied in many research. Of course, we need to ensure reproducibility by specifying a random seed for whatever pseudo-random generator was used.

2. ask the model to score only true positives and sampled negatives, put dummy scores for the items not being used while computing metrics.
   
   * pros: could be faster if negative items set to a small number
   * cons: diminishing return when not utilizing matrix ops and negative items set to a large number

tqtg · 2023-10-17T04:33:11Z

@lthoang @hieuddo @darrylong it seems that we can have an option for negative sampling during evaluation. Anyone wants to take a lead on this improvement? 😃

darrylong added the feature New feature/enhancement request label Oct 12, 2023

darrylong assigned darrylong, lthoang, tqtg and hieuddo Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve computation of ranking related metrics #530

Improve computation of ranking related metrics #530

darrylong commented Oct 12, 2023

tqtg commented Oct 12, 2023 •

edited

Loading

hieuddo commented Oct 13, 2023

tqtg commented Oct 13, 2023

lthoang commented Oct 15, 2023

tqtg commented Oct 17, 2023

Improve computation of ranking related metrics #530

Improve computation of ranking related metrics #530

Comments

darrylong commented Oct 12, 2023

Description

Other Comments

tqtg commented Oct 12, 2023 • edited Loading

hieuddo commented Oct 13, 2023

tqtg commented Oct 13, 2023

lthoang commented Oct 15, 2023

tqtg commented Oct 17, 2023

tqtg commented Oct 12, 2023 •

edited

Loading