Author : Soon Hyeok Park, Kyoungok Kim(coressponding author)
Paper Link : https://link.springer.com/article/10.1007/s12652-023-04647-0
The first author uploaded the experimental code for this study through this repository.
- We proposed strategies to address each limitation of Rating_Jaccard to obtain better extensions of Jaccard similarity.
- We verifed the efectiveness of each proposed strategy in detail using several datasets and determined the best similarity among the proposed Jaccard similarity extensions, considering the prediction performance and computation time.
- RJAC_DUB, which was determined as the best similarity among the proposed ones in this study, considered the rating information and inherently users’ rating preference behavior in the similarity calculation.
- Through extensive experiments based on various datasets, we demonstrated the superiority of the proposed similarity over existing variations of Jaccard similarity and other similarity measures.
The similarity measures proposed in this study are inspired by Rating_Jaccard (Ayub et al. 2020a). This study aims to address the following limitations of Rating_Jaccard.
- In general, the similarity between two users is proportional to the number of co-rated items. However, Rating_Jaccard is inversely proportional to the number of co-rated items.
- Rating_Jaccard only enumerates co-rated items with identical ratings; thus, the similarity between users may be zero in more cases than for Jaccard similarity. Particularly, if two users have a small number of co-rated items, the similarity between them is likely to be zero. In this case, it is impossible to distinguish them from a pair of users with zero co-rated items. Moreover, the overabundance of zero similarity values complicates the identifcation of a sufcient number of nearest neighbors, which increases the number of items whose ratings cannot be predicted during the prediction of ratings of unrated items.
- The rating behaviors of users vary widely. However, Rating_Jaccard does not consider this variability
In terms of the MAE, F1-score, and computation time, RJAC_DUB was determined to be the best metric among the proposed candidates.
RJAC_DUB generally outperformed the other similarity measures in terms of MAE and F1-score.
Moreover, RJAC_DUB was slower than the traditional similarity but the computation time did not increase signifcantly compared to the other improved similarities.
Furthermore, RJAC_DUB was superior to JacLMH for all the datasets and RJaccard for the large-size datasets in terms of calculation speed.