This week, in our favourite PhD student’s rabbit hole, there was an article on the use of similitary measures. But, it’s not as easy as it sounds.
Today, recommender systems, especially collaborative filtering systems, are used to using similarity measures to find neighbours to recommend. The goal of a collaborative filtering system is to infer how a user might interact with some item based on how other users with similar tastes interacted with this item.
There are many ways to calculate these similarity measures. Some common ones are the Jaccard’s index, the Pearson correlation coefficient or cosine similarity. Not all of these are the same. Due to the massive amount of information available for the recommendation, many similarity measures may present good performance.
This article is a pre-print, but the researchers seems to focus on some neighbour selection problems using the Pearson correlation coefficient and cosine similarity. They examine all measures at three levels; a toy example, synthetic datasets, and real-world datasets. Their results show that sometimes Pearson correlation coefficient and cosine similarity exclude similar neighbors in favor of less valuable ones. As a result, they propose a new similarity measure called the normalized sum of multiplications (NSM) that doesn’t suffer from these drawbacks.
Even if the measure shows some robustness, we can wonder if this will work or if it will just be another name added to the list of potential similarity measures to use when designing a recommender system?
Comments