Item-Based Collaborative Filtering, Explained

Jericho Siahaya
5 min readApr 29, 2023

YouTube, Netflix, Spotify or even Facebook and any other entertainment-consume applications uses the same thing (formula) to provide enormous content to your feed, so you can stay relax on your phone and not even bother to leave your bed.

This thing is called recommender system.

What the heck is recommender system?

Every single random video, films, songs or even pictures that pop ups on your screen are being programmed by this system. A recommender system is a tool that helps suggest items to a user based on their past behavior or preferences. It’s like having a personal shopping assistant who recommends products or services to you based on what you’ve liked or purchased in the past.

For example, if you’ve previously enjoyed watching romantic comedies on a streaming platform, a recommender system may suggest similar movies or TV shows to you. The system uses algorithms to analyze patterns in your past behavior and provide personalized recommendations to help you discover new items that you may like.

Now that you’ve already understand about the recommender system, then you should know that there are lot of type of recommender system. One of the popular one is collaborative filtering.

What is collaborative filtering?

Collaborative filtering (CF) is a type of recommendation algorithm that suggests items to users based on the preferences and behavior of similar users. The CF recommendation technique uses ratings or quality scores of items as a benchmark/parameter in the calculation to compare the similarity of each item.

There are two types of collaborative filtering: content-based and item-based. In this article, we’re going to talk about item-based collaborative filtering (IBCF). In the case of IBCF type, the similarity that is calculated is the similarity between items.

I think you may still be confused. Let me explain it to you using the example below.

IBCF example

There are 4 fruit items that will be tested in a recommendation system, namely orange, strawberry, apple, and banana. The parameter used here is the buyer who has successfully purchased any of these four items.

The first buyer purchases oranges, apples, and bananas. The second buyer purchases oranges and apples, while the third buyer only purchases apples. Let’s say the third buyer wants to get a recommendation for other fruits based on the fruit they have purchased, namely apples. Then, the recommendation system will calculate the similarity of other fruits based on the purchase history of the buyer who has purchased apples.

It can be seen that the second buyer bought apples and also oranges, while the first buyer bought oranges, apples, and bananas. The analysis result that can be obtained from this recommendation system is that apples are purchased by the first and second buyers who also both purchase oranges. Therefore, the recommendation system will recommend oranges to the third buyer.

In general, the item-based collaborative filtering (IBCF) recommender system is utilized to determine the similarity of static items, since all users will obtain the same output for a specific recommended item. Despite this, IBCF has an advantage in terms of optimization, allowing for faster calculation processes.

More technical stuff…

Let’s take a look on how we can implement IBCF in real life using some numbers and calculation.

Firstly, the data set consisting of items and user ratings is transformed into an item-to-item matrix format. Books are used as an example of items in this case.

  • BX = Book X, BY = Book Y, BZ = Book Z
  • P1 = User 1, P2 = User 2, P3 = User 3

Secondly, after the matrix is formed, a target recommendation is determined. In this case, BX or book X is selected as the target recommendation.

The recommendation system determines which books have similarity with book X.

Thirdly, after the target recommendation is determined, the ratings in columns BY (book Y) and BZ (book Z) are calculated using a predetermined algorithm. I will use cosine similarity to calculate the similiarity between the columns.

The cosine similarity algorithm is used in this case because it can calculate the smallest distance between product dimensions, regardless of the magnitude of the products themselves. In other words, cosine similarity focuses on the quantity of ratings given by users rather than the quality of an item.

Cosine Similarity formula

To obtain effective similarity, the result of the cosine similarity calculation should be as close to 1 as possible. If the result is 1, the two items are considered to have perfect similarities.

Lastly, after the cosine similarity algorithm is used to calculate the similarities between BY and BZ columns, the final results are as follows:

It can be seen that book Y has a final calculation result of 0.85, while book Z has a final calculation result of 0.41.

Therefore, it can be concluded that book Y has a higher similarity level to book X because its calculation result is closer to 1.

After making recommendations, it would be beneficial to perform validation on the output results. The validation can be in the form of calculations using RMSE or MAE metrics.

Conclusion

  • CF uses parameters to calculate the similarity between items.
  • IBCF, on the other hand, focuses on the similarity between items.
  • IBCF is static, so the output for a single item used as a recommendation target will be the same for all users.
  • IBCF excels in optimization and its calculation process is efficient.
  • Cosine similarity can be used on data with a lot of sparsity because it does not consider the magnitude of ratings for an item.

--

--