Discover, Recommend, Engage: The Art & Science of Recommender Systems.

9 min readJul 6, 2023

A Special type of Machine Algorithm that plays a key role in most of the online platforms and keeps people stick their heads in scrolling through the platform’s products and etc…

I hope you got it!

Yes ! We are gonna explore the topic of “Recommender Systems”.

Before we jump into the topic, what “Recommender Systems” actually is ? Let us look into some of the examples from platforms like “Amazon, Netflix & Youtube” to get motivated about this concept.

YouTube:

Here is a screenshot from YouTube and the video and this video is regarding “PCA (Principal Component Analysis)”, which is a dimensionality reduction technique. [Randomly Searched video]

Here, when we started watching a video regarding specific topic, we got recommendations related to the video that is being watched by us.

We can see that 1st, 2nd, 4th & 5th recommendations are regarding “PCA” and the other two are related to and similar to “PCA”. The videos that are being shown by YouTube are either related or similar to the video that we are watching.

And this is the core-idea of “Recommender Systems”.

Mathematically, a recommender system can be described as the following:

=> If there is an user “Ui”, and this user “watched some videos” and has a history “Ij” in this case of YouTube. Now, we need to recommend an item for the user that is highly likely to watch, based on his previous watched videos list.

User: Un & Items: Ij

As the name suggests, a “Recommender System” suggests or recommends an item which is relevant to a given user based on his historical data about the user and the items the user liked or watched. And this is the core-idea.

Now, let us look some more examples;

Netflix:

Recommendations Based on Prior Watch List.

Similarly on Netflix we have these recommendations.

And one fascinating thing here in the Netflix is that it gives us the following statement “Because you have watched some video”.

In this case it recommended me some more by saying the following: “Because you watched Black Mirror”,

Amazon:

Let’s see a book to buy; “The Alchemist”.

Now, at the bottom of the page we often get some recommendations.

Here there are recommendations related to the product that we have searched (or) looking at the current page. As we are spending some sufficient time viewing the product and scrolling, amazon recommending products that are relevant.

Recommended Products related to the Current Search.

Similarly at the end of the Page we have these product recommendations based on Shopping History.

Products Recommended based on shopping history.

Time for diving into the concept and learn how to solve this;

Looking this from Mathematical point of view, speaking of which, a Dataset, D = {Xi} is what we are allotted (or) given and now this can be represented as a Matrix “M”.
Now, let’s explore what this Matrix, “M” is:

Stars are non-empty cells & “—” are empty cells.

Let’s assume;

n = users; m = items (could be anything be it movies, products, songs, videos).

Each row in this Matrix “M” corresponds to a user “Un”.
Each Column corresponds to an item “Im”.

=> The data that we have is, for user “Un”, and items “Im”; “Mij” could be anything like rating given by user “Ui” on item “Ij” in Netflix, YouTube or Amazon.

Imagine that the user, “Ui”, has not watched (or) interacted with “I1” item, assuming the items are videos in this case, and there will be no value in the matrix “Mnm” cells. Here the imputation (filling the missing values) should be done with non-impacting values or it is even better to leave them empty. Because imputation with some random value like “0 or 1”, can impact the algorithm and hence it can be an influential factor in predicting.

This is how the data representation will be in this case, not like the combination of “Xi & Yi” which is for “Classification & Regression” techniques.
The theme of the data is as follows, given a matrix “M”, it describes about the behavior of user “Ui”, with respective of the item “Ij”.

There are few properties for this Matrix and let’s understand them:

The Matrix “M”, of ratings is very Sparse.

Take an example:

Assuming the no. of users, n = 10 Million & items i = 10k;
Now, the size of the matrix => n X i = 10⁷ X 10⁴ = 10¹¹ = 100 Billion.
Size of the matrix is extremely large.

But, what happens here in this Matrix is that most of these cell values will not be available. If we look from the perspective of an user “Ui”, he/she may rate 10–15 movies utmost, even though there are 10k movies, the user might rate to a certain extent which is very low and the user might watch 100–150 movies on an average (not the binge watchers 😂) if movies are taken instead of ratings.

Now, as per the data, for most of the users, their respective rows are very sparse vectors; “Sparse here means there are only few values that are non-empty”.

Example:

To say, on an average user gives ratings to only 5 movies. And now, this Matrix, M, will have only “5 X 10 M” entries which are non-empty. So the non-empty cells are only “50 Million”.

Here comes the metric called “Sparsity Of The Matrix”.

Sparsity Of the Matrix can be defined as the no. of empty cells divided by total no. of cells.

S = 100B — 50M/100B.
=> Sparsity Of M = 0.9995 ~= 1 approximately. Very small number compared to the previous computation. It is the ratio of empty cells and total cells.

So that now we have learnt about Sparsity which is extreme sparsity in this case. Let’s jump into the next big task of building recommender system.

The next task in the recommender system is, given an user “Ui”, we already know the history of some items that are liked (or) rated by this user. Now, the task is that we need to recommend a new item that he/she might like to rate it (or) watch it.

This is the task of recommender system.

Can we pose Recommender Systems as a Classification (Or) Regression Problem?

Regression/Classification Point of View:

We may often think “Recommender Systems” as a regression (or) classification problem. Because what we do in them is either we predict a real value (or) classify. Of course, we can pose this into either of them if “Mij’s” are integers (or) booleans.

So, let’s look how to pose as them;

Recommender Systems From the Classification & Regression Point of View.

Here each of our datapoint is Xi, we will have some properties representing user Ui, and the other representing items and corresponding behavior.

Now, here comes the big issue, it will be concerning with our dataset. And we need to come up with some type of feature engineering techniques for our feature representation.

Feature Representation to be done for the Classification & Regression task by feature engineering techniques.

This feature representation is not explicitly given to work on. We somehow need to come up with this type of feature representation.
One thing we need to remember is that all we are given is a data Matrix, M, and nothing else.

Data D = Matrix, M.

Given Matrix, “M”, this is how we represent the problem as a “Classification (or) Regression problem”.

So, the new dataset, “D Train” will only consists of non-empty cells.

=> Mij != NULL

Now, wherever we have “Mij’s” which are non-empty cells form your D Train and your test data will be “Mij’s” which are empty cells and we need to predict those values.

The hard part in the framing is “how to arrive at feature representation for Ui & Ij ”.

This represent what type of data we have and what is the outcome.

So, how do we represent or write user “Ui” and item “Ij” as vectors given that the dataset is only Matrix, M. This solves the framing part of the classification or regression problem and this is one way to solve recommender systems problem.

Now let us look into the recommender systems in a slightly different way which is “Matrix Completion Problem”.

Note: Matrix Completion Problem is a well studied concept or problem from a Applied Math Stand point.

Matrix Completion Problem:

Given Matrix, “M”.

Let us now look into what Matrix Completion is!

Matrix Completion Problem is, given a Matrix, only a few values of “Mij” are given but many cells are empty.

So, given this our task is to fill up the Matrix, M’s empty cells with reasonable and relevant values and these values to be filled based on the non-empty cells. This is the problem of “Matrix Completion Problem”.

This is why it is called as Matrix Completion problem as we are filling or completing the empty cells by filling up the values based on the non-empty cells values

Assuming the Matrix, “M”, and we have the user “Ui”, and items “Ij”.

Imagine filling up the empty cells based on the history of non-empty cells both in rows and columns.

Suppose we allotted a value “4.7” which is predicted in one among the empty cells and we know that these value came by working on non-empty cells data in the Matrix, “M” which consists of User, “Ui” and could rate them “4.7”. Hence, for the “Ui” the possible rating could be predicted vale which is “4.7” in this case.
If this value is high that implies that I2 can be recommended to Ui. Because the predicted value of MI2 =4.7 which is very high.

This is how we pose the recommender system problem as a Matrix completion Problem.

This is one of ways of formulation and plan of attack, of course we can also use classification or regression techniques but they have their own limitations and also we can pose this as a Matrix Completion Problem.

So, this is the big formulation for our Recommender Systems task.!

In the coming articles let’s discuss about the types of recommender systems and how to deal them.

BECOME a WRITER at MLearning.ai // invisible ML // 800+ AI tools

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com