Recommendation systems2

In the previous article, we learned about Recommender systems; recommender systems give users various recommendations based on various techniques. We were able to differentiate the two significant models of recommendation systems, model-based and memory-based

In this article, we shall look at collaborative filtering, a type of memory-based recommender system. There are two types of collaborative filtering, item-based and User-based. We discuss below in detail how they work, how to implement using Python and various techniques used to look for similarity such as correlation, alternating least square method, matrix factorization SVD, and much more

In memory-based models, we have three types of models collaborative filtering, content-based filtering, and hybrid methods lets have a look at collaborative

Collaborative filtering

When it comes to developing intelligent recommender systems that can learn to provide better recommendations as more knowledge about users is collected, collaborative filtering is the most commonly used technique.

Collaborative filtering is used by most websites, including Amazon, YouTube, and Netflix, as part of their sophisticated recommendation systems. This technique creates recommenders that make recommendations to a user based on other users’ likes and dislikes.

It works by sifting through many people to find a smaller group of users with similar tastes to a specific person. It analyzes their favorite products and compiles a ranked list of recommendations. There are various forms of collaborative filtering strategies discussed below.

Recommender Systems 03

Types of memory/similarity based models

Item-Based collaborative filtering

Recommender Systems 04

Diagram of how item-based collaborative filtering works

Item-based collaborative filtering is a model-based recommendation algorithm. The algorithm calculates the similarities between different items in the Dataset using one of several similarity steps. It then uses these similarity values to predict ratings for user-item pairs that aren’t in the Dataset.

Calculate the similarity among the items:

The similarity between objects calculation is by ratings provided by users who have rated both of them. 

In measuring the similarity of items, many different mathematical formulations are applicable. Each formula includes terms summed over the set of familiar users U, as shown in the formula below.

Cosine-based similarity

This formulation, also known as vector-based similarity, considers two objects and their ratings as vectors and describes similarity as


The angle between them:

Recommender Systems 05

Cosine-Based similarity formula

Pearson (correlation)-based similarity

The similarity is a metric based on how much the ratings of familiar users vary from the average ratings for a pair of items:

Recommender Systems 06

Pearson correlation formula

Adjusted cosine similarity

Adjusted cosine similarity calculation is a modified version of vector-based similarity. It accounts for the fact that different users have different rating schemes; in other words, some users may score items highly in general, while others may choose to rate items lower. To overcome this constraint of vector-based similarity, we deduct each User’s average rating from their rating for the pair of items in question.

From model to predictions

We can predict the rating for any user-item pair using the concept of weighted sum once we’ve built a model using one of the similarity measures mentioned above. We start by collecting all of the items close to our target item, then selecting the active user-rated items. The similarity between these items and the target item weighs the User’s rating for each Item. Finally, to get a fair value for the expected ranking, we scale the prediction by the number of similarities:

Recommender Systems 08

Formula to calculate similarity

Alternating least square method

A dataset with the explicit rank, count, or category of a particular item or case is known as an explicit data item. A 4 out of 5 ratings for a film is a simple data point. In contrast, understanding users’ interaction and events is need before determining the rank/category of an implicit dataset. Consider a person who is only interested in one form of film. Tacit datasets are the name given to these types of data. We’ll be missing out on a lot of hidden insights if we don’t embrace hidden datasets.

The implicit dataset consists solely of user and object interactions.

A matrix factorization algorithm is the alternating least-squares solution. As seen in the diagram below, a matrix I factorize into two smaller matrices. Consider the set of interactions between the User and the object in the first matrix. The factorized matrices are the user and object characteristics.

Recommender Systems 10

Matrix factorization diagram

Each variable’s value is determined by interactions matrix values, which are events with unique preferences and trust. Take the E-commerce dataset, for example, with three events: View, Add-to-Cart, and Transact. Positive preferences are considered harmful when there is an interaction between the User and the Item pair.

Recommender Systems 11

Preference calculation formula


Confidence is the value or value of the interaction. For User Purchase(transaction event) item X, we increase the interaction weight while User A viewing item Z is less weighted than ‘purchase interaction.

Recommender Systems 12

Confidence calculation formula


Confidence: r is the interaction between User “U’ and Item i. More interaction, more trust—scaled to the value of α. The Item bought 5 has more confidence than the Item bought twice. We’re adding 1 in case r is 0 for this interaction, making it nonzero. Typically, the paper recommends a value of 40 as α

The paper describes the following cost function for the discovery of user interactions and item interactions matrices:

Recommender Systems 13

The formula for alternating least square method

Here, λ regularizes the model using cross-validation.

The Essence of Alternating Lowest Square

The cost function includes m · n terms, where m is the number of users and n is the number of items. Typical datasets of m · n can quickly reach a few billion. Thus, optimization methods, such as stochastic gradient descent, would make such vast data a mess. Paper, therefore, introduces alternative optimization techniques.


Note that when either the user-factors or the item-factors are deemed fixed, the cost function becomes quadratic so that its global minimum is computable. It leads to an alternating-least-square optimization process, where we alternate.

User-factors and item-factors are guaranteed to lower the value of the cost function at each step.

The user(x) vector and the item(y) vector are identifiable by differentiating the above cost function by x and y.

Recommender Systems 14

Cost function of X and Y

User and Item Vector

So, now to find a user-item pair preference score, we’re using the following:

Recommender Systems 15

Preference score formula

We find that the most significant p-value has items to be recommended to the User.


Implementation of alternating least square method

Import Libraries

Data Preprocessing

Creating Interaction Matrices

As the data is sparse, we create a sparse matrix for the item-user data input to the model. A user-item matrix makes the recommendations.

ALS Model

Using the Model

Getting the recommendations using the inbuilt library function

We can also use the following function to have a list of similar items

Implementation of item-based collaborative filtering

For this example, we use a movie dataset to recommend using item-based collaborative filtering


user_idobject_idvotingtimevotedFilm title
1782104781240949Terminator (1996)
452104775787190Terminator (1996)
2062104783988671Terminator (1996)
1342104779238235Terminator (1996)
1502104776403793Terminator (1996)


Using Scikit-Learn, we will, with efficiency, run the SVD.

we tend to create a matrix of 1564 rows (as several because of the distinctive movies) and twelve columns, that square measure the latent variables.


We can use various similarity measures, like Pearson Correlation, trigonometric function Similarity. We’re attending to work with the Pearson Correlation nowadays. Let’s produce a matrix of correlations:

Find Similar Movies

Let’s search for a Star Wars-like film (1977)

Similar Movies to Star Wars (1977)

Correlation Films

0.975497 Avengers (2009)
1.988090The hulk (1990)
0.947979 Spiderman (2002)
0.974499 Iron man (2010)
0.799799Justice League (2005)

User-based collaborative filtering

User-Based Collaborative Filtering is a method of predicting which items a user would enjoy based on the ratings provided to that Item by other users who have similar tastes to the target user.

Recommender Systems 14

Diagram showcasing how User-based collaborative filtering works

Steps for User-Based Collaborative Filtering:

Step 1: Find the similarity of users to the U target user.

The similarity for any two users, A and B, can be calculated from the formula in a question.

Recommender Systems 16

Formula to find similarity

Step 2: Prediction of an item’s missing rating

The target user may be very similar to some users and may not be very similar to others. Therefore, the ratings given to a particular item by more similar users should be given more weighting than those given by less similar users and so on. This problem solution is using a weighted average approach. In this approach, you multiply each User’s rating with a similarity factor calculated using the formula mentioned above.

The missing rating may calculation is,

Recommender Systems 17

Formula to find the missing rating


Collaborative filtering is used by most websites, including Amazon, YouTube, and Netflix. This technique can create recommenders that make recommendations to a user. It works by sifting through a broad number of

Item-based collaborative filtering is a model-based recommendation algorithm. The algorithm calculates the similarities between different items in the Dataset

User-Driven Collaborative Filtering is a method for predicting which things users would like based on their ratings. content-based filtering uses item features to suggest other products that are close to what they want


In this article, we have looked at how we can use collaborative filtering to recommender products to the User-based on how other products are similar to the product and what a user likes based on their ratings

In the next article, we shall look at how we can use additional information such as content and context to build more robust recommender systems. We shall also look at recommender systems that use both content and collaborative features


Next topic: Recommender systems: context-based & hybrid recommender systems


0 0 votes
Article Rating
Notify of

Inline Feedbacks
View all comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Would love your thoughts, please comment.x