Building a Recommender system from scratch — Customers who bought this item also bought (2)

Approach to creating a recommender solution based on “customers who bought this item also bought” for any e-commerce businesses

Rhydham Gupta
7 min readMay 26, 2023

As a data scientist in a top e-commerce company, you’re tasked with improving the recommendation engine. To help you design an efficient solution, your manager arranged a 1-hour call with the warehouse manager because he wants you to get insights from real experience. The warehouse manager shares an interesting fact: when there’s a high demand for competitive exam books, there’s also a surge in the demand for earphones. You quickly realized that this insight aligns with collaborative filtering.

Here’s a challenge, without knowing collaborative filtering, can you design a recommendation engine based on the warehouse manager’s insights?

Let’s design the solution and we will use item-item collaborative filtering. We have two different options to implement it:-

  1. Using cosine similarity
  2. Using Matrix factorization

I personally find approach 1 easier to explain especially to the business people but the 2nd approach is also very powerful in the sense it captures the impact of latent variables. (Don’t worry we will understand the concept of latent variables)

Approach 1: Using Cosine Similarity

Let’s look at a very simple example. Below is the use-item matrix. If the customer has purchased the item then it is 1 else 0.

Now, if I ask you which item is more related to Fitness Tracker? or in other words if the customer has purchased Fitness Tracker then which out of 4 products customer is more likely to buy given the above matrix? What would be your answer and why?

At first, Yoga Mat seems to be more related to Fitness Tracker, can you guess why? Because User1, User3, and User4 have purchased Fitness and the same customers have bought Yoga Mat as well. Let’s check if our hypothesis is correct or not

We will implement it in python to find the similarity between the different items

Step 1: We will first create the above table as a data frame.

import pandas as pd

# Create the purchase history matrix
data = {
'User 1': [1, 1, 0, 1, 0],
'User 2': [0, 1, 1, 1, 1],
'User 3': [1, 0, 0, 1, 1],
'User 4': [1, 0, 1, 1, 0]]
}
df = pd.DataFrame(data, index=['Fitness Tracker', 'Earphones', 'Smartwatch', 'Yoga Mat', 'Running Shoes'])

Step 2: Next, let’s compute the cosine similarity:-

from sklearn.metrics.pairwise import cosine_similarity

# Compute the similarity matrix using cosine similarity
similarity_matrix = cosine_similarity(df)

Below is the output:

As we have guessed rightly, the Yoga Mat is the most similar item to the Fitness Tracker out of all the given products. Interestingly all other items have the same similarity score and of course, the similarity of Fitness Tracker with itself should be 1.

In practical scenarios, this matrix will be much large and sparse and these similarity scores will clearly show high similarity towards only towards few items.

Food for thought: Instead of finding the cosine similarity score, we would have also computed that for each pair of items, how many customers have bought both the items and the one with the highest number could have been assigned maximum similarity. Could you figure out what is the benefit of using cosine similarity over this simple approach?

Let’s take the example of a Fitness tracker and Yoga Mat, also assume a third example water bottle that was purchased by customers 1,3,4. If you look at it all the customers who have bought Fitness trackers have also bought Yoga mats and water bottles. Then which one do you think is better to recommend to a user buying a Fitness tracker? You might say the Yoga mat seems a more promising option as more users have bought it. Whereas the cosine similarity score of the Yoga Mat is 0.87 and the water bottle would have been 1. Why is it so? Because the cosine similarity method penalizes the mismatches as well. Since Customer 2 bought a Yoga mat but didn’t buy a fitness tracker hence its score goes down whereas this would not have been the case with a water bottle.

Approach 2: Using Matrix Factorization

While trying to understand Matrix Factorization, I usually tend to get lost in mathematics then after getting frustrated I would leave it. But then I found that starting with the implementation and thinking about the practical applications makes it more intuitive. That’s exactly what we will do in this article. We will continue the same example that we have used in the cosine similarity.

Step 1: Let’s load this table into the NumPy array:-

import numpy as np
import pandas as pd

# Example data
products = ['Fitness Tracker', 'Earphones', 'Smartwatch', 'Yoga Mat', 'Running Shoes']
users = ['User1', 'User2', 'User3', 'User4']

# User-item purchase matrix
purchases = np.array([[1, 1, 0, 1, 0],
[0, 1, 1, 1, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 0]])

Step 2: Next, we will use the singular value decomposition method to break the user-item purchase matrix into the user-latent variables matrix and item-latent variables matrix.

What are latent variables?

These are the factors/features that we do not explicitly mention but are automatically deciphered based on the customer's purchasing pattern. One thing to note here is that these latent features don’t necessarily have a very clear meaning of the driver instead it can be a mix of multiple factors. But for understanding, we can try to give it some meaning. Let’s take our current example:-

products = [Fitness Tracker, Earphones, SmartWatch, Yogamat, Running shoes]

When we say that based on customer purchase patterns, Fitness Tracker and Yogamat are highly related. It could be due to multiple factors, both products fall into the premium segment, both these products could be compact & durable, or maybe both these products have very high ratings. It is basically trying to decipher the underlying drivers for similarities that are not explicitly defined. Let’s see it in action

# Compute matrix factorization
k = 2 # Number of latent factors
U, S, V = np.linalg.svd(purchases)

U = U[:, :k]
S = np.diag(S)[:k, :k]
V = V[:k, :]

# Compute predicted purchases
predicted_purchases = np.dot(np.dot(U, S), V)

# Convert to DataFrame
df_predicted_purchases = pd.DataFrame(predicted_purchases, index=users, columns=products)

Below are our decomposed matrix

Step 3: Now, we can simply take the dot product of Fitness Tracker latent variables with other items latent variables and the one with the highest score will be recommended to the user who has purchased Fitness Tracker. Let’s try to compute the most similar product for Fitness Tracker,

# Select a target item
target_item = 'Fitness Tracker'

# Get similar items based on item latent factors
item_index = products.index(target_item)
item_latent_factors = V[:, item_index]
similar_items = pd.DataFrame({'Product': products, 'Similarity': np.dot(V.T, item_latent_factors)})

# Exclude the target item itself
similar_items = similar_items[similar_items['Product'] != target_item]

# Sort by similarity in descending order
similar_items = similar_items.sort_values('Similarity', ascending=False)

print(f"Recommended items for {target_item}:")
print(similar_items)
Recommended items for Fitness Tracker:
Product Similarity
3 Yoga Mat 0.375
1 Earphones -0.125
2 Smartwatch -0.125
4 Running Shoes -0.125

Above, you can see the order of similar items for the Fitness Tracker. Now if you remember the results from the cosine similarity method-

Yoga Mat was most similar to the Fitness Tracker and the rest all the items have a similar score. We are observing the same results with this method as well. Great :)

You might be wondering when to use cosine similarity vs matrix factorization.

The cosine similarity approach is simple and offers ease of explainability to business people but it will not be able to capture the complex relationships that exist within data whereas the matrix factorization performs well for identifying these relationships but it becomes really difficult to explain it to business people. Moreover, it needs more data for giving more accurate results.

Now you know how to design our recommendation engine based on “Customer who bought this item also bought …..”

This is the second article in our series about building different components of a recommendation engine for an e-commerce company. You can read out the introductory article here which covers all the components that we will cover in this series. You can also try to give a shot at what would be your approach for each component.

--

--

Rhydham Gupta

I am a Data Scientist, I believe that observing and decoding data is an art. Same Data, Different Eyes Different Stories