Django

Question

I have a list of products with tags & categories, like this

class Product(models.Model):
    tags = TaggableManager() #using django-taggit
    categories = models.ManyToManyField(Category)

I am looking for a way to effectively implement a method such as

p = Product.objects.get(...)
p.similar_products() # -> should return a list sorted by similarity

How is similarity computed: the similarity score between two products should be the number of tags & categories they have in common.

The challenge is that this method needs to be computed hundreds of times per second becuase so its important to do effectively.

I might speed it up with caching but the question remains - is there a django-native way to calculate and score similar products based on tags and categories? (I am aware of django-recommends but it seems to use users and ratings)

Thanks :)

miraculixx · Accepted Answer

Disclaimer: The following is a start on how I would approach the problem. Provided as is, not fit for purpose and no warranty included.

is there a django-native way to calculate and score similar products based on tags and categories?

The short answer as no -- Django is a web application framework, not a recommender system.

I am looking for a way to effectively implement a method (...)

Please realise that this is a non-trivial task at its core. There are two parts that you need to solve:

Calculating the similarity between products
Retrieve the set of similar products given a product, possibly ranked by similarity

Once 1. is done, 2. becomes trivial. There are many ways in which to calculate similarity, and you may want to vary the method over time as you gain experience.

Hence, I would start with 2. and then work backwards to solve 1. This will give you a method to store and retrieve similarities that is not bound to any particular method to calculate similarity.

Retrieval of similar products

One way to solve this natively in Django is a ManyToMany Relationship:

class Product(models.Model):
    tags = TaggableManager() #using django-taggit
    categories = models.ManyToManyField(Category)
    similars = models.ManyToManyField(Product)

Note the key idea here is to store, for each product, the list of primary keys of all similar products. Then the similar_products method is simply:

def similar_products(self):
     return self.similars.all()

The challenge is that this method needs to be computed hundreds of times per second

Depending on the size of the product catalog and the list of categories, this approach may not scale well. There are more efficient implementations of the same concept though, e.g. you could cache or store the list of similar products' keys outside of the database, e.g. using an in memory store like Redis.

Calculating similarity

Calculating similarity is a computationally complex task. Essentially you want to compare each product with all the others, which by nature is in O(n^2). There has been quite a bit of research on the topic.

the similarity score between two products should be the number of tags & categories they have in common

One naive approach is as follows.

For each product,

Retrieve the list of categories, ordered by the category's primary key
Build a matrix of products x categories, where each row represents the categories of one product, and each column represents the category (column 1 represents category 1, column 2 represents category 2, etc.). In the matrix, each column is a category variable (0,1) which is 1 if the product is in the respective category, else 0.
For each product calculate the category_score which is the binary representation of the category indicators (essentially a bit string)
Build a product x product matrix that for each product calculates the similarity as a distance to all the other products, e.g. similarity = abs(product1.category_score - product2.category_score)
Given some cut-off maximum distance, for each product retrieve all the other products that are within this maximum distance, and fill the Product.similars relation in the Django model

Obviously this is task that needs to be run offline in some sort of batch environment. Note there are more sophisticated methods applying machine learning techniques, in particular some that work online and scale much better than the above. Depending on your particular requirements (e.g. #products, #transactions, need for user preference matching etc.), it may or may not be worth looking into these methods.

Django - How to recommend similar products

Tags:

python

categories

recommendation-engine

Nimo

1 Answers

miraculixx

Recent Activity

Donate For Us