I have a list of products with tags & categories, like this
class Product(models.Model):
tags = TaggableManager() #using django-taggit
categories = models.ManyToManyField(Category)
I am looking for a way to effectively implement a method such as
p = Product.objects.get(...)
p.similar_products() # -> should return a list sorted by similarity
How is similarity computed: the similarity score between two products should be the number of tags & categories they have in common.
The challenge is that this method needs to be computed hundreds of times per second becuase so its important to do effectively.
I might speed it up with caching but the question remains - is there a django-native way to calculate and score similar products based on tags and categories? (I am aware of django-recommends but it seems to use users and ratings)
Thanks :)
Disclaimer: The following is a start on how I would approach the problem. Provided as is, not fit for purpose and no warranty included.
is there a django-native way to calculate and score similar products based on tags and categories?
The short answer as no -- Django is a web application framework, not a recommender system.
I am looking for a way to effectively implement a method (...)
Please realise that this is a non-trivial task at its core. There are two parts that you need to solve:
Once 1. is done, 2. becomes trivial. There are many ways in which to calculate similarity, and you may want to vary the method over time as you gain experience.
Hence, I would start with 2. and then work backwards to solve 1. This will give you a method to store and retrieve similarities that is not bound to any particular method to calculate similarity.
Retrieval of similar products
One way to solve this natively in Django is a ManyToMany
Relationship:
class Product(models.Model):
tags = TaggableManager() #using django-taggit
categories = models.ManyToManyField(Category)
similars = models.ManyToManyField(Product)
Note the key idea here is to store, for each product, the list of primary keys of all similar products. Then the similar_products
method is simply:
def similar_products(self):
return self.similars.all()
The challenge is that this method needs to be computed hundreds of times per second
Depending on the size of the product catalog and the list of categories, this approach may not scale well. There are more efficient implementations of the same concept though, e.g. you could cache or store the list of similar products' keys outside of the database, e.g. using an in memory store like Redis.
Calculating similarity
Calculating similarity is a computationally complex task. Essentially you want to compare each product with all the others, which by nature is in O(n^2). There has been quite a bit of research on the topic.
the similarity score between two products should be the number of tags & categories they have in common
One naive approach is as follows.
For each product,
category_score
which is the binary representation of the category indicators (essentially a bit string)similarity = abs(product1.category_score - product2.category_score)
Product.similars
relation in the Django modelObviously this is task that needs to be run offline in some sort of batch environment. Note there are more sophisticated methods applying machine learning techniques, in particular some that work online and scale much better than the above. Depending on your particular requirements (e.g. #products, #transactions, need for user preference matching etc.), it may or may not be worth looking into these methods.
Recommended reading:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With