Crobox is built off of an open source community when it comes to our product development. We want to give back to that community by offering a transparent look into the inner-workings of our company through our Crobox Lab series. Throughout this series you will get a peek into the algorithms, theories, or processes we use in our framework.
Blog post by Axel Hirschel - former research intern at Crobox
Most recommender systems (RS) recommend items are based on the feedback of users, but imagine that a new user enters your site and has not yet given explicit or implicit feedback on any products.
Of course, one could just recommend the best reviewed items. But that may not always be the best choice. In this blog post, I will explain how a context-aware demographic RS is better equipped to deal with the cold-start problem of collaborative filtering RSs.
Collaborative Filtering RS
The main assumption behind the collaborative filtering RS is that if many people like both apples and oranges. And I like apples, there is a great possibility that I like oranges too. That idea can be extrapolated to a large database of films (Netflix), songs (Spotify), and products (Amazon).
With this method, companies are able to produce personalized recommendations for their users under one condition: the users need to have given feedback on product. Because otherwise, there is no data available to compare this user to other users.
Demographic RSs are equipped to deal with users that have not given feedback. This is because when a user enters a website, data is already available. For example, their location can be identified with their IP-address, their browser type is available, and their referrer site is indicated.
Using this information, it is possible to deduce the preferences of people based on individuals with similar demographic data. An example of this would be that if people on an iPhone, coming from Facebook, who live in Amsterdam have previously clicked on blue step-in bikes on a website, new users with this same data profile would also be interested in this kind of bike.
In addition to the demographic data, contextual data can also be used to boost the performance of demographic RSs. Contextual data is information that changes during a session, such as the time, how long users are browsing, and how many products they have clicked.
Often, the context in which users visit a website are important for their preferences. For example, in the winter users are more likely to buy warmer clothes and people who browse longer are more interested in higher specifications for their PCs.
Although context-aware RSs predict more relevant items to new users, their long-term performance is not nearly as fine-tuned as collaborative filtering. To exploit the strengths of both algorithms, they need to be combined into one hybrid RS.
Burke (2002) identified seven methods to do this. But his research alludes to the three most promising types tested:
- Cascade Hybrid: Creates clusters based on demographic and contextual data, employs collaborative filtering to create a recommendation
- Feature Combination Hybrid: Merges the data of both recommendation systems and calculates the similarity scores with the combined data
- Switching Hybrid: Uses the recommendations of the context-aware demographic RS on short session length and switches to collaborative filtering on longer session lengths
Most effective combination
When tested, the Cascade and the Feature Combination hybrids obtain lower performance scores than the best performing algorithm on all session lengths. This means that the feedback data and the contextual/demographic data do not improve performance when combined.
The Switching Hybrid was found to be equal to the best performing algorithm on that session length and all other session lengths. This is a major success, as it proves that it is possible to combine the context-aware demographic RS with a collaborative filtering RS. And by doing so, the performance for new users is improved.
Many users normally leave website after a few clicks and the results described in this post indicate that constructing a context-aware demographic RS can contribute to better recommendations.
This means that if you use a collaborative filtering RS, then you should create a demographic RS as well and analyze on which session lengths it performs best.
Using the demographic RS on short session lengths and collaborative filtering on longer session lengths can increase the performance of the recommendation. And thus, your RS becomes better equipped to deal with new users!