How to Avoid Some of the Problems in Writing a Working Recommendation Engine
Introduction
When you think about it, having a computer analyze your interests and tell you other things that you might be interested sounds seriously cool. But a lot of recommendation engines hardly ever actually find anything interesting for human users. This post will cover some of the issues that software developers face when trying to write a recommendation engine, and how to reduce the impact of those issues.
Developers making recommendation engines get annoyed by
1. Speed and scalability
A lot of recommendation engines are for websites such as Amazon. The recommendations have to be served up quickly. One way to solve this is by not generating the recommendations when the page with the recommendations is being requested. The recommendations can be generated at an earlier time and, on page load, can be fetched with database queries or similar.
Another way to solve this is to use the client’s processing power to either generate the recommendations. For a web application, Javascript can be used to generate recommendation, as this Delicious recommendation engine does.
There is also the issue of scalability. The system has to be able to quickly generate recommendations for large amounts of users. It would be sad for a recommendation engine to fail, Twitter style.
2. Malicious folk
This is an issue when having a recommendation engine that deals with user generated content. It doesn’t take long to find a Youtube video with literally hundreds of unrelated tags in the hope of tricking users to watch the video.
For some recommendation engines, this isn’t a problem, but for many, it can be difficult. For the Youtube example that I gave, videos with copious amounts of tags could be ignored by the recommendation engine because it is highly doubtful that nearly every single one of those hundreds of tags are relevant.
3. Little data
It would be snap-easy if all the users that your recommendation engine will analyze had large pools of data clearly describing what the users do. It is very difficult to give recommendations based on little data.
One way to attack the little data issue is to not give recommendations until enough data from the user is collected. This results in the user having to do work and wait for the recommendations to appear, which usually isn’t a good thing.
Another way is to make the recommendations based on the data of others. Recommendations popular among the vast majority of other users (think Digg front page) could be made.
4. Nothing good to recommend
Recommendation engines work really well for big sites such as Amazon, Youtube, Digg, and Delicious because those websites have a lot to recommend. It doesn’t matter what the user’s interests are, because those websites probably have to have something that would interest the user.
Smaller applications typically do not have as much to recommend, unless they are drawing upon the content of larger applications.
In order for the programmer to solve this, more content must exist in order so at least something interesting can be recommended to the user. If the content is user-generated, encourage users to generate more content by providing some sort of benefit for creating more content.
5. Duplication
Youtube is a great example of content duplication. Multiple people will frequently upload the same copyrighted content, creating duplicate content on the Youtube servers. How can a recommendation engine give relevant recommendations, but not have the recommendations be so relevant that the user sees duplicate content.
In the case of Youtube, having the recommendation engine ignore suspiciously similar video titles might help reducing the duplicate recommendations.
Posted in Programming |