Posts in Statistics

Recommendation Engines: A Brief Introduction

What do I watch next? What book do I want to read next? More often than not, these decisions are influenced by ‘recommended’ or ‘suggested’ feeds on social media. Companies like Netflix, Amazon, and Youtube depend strongly on these recommendation systems to attract consumers and to encourage current consumers to keep coming back. In fact, 70% of all youtube videos that are watched were found in the recommendation section. How does the backbone of these technology empires operate? How have they become so good at what they do? The answer is data. 

The majority of companies track every click you make on their website to learn what you like and learn what people similar to you like. One technique for doing this uses a clever bit of Linear Algebra. 

Let’s start by making a list of every movie you (and everyone else) watched than half of on Netflix, and assign each movie a unique ID. 

Example: Alex, a 17-year-old high school senior: [‘Stranger Things’, ‘13 Reasons Why’’, ‘Orange Is the New Black’]

Brace yourselves: here comes the math. 

This is list is what is called a vector: basically a list of numbers. 


Now what we want to do it see what people with similar taste to have enjoyed. We do this by calculating the distance between these vectors (using the euclidean distance). 

The closer the vector, the similar your tastes are with that person. From there we can see what movies they watched that you didn’t and suggest them to you.