Design a Video Recommendation System | IG | Tiktok | Netflix

Two Towers architecture and other types of recommendation systems

Aug 16, 2025

Welcome to the 64 new subscribers who have joined us since last week.

If you aren’t subscribed yet, join 1000+ engineers and technical managers dedicated to learn a real world System Design.

Video Recommendation Systems are a core component of modern streaming platforms such as YouTube and Netflix, responsible for delivering engaging, personalized suggestions to keep users watching. These systems come in multiple forms, each applying different strategies to identify what content a user is most likely to enjoy. Three widely used approaches include Content-Based Filtering, Collaborative Filtering, and the Two-Tower Architecture.

1. Content-Based Recommendation Systems

Content-based methods focus on the attributes of items and a user’s past preferences. They rely on features like genres, keywords, textual descriptions, or other metadata to determine similarity between items. Recommendations are generated by matching these item features with the characteristics of content the user has already consumed. For instance, a movie recommender using this approach might suggest films with similar genres or themes to those the user has previously rated highly. This method excels at providing highly relevant, interest-aligned suggestions, but it can be limited when trying to introduce entirely new content outside the user’s established tastes.

2. Collaborative Filtering Recommendation Systems

Collaborative filtering leverages the behavior and preferences of a large community of users. The underlying assumption is that people with similar tastes in the past will continue to share similar preferences in the future. This technique comes in two primary variants:

User-Based Collaborative Filtering: Finds users with preferences similar to the target user and recommends items those similar users enjoyed.
Item-Based Collaborative Filtering: Identifies items that are frequently liked together and recommends those related items to the target user.

This approach is especially effective at surfacing popular or trending content among like-minded audiences, but it can struggle when there’s little historical data for a new user or item.

3. Two-Tower Architecture

The Two-Tower Architecture is a modern, neural network–driven approach designed to produce scalable and high-quality recommendations. It employs two separate deep learning “towers” — one dedicated to encoding user features (such as profiles and interaction histories) and the other to encoding item features (such as metadata and descriptors).

By learning independent embeddings for users and items, the model can efficiently compute similarity scores between the two, enabling fast retrieval and ranking of personalized recommendations. This architecture also handles the cold start problem more effectively, making it suitable for situations involving new users or newly added content.

Two-Tower Neural Networks for Retrieval

In retrieval systems, Two-Tower Neural Networks extend ideas from embedding-based methods like Word2Vec. Here’s how they work:

Separate Encoding: The user tower processes only user-specific features, while the item tower processes only item-specific features, producing embeddings for each.
Similarity Matching: The training objective aligns user embeddings closely with item embeddings for content the user engages with (e.g., likes, clicks, views).
Independent Operation: After training, both towers operate independently — item embeddings can be precomputed offline and cached, while user embeddings are generated dynamically during requests.

This separation enables scalable, low-latency retrieval using approximate nearest neighbor (ANN) search systems like FAISS or HNSW, avoiding the need to scan an entire dataset.

For example:

Offline: Precompute item embeddings daily and store them in the ANN index.
Online: Generate a fresh user embedding at request time, query the ANN index, and return the closest item embeddings as recommendations.

The key advantage of this approach lies in embedding caching, which allows for rapid retrieval without rerunning the entire model, making it ideal for large-scale, real-time recommendation environments.

Check the full coverage on recommendation system design on Youtube

Thank you for your continued support of my newsletter and the growth 🙏

Join a 1000+ members community across Youtube and Substack

You can also hit the like ❤️ button at the bottom of this email or share this post with a friend. It really helps!

System Design Pal

Discussion about this post