|
| 1 | +--- |
| 2 | +title: Deep Learning in Recommendation Systems |
| 3 | +sidebar_label: Recommendation Systems |
| 4 | +description: "How CNNs and deep neural networks power modern discovery engines like Netflix, YouTube, and Pinterest." |
| 5 | +tags: [deep-learning, cnn, recommendation-systems, embeddings, computer-vision] |
| 6 | +--- |
| 7 | + |
| 8 | +Traditional recommendation systems relied on **Collaborative Filtering** (finding similar users) or **Content-Based Filtering** (matching tags). Modern systems, however, use **Deep Learning** to understand the actual content of the items—images, text, and video—to make highly personalized "visual" or "semantic" recommendations. |
| 9 | + |
| 10 | +## 1. The Role of CNNs in Recommendations |
| 11 | + |
| 12 | +CNNs have revolutionized recommendation engines in industries where the "visual" aspect is the primary driver of user interest (e.g., Fashion, Home Decor, or Social Media). |
| 13 | + |
| 14 | +### A. Visual Search and Similarity |
| 15 | +In apps like Pinterest or Instagram, CNNs extract feature vectors (embeddings) from images. If a user likes a photo of a "mid-century modern chair," the system finds other images whose feature vectors are mathematically close in vector space. |
| 16 | + |
| 17 | +### B. Extracting Latent Features |
| 18 | +Traditional systems might only know a product is "Blue" and "Large." A CNN can detect latent features that aren't in the metadata, such as "minimalist aesthetic," "high-waisted cut," or "warm lighting." |
| 19 | + |
| 20 | +## 2. Hybrid Architectures |
| 21 | + |
| 22 | +Modern recommenders rarely use just one model. They often combine multiple neural networks in a "Wide & Deep" architecture: |
| 23 | + |
| 24 | +1. **The Deep Component (CNN/RNN):** Processes unstructured data like product images or video thumbnails to learn high-level abstractions. |
| 25 | +2. **The Wide Component (Linear):** Handles structured categorical data like user ID, location, or past purchase history. |
| 26 | +3. **The Ranking Head:** Combines these signals to predict the probability that a user will click or buy. |
| 27 | + |
| 28 | +```mermaid |
| 29 | +graph TD |
| 30 | + User_Data[User Profile & History] --> Wide[Wide Linear Model] |
| 31 | + Product_Img[Product Image] --> CNN[CNN Feature Extractor] |
| 32 | + CNN --> Embed[Visual Embedding] |
| 33 | + Embed --> Deep[Deep Neural Network] |
| 34 | + Wide --> Fusion[Feature Fusion Layer] |
| 35 | + Deep --> Fusion |
| 36 | + Fusion --> Output[Click Probability] |
| 37 | + |
| 38 | + style CNN fill:#e1f5fe,stroke:#01579b,color:#333 |
| 39 | + style Wide fill:#fff3e0,stroke:#ef6c00,color:#333 |
| 40 | + style Output fill:#e8f5e9,stroke:#2e7d32,color:#333 |
| 41 | +
|
| 42 | +``` |
| 43 | + |
| 44 | +## 3. Collaborative Deep Learning (CDL) |
| 45 | + |
| 46 | +**Collaborative Deep Learning** integrates deep learning for content features with a ratings matrix. |
| 47 | + |
| 48 | +* The CNN learns a representation of the item (e.g., a movie poster or a song's spectrogram). |
| 49 | +* The system then uses these "deep features" to fill in the gaps in the user-item matrix where data is missing (the **Cold Start** problem). |
| 50 | + |
| 51 | +## 4. Solving the "Cold Start" Problem |
| 52 | + |
| 53 | +The **Cold Start** problem occurs when a new item is added to the platform and has no ratings yet. |
| 54 | + |
| 55 | +* **Without CNNs:** The item won't be recommended because no one has interacted with it. |
| 56 | +* **With CNNs:** The model "sees" the item, recognizes it is similar to other popular items visually, and can start recommending it immediately based on content alone. |
| 57 | + |
| 58 | +## 5. Use Case: Pinterest's "Visual Pin" Recommender |
| 59 | + |
| 60 | +Pinterest uses a massive CNN architecture called **PinSage**. It uses Graph Convolutional Networks (GCN) that combine: |
| 61 | + |
| 62 | +1. **Visual features** (what the pin looks like). |
| 63 | +2. **Graph features** (what other pins it is frequently "saved" with). |
| 64 | + |
| 65 | +This allows the system to recommend a "rustic dining table" even if the user just started browsing "wooden cabins." |
| 66 | + |
| 67 | +## 6. Implementation Sketch (Feature Extraction) |
| 68 | + |
| 69 | +To build a visual recommender, we often use a pre-trained CNN just to get the "embeddings" (the output of the last pooling layer before classification). |
| 70 | + |
| 71 | +```python |
| 72 | +import tensorflow as tf |
| 73 | +from tensorflow.keras.applications import ResNet50 |
| 74 | +from tensorflow.keras.preprocessing import image |
| 75 | +from sklearn.metrics.pairwise import cosine_similarity |
| 76 | + |
| 77 | +# 1. Load ResNet50 without the classification head |
| 78 | +model = ResNet50(weights='imagenet', include_top=False, pooling='avg') |
| 79 | + |
| 80 | +# 2. Extract features from two different product images |
| 81 | +def get_embedding(img_path): |
| 82 | + img = image.load_img(img_path, target_size=(224, 224)) |
| 83 | + x = image.img_to_array(img) |
| 84 | + x = np.expand_dims(x, axis=0) |
| 85 | + return model.predict(x) |
| 86 | + |
| 87 | +feat1 = get_embedding('product_A.jpg') |
| 88 | +feat2 = get_embedding('product_B.jpg') |
| 89 | + |
| 90 | +# 3. Calculate similarity score (0 to 1) |
| 91 | +similarity = cosine_similarity(feat1, feat2) |
| 92 | +print(f"Product Similarity: {similarity[0][0]}") |
| 93 | + |
| 94 | +``` |
| 95 | + |
| 96 | +## References |
| 97 | + |
| 98 | +* **Google Research:** [Wide & Deep Learning for Recommender Systems](https://arxiv.org/abs/1606.07792) |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +**Visual recommendations are powerful, but they are only part of the story. To understand how a user's interests change over time, we need models that can remember the sequence of their actions.** |
0 commit comments