Case Study:

Killing the Facebook Gaming Feeds

Taking a design approach to solve hard-to-identify technical problems – Reverse engineering ML recommendations to prove they aren’t working.


 
 

Problem

Users reported low relevance on Gaming content found in the Facebook Gaming apps as well as in the Gaming Tab on the core Facebook app.

Hypothesis

Introducing more direct and input methods for users to select content they are interested in will result in an increase in recommendations for preferred content, increasing content interaction engagement

Outcomes

After 6 months and 8 feature additions, watch time and content interaction saw no stat sig change.

⚠️ There was also no meaningful change in the frequency of followed/liked content being served, but this stat was not originally tracked and observed.

 
 
 

The Problem

Personalization feature usage made no impact.. but they should have

Even poorly implemented features would create a measurable change, positive or negative, but after 6 months of running gentle and extremely aggressive personalization features, the gamer experience had no meaningful, measurable difference.

This aligned directly with 2 years of anecdotal user feedback that the content in their gaming feed was not content they were interested in. Feedback from users, including many of the FB Gaming team, reported mostly seeing content from PUBG, Fortnite and Fall Guys regardless of their signaled interests (Games they followed and content they liked) and viewing behavior.

 

Common Complaints

  • Following a Game seemed to have no impact on whether or not the user would receive content from it

  • Users would be served only a handful of the top creators, with increasing frequency of the same creators over time.

  • Users would signal disinterest in a Game category (option in 3 dot menu) but would continue to see content from the game, regardless of how many times they signaled disinterest.

 

Turning point

Design was prompted by the ML team to continue adding personalization levers, even if it didn’t appear to make a meaningful difference to the gamer experience – with no explanation of why these features may not be working and regardless that the overall Gaming feed experience was getting worse.

After doing some digging, I realized that with access to my own affinity data, I’d be able to reverse engineer my own content experience and break down what may be happening in the ML black box.

 

Identifying the Problem

Seeing into the “Black Box”

The reactive nature of how ML-based systems turn inputs into outputs is often described as a “Black Box” – we can’t directly see what is being used to define what is being generated. There isn’t a single code set or weighting system to review, but that doesn’t mean as designers we can’t actively measure or influence the predictive models being produced if we can access/view the parameters that make up the experience and also the output.

 

Actions

1. Input/Signal

Measurable user interactions in an experience to approximate interest

  • Watch Time

  • Opens/Taps

  • Up/Down Votes

  • Follows

data

2. Affinity

The stored data per-user that is used to influence content ranking. Often acts as a multiplier or cohort segmentation.

  • Content source: Games, Genres, Creators, Content Types, etc.

  • Measured 0-1 in .01 increments

output

3. Ranked Content

The ranked content output after being processed from hundreds of thousands down to hundreds.

  • displayed in order of likelihood user will engage in desired way

  • impacted by affinity and other heuristic inputs

scale

4. Compare

Compare outcomes (inputs vs. outputs) across multiple users to identify patterns in prediction process.

  • While content sources and subjects may be personalized, the relationship between affinity and output content is likely predictable across multiple users

 

Seeing the Problem at Scale

30 Users, 60 days

Low content relevance was such a prevalent issue for years that FB Gaming employees jumped at the chance to troubleshoot the feed at scale. Without research support, I put together a gurella research plan to observe user controls at scale:

  • 30 users (20 employees, 10 non-employee gaming users)

  • 60 days, check-ins every 2 weeks

Program

  1. Select 3 games and 3 creators, ensure the creators don’t overlap with the games

  2. Follow the selected games and creators

  3. View the feed 3 days a week, only the top 20 items

  4. Interact with only the items served of the selected games and creators

  5. Report matches and any other patterns that appear.

 

Observations

Collectively we reverse engineered the general model rules by the 2nd check in, and then begin to predictably control feed experiences moving forward by only engaging with specific content in specific ways:

  1. Following a Streamer had a strong and immediate impact. Any game the streamer played, regardless if the user watched the streamer played it, recieved an immediate affinity boost. This continued for as long as the streamer was followed, even with 0 viewership.

  2. Following a Game had no impact on the Game’s affinity (0-1 multiplier)

  3. Because viewers rarely followed Streamers who played follow games, users were rarely served content for games they followed and could not signal positive interest in the Game.

  4. The Gaming feed’s unique mix of multiple types of unlike content (Livestreams, long form VODs, short form clips, group posts) made measuring interest in content extremely convoluted in a feed experience – providing a clear hierarchy for which content types influenced affinity – HIGHLY prioritizing users who watched live streams and ignoring users who were interested in any form of non-video content.


Takeaways

1. The only for a user to reliably increase affinity for a Game is to Follow a streamer who plays it

2. This loop heavily devalued the recommendation system’s priority of Followed games because users were rarely served content from games they followed, could not confirm interest in the game by watching content from it. Over time, the system proved to itself that Following a game was not an indication that a viewer would watch content from it (because it never served content to the user to watch)


Outcomes

Gaming Feed and Gaming Apps Deprecated

I presented the findings and recommendations for how to approach rebalancing the content in the feed which would take 18-24 months of costly development – or – alternatively, removing the unified Gaming Feeds and continue surfacing Gaming content on the Watch and Group surfaces – allowing the Gaming tab to better serve the small, but growing FB user base who prefer web-based mini games in app.


Takeaways for Designers:

  1. Use anecdotes to find problems, data to test and confirm resolutions

  2. Technical solutions don’t always require deep technical understanding to solve

  3. Sometimes, deprecating a feature is the best way to achieve success