Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads
Meta continues to lead the industry in utilizing groundbreaking AI Recommendation Systems (RecSys) to deliver better experiences for people, and better results for advertisers. To reach the next frontier of performance, we are scaling Meta’s Ads Recommender runtime models to LLM-scale & complexity to further a deeper understanding of people’s interests and intent. This increase [...] Read More... The post Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads appeared first on Engineering at Meta .

Meta, a leading technology company, has been at the forefront of utilizing advanced AI Recommendation Systems (RecSys) to enhance user experiences and improve advertising outcomes. To push the boundaries of performance even further, Meta is scaling its Ads Recommender runtime models to LLM-scale complexity, allowing for a deeper understanding of users' interests and intent. This scaling, however, presents a significant challenge known as the "inference trilemma." This trilemma refers to the difficulty of balancing increased model complexity and the associated compute and memory requirements with the low latency and cost efficiency necessary for a global service serving billions of people.
To address this challenge, Meta has developed the Meta Adaptive Ranking Model, which effectively bends the inference scaling curve while achieving high return on investment (ROI) and industry-leading efficiency. The Adaptive Ranking Model replaces the traditional "one-size-fits-all" inference approach with intelligent request routing. By dynamically aligning model complexity with a rich understanding of a person's context and intent, the system ensures that every request is served by the most effective and efficient model. This approach allows Meta Ads to maintain the strict, sub-second latency required for the platform while providing a high-quality experience for every user.
Serving LLM-scale models at Meta's scale required a fundamental rethinking of the inference stack. Three key innovations have driven this transformation:
1. **Inference-Efficient Model Scaling**: By shifting to a request-centric architecture, the Adaptive Ranking Model serves a LLM-scale and complexity model at sub-second latency. This enables a more sophisticated understanding of a person's interests and intent without compromising the user experience.
2. **Model/System Co-Design**: The development of hardware-aware model architectures that align model design with the underlying hardware system and silicon's capabilities and limitations has significantly improved hardware utilization in heterogeneous hardware environments.
3. **Reimagining the Inference Pipeline**: The Adaptive Ranking Model has been designed to optimize the entire inference pipeline, ensuring that each component works in harmony to deliver the best possible performance.
The Meta Adaptive Ranking Model represents a significant advancement in the field of AI Recommendation Systems. By addressing the inference trilemma and scaling LLM-models efficiently, Meta continues to set industry standards for delivering personalized and high-quality experiences to billions of users worldwide. This innovation not only enhances user engagement but also provides advertisers with more effective targeting and better results, solidifying Meta's position as a leader in the industry.










