Making the LinkedIn experimentation engine 20x faster

January 23, 2020

Table of Contents

At LinkedIn, we like to say that experimentation is in our blood because no production release at the company happens without experimentation; by “experimentation,” we typically mean “A/B testing.” The company relies on employees to make decisions by analyzing data. Experimentation is a data-driven foundation of the decision-making process, which helps with measuring the precise impact of every change and release, and evaluating whether expectations meet reality.

LinkedIn’s experimentation platform operates at an extremely large scale: It serves up to 800,000 QPS of network calls, It serves about 35,000 concurrently running A/B experiments, It handles up to 23 trillion experiment evaluations per day, Average latency of experiment evaluation is 700 ns and the 99th percentile is 3 μs, It is used in about 500 production services. It is used in about 500 production services.

Source: linkedin.com

Scaling Beyond a Billion Transactions Per Day with Sub-second Responses

Observability at Scale: Building Uber’s Alerting Ecosystem

Uber’s software architectures consists of thousands of microservices that empower teams to iterate quickly and support our company’s global growth. These microservices support a variety of solutions, such as mobile applications, internal and infrastructure services, and products along with complex configurations that affect these products at city and sub-city levels. To maintain our growth and architecture, Uber’s Observability team built a robust, scalable metrics and alerting pipeline responsible for detecting, mitigating, and notifying engineers of issues with their services as soon as they occur.

Kubernetes Failure Stories

I started to compile a list of public failure/horror stories related to Kubernetes. It should make it easier for people tasked with operations to find outage reports to learn from. Since we started with Kubernetes at Zalando in 2016, we collected many internal postmortems.

Making the LinkedIn experimentation engine 20x faster

Tags :

Share :

Related Posts

Scaling Beyond a Billion Transactions Per Day with Sub-second Responses

Observability at Scale: Building Uber’s Alerting Ecosystem

Kubernetes Failure Stories