Cross shard transactions at 10 million requests per second

Cross shard transactions at 10 million requests per second

  • November 9, 2018
Table of Contents

Cross shard transactions at 10 million requests per second

Dropbox stores petabytes of metadata to support user-facing features and to power our production infrastructure. The primary system we use to store this metadata is named Edgestore and is described in a previous blog post, (Re)Introducing Edgestore. In simple terms, Edgestore is a service and abstraction over thousands of MySQL nodes that provides users with strongly consistent, transactional reads and writes at low latency.

Edgestore hides details of physical sharding from the application layer to allow developers to scale out their metadata storage needs without thinking about complexities of data placement and distribution. Central to building a distributed database on top of individual MySQL shards in Edgestore is the ability to collocate related data items together on the same shard. Developers express logical collocation of data via the concept of a colo, indicating that two pieces of data are typically accessed together.

In turn, Edgestore provides low-latency, transactional guarantees for reads and writes within a given colo (by placing them on the same physical MySQL shard), but only best-effort support across colos. While the product use-cases at Dropbox are usually a good fit for collocation, over time we found that certain ones just aren’t easily partitionable. As a simple example, an association between a user and the content they share with another user is unlikely to be collocated, since the users likely live on different shards.

Even if we were to attempt to reorganize physical storage such that related colos land on the same physical shards, we would never get a perfect cut of data.

Source: dropbox.com

Share :
comments powered by Disqus

Related Posts

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management provides an abstraction layer for various workloads. With the increasing scale of our business, the efficient use of cluster resources becomes very important.

Read More
20 Best YouTube channels for AI and machine learning

20 Best YouTube channels for AI and machine learning

What are the most interesting and informative YouTube channels about artificial intelligence (AI) and machine learning? Subscribe to these 20 high-quality channels today to stay up to date with the latest AI and machine learning breakthroughs. Siraj Raval:

Read More
Why React’s new Hooks API is a game changer

Why React’s new Hooks API is a game changer

I have been developing with React since it’s early days and during that time there have been many attempts by both influencers, as well as the core team to improve the API and patterns developers are using to creating software. One of the biggest challenges we have had was how to share behaviour neatly between components to enable reuse or even just separation of concerns. Every single solution proposed up until this point had some problems associated with it.

Read More