Systems Papers

By Micah Lerner

I write about papers from the worlds of distributed systems, operating systems, and computer security (among other technical topics!) Previous issues available on www.micahlerner.com

I write about papers from the worlds of distributed systems, operating systems, and computer security (among other technical topics!)

Previous issues available on www.micahlerner.com

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Systems Papers will receive your email address.

1.1K

subscribers

16

issues

#16・

Systems Papers - Seven years in the life of Hypergiants' off-nets

Hello new and old subscribers!This week's paper is Seven years in the life of Hypergiants' off-nets.Many large tech organizations (called hypergiants) serve multimedia content to users all around the world. Serving this content with low latency poses difficul…

#15・

Systems Papers - Automatic Reliability Testing For Cluster Management Controllers

Hello new and old subscribers!This week’s paper is Automatic Reliability Testing For Cluster Management Controllers. The paper discusses an open source testing framework that has caught numerous hard-to-find bugs in the components of a Kubernetes cluster resp…

#14・

Systems Papers - Metastable Failures in the Wild

Hello new and old subscribers!This week’s paper is Metastable Failures in the Wild. @AlekseyCharapko (one of the authors) wrote: "Metastable failures feed and strengthen their own failed condition. The main characteristic...is a positive feedback loop that ke…

#13・

Systems Papers - Sundial: Fault-tolerant Clock Synchronization for Datacenters

Hello new and old subscribers!This week’s paper is Sundial: Fault-tolerant Clock Synchronization for Datacenters. Clock synchronization is an important primitive for distributed systems, and the paper discusses the design and implementation of clock synchroni…

#12・

Systems Papers - Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems

Hello new and old subscribers!This week’s paper is Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems. The research describes an actor-based framework for building query-serving systems, a class of database that predominantly respond…

#11・

Systems Papers - Druid: A Real-time Analytical Data Store

Hello new and old subscribers!This week’s paper is Druid: A Real-time Analytical Data Store. Druid is an open-source database designed for near-realtime and historical data analysis with low-latency. It is used across industry for a variety of applications - …

#10・

Monarch: Google’s Planet-Scale In-Memory Time Series Database

Hello new and old subscribers,This week’s paper is Monarch: Google’s Planet-Scale In-Memory Time Series Database. While Monarch is not the first, nor the last time series database, the system makes some interesting design decisions to trade consistency for av…

#9・

Systems Papers - The Ties that un-Bind: Decoupling IP from web services and sockets for robust addressing agility at CDN-scale

Hello new and old subscribers,This week's paper is The Ties that un-Bind: Decoupling IP from web services and sockets for robust addressing agility at CDN-scale. It discusses how Cloudflare changed its CDN architecture to dramatically reduce IP address use us…

#8・

Systems Papers - Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications

Hello new and old subscribers!This week’s paper, Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications, describes a system for running sharded applications at scale within Facebook - link to the paper review here.Application sha…

#7・

Systems Papers - ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling

Hello new and old subscribers!This week’s paper, ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling, describes a system for implementing Linux scheduling in user-space.The article is best enjoyed on my blog.Discussion on Hacker News.Custom s…

#6・

Systems Papers - Kangaroo: Caching Billions of Tiny Objects on Flash

This week’s paper, Kangaroo: Caching Billions of Tiny Objects on Flash, won a best paper award at SOSP. It describes a system that uses both flash and memory to cheaply and efficiently cache data at scale.Previous academic and industry research demonstrates s…

#5・

Systems Papers - Faster and Cheaper Serverless Computing on Harvested Resources

This week’s paper review, Faster and Cheaper Serverless Computing on Harvested Resources, evaluates running serverless workloads on dynamically resizing virtual machines, called Harvest VMs. Harvest VMs are able to shrink and expand to use unclaimed resources…

#4・

Log-structured protocols in Delos

Hello new and old subscribers!This week’s paper review, Log-structured Protocols in Delos discusses a critical component of Facebook’s system for storing control plane data, like scheduler metadata and configuration - according to the authors, Delos is replac…

#3・

The Demikernel Datapath OS Architecture for Microsecond-scale Datacenter Systems

Hello new and old subscribers!This week's paper review is The Demikernel Datapath OS Architecture for Microsecond-scale Datacenter Systems. It is one in a series we will be reading from SOSP.Demikernel is an operating systems architecture designed for an age …

#2・

Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale

Hello new and old subscribers!This week's paper is Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale. Rudra is a system for finding memory safety bugs in code written with the Rust programming language. While the language is well known for its …

#1・

RAMP-TAO: Layering atomic transactions on Facebook’s online graph store

Hello new and old subscribers,