<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Micah Learns]]></title><description><![CDATA[ Following my curiosity and writing about Computer Science/Systems research]]></description><link>https://newsletter.micahlerner.com</link><image><url>https://newsletter.micahlerner.com/img/substack.png</url><title>Micah Learns</title><link>https://newsletter.micahlerner.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 11 Apr 2026 19:07:43 GMT</lastBuildDate><atom:link href="https://newsletter.micahlerner.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Micah Lerner]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[csresearch@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[csresearch@substack.com]]></itunes:email><itunes:name><![CDATA[Micah Lerner]]></itunes:name></itunes:owner><itunes:author><![CDATA[Micah Lerner]]></itunes:author><googleplay:owner><![CDATA[csresearch@substack.com]]></googleplay:owner><googleplay:email><![CDATA[csresearch@substack.com]]></googleplay:email><googleplay:author><![CDATA[Micah Lerner]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Resiliency at Scale: Managing Google’s TPUv4 Machine Learning Supercomputer]]></title><description><![CDATA[Resiliency at Scale: Managing Google&#8217;s TPUv4 Machine Learning Supercomputer]]></description><link>https://newsletter.micahlerner.com/p/resiliency-at-scale-managing-googles</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/resiliency-at-scale-managing-googles</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Mon, 27 Jan 2025 00:41:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!PDa_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://www.usenix.org/conference/nsdi24/presentation/zu">Resiliency at Scale: Managing Google&#8217;s TPUv4 Machine Learning Supercomputer</a></p><h2>What is the research and why does it matter?</h2><p>This paper shares Google&#8217;s techniques for operating a large cluster of machine learning resources (specifically, <a href="https://blog.google/technology/ai/difference-cpu-gpu-tpu-trillium/">tensor processing units, aka &#8220;TPUs&#8221;</a>) reliably and at scale.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Micah Learns! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A challenge the research discusses stems from the requirements of modern AI, which need significant computing power to train and serve models - in systems this large, component breakage leads to costly downtime in training and serving, a topic I previously touched on in <a href="https://www.micahlerner.com/2024/01/30/gemini-fast-failure-recovery-in-distributed-training-with-in-memory-checkpoints.html">Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints</a>. The key insight from the paper is that dynamic network reconfiguration (e.g. to route network traffic from training models around faulty TPUs and to working ones) dramatically improves availability and scalability of the system.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PDa_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PDa_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 424w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 848w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1272w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PDa_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png" width="527" height="389" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:389,&quot;width&quot;:527,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PDa_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 424w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 848w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1272w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>How does the system work?</h2><p>There are five main technical components of the system:</p><ul><li><p>TPU chips and their groupings into <em>cubes</em> (a &#8220;cube is a hardware unit with 64 TPU chips arranged in a 4x4x4 3D mesh&#8221;) and <em>pods</em> (64 cubes).</p></li><li><p>The <em>inter-chip interconnect (ICI)</em>, that directly interconnects TPUs to allow device-to-device communication (the paper cites <a href="https://blogs.nvidia.com/blog/what-is-rdma/">Remote Direct Memory Access, aka RDMA</a>) without involving the CPUs. The ICI is like the &#8220;highway&#8221; that connects network traffic.</p></li><li><p><em>Optical circuit switches (OCSes)</em> which contain mirrors that actually point the network traffic (in the form of light) on the right &#8220;highway&#8221; (provided by the ICI). This technology is discussed in more detail in <a href="https://research.google/pubs/jupiter-evolving-transforming-googles-datacenter-network-via-optical-circuit-switches-and-software-defined-networking/">Jupiter evolving: Transforming Google&#8217;s datacenter network via optical circuit switches and software-defined networking</a>.</p></li><li><p>The <em>Borg cluster manager</em> (described in <a href="https://research.google/pubs/large-scale-cluster-management-at-google-with-borg/">previous research</a>) combined with a <em>Pod Manager</em> to handle TPU-specific considerations (e.g. ensuring Pod connectivity and health).</p></li><li><p>Code that allows TPUs to connect to one another (<code>liptpunet</code>) and evaluates hardware health (<code>healthd</code>).</p></li></ul><p>The key insight of the paper is that through the combination of existing general technologies (e.g. the Borg cluster scheduler) and TPU-specific adaptations (e.g. dynamically reconfiguring the network if a TPU pod has problems), it is possible to make a more resilient system that recovers from hardware failure/degradation quickly.</p><p>The design of the system is influenced by previous versions where connections between TPUs were static (in contrast with the new system where connections can be easily reconfigured) - &#8220;in a static pod, all resources in a contiguous set of nodes must be simultaneously healthy to be assigned to a user, which becomes combinatorially less likely as the system scales&#8221;.</p><p>Furthermore, static configurations posed three other challenges:</p><ul><li><p><em>Maintenance</em>: updating any part of the stack that connects TPUs mandated downtime.</p></li><li><p><em>Workload defragmentation</em>: a job had to be scheduled across contiguous GPUs, which made it hard to schedule jobs at different priorities (e.g. training jobs vs smaller experiments). With dynamic network reconfiguration, jobs can shift resources and the connections to use them on demand.</p></li><li><p><em>Deployment lead time</em>: a TPU pod wasn&#8217;t available until all of the underlying resources were ready and installed. Now, TPU pods are usable even when only partially deployed.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SMli!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SMli!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 424w, https://substackcdn.com/image/fetch/$s_!SMli!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 848w, https://substackcdn.com/image/fetch/$s_!SMli!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 1272w, https://substackcdn.com/image/fetch/$s_!SMli!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SMli!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png" width="525" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:240,&quot;width&quot;:525,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SMli!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 424w, https://substackcdn.com/image/fetch/$s_!SMli!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 848w, https://substackcdn.com/image/fetch/$s_!SMli!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 1272w, https://substackcdn.com/image/fetch/$s_!SMli!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9b79ec-e454-4afa-87d9-4804b4eb9e68_525x240.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h3>How does reconfigurability work?</h3><p>A key part of reconfigurability is quickly changing <em>connectivity</em> between TPU cubes/pods in response to failure or degradation. Two components work in concert to provide connectivity - the <em>inter-chip interconnect (ICI)</em> connects TPUs within a machine (4 TPU chips) and cube (16 machines), while the <em>Optical circuit switches (OCSes)</em> connect cubes (each containing 96 TPUs).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cjNm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cjNm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 424w, https://substackcdn.com/image/fetch/$s_!cjNm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 848w, https://substackcdn.com/image/fetch/$s_!cjNm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 1272w, https://substackcdn.com/image/fetch/$s_!cjNm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cjNm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png" width="524" height="449" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:449,&quot;width&quot;:524,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cjNm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 424w, https://substackcdn.com/image/fetch/$s_!cjNm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 848w, https://substackcdn.com/image/fetch/$s_!cjNm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 1272w, https://substackcdn.com/image/fetch/$s_!cjNm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20cd0a3e-be7b-4d88-9ee2-6572aea28fa9_524x449.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6XXk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6XXk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 424w, https://substackcdn.com/image/fetch/$s_!6XXk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 848w, https://substackcdn.com/image/fetch/$s_!6XXk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 1272w, https://substackcdn.com/image/fetch/$s_!6XXk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6XXk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png" width="528" height="685" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:685,&quot;width&quot;:528,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6XXk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 424w, https://substackcdn.com/image/fetch/$s_!6XXk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 848w, https://substackcdn.com/image/fetch/$s_!6XXk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 1272w, https://substackcdn.com/image/fetch/$s_!6XXk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ec2c1b-c882-4ac0-9cb1-f54d76d538f8_528x685.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The <em>ICI</em> handles multiple levels of the networking stack, with technologies like RDMA (which supports <a href="https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/">collectives</a> like &#8220;combine all of the data from other TPUs into a single result&#8221;) at the top-level.</p><p>The <em>libtpunet</em> software is responsible for setting up the <em>inter-chip interconnect (ICI)</em> according to a job&#8217;s needs, as well as discovering (using breadth first search, who said Leetcode isn&#8217;t useful!) and monitoring links (the latter of which becomes a factor in reconfiguration). The data generated by <code>libtpunet</code> also informs scheduling decisions.</p><p>Connectivity changes at three points in the lifecycle of a TPU cube: <em>on job start</em>, <em>on failure</em>, and <em>on migration/preemption</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!96XX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!96XX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 424w, https://substackcdn.com/image/fetch/$s_!96XX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 848w, https://substackcdn.com/image/fetch/$s_!96XX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 1272w, https://substackcdn.com/image/fetch/$s_!96XX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!96XX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png" width="1090" height="515" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:515,&quot;width&quot;:1090,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!96XX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 424w, https://substackcdn.com/image/fetch/$s_!96XX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 848w, https://substackcdn.com/image/fetch/$s_!96XX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 1272w, https://substackcdn.com/image/fetch/$s_!96XX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F260200e7-a031-4ccd-8e5e-9eb11c234fd9_1090x515.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p><em>On job start</em>, after a user submits a job that requires multiple cubes, the <em>Borg scheduler</em> selects which cubes to use. Then, the <em>Pod Manager</em> configures the <em>Optical circuit switches (OCSes)</em> to connect these cubes together in the right pattern. If that happens successfully, the <em>Pod Manager</em> communicates this fact back to Borg, which handles deploying the binary that will be executed and running the job.</p><p>Second, connectivity changes <em>on failure</em>, as detected by two software components (<code>libtpunet</code> and <code>healthd</code>). For example, when a TPU machine breaks down, an <em>optical circuit switch (OCS)</em> needs maintenance, or a network link starts showing errors. The two aforementioned systems provide signals to Borg/Pod Manager, who update the network topology for running jobs to route around failures. Additionally, these two systems reflect current state in the system&#8217;s source of truth, ensuring that future jobs don&#8217;t run into the same issues.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EUYI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EUYI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 424w, https://substackcdn.com/image/fetch/$s_!EUYI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 848w, https://substackcdn.com/image/fetch/$s_!EUYI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 1272w, https://substackcdn.com/image/fetch/$s_!EUYI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EUYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png" width="532" height="509" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7dd9542c-b622-4296-90ea-627417193363_532x509.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:509,&quot;width&quot;:532,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EUYI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 424w, https://substackcdn.com/image/fetch/$s_!EUYI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 848w, https://substackcdn.com/image/fetch/$s_!EUYI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 1272w, https://substackcdn.com/image/fetch/$s_!EUYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dd9542c-b622-4296-90ea-627417193363_532x509.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NyP_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NyP_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 424w, https://substackcdn.com/image/fetch/$s_!NyP_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 848w, https://substackcdn.com/image/fetch/$s_!NyP_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 1272w, https://substackcdn.com/image/fetch/$s_!NyP_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NyP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png" width="530" height="425" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:425,&quot;width&quot;:530,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NyP_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 424w, https://substackcdn.com/image/fetch/$s_!NyP_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 848w, https://substackcdn.com/image/fetch/$s_!NyP_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 1272w, https://substackcdn.com/image/fetch/$s_!NyP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e3dcea3-7a1b-4dfb-baa8-60478ee2c918_530x425.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, connectivity changes <em>on migration/preemption</em> - this can happen when the cluster manager chooses to de-frag a job into fewer cubes, or relocate a job to make room for a larger one. In this case, the system needs to ensure that the new resources and networking links are configured correctly.</p><h2>How is the research evaluated?</h2><p>To evaluate the system, the paper includes data on the number of reconfigurations, ability to automate maintenance, and the impact of failure recovery mechanisms on performance.</p><p>To evaluate reconfiguration rate, the authors compare the number of <code>xconnect</code> actions with the number of jobs submitted. <code>xconnect</code> handles the low-level changes to networking connections, and varies with the number of jobs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x4Ss!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x4Ss!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 424w, https://substackcdn.com/image/fetch/$s_!x4Ss!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 848w, https://substackcdn.com/image/fetch/$s_!x4Ss!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 1272w, https://substackcdn.com/image/fetch/$s_!x4Ss!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x4Ss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png" width="523" height="332" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72eb601b-20be-4499-8e19-253cb00c7653_523x332.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:332,&quot;width&quot;:523,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x4Ss!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 424w, https://substackcdn.com/image/fetch/$s_!x4Ss!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 848w, https://substackcdn.com/image/fetch/$s_!x4Ss!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 1272w, https://substackcdn.com/image/fetch/$s_!x4Ss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72eb601b-20be-4499-8e19-253cb00c7653_523x332.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Additionally, the paper contains information on the failure rate of different system components:</p><blockquote><p>In an average supercomputer, each day, 0.08% of the TPU machines, 0.005% of the ICI cables, and 0.04% of the OCS experience a failure. While these values are small, the number of jobs that are impacted by hardware outages is non-trivial because each supercomputer has a large number of machines, ICIs, and OCS. Machine and ICI outages are automatically tolerated by reconfiguring jobs to use spare healthy cubes.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a0mf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a0mf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 424w, https://substackcdn.com/image/fetch/$s_!a0mf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 848w, https://substackcdn.com/image/fetch/$s_!a0mf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 1272w, https://substackcdn.com/image/fetch/$s_!a0mf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a0mf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png" width="1092" height="323" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:323,&quot;width&quot;:1092,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a0mf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 424w, https://substackcdn.com/image/fetch/$s_!a0mf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 848w, https://substackcdn.com/image/fetch/$s_!a0mf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 1272w, https://substackcdn.com/image/fetch/$s_!a0mf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557e1028-d927-4127-b3db-2871537ec7e8_1092x323.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The claim about automatically reconfiguring the system (both in response to failures and to handle changing workloads) is supported by data shared at the beginning of the paper - availability of the system grows, even with significantly larger TPU pods. Interestingly, being able to reconfigure <em>at all</em> has a much higher impact on availability than support for fault tolerant routing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PDa_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PDa_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 424w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 848w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1272w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PDa_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png" width="527" height="389" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:389,&quot;width&quot;:527,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PDa_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 424w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 848w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1272w, https://substackcdn.com/image/fetch/$s_!PDa_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c9bc749-5b62-48d9-bcca-ba24f396ce58_527x389.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, the paper notes that there is a non-negligible cost to setting up a job for fault-tolerance, although not all jobs take advantage of the technique.</p><p>Specifically, fault-tolerant routing can slow down jobs due to congested network links, most notably for collective operations. This is visible in results for training recommendation models which use &#8220;embeddings&#8221; to represent data (part of training these models is updating embeddings on other chips, which is performed by collectives), and all-to-all communication to access embeddings on different TPUs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xdnr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xdnr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 424w, https://substackcdn.com/image/fetch/$s_!xdnr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 848w, https://substackcdn.com/image/fetch/$s_!xdnr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 1272w, https://substackcdn.com/image/fetch/$s_!xdnr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xdnr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png" width="522" height="333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:333,&quot;width&quot;:522,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xdnr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 424w, https://substackcdn.com/image/fetch/$s_!xdnr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 848w, https://substackcdn.com/image/fetch/$s_!xdnr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 1272w, https://substackcdn.com/image/fetch/$s_!xdnr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbedd3b3d-a106-485b-afdc-76759cdecdb6_522x333.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Conclusion</h2><p>The infrastructure challenges this paper addresses have become even more critical with the growth in AI model sizes and associated computational demands (although there is active discussion on <a href="https://x.com/dwarkesh_sp/status/1805613164610175207">scaling walls</a>). I&#8217;d be interested to know whether the approach and pod-structure should or would scale beyond the current implementation - for example, what happens with even larger TPU pods (or is increasing TPU pod size a non-goal)? Lastly, I found the opportunity to reduce the impact of failures even further through the use of &#8220;hot-standbys&#8221; without the use of checkpointing (potentially migrating state from a problematic node), and I&#8217;m looking to future research on that front.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Micah Learns! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[[Paper Review] ServiceRouter: Hyperscale and Minimal Cost Service Mesh at Meta]]></title><description><![CDATA[Hi everyone,]]></description><link>https://newsletter.micahlerner.com/p/paper-review-servicerouter-hyperscale</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/paper-review-servicerouter-hyperscale</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Fri, 29 Mar 2024 00:28:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!l8IH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hi everyone,</em></p><p><em>Micah here. This week&#8217;s paper is one of several notable industry papers from last year that I&#8217;ll be writing about over the next few weeks.</em></p><p><em>Enjoy!</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Just arriving? Join my growing community of engineers to receive 1 email per week with a deep dive into cutting-edge Computer Science research.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h4></h4><p><a href="https://www.usenix.org/conference/osdi23/presentation/saokar">ServiceRouter: Hyperscale and Minimal Cost Service Mesh at Meta</a></p><h2>What is the research and why does it matter?</h2><p>Many tech companies have distributed services deployed in the cloud in regions around the world. The systems often depend on each other, meaning that they need to determine which dependencies are where (<a href="https://developer.hashicorp.com/consul/docs/concepts/service-discovery">service discovery</a>), and route the requests across the network (often performed via a <a href="https://buoyant.io/service-mesh-manifesto&quot;">service mesh</a>). Inter-system communication also needs to be highly reliable and load balanced.</p><p>This paper is about Meta&#8217;s infrastructure that implements these capabilities, called ServiceRouter.</p><p>While there are well known open source systems for routing traffic (e.g. <a href="https://linkerd.io/">Linkerd</a>, <a href="https://www.envoyproxy.io/">Envoy</a>, and <a href="https://istio.io/latest/about/service-mesh/">Istio</a>), there are a few interesting components of ServiceRouter:</p><ul><li><p>ServiceRouter supports embedding inside Meta application code, significantly reducing cost from the common pattern of running separate &#8220;service mesh&#8221; infrastructure - the paper suggests that a separately deployed service mesh at Meta scale would need the equivalent of <a href="https://instances.vantage.sh/?selected=t4g.small">1,750,000 AWS t4g.small VMs</a>.</p></li><li><p>ServiceRouter is one of the first pieces of routing infrastructure discussed in research deployed at hyperscale.</p></li><li><p>ServiceRouter is able to natively handle sharded stateful services, unlike open source alternatives<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p></li><li><p>The technology handles global load balancing across regions using the novel idea of &#8220;Latency Rings&#8221;.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ddgB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ddgB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 424w, https://substackcdn.com/image/fetch/$s_!ddgB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 848w, https://substackcdn.com/image/fetch/$s_!ddgB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 1272w, https://substackcdn.com/image/fetch/$s_!ddgB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ddgB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png" width="519" height="237" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:237,&quot;width&quot;:519,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ddgB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 424w, https://substackcdn.com/image/fetch/$s_!ddgB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 848w, https://substackcdn.com/image/fetch/$s_!ddgB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 1272w, https://substackcdn.com/image/fetch/$s_!ddgB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271b136c-fa94-4174-bc05-f5c9c472a1cc_519x237.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>How does the system work?</h2><h3>Design</h3><p>There are three main functions of ServiceRouter:</p><ul><li><p>Gathering the data that informs how services talk to each other.</p></li><li><p>Distributing that data reliably around the network.</p></li><li><p>Routing a request from one service to another.</p></li></ul><p>To build a source of truth for routing decisions (which the paper calls the <em>Routing Information Base (RIB)</em>), ServiceRouter gathers information from the cluster manager about which services are running where. Importantly, ServiceRouter can also handle stateful services (discussed in my previous paper review on <a href="https://www.micahlerner.com/2022/01/08/shard-manager-a-generic-shard-management-framework-for-geo-distributed-applications.html">ShardManager</a>) - for example, some stateful services will store a specific subset of data on a specific server, so knowing the server alone is not enough.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OyZH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OyZH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 424w, https://substackcdn.com/image/fetch/$s_!OyZH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 848w, https://substackcdn.com/image/fetch/$s_!OyZH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 1272w, https://substackcdn.com/image/fetch/$s_!OyZH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OyZH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png" width="553" height="234" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:234,&quot;width&quot;:553,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OyZH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 424w, https://substackcdn.com/image/fetch/$s_!OyZH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 848w, https://substackcdn.com/image/fetch/$s_!OyZH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 1272w, https://substackcdn.com/image/fetch/$s_!OyZH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75c6276f-053e-4fca-8d2a-73abf0463347_553x234.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>ServiceRouter also gathers information that allows it to make decisions about how services talk to each other across clusters (for example, monitoring the latency of traffic from North America to South America). Using these inputs, Service Router&#8217;s control layer produces a dataset it calls the <em>Routing Information Base</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l8IH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l8IH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 424w, https://substackcdn.com/image/fetch/$s_!l8IH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 848w, https://substackcdn.com/image/fetch/$s_!l8IH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 1272w, https://substackcdn.com/image/fetch/$s_!l8IH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l8IH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png" width="503" height="359" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a73aaa06-1f01-4475-8ea4-244effe04925_503x359.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:359,&quot;width&quot;:503,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l8IH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 424w, https://substackcdn.com/image/fetch/$s_!l8IH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 848w, https://substackcdn.com/image/fetch/$s_!l8IH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 1272w, https://substackcdn.com/image/fetch/$s_!l8IH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73aaa06-1f01-4475-8ea4-244effe04925_503x359.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>ServiceRouter then distributes the <em>Routing Information Base</em> across infrastructure in the network to inform RPC routing decisions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a_jU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a_jU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 424w, https://substackcdn.com/image/fetch/$s_!a_jU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 848w, https://substackcdn.com/image/fetch/$s_!a_jU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 1272w, https://substackcdn.com/image/fetch/$s_!a_jU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a_jU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png" width="535" height="434" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:434,&quot;width&quot;:535,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a_jU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 424w, https://substackcdn.com/image/fetch/$s_!a_jU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 848w, https://substackcdn.com/image/fetch/$s_!a_jU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 1272w, https://substackcdn.com/image/fetch/$s_!a_jU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35eb91e1-b0df-480f-91bc-e2c7779bc708_535x434.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>To implement the part of the system responsible for making routing decisions, ServiceRouter supports three main types of deployments: SRLib, Remote Proxy, and Sidecar Proxy.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vd2L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vd2L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 424w, https://substackcdn.com/image/fetch/$s_!Vd2L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 848w, https://substackcdn.com/image/fetch/$s_!Vd2L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 1272w, https://substackcdn.com/image/fetch/$s_!Vd2L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vd2L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png" width="1087" height="322" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:322,&quot;width&quot;:1087,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vd2L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 424w, https://substackcdn.com/image/fetch/$s_!Vd2L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 848w, https://substackcdn.com/image/fetch/$s_!Vd2L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 1272w, https://substackcdn.com/image/fetch/$s_!Vd2L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7c2d83d-a250-4a79-9f43-495c35970258_1087x322.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J-VV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J-VV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 424w, https://substackcdn.com/image/fetch/$s_!J-VV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 848w, https://substackcdn.com/image/fetch/$s_!J-VV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 1272w, https://substackcdn.com/image/fetch/$s_!J-VV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J-VV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png" width="521" height="333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:333,&quot;width&quot;:521,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J-VV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 424w, https://substackcdn.com/image/fetch/$s_!J-VV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 848w, https://substackcdn.com/image/fetch/$s_!J-VV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 1272w, https://substackcdn.com/image/fetch/$s_!J-VV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa02cd7a8-fefc-4f2a-87a9-1ca2271c4ac3_521x333.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p><em>SRLib</em> embeds the ServiceRouter functionality actually inside the application binary, deeply integrated with application source code. While this introduces some risk (e.g. if the embedded library had a bug or vulnerability, all applications would need to be re-released), it dramatically reduces hardware cost.</p><p>There are several situations in which SRLib performs suboptimally - for example, with traffic that goes across regions, it is preferable to have a smaller set of proxies that perform the RPC forwarding using long-held connections, lowering the overhead of sending RPCs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ATSw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ATSw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 424w, https://substackcdn.com/image/fetch/$s_!ATSw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 848w, https://substackcdn.com/image/fetch/$s_!ATSw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 1272w, https://substackcdn.com/image/fetch/$s_!ATSw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ATSw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png" width="531" height="323" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:323,&quot;width&quot;:531,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ATSw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 424w, https://substackcdn.com/image/fetch/$s_!ATSw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 848w, https://substackcdn.com/image/fetch/$s_!ATSw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 1272w, https://substackcdn.com/image/fetch/$s_!ATSw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84902c9-34e5-4598-8257-6a44c4d53b0a_531x323.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>ServiceRouter also supports codebases where it is difficult or impossible to embed SRLib directly. The paper cites one example of internal Erlang applications which didn&#8217;t have builtin support for the library, but still wanted to make use of Meta-internal systems.</p><h3>Load Balancing</h3><p>One of the most novel features of ServiceRouter is its approach to global load balancing traffic across regions.</p><p>The system implements this capability using the idea of <em>locality rings</em> for a service:</p><blockquote><p>An RPC client uses cross-region RTTs to estimate its latency to different servers. Starting from ring1, if the client finds any RPC server whose latency is within the latency bound for ring i, it filters out all servers in ring i+1 and above, and randomly samples two servers from ring i. If the service has no servers in ring i, it considers servers in ring i+1, and so forth. SR&#8217;s default setting maps [ring1 ring2 ring3 ring4] to [same region neighboring regions same continent global].</p></blockquote><p>The paper discusses several downsides to this approach, notably that latency alone doesn&#8217;t reflect how servers in a <em>locality ring</em> are being utilized. To solve this shortcoming, ServiceRouter integrates another input to the Routing Information Base - the load of a &#8220;locality ring&#8221;. This data allows ServiceRouter to support functionality like &#8220;route X% of traffic to this locality ring until the load of that locality ring exceeds X%, then send traffic to the next locality ring.&#8221;</p><p>The paper also discusses alternatives to the locality ring approach, including relying solely on RPC latency and feedback from a service about overload to decide when to send traffic to a different locality ring - the authors decided not to follow this approach as they argue that routing would change only under severe overload.</p><h2>How is the research evaluated?</h2><p>The paper evaluates ServiceRouter on four main aspects: its scalability, the cost-savings of an embedded routing library, performance of global load balancing, and ability to handle sharded services.</p><p>To assess scalability, the paper shares data on the number of servers used by services and the requests per second by service:</p><blockquote><p>A small fraction of services are very large while most are very small. Specifically, while 90% of services each use less than 200 servers, 2% of services each use more than 2,000 servers and the largest service uses about 90&#263; servers&#8230;Similarly, while most services have a low RPS, some hyperscale services process billions of RPS.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UUgg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UUgg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 424w, https://substackcdn.com/image/fetch/$s_!UUgg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 848w, https://substackcdn.com/image/fetch/$s_!UUgg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 1272w, https://substackcdn.com/image/fetch/$s_!UUgg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UUgg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png" width="525" height="338" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:525,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UUgg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 424w, https://substackcdn.com/image/fetch/$s_!UUgg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 848w, https://substackcdn.com/image/fetch/$s_!UUgg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 1272w, https://substackcdn.com/image/fetch/$s_!UUgg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa8204f-e939-4f5e-a8fb-42ac8009672c_525x338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JWAB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JWAB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 424w, https://substackcdn.com/image/fetch/$s_!JWAB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 848w, https://substackcdn.com/image/fetch/$s_!JWAB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 1272w, https://substackcdn.com/image/fetch/$s_!JWAB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JWAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png" width="531" height="337" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/537fc3c3-6818-4baf-b810-8f9762613113_531x337.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:337,&quot;width&quot;:531,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JWAB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 424w, https://substackcdn.com/image/fetch/$s_!JWAB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 848w, https://substackcdn.com/image/fetch/$s_!JWAB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 1272w, https://substackcdn.com/image/fetch/$s_!JWAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537fc3c3-6818-4baf-b810-8f9762613113_531x337.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The paper also discusses several scalability challenges, specifically with the <em>Routing Information Base</em>, which must store data on Meta&#8217;s ever-changing services and production infrastructure. Interestingly, the authors say that the RIB is not currently a bottleneck, following their work to migrate off of <a href="https://zookeeper.apache.org/">Zookeeper</a> and onto a custom datastore.</p><p>To evaluate hardware cost, the paper compares RPC latency and CPU overhead for Meta&#8217;s raw RPC library (called Thrift), embedded SRLib and the SRProxy - &#8220;across the RPC client and proxy, the SRProxy setup in total consumes more than twice the amount of CPU cycles as the SRLib setup&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MDDv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MDDv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 424w, https://substackcdn.com/image/fetch/$s_!MDDv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 848w, https://substackcdn.com/image/fetch/$s_!MDDv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 1272w, https://substackcdn.com/image/fetch/$s_!MDDv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MDDv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png" width="546" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:546,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MDDv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 424w, https://substackcdn.com/image/fetch/$s_!MDDv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 848w, https://substackcdn.com/image/fetch/$s_!MDDv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 1272w, https://substackcdn.com/image/fetch/$s_!MDDv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c57e142-ce36-4736-9089-ef0801c3d458_546x817.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The paper also includes several production use cases of SRProxy. One example was for a sharded system that sends traffic cross-region. SRProxy was able to reduce cross-region latency because it reuses connections. Because ServiceRouter was able to effectively support cross-region load balancing, the system didn&#8217;t need to replicate all the shards to all the regions, significantly reducing capacity usage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fadr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fadr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 424w, https://substackcdn.com/image/fetch/$s_!fadr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 848w, https://substackcdn.com/image/fetch/$s_!fadr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 1272w, https://substackcdn.com/image/fetch/$s_!fadr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fadr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png" width="526" height="335" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:335,&quot;width&quot;:526,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fadr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 424w, https://substackcdn.com/image/fetch/$s_!fadr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 848w, https://substackcdn.com/image/fetch/$s_!fadr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 1272w, https://substackcdn.com/image/fetch/$s_!fadr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3ac96b-a15c-4470-a0c4-de1e1cebc3dc_526x335.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>To evaluate load balancing, the paper considers the permutations of same-region and cross-region load balancing for both sharded and unsharded services. For same region traffic of unsharded services, load balancing is low, represented with a low &#8220;coefficient of variation&#8221; for CPU usage and outstanding requests. </p><p>The story for sharded services is more complicated due to inherent shard imbalance - &#8220;some shards are hot (receiving a lot of traffic) while others are cold (receiving little traffic), due to the nature of data stored in the shards.&#8221; In other words, even if ServiceRouter performs perfectly, there will always be some variation of load between shards.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ycEq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ycEq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 424w, https://substackcdn.com/image/fetch/$s_!ycEq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 848w, https://substackcdn.com/image/fetch/$s_!ycEq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 1272w, https://substackcdn.com/image/fetch/$s_!ycEq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ycEq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png" width="521" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:521,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ycEq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 424w, https://substackcdn.com/image/fetch/$s_!ycEq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 848w, https://substackcdn.com/image/fetch/$s_!ycEq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 1272w, https://substackcdn.com/image/fetch/$s_!ycEq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58fc9f0-50c7-4f10-9cca-ea766c8da65a_521x485.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>To evaluate global load balancing with locality rings, the paper includes an example of an incident where traffic spilled cross-region, and SR was able to balance load below the 75% locality threshold.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JFr4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JFr4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 424w, https://substackcdn.com/image/fetch/$s_!JFr4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 848w, https://substackcdn.com/image/fetch/$s_!JFr4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 1272w, https://substackcdn.com/image/fetch/$s_!JFr4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JFr4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png" width="515" height="444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:444,&quot;width&quot;:515,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JFr4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 424w, https://substackcdn.com/image/fetch/$s_!JFr4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 848w, https://substackcdn.com/image/fetch/$s_!JFr4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 1272w, https://substackcdn.com/image/fetch/$s_!JFr4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ca207b-52f7-4614-92f8-a98576cf5b0f_515x444.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, the paper shows that traffic to sharded services makes up a significant portion of total traffic, highlighting the requirement that this idea needs to be natively supported in Meta&#8217;s service mesh.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9ACN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9ACN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 424w, https://substackcdn.com/image/fetch/$s_!9ACN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 848w, https://substackcdn.com/image/fetch/$s_!9ACN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 1272w, https://substackcdn.com/image/fetch/$s_!9ACN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9ACN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png" width="522" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:522,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9ACN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 424w, https://substackcdn.com/image/fetch/$s_!9ACN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 848w, https://substackcdn.com/image/fetch/$s_!9ACN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 1272w, https://substackcdn.com/image/fetch/$s_!9ACN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb006b748-0846-427d-b1a9-bb638f8d30a7_522x571.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Conclusion</h2><p>While service meshes aren&#8217;t necessarily novel, ServiceRouter&#8217;s deployment at scale, along with its implementation of global load balancing and support for sharded services is unique. </p><p>Load balancing cross region at scale, in particular to handle reliability issues, is non-trivial. I&#8217;d be interested in hearing more about how teams formulate locality rings (as from the paper, it seems like some custom tuning is involved). Furthermore, the ideas behind locality rings seems ripe for further development - are latency and CPU usage the only factors that locality rings should be limited to? Relying only on those two metrics seems like a potential source of further instability during an incident (e.g. if a region was <em>quickly</em> returning many errors, its CPU utilization might appear low, meaning that ServiceRouter would send requests there, potentially exacerbating an outage with overload).</p><p>Lastly, embedding SRLib in an application&#8217;s code saves resources, but seems like it would introduce risk. For example, if SRLib had a fleet-wide security vulnerability or performance regression that couldn&#8217;t be turned off, what would the impact to services and developers be?</p><p>In future paper reviews, I&#8217;ll continue diving deeper on sharded services in hyperscale environments - for example, I&#8217;m planning on comparing ServiceRouter and its discussion of sharded services with Google&#8217;s paper from 2016 on <a href="https://www.usenix.org/system/files/conference/osdi16/osdi16-adya.pdf">Slicer</a>.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The paper makes the claim that open source service meshes &#8216;exclusively focus on un-sharded services&#8217;, which I&#8217;m slightly confused about - one of my friends <a href="https://twitter.com/MaxGurewitz">Max</a> pointed out that people run sharded services like <a href="https://events.istio.io/istiocon-2021/sessions/the-benefits-of-integrating-apache-kafka-with-istio-on-kubernetes/">Kafka</a> on Kubernetes.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>The paper also mentions that Istio <a href="https://istio.io/latest/docs/tasks/traffic-management/locality-load-balancing/">supports a simpler version</a> of this idea. .</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The paper also mentions a fourth, SRLookaside which is now deprecated.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[[Paper Review] A Cloud-Scale Characterization of Remote Procedure Calls]]></title><description><![CDATA[Hi everyone,]]></description><link>https://newsletter.micahlerner.com/p/paper-review-a-cloud-scale-characterization</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/paper-review-a-cloud-scale-characterization</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Tue, 05 Mar 2024 01:00:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Bcex!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hi everyone,</em></p><p><em>Micah here. This week&#8217;s paper is one of several papers I&#8217;ll be reading from 2023&#8217;s Symposium on Operating Systems Principles (SOSP). </em></p><p><em>Enjoy!</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Just arriving? Join my growing community of engineers to receive 1 email per week with a deep dive into cutting-edge Computer Science research.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><a href="https://dl.acm.org/doi/abs/10.1145/3600006.3613156">&#8220;A Cloud-Scale Characterization of Remote Procedure Calls&#8221;</a></p><h2>What is the research and why does it matter?</h2><p>This paper is slightly different from others I&#8217;ve written about recently - rather than a novel system design, it contains a characterization of a production system, with the goals of sharing data with the research community.</p><p>Specifically, the research dives deep on contributors to Remote Procedure Call (RPC) latency and performance in Google&#8217;s hyper-scale systems, including Google Search, Google Maps, Gmail, and YouTube. In some areas, this data matches existing research and points towards the benefits that further investment could provide. Other datapoints don&#8217;t match up with previous thinking, indicating the possibility of new research threads or focused interest in existing ideas!</p><h2>Characteristics of RPCs at Hyperscale</h2><p>The paper focuses on production traffic that uses Google&#8217;s internal RPC library, <a href="https://sre.google/sre-book/production-environment/">Stubby</a>. This traffic powers first-party services and their accesses to other internal systems and databases. The authors use data from Monarch (<a href="https://www.micahlerner.com/2022/04/24/monarch-googles-planet-scale-in-memory-time-series-database.html">the subject of a previous paper review!</a>), Dapper (<a href="https://research.google/pubs/dapper-a-large-scale-distributed-systems-tracing-infrastructure/">a library for distributed tracing</a>) and Google Wide Profiling (<a href="https://research.google/pubs/google-wide-profiling-a-continuous-profiling-infrastructure-for-data-centers/">which continuously gathers performance data</a> from services deployed in Google datacenters).</p><p>An analysis of these datasets exposes five insights about Google-internal RPCs.</p><ul><li><p>RPC performance is growing over time</p></li><li><p>Some RPCs take microseconds, while many take milliseconds</p></li><li><p>The RPC call graph mostly involves fan out, rather than deep call trees.</p></li><li><p>Many RPCs response/request sizes are small, but some are quite large.</p></li><li><p>A significant portion of RPC traffic is associated with access to storage.</p></li></ul><p>First, the authors measure RPC performance improvement over time using &#8220;RPCs per CPU cycle&#8221;. This effect is because of factors like optimizing the RPC library (which reduces the cost of sending RPCs, allowing more RPCs to be sent with fewer resources). In turn, these performance improvements are posing a greater load on <em>other</em> resources, like the network.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t1la!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t1la!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 424w, https://substackcdn.com/image/fetch/$s_!t1la!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 848w, https://substackcdn.com/image/fetch/$s_!t1la!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 1272w, https://substackcdn.com/image/fetch/$s_!t1la!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t1la!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png" width="585" height="217" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:217,&quot;width&quot;:585,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t1la!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 424w, https://substackcdn.com/image/fetch/$s_!t1la!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 848w, https://substackcdn.com/image/fetch/$s_!t1la!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 1272w, https://substackcdn.com/image/fetch/$s_!t1la!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68da0ca7-d576-4c5f-bc8f-1f23582e9498_585x217.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Second, &#8220;not all RPCs are the same&#8221; in terms of their latency - some take microseconds, while others take milliseconds. Furthermore, a small number of RPC calls make up a majority of traffic, meaning that optimizing them could have outsized impact. Other calls are infrequent, but take up significant computation - &#8220;the slowest 1000 RPC methods account for only 1.1% of all calls, but they take 89% of the total RPC time.&#8221; The authors share data on per-method RPC latency and frequency to demonstrate these trends.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bcex!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bcex!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 424w, https://substackcdn.com/image/fetch/$s_!Bcex!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 848w, https://substackcdn.com/image/fetch/$s_!Bcex!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 1272w, https://substackcdn.com/image/fetch/$s_!Bcex!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bcex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png" width="587" height="334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:334,&quot;width&quot;:587,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bcex!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 424w, https://substackcdn.com/image/fetch/$s_!Bcex!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 848w, https://substackcdn.com/image/fetch/$s_!Bcex!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 1272w, https://substackcdn.com/image/fetch/$s_!Bcex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3a6534-feb1-4605-aa93-fd25e99f04a7_587x334.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N1iw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N1iw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 424w, https://substackcdn.com/image/fetch/$s_!N1iw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 848w, https://substackcdn.com/image/fetch/$s_!N1iw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 1272w, https://substackcdn.com/image/fetch/$s_!N1iw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N1iw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png" width="598" height="294" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba147396-f986-452a-99e5-62a393477efc_598x294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:294,&quot;width&quot;:598,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N1iw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 424w, https://substackcdn.com/image/fetch/$s_!N1iw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 848w, https://substackcdn.com/image/fetch/$s_!N1iw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 1272w, https://substackcdn.com/image/fetch/$s_!N1iw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba147396-f986-452a-99e5-62a393477efc_598x294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Third, &#8220;RPCs are Wider than Deep&#8221; - RPCs have significant fan out into other systems Google infrastructure, but don&#8217;t normally result in many services calling each other far down into the stack. The authors note this behavior matches with existing studies from <a href="https://dl.acm.org/doi/abs/10.1145/3472883.3487003">Alibaba</a> and <a href="https://www.usenix.org/conference/atc23/presentation/huye">Meta</a>. The paper visualizes this insight with CDFs of &#8220;descendants&#8221; and &#8220;ancestors&#8221; in the RPC call graph - &#8220;looking at the number of descendants shows the scale of distributed computation performed by an RPC, and the number of ancestors provides insights into how the properties of RPCs change as they get deeper into the call graph of a root RPC.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SH40!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SH40!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 424w, https://substackcdn.com/image/fetch/$s_!SH40!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 848w, https://substackcdn.com/image/fetch/$s_!SH40!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 1272w, https://substackcdn.com/image/fetch/$s_!SH40!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SH40!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png" width="585" height="335" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:335,&quot;width&quot;:585,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SH40!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 424w, https://substackcdn.com/image/fetch/$s_!SH40!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 848w, https://substackcdn.com/image/fetch/$s_!SH40!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 1272w, https://substackcdn.com/image/fetch/$s_!SH40!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec10b1-eec4-4160-b322-1c612f396a7f_585x335.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!20rg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!20rg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 424w, https://substackcdn.com/image/fetch/$s_!20rg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 848w, https://substackcdn.com/image/fetch/$s_!20rg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 1272w, https://substackcdn.com/image/fetch/$s_!20rg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!20rg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png" width="606" height="335" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:335,&quot;width&quot;:606,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!20rg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 424w, https://substackcdn.com/image/fetch/$s_!20rg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 848w, https://substackcdn.com/image/fetch/$s_!20rg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 1272w, https://substackcdn.com/image/fetch/$s_!20rg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31536b67-798a-48a5-b282-fb457fe0ed31_606x335.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Fourth, there is an &#8220;elephant and mice distribution&#8221; of RPC sizes - &#8220;most RPCs are small with the smallest a single cache line (64 B)&#8221;. Others are significantly larger - &#8220;P99 requests and responses are 196 KB and 563 KB&#8221;. This data shows that projects like hardware accelerators would be able to optimize significant parts of the RPC workload, but would not be able to handle others (specifically, the authors reference <a href="https://dl.acm.org/doi/10.1145/3458336.3465283">&#8220;Zerializer: Towards zero-copy serialization&#8221;</a>). The authors present this data using CDFs that show percentiles of request sizes and the ratio between response/request.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VrWn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VrWn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 424w, https://substackcdn.com/image/fetch/$s_!VrWn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 848w, https://substackcdn.com/image/fetch/$s_!VrWn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 1272w, https://substackcdn.com/image/fetch/$s_!VrWn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VrWn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png" width="586" height="338" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:586,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VrWn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 424w, https://substackcdn.com/image/fetch/$s_!VrWn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 848w, https://substackcdn.com/image/fetch/$s_!VrWn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 1272w, https://substackcdn.com/image/fetch/$s_!VrWn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0ad76d7-3457-4d7d-bc8e-36235ca2f34d_586x338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, a significant portion of RPC traffic is associated with accesses to storage - &#8220;these findings motivate application-specific optimizations, especially on storage systems, as storage is by far the largest distributed application in the fleet.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PNRs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PNRs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 424w, https://substackcdn.com/image/fetch/$s_!PNRs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 848w, https://substackcdn.com/image/fetch/$s_!PNRs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 1272w, https://substackcdn.com/image/fetch/$s_!PNRs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PNRs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png" width="569" height="304" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:304,&quot;width&quot;:569,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PNRs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 424w, https://substackcdn.com/image/fetch/$s_!PNRs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 848w, https://substackcdn.com/image/fetch/$s_!PNRs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 1272w, https://substackcdn.com/image/fetch/$s_!PNRs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6033c8f-fa16-4bcc-a6b4-92e587da41b1_569x304.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>RPC Latency</h2><p>The papers dive into the sources of RPC latency in a client-server interaction - at a high level, the components boil down to client/server send and receive queues, server processing logic, the networking stack, and networking infrastructure.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4tJZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4tJZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 424w, https://substackcdn.com/image/fetch/$s_!4tJZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 848w, https://substackcdn.com/image/fetch/$s_!4tJZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 1272w, https://substackcdn.com/image/fetch/$s_!4tJZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4tJZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png" width="585" height="207" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:207,&quot;width&quot;:585,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4tJZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 424w, https://substackcdn.com/image/fetch/$s_!4tJZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 848w, https://substackcdn.com/image/fetch/$s_!4tJZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 1272w, https://substackcdn.com/image/fetch/$s_!4tJZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88e62be-de98-403e-b537-9aea56b3cfd9_585x207.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>To describe the cost of sending an RPC to an external service, minus server processing time, the paper uses the term <em>RPC latency tax</em> - the paper focuses on this because while &#8220;application-processing time dominates&#8230;[the] RPC tax can be significant.&#8221; This tax applies no matter how good a server gets at returning a response - for many RPC calls this tax makes up the bulk of their time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fcbz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fcbz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 424w, https://substackcdn.com/image/fetch/$s_!fcbz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 848w, https://substackcdn.com/image/fetch/$s_!fcbz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 1272w, https://substackcdn.com/image/fetch/$s_!fcbz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fcbz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png" width="589" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:589,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fcbz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 424w, https://substackcdn.com/image/fetch/$s_!fcbz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 848w, https://substackcdn.com/image/fetch/$s_!fcbz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 1272w, https://substackcdn.com/image/fetch/$s_!fcbz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f771ce8-71ab-472f-a352-f0eb2c7f0fbd_589x478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>This tax also varies across different types of services. For example, RPCs to an SSD cache would benefit the most from reducing the time an RPC spends in the server send queue, while RPC calls to the F1 database would benefit the most from reducing time in the client recv queue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fcsH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fcsH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 424w, https://substackcdn.com/image/fetch/$s_!fcsH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 848w, https://substackcdn.com/image/fetch/$s_!fcsH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 1272w, https://substackcdn.com/image/fetch/$s_!fcsH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fcsH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png" width="572" height="526" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b678e306-5798-4981-a55c-b66c33133d69_572x526.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:526,&quot;width&quot;:572,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fcsH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 424w, https://substackcdn.com/image/fetch/$s_!fcsH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 848w, https://substackcdn.com/image/fetch/$s_!fcsH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 1272w, https://substackcdn.com/image/fetch/$s_!fcsH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb678e306-5798-4981-a55c-b66c33133d69_572x526.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M3i8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M3i8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 424w, https://substackcdn.com/image/fetch/$s_!M3i8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 848w, https://substackcdn.com/image/fetch/$s_!M3i8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 1272w, https://substackcdn.com/image/fetch/$s_!M3i8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M3i8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png" width="561" height="412" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88710d21-2448-4df6-aec8-b54f410357c2_561x412.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:412,&quot;width&quot;:561,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M3i8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 424w, https://substackcdn.com/image/fetch/$s_!M3i8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 848w, https://substackcdn.com/image/fetch/$s_!M3i8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 1272w, https://substackcdn.com/image/fetch/$s_!M3i8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88710d21-2448-4df6-aec8-b54f410357c2_561x412.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The RPC latency tax also varies across clusters - in other words, a service can respond faster to RPCs when it is deployed in cluster A instead of cluster B. This happens because of characteristics of the cluster, like CPU utilization and memory bandwidth - the paper calls these <em>exogenous variables</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LMox!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LMox!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 424w, https://substackcdn.com/image/fetch/$s_!LMox!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 848w, https://substackcdn.com/image/fetch/$s_!LMox!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 1272w, https://substackcdn.com/image/fetch/$s_!LMox!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LMox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png" width="569" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:240,&quot;width&quot;:569,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LMox!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 424w, https://substackcdn.com/image/fetch/$s_!LMox!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 848w, https://substackcdn.com/image/fetch/$s_!LMox!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 1272w, https://substackcdn.com/image/fetch/$s_!LMox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59db58e6-dc79-4d51-b2ea-eed8809ec558_569x240.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><blockquote><p>Each application category reacts differently towards these exogenous variables. Bigtable is a server-processing-heavy workload, and its performance is highly dependent on CPU utilization, memory bandwidth, wake-up time, and cycles per instruction. Video Metadata is queuing heavy, which follows a similar trend.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Nei!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Nei!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 424w, https://substackcdn.com/image/fetch/$s_!7Nei!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 848w, https://substackcdn.com/image/fetch/$s_!7Nei!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 1272w, https://substackcdn.com/image/fetch/$s_!7Nei!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Nei!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png" width="588" height="536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:536,&quot;width&quot;:588,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Nei!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 424w, https://substackcdn.com/image/fetch/$s_!7Nei!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 848w, https://substackcdn.com/image/fetch/$s_!7Nei!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 1272w, https://substackcdn.com/image/fetch/$s_!7Nei!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcddc9d07-83c3-4003-8d08-2ab3d004c3c1_588x536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4xpA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4xpA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 424w, https://substackcdn.com/image/fetch/$s_!4xpA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 848w, https://substackcdn.com/image/fetch/$s_!4xpA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 1272w, https://substackcdn.com/image/fetch/$s_!4xpA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4xpA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png" width="598" height="510" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:510,&quot;width&quot;:598,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4xpA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 424w, https://substackcdn.com/image/fetch/$s_!4xpA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 848w, https://substackcdn.com/image/fetch/$s_!4xpA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 1272w, https://substackcdn.com/image/fetch/$s_!4xpA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52b99431-f7f0-457d-a9e1-5cd2c167f2f8_598x510.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Resource Utilization of RPCs</h2><p>There is also another cost for RPCs, the CPU cost, which the paper calls the <em>cycle tax</em>. Multiple components of the RPC flow contribute, however compression dominates.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1bSv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1bSv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 424w, https://substackcdn.com/image/fetch/$s_!1bSv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 848w, https://substackcdn.com/image/fetch/$s_!1bSv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 1272w, https://substackcdn.com/image/fetch/$s_!1bSv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1bSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png" width="575" height="274" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:274,&quot;width&quot;:575,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1bSv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 424w, https://substackcdn.com/image/fetch/$s_!1bSv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 848w, https://substackcdn.com/image/fetch/$s_!1bSv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 1272w, https://substackcdn.com/image/fetch/$s_!1bSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfa2d689-6cff-4c09-99ce-1c9b92883ffb_575x274.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The paper also evalutes the CPU cycle usage from unsuccessful RPCs - the single largest contributor are cancelled requests (likely sent because of <a href="https://blog.acolyer.org/2015/01/15/the-tail-at-scale/">request hedging</a>). Other types of potentially avoidable errors consume a suprising amount of CPU resources (e.g. &#8220;entity not found&#8221; response codes).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hK8D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hK8D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 424w, https://substackcdn.com/image/fetch/$s_!hK8D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 848w, https://substackcdn.com/image/fetch/$s_!hK8D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 1272w, https://substackcdn.com/image/fetch/$s_!hK8D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hK8D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png" width="467" height="330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:330,&quot;width&quot;:467,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hK8D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 424w, https://substackcdn.com/image/fetch/$s_!hK8D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 848w, https://substackcdn.com/image/fetch/$s_!hK8D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 1272w, https://substackcdn.com/image/fetch/$s_!hK8D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba66faf-325c-47f7-ac1e-77bf84505b8f_467x330.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Conclusion</h2><p>I enjoyed this paper because of its focus on providing data on the potential impact of several opens areas of academic research - without this thorough characterization, it would be difficult to understand their expected value.</p><p>While many proposals are focused on <a href="https://research.google/pubs/attack-of-the-killer-microseconds/">Attack of the killer microseconds</a>, these improvements aren&#8217;t required for many RPCs. The research also highlights challenges with solutions to known problems like tail latency - approaches like request hedging have their own downsides in wasted CPU resources. Rather than trying to globally optimize RPCs, focusing on specific operations is likely to the highest impact - &#8220;the 10 most popular RPC methods account for 58% of all calls and the top-100 account for 91% of all calls.&#8221; On the hardware front, accelerators (some of which have already been discussed in research) could yield significant benefits - for example, previous papers evaluated a <a href="https://dl.acm.org/doi/abs/10.1145/3466752.3480051">hardware accelerator for protocol buffers</a>.</p><p>With the insights from this paper, I&#8217;m looking forward to seeing how the authors follow up with future improvements and to see how other groups respond in adjusting the direction of their research (or not).</p><p>Lastly, the research cited a number of other industry studies about microservices architectures and their costs that I&#8217;m hoping to dive into with future paper reviews - specifically <a href="https://www.usenix.org/conference/osdi23/presentation/saokar">ServiceRouter</a>.</p>]]></content:encoded></item><item><title><![CDATA[Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints]]></title><description><![CDATA[Hi everyone,]]></description><link>https://newsletter.micahlerner.com/p/gemini-fast-failure-recovery-in-distributed</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/gemini-fast-failure-recovery-in-distributed</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Wed, 31 Jan 2024 01:44:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gOJh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hi everyone,</em></p><p><em>Micah here. This week&#8217;s paper is one of several papers I&#8217;ll be reading from 2023&#8217;s Symposium on Operating Systems Principles (SOSP). </em></p><p><em>Enjoy!</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Just arriving? Join my growing community of engineers to receive 1 email per week with a deep dive into cutting-edge Computer Science research.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><em>This week&#8217;s paper is one of several papers I&#8217;ll be reading from 2023&#8217;s Symposium on Operating Systems Principles (SOSP). Enjoy!</em></p><p><a href="https://www.micahlerner.com/assets/papers/gemini.pdf">Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints</a></p><h2>What is the research and why does it matter?</h2><p>Training AI models requires a large amount of compute resources, in particular GPUs. Many large companies purchase their own GPUs, leading to both up front costs of acquisition, as well as ongoing spend to power the clusters where the GPUs are hosted. Furthermore, at an organization with many projects requiring these resources, there is contention for compute time.</p><p>While the first two sources of cost are difficult to minimize, effective usage of compute time is a growing area of research. One way that teams are making gains is by improving the reliability of model training - if a machine involved in training fails, in-progress work may be lost. Motivated by data that existing solutions don&#8217;t handle this case well, the authors propose a framework for limiting the amount of wasted resources - &#8220;According to the report from OPT-175B training&#8230;about 178,000 GPU hours were wasted due to various training failures.&#8221;</p><p>The Gemini paper aims to solves this problem by providing a failure recovery system for training runs. Rather than solely relying on far remote storage which is costly to read/write to, Gemini builds a multi-level cache comprising GPU memory, local and remote CPU memory, with persistent storage as a last resort. This new design and implementation is able to successfully achieve 13x faster failure recovery without significantly impacting training.</p><h2>Motivation</h2><p>A key idea in Gemini is &#8220;checkpointing&#8221; progress so that when a failure happens, the system doesn&#8217;t need to recompute results.</p><p>The paper lays out three factors to &#8220;wasted time&#8221; in existing checkpoint-based systems:</p><ul><li><p><em>Checkpoint time</em>: how long a model takes to build an intermediate checkpoint.</p></li><li><p><em>Checkpoint frequency</em>: how often a training system builds intermediate states.</p></li><li><p><em>Retrieval time</em>: how long it takes to fetch a checkpoint.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gOJh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gOJh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 424w, https://substackcdn.com/image/fetch/$s_!gOJh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 848w, https://substackcdn.com/image/fetch/$s_!gOJh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 1272w, https://substackcdn.com/image/fetch/$s_!gOJh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gOJh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png" width="529" height="370" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:370,&quot;width&quot;:529,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gOJh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 424w, https://substackcdn.com/image/fetch/$s_!gOJh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 848w, https://substackcdn.com/image/fetch/$s_!gOJh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 1272w, https://substackcdn.com/image/fetch/$s_!gOJh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58f7d633-94e2-4179-ab92-a2175cfebe12_529x370.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>These factors manifest in existing systems where checkpoints are resource-intensive to create, limiting how often the system produces them. In turn, this impacts the freshness of checkpoints and increases &#8220;wasted time&#8221; - a system resuming from an old checkpoint will redo more computation.</p><p>Beyond creating the checkpoints, the system must store them and make them available with low-latency - to drive down wasted time, Gemini proposes a system that aims to &#8220;maximize the probability of failure recovery from checkpoints stored in CPU memory&#8221;.</p><h2>How does the system work?</h2><p>There are two parts of the Gemini system: the <em>checkpoint creation module</em> and the <em>failure recovery module</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KyD4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KyD4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 424w, https://substackcdn.com/image/fetch/$s_!KyD4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 848w, https://substackcdn.com/image/fetch/$s_!KyD4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 1272w, https://substackcdn.com/image/fetch/$s_!KyD4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KyD4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png" width="523" height="513" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:513,&quot;width&quot;:523,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KyD4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 424w, https://substackcdn.com/image/fetch/$s_!KyD4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 848w, https://substackcdn.com/image/fetch/$s_!KyD4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 1272w, https://substackcdn.com/image/fetch/$s_!KyD4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70592bfd-57f5-4a5d-97d8-1d75a75caf37_523x513.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The <em>checkpoint creation module</em> creates checkpoints and figures out where to store them. It implements a distributed multi-level cache of checkpoints stored in CPU Memory, GPU Memory, and Remote Storage.</p><p>The <em>failure recovery module</em> is responsible for evaluating whether a component of the system has failed and needs replacement - it contains contains four components:</p><ul><li><p><em>Gemini worker agents</em>: each node participating in training has a process that reports on machine health and provides state updates.</p></li><li><p><em>Root agent</em>: worker agent promoted to leader via distributed consensus algorithm (e.g. <a href="https://www.micahlerner.com/2020/05/08/understanding-raft-consensus.html">Raft</a>) and provides commands to workers (e.g. recover from failure).</p></li><li><p><em>Distributed KV Store</em>: stores state on the machines in the network and assists in electing new root agents on failure.</p></li><li><p><em>Cloud Operator</em>: the root agent communicates with a hosting provider (e.g. a central schedular) to perform actions like requesting more resources (e.g. more machines with GPUs).</p></li></ul><h3>Technical Challenges</h3><p>The system aims to minimize &#8220;wasted time&#8221; by limiting the impact of machine failure. Key to the system&#8217;s approach is distributing checkpoints across machines in the network - the paper investigates different configurations of where to place the checkpoints - it calls them <em>group</em>, <em>ring, and _mixed</em>. The paper also includes pseudocode of the <em>group</em> algorithm.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4ppr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4ppr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 424w, https://substackcdn.com/image/fetch/$s_!4ppr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 848w, https://substackcdn.com/image/fetch/$s_!4ppr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 1272w, https://substackcdn.com/image/fetch/$s_!4ppr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4ppr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png" width="1051" height="301" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:301,&quot;width&quot;:1051,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4ppr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 424w, https://substackcdn.com/image/fetch/$s_!4ppr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 848w, https://substackcdn.com/image/fetch/$s_!4ppr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 1272w, https://substackcdn.com/image/fetch/$s_!4ppr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e3867f5-5406-4771-9519-28d945428d3f_1051x301.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9PG2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9PG2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 424w, https://substackcdn.com/image/fetch/$s_!9PG2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 848w, https://substackcdn.com/image/fetch/$s_!9PG2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 1272w, https://substackcdn.com/image/fetch/$s_!9PG2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9PG2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png" width="531" height="558" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a58f0748-e211-4513-b109-1e104ad6d54a_531x558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:531,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9PG2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 424w, https://substackcdn.com/image/fetch/$s_!9PG2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 848w, https://substackcdn.com/image/fetch/$s_!9PG2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 1272w, https://substackcdn.com/image/fetch/$s_!9PG2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa58f0748-e211-4513-b109-1e104ad6d54a_531x558.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Distributing checkpoints is not without its cost, particularly to networking resources - as model training also uses the network, competing traffic could impact performance. To address this the authors implement <em>traffic interleaving</em>, which attempts to send the checkpoints over the network in a way that doesn&#8217;t interfere with other network traffic associated with training.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5DAE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5DAE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 424w, https://substackcdn.com/image/fetch/$s_!5DAE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 848w, https://substackcdn.com/image/fetch/$s_!5DAE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 1272w, https://substackcdn.com/image/fetch/$s_!5DAE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5DAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png" width="522" height="420" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:420,&quot;width&quot;:522,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5DAE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 424w, https://substackcdn.com/image/fetch/$s_!5DAE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 848w, https://substackcdn.com/image/fetch/$s_!5DAE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 1272w, https://substackcdn.com/image/fetch/$s_!5DAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d879fcc-7a6e-491f-b033-36b430f125a0_522x420.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>After creating a checkpoint, Gemini transfers it to remote GPU memory on another machine in the network, then that machine transfers it to CPU memory (which is relatively cheaper and more abundant).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6ZwO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6ZwO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 424w, https://substackcdn.com/image/fetch/$s_!6ZwO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 848w, https://substackcdn.com/image/fetch/$s_!6ZwO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 1272w, https://substackcdn.com/image/fetch/$s_!6ZwO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6ZwO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png" width="526" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:526,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6ZwO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 424w, https://substackcdn.com/image/fetch/$s_!6ZwO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 848w, https://substackcdn.com/image/fetch/$s_!6ZwO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 1272w, https://substackcdn.com/image/fetch/$s_!6ZwO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bd502d2-9923-4e22-a0e2-d6dfdbf22ce1_526x528.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>A simple implementation following this approach transfers the whole checkpoint at once across the network to remote GPU memory. As checkpoints can be quite large, this implementation requires the destination machine to set aside a large amount of GPU memory just for receiving checkpoints (or risk hitting out of memory errors). Instead, the paper proposes splitting up a checkpoint into partitions, then incrementally transferring them over the network.</p><p>The authors also describe an approach to <em>online profiling</em> where the dynamics of network traffic are learned over time and then eventually feed into the decision making for sending traffic over the network. By combining this idea with checkpoint partitioning, Gemini is able to make decisions about when to send buffers over the network.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VbPi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VbPi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 424w, https://substackcdn.com/image/fetch/$s_!VbPi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 848w, https://substackcdn.com/image/fetch/$s_!VbPi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 1272w, https://substackcdn.com/image/fetch/$s_!VbPi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VbPi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png" width="523" height="928" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:928,&quot;width&quot;:523,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VbPi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 424w, https://substackcdn.com/image/fetch/$s_!VbPi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 848w, https://substackcdn.com/image/fetch/$s_!VbPi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 1272w, https://substackcdn.com/image/fetch/$s_!VbPi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5116f332-4b8c-44b4-a316-7302885eeca3_523x928.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h3>Resuming from failure</h3><p>The authors describe two different failure types: <em>software failure</em> (e.g. the code running training has a bug) and <em>hardware failure</em> (e.g. the machine loses network connectivity or a hard drive breaks). Critically, GEMINI treats the two types differently because of the impact they have on the in-memory data used to restore from checkpoint - a software failure can likely recover from checkpoint stored in memory, while a hardware failure often requires a combination of machine replacement and fetching checkpoints from a different computer in the network.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHfM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHfM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 424w, https://substackcdn.com/image/fetch/$s_!yHfM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 848w, https://substackcdn.com/image/fetch/$s_!yHfM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 1272w, https://substackcdn.com/image/fetch/$s_!yHfM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHfM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png" width="1091" height="323" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:323,&quot;width&quot;:1091,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHfM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 424w, https://substackcdn.com/image/fetch/$s_!yHfM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 848w, https://substackcdn.com/image/fetch/$s_!yHfM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 1272w, https://substackcdn.com/image/fetch/$s_!yHfM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F391dc3ba-c8ca-4180-8f5c-bcc1d2052879_1091x323.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>How is the research evaluated?</h2><p>The research evaluates Gemini&#8217;s impact on training efficiency, and effectiveness in traffic interleaving. Additionally, the authors make projections around the system&#8217;s scalability and impact it could have in training large language models.</p><p>For training efficiency, the paper measures whether Gemini changes training time, wasted time, and checkpoint time. When training three large models, the paper finds that Gemini doesn&#8217;t increase iteration time (time where the model is doing work before it must pause to communicate) while significantly reducing wasted time in the presence of machine failures. Checkpoint time also goes down when compared to existing checkpoint-based training solutions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PPrL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PPrL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 424w, https://substackcdn.com/image/fetch/$s_!PPrL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 848w, https://substackcdn.com/image/fetch/$s_!PPrL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 1272w, https://substackcdn.com/image/fetch/$s_!PPrL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PPrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png" width="357" height="306" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:306,&quot;width&quot;:357,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PPrL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 424w, https://substackcdn.com/image/fetch/$s_!PPrL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 848w, https://substackcdn.com/image/fetch/$s_!PPrL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 1272w, https://substackcdn.com/image/fetch/$s_!PPrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ee30eb-58d0-4ed3-a478-ef474af269eb_357x306.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Yr-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Yr-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 424w, https://substackcdn.com/image/fetch/$s_!8Yr-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 848w, https://substackcdn.com/image/fetch/$s_!8Yr-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 1272w, https://substackcdn.com/image/fetch/$s_!8Yr-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Yr-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png" width="348" height="327" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a520226-e636-4f33-92c6-f2186c41db35_348x327.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:327,&quot;width&quot;:348,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Yr-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 424w, https://substackcdn.com/image/fetch/$s_!8Yr-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 848w, https://substackcdn.com/image/fetch/$s_!8Yr-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 1272w, https://substackcdn.com/image/fetch/$s_!8Yr-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a520226-e636-4f33-92c6-f2186c41db35_348x327.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The paper also measures the effectiveness of traffic interleaving (specifically by tracking iteration time), comparing the Gemini approach against existing baselines and other approaches (e.g. the naive implementation without checkpoint partitioning) - the Gemini solution doesn&#8217;t result in out of memory issues while keeping the iteration time the same and being able to recover from failure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xm9F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xm9F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 424w, https://substackcdn.com/image/fetch/$s_!Xm9F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 848w, https://substackcdn.com/image/fetch/$s_!Xm9F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 1272w, https://substackcdn.com/image/fetch/$s_!Xm9F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xm9F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png" width="525" height="343" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:343,&quot;width&quot;:525,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xm9F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 424w, https://substackcdn.com/image/fetch/$s_!Xm9F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 848w, https://substackcdn.com/image/fetch/$s_!Xm9F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 1272w, https://substackcdn.com/image/fetch/$s_!Xm9F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F533e8853-8e1a-4496-9e53-1ef418a98307_525x343.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, the research contains projections about Gemini&#8217;s ability to reduce wasted time if the system was applied to training a large language model - while the results of this projection seem promising, it seems like there is more work to gather the effectiveness of Gemini at scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KPo1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KPo1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 424w, https://substackcdn.com/image/fetch/$s_!KPo1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 848w, https://substackcdn.com/image/fetch/$s_!KPo1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 1272w, https://substackcdn.com/image/fetch/$s_!KPo1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KPo1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png" width="522" height="270" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:270,&quot;width&quot;:522,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KPo1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 424w, https://substackcdn.com/image/fetch/$s_!KPo1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 848w, https://substackcdn.com/image/fetch/$s_!KPo1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 1272w, https://substackcdn.com/image/fetch/$s_!KPo1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf63cd2f-6b16-4e8c-a625-e2bf34223d93_522x270.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Conclusion</h2><p>Gemini is a system that could potentially dramatically reduce wasted time in training AI models - as models continue to grow and use more resources in a distributed setting, recovering from failure will become even more of a concern than it already is.</p><p>One of my main takeaways from the Gemini paper is around the application of systems ideas to AI models and their training. For example, Gemini takes advantage of common patterns like reliance on a distributed key-value store, leader election, and a multi-tier memory system. The idea that adopting well-known patterns could lead to dramatic performance and reliability improvements in this new type of serving system is quite exciting - it means there is a lot of low hanging fruit!</p><p>I&#8217;m looking forward to further developments in this space, and hope to see a followup paper from the authors soon with more data on training a model at scale (or alternatively a reference to using Gemini-like techniques from other organizations).</p>]]></content:encoded></item><item><title><![CDATA[[Paper Review] XFaaS: Hyperscale and Low Cost Serverless Functions at Meta]]></title><description><![CDATA[Hi everyone,]]></description><link>https://newsletter.micahlerner.com/p/paper-review-xfaas-hyperscale-and</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/paper-review-xfaas-hyperscale-and</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Wed, 24 Jan 2024 00:44:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7CY5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hi everyone,</em></p><p><em>Micah here. This week&#8217;s paper is one of several papers I&#8217;ll be reading from 2023&#8217;s Symposium on Operating Systems Principles (SOSP). </em></p><p><em>Enjoy!</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Just arriving? Join my growing community of engineers to receive 1 email per week with a deep dive into cutting-edge Computer Science research.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><a href="https://www.micahlerner.com/assets/papers/xfaas.pdf">&#8220;XFaaS: Hyperscale and Low Cost Serverless Functions at Meta&#8221;</a></p><h2>Background</h2><p>Function-as-a-Service systems (a.k.a. FaaS) allow engineers to run code without setting aside servers to a specific function. Instead, users of FaaS systems run their code on generalized infrastructure (like <a href="https://aws.amazon.com/lambda/">AWS Lambda</a>, <a href="https://azure.microsoft.com/en-us/products/functions">Azure Functions</a>, and <a href="https://cloud.google.com/functions">GCP&#8217;s Cloud Functions</a>), and only pay for the time that they use.</p><h2>Key Takeaways</h2><p>This paper describes Meta&#8217;s internal system for serverless, called XFaaS, which runs &#8220;trillions of function calls per day on more than 100,000 servers&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pdPB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pdPB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 424w, https://substackcdn.com/image/fetch/$s_!pdPB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 848w, https://substackcdn.com/image/fetch/$s_!pdPB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 1272w, https://substackcdn.com/image/fetch/$s_!pdPB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pdPB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png" width="456" height="235" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:235,&quot;width&quot;:456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pdPB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 424w, https://substackcdn.com/image/fetch/$s_!pdPB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 848w, https://substackcdn.com/image/fetch/$s_!pdPB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 1272w, https://substackcdn.com/image/fetch/$s_!pdPB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a0edb46-426e-4b85-befe-187e8efb6c58_456x235.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Besides characterization of this unique at-scale serverless system, the paper dives deeper on several challenges that they authors addressed before reaching the current state of the infrastructure:</p><ul><li><p>Handling load spikes from Meta-internal systems scheduling large numbers of function executions.</p></li><li><p>Ensuring fast function startup and execution, which can impact the developer experience and decrease resource utilization.</p></li><li><p>Global load balancing across Meta&#8217;s distributed private cloud, avoiding datacenter overload.</p></li><li><p>Ensuring high-utilization of resources to limit cost increases from running the system.</p></li><li><p>Preventing overload of downstream services, as functions often access or update data via RPC requests when performing computation.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JsSh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JsSh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 424w, https://substackcdn.com/image/fetch/$s_!JsSh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 848w, https://substackcdn.com/image/fetch/$s_!JsSh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 1272w, https://substackcdn.com/image/fetch/$s_!JsSh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JsSh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png" width="463" height="285" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:285,&quot;width&quot;:463,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JsSh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 424w, https://substackcdn.com/image/fetch/$s_!JsSh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 848w, https://substackcdn.com/image/fetch/$s_!JsSh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 1272w, https://substackcdn.com/image/fetch/$s_!JsSh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450ad1b8-5344-4a9e-af48-bf535532d8ee_463x285.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hQOr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hQOr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 424w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 848w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1272w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hQOr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png" width="467" height="281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fccb9918-f331-4185-ad71-4e894b96e64c_467x281.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:281,&quot;width&quot;:467,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hQOr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 424w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 848w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1272w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>How does the system work?</h2><h3>Architecture</h3><p>The multi-region infrastructure of XFaaS contains five main components: <em>Submitter</em>, <em>load balancers</em>, <em>DurableQ</em>, <em>Scheduler</em>, and <em>Worker Pool</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hQOr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hQOr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 424w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 848w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1272w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hQOr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png" width="467" height="281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fccb9918-f331-4185-ad71-4e894b96e64c_467x281.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:281,&quot;width&quot;:467,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hQOr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 424w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 848w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1272w, https://substackcdn.com/image/fetch/$s_!hQOr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffccb9918-f331-4185-ad71-4e894b96e64c_467x281.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Clients of the system schedule function execution by communicating with the <em>Submitter</em>. Functions can take one of three types:</p><blockquote><p>(1) queue-triggered functions, which are submitted via a queue service; (2) event-triggered functions, which are activated by data-change events in our data warehouse and data-stream systems; and (3) timer-triggered functions, which automatically fire based on a pre-set timing.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qrlz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qrlz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 424w, https://substackcdn.com/image/fetch/$s_!Qrlz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 848w, https://substackcdn.com/image/fetch/$s_!Qrlz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 1272w, https://substackcdn.com/image/fetch/$s_!Qrlz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qrlz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png" width="439" height="136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:136,&quot;width&quot;:439,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qrlz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 424w, https://substackcdn.com/image/fetch/$s_!Qrlz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 848w, https://substackcdn.com/image/fetch/$s_!Qrlz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 1272w, https://substackcdn.com/image/fetch/$s_!Qrlz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d264111-60e3-4e6c-b30f-68d2dd4e067d_439x136.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The <em>submitter</em> is an interesting design choice because it serves as an entry point to downstream parts of the system. Before the pattern was introduced, clients interfaced with downstream components of the system directly, allowing badly behaved services to overload XFaaS - now, clients receive default quota, and the system throttles those that exceed this quota (although there is a process for negotiating higher quota as needed).</p><p>The next stage in the request flow is forwarding the initial function execution request to a load balancer (<em>Queue Load Balancers (QueueLB)</em>) sitting in front of durable storage (called <em>DurableQ</em>) that contains metadata about the function. The QueueLB is one usage of XFaaS&#8217; usage of load balancers, and ensures effective utilization of distributed system resources while preventing overload.</p><p>Once the information about a function is stored in a <em>DurableQ</em>, a <em>scheduler</em> will eventually attempt to run it - given that there are many clients of XFaaS, the scheduler, &#8220;determine(s) the order of function calls based on their criticality, execution deadline, and capacity quota&#8221;. This ordering is represented with in-memory datastructures called the <em>FuncBuffer</em> and the <em>RunQ</em> - &#8220;the inputs to the scheduler are multiple FuncBuffers (function buffers), one for each function, and the output is a single ordered RunQ (run queue) of function calls that will be dispatched for execution.&#8221;</p><p>To assist with load-balancing computation, a scheduler can also choose to run functions from a different region if there aren&#8217;t enough functions to run in the local-region - this decision is based on a &#8220;traffic matrix&#8221; that XFaaS computes to represent how much load a region should externally source (e.g. Region A should source functions from Regions B, C, and D because they&#8217;re under relatively higher load).</p><p>Once the scheduler determines that there is sufficient capacity to run more functions, it assigns the execution to a <em>WorkerPool</em> using a load-balancer approach similar to the <em>QueueLB</em> mentioned earlier.</p><p>Given the large numbers of different functions in the system, one challenge with reaching high worker utilization is reducing the memory and CPU resources that workers spend on loading function data and code. XFaaS addresses this constraint by implementing <em>Locality Groups</em> that limit a function&#8217;s execution to a subset of the larger pool.</p><h3>Performance Optimizations</h3><p>The paper mentions two other optimizations to increase worker utilization: <em>time-shifted computing</em> and <em>cooperative JIT compilation</em>.</p><p><em>Time-shifted computing</em> introduces flexibility to when a function executes - for example, rather than specififying &#8220;this function must execute immediately&#8221;, XFaaS can delay the computation to a time when other functions aren&#8217;t executing, smoothing resource utilization. Importantly, users of the system are incentivized to take advantage of this flexibility as functions have two different quotas, <em>reserved</em> and <em>opportunistic</em> (mapping to more or less rigid timing where <em>opportunistic</em> quota is internally treated as &#8220;cheaper&#8221;).</p><p>Additionally, the code in Meta&#8217;s infrastructure takes advantage of <a href="https://blog.acolyer.org/2018/08/08/hhvm-jit-a-profile-guided-region-based-compiler-for-php-and-hack/">profiling-guided optimization</a>, a technique that can dramatically improve performance. XFaaS ensures that these performance optimizations computed on one worker benefit other workers in the fleet by shipping the optimized code across the network.</p><h3>Preventing Overload</h3><p>It is critical that accessing downstream services don&#8217;t cause or worsen overload - an idea very similar to what was discussed in a previous paper review on <a href="https://www.micahlerner.com/2022/07/11/metastable-failures-in-the-wild.html">Metastable Failures in the Wild</a>. XFaaS implements this by borrowing the idea of <em><a href="https://medium.com/@jayphelps/backpressure-explained-the-flow-of-data-through-software-2350b3e77ce7">backpressure</a></em> from TCP (specifically <a href="https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease">Additive increase/multiplicative decrease</a>) and other distributed systems.</p><h2>How is the research evaluated?</h2><p>The paper evaluates the system&#8217;s ability to achieve high utilization, efficiently execute functions while taking advantage of performance improvements, and prevent overload of downstream services.</p><p>To evaluate XFaaS&#8217;s ability to maintain high utilization and smooth load, the authors compare the rate of incoming requests to the load of the system - &#8220;the peak-to-trough ratio of CPU utilization is only 1.4x, which is a significant improvement over the peak-to-trough ratio of 4.3x depicted&#8230;for the <em>Received</em> curve.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7CY5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7CY5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 424w, https://substackcdn.com/image/fetch/$s_!7CY5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 848w, https://substackcdn.com/image/fetch/$s_!7CY5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 1272w, https://substackcdn.com/image/fetch/$s_!7CY5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7CY5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png" width="460" height="383" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:383,&quot;width&quot;:460,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7CY5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 424w, https://substackcdn.com/image/fetch/$s_!7CY5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 848w, https://substackcdn.com/image/fetch/$s_!7CY5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 1272w, https://substackcdn.com/image/fetch/$s_!7CY5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6477659f-7256-499d-a43b-7272a0e7a4f4_460x383.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>One reason for consistently high load is the incentive to allow flexibility in the execution of their functions, highlighted by usage of the two quota-types described by the paper.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1WfX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1WfX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 424w, https://substackcdn.com/image/fetch/$s_!1WfX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 848w, https://substackcdn.com/image/fetch/$s_!1WfX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 1272w, https://substackcdn.com/image/fetch/$s_!1WfX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1WfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png" width="469" height="387" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:387,&quot;width&quot;:469,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1WfX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 424w, https://substackcdn.com/image/fetch/$s_!1WfX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 848w, https://substackcdn.com/image/fetch/$s_!1WfX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 1272w, https://substackcdn.com/image/fetch/$s_!1WfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07e1ed26-9f0c-4502-8047-d6645e323dbd_469x387.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>To determine the effectiveness of assigning a subset of functions to a worker using <em>Locality Groups</em>, the authors share time series data on the number of functions executed by workers and memory utiliation across the fleet, finding that both stay relatively constant.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2EEP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2EEP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 424w, https://substackcdn.com/image/fetch/$s_!2EEP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 848w, https://substackcdn.com/image/fetch/$s_!2EEP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 1272w, https://substackcdn.com/image/fetch/$s_!2EEP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2EEP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png" width="467" height="327" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:327,&quot;width&quot;:467,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2EEP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 424w, https://substackcdn.com/image/fetch/$s_!2EEP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 848w, https://substackcdn.com/image/fetch/$s_!2EEP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 1272w, https://substackcdn.com/image/fetch/$s_!2EEP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754284d-ef32-498e-9486-28e1fe385e8b_467x327.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z8n0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z8n0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 424w, https://substackcdn.com/image/fetch/$s_!z8n0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 848w, https://substackcdn.com/image/fetch/$s_!z8n0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 1272w, https://substackcdn.com/image/fetch/$s_!z8n0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z8n0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png" width="457" height="242" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:242,&quot;width&quot;:457,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z8n0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 424w, https://substackcdn.com/image/fetch/$s_!z8n0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 848w, https://substackcdn.com/image/fetch/$s_!z8n0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 1272w, https://substackcdn.com/image/fetch/$s_!z8n0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fa29a5b-fdc1-43da-83b1-e14e6efc45a3_457x242.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Furthermore, XFaaS&#8217; performance optimizations allow it to maintain a relatively high throughput, visible from contrasting requests per-second with and without profile-guided optimizations in place.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DB0Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DB0Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 424w, https://substackcdn.com/image/fetch/$s_!DB0Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 848w, https://substackcdn.com/image/fetch/$s_!DB0Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 1272w, https://substackcdn.com/image/fetch/$s_!DB0Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DB0Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png" width="458" height="363" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:363,&quot;width&quot;:458,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DB0Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 424w, https://substackcdn.com/image/fetch/$s_!DB0Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 848w, https://substackcdn.com/image/fetch/$s_!DB0Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 1272w, https://substackcdn.com/image/fetch/$s_!DB0Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6542cc1c-f987-47c4-a738-5c50915824f2_458x363.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, the paper presents how XFaaS execution behaves in response to issues with downstream systems (specifically, not exacerabating outages). For example, when there were outage in Meta&#8217;s graph database (TAO, the subject of a previous paper review), or infrastructure related to it, XFaaS reduced the execution of functions accessing these services.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sxfr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sxfr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 424w, https://substackcdn.com/image/fetch/$s_!sxfr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 848w, https://substackcdn.com/image/fetch/$s_!sxfr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 1272w, https://substackcdn.com/image/fetch/$s_!sxfr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sxfr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png" width="472" height="470" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:470,&quot;width&quot;:472,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sxfr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 424w, https://substackcdn.com/image/fetch/$s_!sxfr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 848w, https://substackcdn.com/image/fetch/$s_!sxfr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 1272w, https://substackcdn.com/image/fetch/$s_!sxfr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab74a5ad-e0c4-4306-8cff-a6a4e846ae63_472x470.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eBAW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eBAW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 424w, https://substackcdn.com/image/fetch/$s_!eBAW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 848w, https://substackcdn.com/image/fetch/$s_!eBAW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 1272w, https://substackcdn.com/image/fetch/$s_!eBAW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eBAW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png" width="463" height="369" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30699e63-c5e5-4675-ae51-471a66796254_463x369.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:369,&quot;width&quot;:463,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eBAW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 424w, https://substackcdn.com/image/fetch/$s_!eBAW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 848w, https://substackcdn.com/image/fetch/$s_!eBAW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 1272w, https://substackcdn.com/image/fetch/$s_!eBAW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30699e63-c5e5-4675-ae51-471a66796254_463x369.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Conclusion</h2><p>The XFaaS paper is unique in characterizing a serverless system running at immmense scale. While previous research has touched on this topic, none have provided specific numbers of utilization, likely omitted due to privacy or business concerns (although <a href="https://www.usenix.org/conference/atc20/presentation/shahrad">Serverless in the Wild</a> comes close).</p><p>At the same time, the data on XFaaS comes with caveats, as the system is able to make design choices under a different set of constraints than serverless platforms from public cloud providers. For example, public clouds must guarantee isolation between customers and prioritize security considerations. While XFaaS doesn&#8217;t wholly neglect these concerns (e.g. some jobs must run on separate machines and there are some levels of isolation between jobs with these considerations), it otherwise relaxes this constraint. Furthermore, XFaaS explicitly does not handle functions and the path of a user-interaction (even though the paper discusses executing latency-sensitive functions) - this is in contrast with services like Lambda which use Serverless functions to respond to HTTP requests.</p><p>While XFaaS is a fascinating system, the paper left me with several questions including whether many of the functions the system executes would actually be better served with a batch job. Furthermore, the authors allude to XFaaS utilization being significantly higher based on &#8220;anecdotal knowledge&#8221; - while this might be true, it would be useful to know the source of this data to judge whether any differences are in fact meaningful.</p>]]></content:encoded></item><item><title><![CDATA[[Paper Review] Efficient Memory Management for Large Language Model Serving with PagedAttention]]></title><description><![CDATA[Thanks for reading Micah Learns!]]></description><link>https://newsletter.micahlerner.com/p/paper-review-efficient-memory-management</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/paper-review-efficient-memory-management</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Thu, 11 Jan 2024 15:54:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iJdu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Micah Learns! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This is the first of several papers I&#8217;ll be writing about from SOSP&#8217;2023 (Symposium on Operating Systems Principles).</em></p><p><a href="https://dl.acm.org/doi/10.1145/3600006.3613165">Efficient Memory Management for Large Language Model Serving with PagedAttention</a></p><h2>What is the research?</h2><p>Large language models (like OpenAI&#8217;s <a href="https://openai.com/blog/chatgpt">ChatGPT</a>, Google&#8217;s <a href="https://bard.google.com">Bard</a>, Meta&#8217;s <a href="https://ai.meta.com/llama/">Llama</a>, and Mistral&#8217;s <a href="https://mistral.ai/news/mixtral-of-experts/">Mixtral</a>) take in a user prompt and respond with generated text<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. Based on public reports, supporting this functionality is <a href="https://www.businessinsider.com/how-much-chatgpt-costs-openai-to-run-estimate-report-2023-4">expensive</a>, and given the relatively new nature of LLMs deployed at scale, there are opportunities for improving performance.</p><p>To that end, this paper focuses on increasing the queries per second (a.k.a throughput) large language models (LLMs) can serve through two innovations, <em>PagedAttention</em>, and <em>vLLM</em>, both discussed in detail later in this paper review. Improving throughput can significantly decrease the cost of large language model serving by responding to more requests with the same number of GPU resources. The evaluations from the paper show that, &#8220;vLLM improves the LLM serving throughput by 2-4&#215; compared to the state-of-the-art systems&#8230;without affecting the model accuracy at all.&#8221;</p><p>Based on the observation that large language model serving is memory bound, the authors identify several areas of improvement for GPU memory allocation, then design a system that addresses these shortcomings. </p><p>One of the foremost problems they address is static allocation of memory. Existing LLM serving systems (or at least publicly-released ones) set aside fixed, contiguous memory to store the data needed to generate a response. If the response to the user is shorter than this fixed size, the resources are inaccessible to use for serving other requests until the original request is complete. </p><p>Requiring contiguous memory blocks also increases resource waste by &#8220;stranding&#8221; memory between the contiguously allocated areas of memory, causing it become unusable for serving other requests.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iJdu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iJdu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 424w, https://substackcdn.com/image/fetch/$s_!iJdu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 848w, https://substackcdn.com/image/fetch/$s_!iJdu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 1272w, https://substackcdn.com/image/fetch/$s_!iJdu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iJdu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png" width="420" height="392" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:420,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iJdu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 424w, https://substackcdn.com/image/fetch/$s_!iJdu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 848w, https://substackcdn.com/image/fetch/$s_!iJdu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 1272w, https://substackcdn.com/image/fetch/$s_!iJdu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721174b0-9d33-47cc-b7bb-6b351e225e6a_420x392.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Borrowing ideas a page from virtual memory, the authors propose a solution, <em>PagedAttention</em>, that can <em>dynamically</em> grow the memory used in LLM serving (in addition to incorporating other optimizations). The paper also describes how <em>PagedAttention</em> is implemented in a new open-source LLM serving library called <a href="https://github.com/vllm-project/vllm">vLLM</a>.</p><h2>How does the system work?</h2><p>Large language models take in a prompt from a user, then generate a text response. The paper focuses specifically on improving the performance of serving for transformers<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, a technology used by predominantly all implementations of large language models to generate the next word in a sequence.</p><p>Generating these sequences requires information on the users prompt, and about previous tokens in the response - this knowledge takes the form of vectors stored in memory in a data structure the authors call the Key Value Cache (aka <em>KV cache</em>). Because the limiting step in the execution of an LLM depends on reading and writing data to/from memory, an LLM process is &#8220;memory bound&#8221; - as a result, improving memory utilization (specifically, of the <em>KV Cache</em>) can increase performance of the system.</p><p>The authors identify three main types of waste in the <em>KV Cache</em>:</p><blockquote><p><em>reserved slots</em> for future tokens, <em>internal fragmentation</em> due to over-provisioning for potential maximum sequence lengths, and <em>external fragmentation</em> from the memory allocator.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0qRk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0qRk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 424w, https://substackcdn.com/image/fetch/$s_!0qRk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 848w, https://substackcdn.com/image/fetch/$s_!0qRk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 1272w, https://substackcdn.com/image/fetch/$s_!0qRk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0qRk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png" width="413" height="279" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:279,&quot;width&quot;:413,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0qRk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 424w, https://substackcdn.com/image/fetch/$s_!0qRk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 848w, https://substackcdn.com/image/fetch/$s_!0qRk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 1272w, https://substackcdn.com/image/fetch/$s_!0qRk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a95095-a869-4ea6-9b4b-c4802a363f8b_413x279.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uTCr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uTCr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 424w, https://substackcdn.com/image/fetch/$s_!uTCr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 848w, https://substackcdn.com/image/fetch/$s_!uTCr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 1272w, https://substackcdn.com/image/fetch/$s_!uTCr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uTCr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png" width="880" height="198" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:198,&quot;width&quot;:880,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uTCr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 424w, https://substackcdn.com/image/fetch/$s_!uTCr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 848w, https://substackcdn.com/image/fetch/$s_!uTCr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 1272w, https://substackcdn.com/image/fetch/$s_!uTCr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d06866-886f-4aec-b010-0f3e3d328e12_880x198.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h3>PagedAttention</h3><p>One of the paper&#8217;s key insights is that allowing a model to <em>dynamically</em> scale up its usage of non-contiguous memory can drastically improve memory utilization. The authors propose <em>PagedAttention</em>, which introduces the idea of <em>logical</em> and <em>physical</em> memory blocks for storing data in the <em>KV Cache</em>. This distinction is <a href="https://stackoverflow.com/a/15851473">similar to virtual memory</a> which provides the abstraction of contiguous RAM to a program, even though the data is physically stored in separate areas of RAM.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TYP1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TYP1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 424w, https://substackcdn.com/image/fetch/$s_!TYP1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 848w, https://substackcdn.com/image/fetch/$s_!TYP1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 1272w, https://substackcdn.com/image/fetch/$s_!TYP1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TYP1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png" width="409" height="219" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:219,&quot;width&quot;:409,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TYP1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 424w, https://substackcdn.com/image/fetch/$s_!TYP1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 848w, https://substackcdn.com/image/fetch/$s_!TYP1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 1272w, https://substackcdn.com/image/fetch/$s_!TYP1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e26cad7-da48-4c5a-9222-fcad9ad063af_409x219.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p><em>Blocks</em> contain entries for more than one token, and the system allocates them on demand based on when responding to a user query - for example, the prompt &#8220;Four score and seven years ago our fathers brought forth&#8221; contains ten tokens, causing the allocation of three blocks, each with the space for four entries (the last block allocated because of the prompt is partially filled). Gradually allocating blocks primarily addresses <em>internal fragmentation</em> and <em>reserved</em> memory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LESM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LESM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 424w, https://substackcdn.com/image/fetch/$s_!LESM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 848w, https://substackcdn.com/image/fetch/$s_!LESM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 1272w, https://substackcdn.com/image/fetch/$s_!LESM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LESM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png" width="587" height="342" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:342,&quot;width&quot;:587,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LESM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 424w, https://substackcdn.com/image/fetch/$s_!LESM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 848w, https://substackcdn.com/image/fetch/$s_!LESM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 1272w, https://substackcdn.com/image/fetch/$s_!LESM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe4eee8-b2ea-4e94-afa5-122651c5bc39_587x342.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>As the large language model generates tokens, it references data on previous tokens using a <em>block table</em> storing the mapping between logical blocks for a query and physical GPU DRAM. Critically, this approach allows for the GPU to serve multiple requests at the same time while using non-contiguous memory, addressing concerns like <em>external fragmentation</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CtJd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CtJd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 424w, https://substackcdn.com/image/fetch/$s_!CtJd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 848w, https://substackcdn.com/image/fetch/$s_!CtJd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 1272w, https://substackcdn.com/image/fetch/$s_!CtJd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CtJd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png" width="594" height="330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:330,&quot;width&quot;:594,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CtJd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 424w, https://substackcdn.com/image/fetch/$s_!CtJd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 848w, https://substackcdn.com/image/fetch/$s_!CtJd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 1272w, https://substackcdn.com/image/fetch/$s_!CtJd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F693d5d1f-340a-44cb-b389-ea83c58c6bcb_594x330.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The paper also describes how <em>PagedAttention</em> reduces memory usage in three other large language model serving request patterns - <em>parallel sampling</em>, <em>beam search</em>, and <em>shared prefix</em> <em>prompting</em>.</p><p><em>Parallel sampling</em> involves generating multiple results for a single prompt - this can occur by having the LLM choose a different token, leading to a different branch of response. The implementation follows a <a href="https://stackoverflow.com/questions/628938/what-is-copy-on-write">&#8220;copy-on-write&#8221;</a> pattern that reuse the same data in GPU memory until the branch in output occurs (at which point, the block with the difference is copied to a new location in memory, and execution completes independently for the different branches).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wVcd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wVcd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 424w, https://substackcdn.com/image/fetch/$s_!wVcd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 848w, https://substackcdn.com/image/fetch/$s_!wVcd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 1272w, https://substackcdn.com/image/fetch/$s_!wVcd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wVcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png" width="581" height="308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/900e23f4-564a-443c-969f-c13816395803_581x308.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:308,&quot;width&quot;:581,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wVcd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 424w, https://substackcdn.com/image/fetch/$s_!wVcd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 848w, https://substackcdn.com/image/fetch/$s_!wVcd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 1272w, https://substackcdn.com/image/fetch/$s_!wVcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F900e23f4-564a-443c-969f-c13816395803_581x308.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The paper also describes <em>PagedAttention</em> in the context of <em>beam search</em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, an algorithm for generating possible next states and choosing a &#8220;top-K&#8221; subset to continue with. A <em>beam search</em> implemented with <em>PagedAttention</em> can reuse blocks across multiple search paths, meaning that the process has less memory overhead.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wso5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wso5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 424w, https://substackcdn.com/image/fetch/$s_!wso5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 848w, https://substackcdn.com/image/fetch/$s_!wso5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 1272w, https://substackcdn.com/image/fetch/$s_!wso5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wso5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png" width="528" height="238" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:238,&quot;width&quot;:528,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wso5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 424w, https://substackcdn.com/image/fetch/$s_!wso5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 848w, https://substackcdn.com/image/fetch/$s_!wso5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 1272w, https://substackcdn.com/image/fetch/$s_!wso5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a1264b-e57d-4b6e-9873-9cbe8a35f6c0_528x238.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, the paper discusses PagedAttention&#8217;s impact on prompts with a <em>shared prefix</em> - in many situations, a user of an LLM will provide a separate &#8220;system&#8221; prompt<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> that applies, no matter the details of the task. One example system prompt is, &#8220;you are a helpful agent that only speaks JSON&#8221;. PagedAttention allows the blocks allocated for this part of the prompt to be reused across multiple requests, reducing memory usage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!42TE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!42TE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 424w, https://substackcdn.com/image/fetch/$s_!42TE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 848w, https://substackcdn.com/image/fetch/$s_!42TE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 1272w, https://substackcdn.com/image/fetch/$s_!42TE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!42TE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png" width="615" height="283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:283,&quot;width&quot;:615,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!42TE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 424w, https://substackcdn.com/image/fetch/$s_!42TE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 848w, https://substackcdn.com/image/fetch/$s_!42TE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 1272w, https://substackcdn.com/image/fetch/$s_!42TE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47c9229-5967-4bee-83b4-4988b3ee6800_615x283.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h3>vLLM</h3><p>To deploy PagedAttention in a distributed environment, the paper proposes the vLLM system, containing a <em>scheduler</em> (which chooses which work to run where), the <em>KV Cache Manager</em>, <em>Workers</em> (computers containing GPU hardware), and <em>Block Allocators</em>. I elide the details of this section given that vLLM is an <a href="https://github.com/vllm-project/vllm/tree/d0215a58e78572d91dadafe9d832a2db89b09a13/vllm/core">open source project</a>, and the details of the infrastructure are likely to change.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y0mn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y0mn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 424w, https://substackcdn.com/image/fetch/$s_!y0mn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 848w, https://substackcdn.com/image/fetch/$s_!y0mn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 1272w, https://substackcdn.com/image/fetch/$s_!y0mn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y0mn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png" width="530" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:530,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y0mn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 424w, https://substackcdn.com/image/fetch/$s_!y0mn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 848w, https://substackcdn.com/image/fetch/$s_!y0mn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 1272w, https://substackcdn.com/image/fetch/$s_!y0mn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1605fee-5f1d-45a9-96dd-1b7551120983_530x347.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>That said, there were a few interesting design choices that stuck out to me:</p><ul><li><p>vLLM adopts patterns from <a href="https://arxiv.org/pdf/1909.08053.pdf">Megatron-LM</a>, which details how to train and run transformers at scale across many GPUs while minimizing communication.</p></li><li><p>vLLM implements the <a href="https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server">OpenAI API interface</a>, simplifying developer adoption.</p></li><li><p>vLLM supports higher-level abstractions (via <code>fork</code>, <code>append</code>, and <code>free</code> commands) used to implement approaches like <em>beam search</em>, <em>parallel sampling</em>, and <em>shared prefix</em> - luckily <a href="https://github.com/vllm-project/vllm/blob/937e7b7d7c460c00805ac358a4873ec0653ab2f5/vllm/sequence.py#L212">the code is open source</a> which allows for a deeper dive!</p></li></ul><h2>How is the research evaluated?</h2><p>The paper compares performance of models served with vLLM against other serving systems (e.g. a custom implementation of <a href="https://www.usenix.org/conference/osdi22/presentation/yu">Orca</a>, an LLM-serving system described in research from OSDI 2022) emulating workloads based on open source datasets (<a href="https://sharegpt.com/">ShareGPT</a> and <a href="https://github.com/tatsu-lab/stanford_alpaca">Stanford Alpaca</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Aoqt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Aoqt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 424w, https://substackcdn.com/image/fetch/$s_!Aoqt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 848w, https://substackcdn.com/image/fetch/$s_!Aoqt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 1272w, https://substackcdn.com/image/fetch/$s_!Aoqt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Aoqt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png" width="590" height="312" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:312,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Aoqt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 424w, https://substackcdn.com/image/fetch/$s_!Aoqt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 848w, https://substackcdn.com/image/fetch/$s_!Aoqt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 1272w, https://substackcdn.com/image/fetch/$s_!Aoqt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf481a8a-8f28-4719-9ecb-8aaa63f34ccd_590x312.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The paper compares three different types of tasks - basic sampling (e.g. normal LLM usage), search-based techniques like <em>parallel sampling</em> and <em>beam search</em>, and chatbot-like uses of LLMs (which have longer prompts, along with back and forth between the user and the LLM).</p><p>For <em>basic sampling</em>, <em>parallel sampling</em>, <em>beam search</em>, and chatbot-like workloads, vLLM is able to achieve significantly higher request rates.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XQ7_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XQ7_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 424w, https://substackcdn.com/image/fetch/$s_!XQ7_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 848w, https://substackcdn.com/image/fetch/$s_!XQ7_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 1272w, https://substackcdn.com/image/fetch/$s_!XQ7_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XQ7_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png" width="1210" height="542" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:542,&quot;width&quot;:1210,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XQ7_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 424w, https://substackcdn.com/image/fetch/$s_!XQ7_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 848w, https://substackcdn.com/image/fetch/$s_!XQ7_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 1272w, https://substackcdn.com/image/fetch/$s_!XQ7_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60d398f-79e4-4a88-aa4a-a737f743b069_1210x542.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Um3A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Um3A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 424w, https://substackcdn.com/image/fetch/$s_!Um3A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 848w, https://substackcdn.com/image/fetch/$s_!Um3A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 1272w, https://substackcdn.com/image/fetch/$s_!Um3A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Um3A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png" width="1216" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a564ab97-c763-4397-9c18-cd1b120658db_1216x533.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:1216,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Um3A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 424w, https://substackcdn.com/image/fetch/$s_!Um3A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 848w, https://substackcdn.com/image/fetch/$s_!Um3A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 1272w, https://substackcdn.com/image/fetch/$s_!Um3A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa564ab97-c763-4397-9c18-cd1b120658db_1216x533.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IUvG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IUvG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 424w, https://substackcdn.com/image/fetch/$s_!IUvG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 848w, https://substackcdn.com/image/fetch/$s_!IUvG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 1272w, https://substackcdn.com/image/fetch/$s_!IUvG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IUvG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png" width="490" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:240,&quot;width&quot;:490,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IUvG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 424w, https://substackcdn.com/image/fetch/$s_!IUvG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 848w, https://substackcdn.com/image/fetch/$s_!IUvG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 1272w, https://substackcdn.com/image/fetch/$s_!IUvG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ffe52a-dfeb-4a9a-bf30-177653ecaa58_490x240.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Additionally, vLLM and PagedAttention are able to save significant amounts of memory on tasks where it is possible to re-use blocks (e.g. parallel sampling and beam search) - these graphs show average amount of memory saving as a percent, but it would be interesting to know in absolute terms.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-Npp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-Npp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 424w, https://substackcdn.com/image/fetch/$s_!-Npp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 848w, https://substackcdn.com/image/fetch/$s_!-Npp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 1272w, https://substackcdn.com/image/fetch/$s_!-Npp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-Npp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png" width="602" height="269" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd190ccb-50d9-4622-9e62-838ae8700953_602x269.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:269,&quot;width&quot;:602,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-Npp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 424w, https://substackcdn.com/image/fetch/$s_!-Npp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 848w, https://substackcdn.com/image/fetch/$s_!-Npp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 1272w, https://substackcdn.com/image/fetch/$s_!-Npp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd190ccb-50d9-4622-9e62-838ae8700953_602x269.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Conclusion</h2><p>PagedAttention and vLLM are at the cutting edge of systems research and its application to AI - something that is becoming more of a topic in research and in practice (e.g. <a href="https://charlesfrye.github.io/programming/2023/11/10/llms-systems.html">Charles Frye&#8217;s post</a>) now that LLMs are beginning to operate at scale. I&#8217;m looking forward to following along on the progress of the vLLM open source project, and from digging in, I discovered it is compatible with <a href="https://skypilot.readthedocs.io/en/latest/">SkyPilot</a> (an open source project for deploying infrastructure cross-cloud discussed in research from <a href="https://www.usenix.org/conference/nsdi23/presentation/yang-zongheng">NSDI 2023</a> and a potential future paper review). As I tinker on <a href="https://twitter.com/micahlerner/status/1741989855041843504">LLM-based side-projects</a>, I&#8217;m looking forward to experimenting with and learning from these promising new tools.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>For the purposes of this paper, the authors don&#8217;t include multi-modal response.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>For more background on transformers, I recommend <a href="http://jalammar.github.io/illustrated-transformer/">The Illustrated Transformer</a> and <a href="https://osanseviero.github.io/hackerllama/blog/posts/random_transformer/">Understand how transformers work by demystifying all the math behind them</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The paper cites <a href="https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf">Sequence to Sequence Learning with Neural Networks</a> when referencing beam search, but I think <a href="https://d2l.ai/chapter_recurrent-modern/beam-search.html#id1">this explanation</a> gets the gist across better.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>System promps are also discussed in <a href="https://platform.openai.com/docs/guides/prompt-engineering">OpenAI&#8217;s documentation on prompt engineering</a>.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[[Paper Review] Blueprint: A Toolchain for Highly-Reconfigurable Microservice Applications]]></title><description><![CDATA[Thanks for reading Micah Learns!]]></description><link>https://newsletter.micahlerner.com/p/paper-review-blueprint-a-toolchain</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/paper-review-blueprint-a-toolchain</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Wed, 03 Jan 2024 00:26:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VTGW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Micah Learns! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This is the first of several papers I&#8217;ll be writing about from SOSP&#8217;2023 (Symposium on Operating Systems Principles).</p><p><a href="https://dl.acm.org/doi/10.1145/3600006.3613138">Blueprint: A Toolchain for Highly-Reconfigurable Microservice Applications</a></p><h2>What is the research?</h2><p>The Blueprint paper talks about a new framework for configuring, building, and deploying application code. This framework aims to simplify iteration on system design, application development, and configuration.</p><p>The authors argue that these tasks are currently difficult to accomplish because many services have tight coupling between application code, framework-level components (like RPC libraries and their behavior), and the actual deployment of the service (e.g. with Docker, Kubernetes, or other systems like Ansible).</p><p>By explicitly separating concerns of an application, and explicitly defining their interactions in a programmatic configuration, the authors are able to test out new configurations of a system - for example, quickly reconfiguring a set of independently deployed microservices into a single monolithic binary, then measuring the performance impact of the change.</p><h2>How does the system work?</h2><p>Blueprint&#8217;s approach divides a system into three types of components:</p><ul><li><p>Application level workflows: business logic that a developer writes to perform a specific function.</p></li><li><p>Scaffolding: underlying framework-level components like RPC functionality, distributed tracing libraries, and storage backends (like caches and databases).</p></li><li><p>Instantiations: specific configuration for framework-level components (e.g. using a specific RPC library with deadlines set or with novel functionality like <a href="https://martinfowler.com/bliki/CircuitBreaker.html">circuit-breakers</a> enabled.</p></li></ul><p>A system is described in a programmatic configuration called a <em>workflow spec</em> which contains application logic and its external interface.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VTGW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VTGW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 424w, https://substackcdn.com/image/fetch/$s_!VTGW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 848w, https://substackcdn.com/image/fetch/$s_!VTGW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 1272w, https://substackcdn.com/image/fetch/$s_!VTGW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VTGW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png" width="550" height="373" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80086480-0f84-4c00-966c-fc900e351336_550x373.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:373,&quot;width&quot;:550,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VTGW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 424w, https://substackcdn.com/image/fetch/$s_!VTGW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 848w, https://substackcdn.com/image/fetch/$s_!VTGW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 1272w, https://substackcdn.com/image/fetch/$s_!VTGW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80086480-0f84-4c00-966c-fc900e351336_550x373.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_OOo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_OOo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 424w, https://substackcdn.com/image/fetch/$s_!_OOo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 848w, https://substackcdn.com/image/fetch/$s_!_OOo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 1272w, https://substackcdn.com/image/fetch/$s_!_OOo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_OOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png" width="551" height="117" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:117,&quot;width&quot;:551,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_OOo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 424w, https://substackcdn.com/image/fetch/$s_!_OOo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 848w, https://substackcdn.com/image/fetch/$s_!_OOo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 1272w, https://substackcdn.com/image/fetch/$s_!_OOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b336c-21c4-449e-98bc-0536b449cf3a_551x117.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Next, a user of Blueprint creates a <em>wiring spec</em> that encode the relationship between pieces of application code and framework-level components. In one example, the authors recreate a simple microservice for posting on a social network, including connection to external caches and databases.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Dr8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Dr8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 424w, https://substackcdn.com/image/fetch/$s_!8Dr8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 848w, https://substackcdn.com/image/fetch/$s_!8Dr8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 1272w, https://substackcdn.com/image/fetch/$s_!8Dr8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Dr8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png" width="543" height="353" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:353,&quot;width&quot;:543,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Dr8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 424w, https://substackcdn.com/image/fetch/$s_!8Dr8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 848w, https://substackcdn.com/image/fetch/$s_!8Dr8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 1272w, https://substackcdn.com/image/fetch/$s_!8Dr8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8a2b735-4efa-4871-a4b3-cb3dcc00e468_543x353.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Blueprint then uses the <em>wiring spec</em> to compile an <em>intermediate representation</em> (an idea <a href="https://cs.lmu.edu/~ray/notes/ir/">common to many compilers</a>) of the system. The intermediate representation is effectively a graph with nodes describing code and edges describing dependencies (e.g. service A calls service B).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ghsf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ghsf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 424w, https://substackcdn.com/image/fetch/$s_!ghsf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 848w, https://substackcdn.com/image/fetch/$s_!ghsf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 1272w, https://substackcdn.com/image/fetch/$s_!ghsf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ghsf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png" width="1084" height="294" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:294,&quot;width&quot;:1084,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ghsf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 424w, https://substackcdn.com/image/fetch/$s_!ghsf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 848w, https://substackcdn.com/image/fetch/$s_!ghsf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 1272w, https://substackcdn.com/image/fetch/$s_!ghsf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43c4bb2-dbd4-4e46-8a79-6134f0117143_1084x294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Lastly, the intermediate representation is used to build concrete artifacts representing the components of the system - for example, the build system can compile the code for a service written in Go and wrap it with a Docker image, enabling later deployments to production.</p><h2>How is the research evaluated?</h2><p>The authors evaluate several research claims about the implementation, but three themes stood out to me:</p><ul><li><p>Does Blueprint make it easier for developers to try new configurations of an system&#8217;s <em>existing</em> components and libraries?</p></li><li><p>Can Blueprint be used to create system configurations that reproduce reliability issues?</p></li><li><p>What are the costs of the abstractions that Blueprint provides?</p></li></ul><p>To evaluate the first question of whether Blueprint makes it easier to try new configurations for a system&#8217;s existing components, the authors considered the lines of code required to enable/disable tracing and to convert a microservice deployment into a monolith.</p><p>They were able to perform the first task of making changes to tracing with 5 lines of code. Similarly, by changing ~10 lines of code in the Blueprint configuration, they were able to generate a monolithic version of an application previously deployed as microservices, then quantify the performance impact of this change.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MorM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MorM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 424w, https://substackcdn.com/image/fetch/$s_!MorM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 848w, https://substackcdn.com/image/fetch/$s_!MorM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 1272w, https://substackcdn.com/image/fetch/$s_!MorM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MorM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png" width="523" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:523,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MorM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 424w, https://substackcdn.com/image/fetch/$s_!MorM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 848w, https://substackcdn.com/image/fetch/$s_!MorM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 1272w, https://substackcdn.com/image/fetch/$s_!MorM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ed2c3b-f554-491e-82ff-f0f2e44114de_523x290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>The authors also used Blueprint to reproduce or create reliability issues in a service - in particular they focused on <a href="https://www.micahlerner.com/2022/07/11/metastable-failures-in-the-wild.html">Metastable failures</a> described in a previous paper review. While creating specific configurations of a system to enable reliability testing is not necessarily a unique feature of Blueprint (e.g. <a href="https://www.usenix.org/publications/loginonline/metastable-failures-wild">Metastable Failures in the Wild</a> discusses replicating metastability), the ease with which the authors performed this analysis was intriguing.</p><p>Lastly, paper analyzes how long it takes for Blueprint to generate systems of different sizes. While many of the examples are based on prototype systems, the authors also ran Blueprint on a system derived from a <a href="https://dl.acm.org/doi/10.1145/3472883.3487003">microservice dataset published by Alibaba</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7wsH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7wsH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 424w, https://substackcdn.com/image/fetch/$s_!7wsH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 848w, https://substackcdn.com/image/fetch/$s_!7wsH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 1272w, https://substackcdn.com/image/fetch/$s_!7wsH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7wsH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png" width="517" height="252" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:252,&quot;width&quot;:517,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7wsH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 424w, https://substackcdn.com/image/fetch/$s_!7wsH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 848w, https://substackcdn.com/image/fetch/$s_!7wsH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 1272w, https://substackcdn.com/image/fetch/$s_!7wsH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aba5027-6c1d-40a4-9ca0-aa46c7a36bc8_517x252.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h2>Conclusion</h2><p>Blueprint&#8217;s idea of separating the concerns involved in an application seems like a promising approach to dramatically increasing the velocity of the software development lifecycle (at least for microservices). One area that seems particularly exciting about Blueprint is the ability to simplify testing different service configurations across infrastructure - for example, rather than rewriting a large body of application code to test out a new tracing library, a developer can simply swap out the code in the Blueprint definition.</p><p>From reading the paper, there are several areas of further research for Blueprint, particularly around production readiness. For example, Blueprint&#8217;s compilation time for a system of ~3000 microservices described in a paper from Alibaba ran for 12 minutes. For organizations that would have many components in their configuration, the cost to run Blueprint would certainly be non-negligible. To speedup compliation of Blueprint, perhaps it would only recompute parts of the system touched by a developer&#8217;s changes.</p><p>Furthermore, adoption via onboarding new systems to Blueprint also seems like a challenge, as developers would need to perform some implementation in order to create the definition of their system - perhaps the team behind Blueprint will expand on tooling that automates this process by reading metadata source from a running system (e.g. traces).</p>]]></content:encoded></item><item><title><![CDATA[2023 and looking forward to 2024]]></title><description><![CDATA[A tradition of mine is to write a year end reflection, regardless of whether it makes it up onto the blog or not - in this case it did :)]]></description><link>https://newsletter.micahlerner.com/p/2023-and-looking-forward-to-2024</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/2023-and-looking-forward-to-2024</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Wed, 27 Dec 2023 21:31:54 GMT</pubDate><content:encoded><![CDATA[<p>A tradition of mine is to write a year end reflection, regardless of whether it makes it up onto the blog or not - in this case it did :)</p><p>2023 year was a great year, and I&#8217;d go so far as to say the best of my life so far.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Micah Learns! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>In my personal life, I celebrated an amazing first year of marriage, and bought a house in San Francisco. While the narrative of SF is that it is in a <a href="https://www.nytimes.com/2023/11/29/us/san-francisco-doom-loop.html">&#8220;doomloop&#8221;</a>, the Bay Area has remained the center of gravity for technology innovation - predominantly all of the innovative AI companies have their headquarters in &#8220;Cerebral Valley&#8221; (notable exception being <a href="https://mistral.ai/">Mistral</a>). There is also a budding (some may even say fully-developed!) community of builders - see <a href="https://cerebralvalley.ai/">Cerebral Valley</a>. I&#8217;m optimistic that on the medium to longer term timescale, San Francisco&#8217;s challenges will be resolved, although I&#8217;ll elide the political details of how&#8230; </p><p>Professionally, I also had amazing opportunities for growth, being a tech lead for a 20+ person team of Googley engineers helping to design and scale innovative new products in Maps (e.g. <a href="https://blog.google/products/maps/google-maps-immersive-view-routes/">immersive route preview</a>, which uses <a href="https://blog.research.google/2023/06/reconstructing-indoor-spaces-with-nerf.html">Neural Radiance Fields</a>). In 2024 I&#8217;ll be presenting about product reliability at industry conferences (e.g. <a href="https://twitter.com/micahlerner/status/1729603161646620797">SRECon</a>), which I&#8217;m quite excited about. While there are critiques of Google and its culture, it is clear that the company 1) continuously deploys serious innovation at scale and 2) users around the world continue to love the company&#8217;s products.</p><p>Balancing my rewarding personal and professional lives impacted my bandwidth for producing content, and I explicitly prioritized the other areas of my life - tradeoffs I embraced enthusiastically. While I wrote fewer paper reviews, I also tried learning-in-public via <a href="https://www.youtube.com/channel/UCPBFHwPU6wuw_tTKmu6TECw">streaming my process</a>. These tweaks seemed to resonate, and the number of subscribers grew quickly.</p><p>I enjoy the process of learning, distilling knowledge, and sharing it with others - wholesale abandoning these activities isn&#8217;t something that I intend to do. That said, given my rewarding personal life and demanding job, I&#8217;ve rethought how I invest my time in my creative pursuits.</p><p>In 2024 and beyond, I&#8217;ll be more intentional about the technical areas that I invest in learning about. I think that some areas of systems research are beginning to provide diminishing returns <em>for me personally -</em> for example, while I&#8217;m far from an expert, I&#8217;m roughly familiar with flavors of distributed databases. As a result, I am not as excited about developments there, meaning I&#8217;m less likely to write a deep dive on a new paper. Furthermore, I don&#8217;t have interest in producing content solely for the goal of maximizing engagement (e.g. &#8220;going full influencer&#8221;).</p><p>Instead, I want to make a concentrated bet by pivoting my focus towards AI and the systems that power these new technologies - AI is here to stay, and will become a larger part of my life. Even without achieving artificial general intelligence, more everyday tools will begin to take advantage of recent rapid innovations. As a software engineer by trade, I can see AI&#8217;s profound ability to impact my work on the horizon (e.g. with the rise of competent large language models capable of answering technical questions and writing code). In the past, going deep on the fundamentals has achieved the twin goals of forwarding my intellectual curiosity while benefiting my career. I&#8217;m very much looking forward to applying this approach, and seeing how this bet pays off down the road.</p><p>Looking forward to a great 2024 and thank you for joining me on this journey!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Micah Learns! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Defcon: Preventing Overload with Graceful Feature Degradation]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-defcon-preventing</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-defcon-preventing</guid><pubDate>Tue, 25 Jul 2023 16:28:20 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers! </em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This week&#8217;s paper is <a href="https://www.micahlerner.com/2023/07/23/defcon-preventing-overload-with-graceful-feature-degradation.html">Defcon: Preventing Overload with Graceful Feature Degradation</a>.</p><p>The Defcon paper talks about how Meta built a system that allows it to turn off features when their products are under heavy load, an approach called &#8220;graceful degradation&#8221;. </p><p>The implementation allows oncallers to respond to near-overload scenarios by gradually disabling non-critical features for the business - for example, it is preferable to disable a &#8220;user status&#8221; feature rather than having all of Instagram go down. </p><p>Defcon is also used to understand how much capacity a feature is using - by periodically turning off a feature, it is possible to visualize differences in CPU and memory usage correlated to the feature being turned off (this also closely connects to the ideas in Flux, another recently published Meta paper on capacity management).</p><p><strong><a href="https://www.micahlerner.com/2023/07/23/defcon-preventing-overload-with-graceful-feature-degradation.html">The paper review is best enjoyed on my blog.</a></strong></p><p>Discussion on:</p><ul><li><p><strong>Hacker News: <a href="https://news.ycombinator.com/item?id=36864764">https://news.ycombinator.com/item?id=36864764</a></strong></p></li></ul><ul><li><p><strong>Twitter</strong>: </p><p><a href="https://twitter.com/micahlerner/status/1683624900349808640?s=46&amp;t=z3snZ-IbsQlkLB6BDxcgDg">https://twitter.com/micahlerner/status/1683624900349808640</a></p></li></ul><p>Until next time,</p><p>Micah</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - live tomorrow!]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-live-tomorrow</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-live-tomorrow</guid><pubDate>Wed, 05 Jul 2023 22:43:58 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers! </em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Inspired by a discussion on Twitter, I'm going to stream reading papers!</p><p>I&#8217;ll start by catching up on papers from NSDI - to receive notifications when we go live, <a href="https://www.youtube.com/watch?v=IeNt0Zs2Lus">follow this link</a> to the Youtube channel.</p><p>The first stream is tomorrow at 5:00PM PST! See how that maps to your timezone <a href="https://www.worldtimebuddy.com/?qm=1&amp;lid=5391959,5128581,3435910,2643743&amp;h=5391959&amp;date=2023-7-6&amp;sln=17-20&amp;hf=1">here</a>.</p><p>Until next time,</p><p>Micah</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-towards-an-adaptable</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-towards-an-adaptable</guid><pubDate>Thu, 29 Jun 2023 17:36:39 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers! </em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This week&#8217;s paper is <a href="https://www.micahlerner.com/2023/06/29/towards-an-adaptable-systems-architecture-for-memory-tiering-at-warehouse-scale.html">Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale</a>.</p><p>Applications running in datacenter environments require resources to operate, like dynamic random access memory. DRAM is both expensive and in high demand. As alternatives to DRAM emerge, new approaches to trading off performance for cost became available to datacenter applications.</p><p>To use this new resource type while limiting the impact to application performance, the authors proposed, built, and deployed at scale a new system called <em>Transparent Memory Tiering System (TMTS)</em>.</p><p>When deployed at scale, <em>TMTS</em> replaced 25% of DRAM with lower cost solutions, while incurring little performance impact to the vast majority of applications!</p><p><strong><a href="https://www.micahlerner.com/2023/06/29/towards-an-adaptable-systems-architecture-for-memory-tiering-at-warehouse-scale.html">The paper review is best enjoyed on my blog.</a></strong></p><p>Discussion on <a href="https://news.ycombinator.com/item?id=36523762">Hacker News</a> and <a href="https://twitter.com/micahlerner/status/1674471642695864320">Twitter</a>.</p><p>Until next time,</p><p>Micah</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-telamalloc-efficient</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-telamalloc-efficient</guid><pubDate>Wed, 07 Jun 2023 00:21:29 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers! </em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This week&#8217;s paper is <a href="https://www.micahlerner.com/2023/04/16/telamalloc-efficient-on-chip-memory-allocation-for-production-machine-learning-accelerators.html">TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators</a>.</p><p>Running ML models requires resources like memory and CPU. Efficiently allocating these resources (in particular memory) to run models in datacenters is an extensively researched  area. </p><p>Unfortunately, solving the memory allocation problem for mobile devices is more difficult - there is a wide spectrum of capabilities between the newest Pixel phones and lower powered Android devices, and previous implementations struggle with this variety. </p><p>To handle this challenge, Google researchers designed an algorithmic approach to allocate memory with a combination of heuristics and solver-based mechanisms. When deployed to production, the system achieved dramatic improvements to user-facing latency.</p><p><strong><a href="https://www.micahlerner.com/2023/04/16/telamalloc-efficient-on-chip-memory-allocation-for-production-machine-learning-accelerators.html">The paper review is best enjoyed on my blog.</a></strong></p><p>Discussion on <a href="https://news.ycombinator.com/item?id=36221010">Hacker News</a> and <a href="https://twitter.com/micahlerner/status/1666238270764896258?s=20">Twitter</a>.</p><p>Until next time,</p><p>Micah</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-perseus-a-fail-slow</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-perseus-a-fail-slow</guid><pubDate>Tue, 18 Apr 2023 18:12:10 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers! </em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>This week&#8217;s paper is <a href="https://www.micahlerner.com/2023/04/16/perseus-a-fail-slow-detection-framework-for-cloud-storage-systems.html">Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems</a>.</p><p>The Perseus paper won a best paper award at FAST 2023 (File and Storage Technologies) and describes a system for detecting <em>fail-slow</em> instances in Alibaba storage clusters - <em>fail-slow</em> is a failure mode in which hardware fails non-obviously, potentially by consistently degrading performance over time. At scale, this category of problem is extremely difficult to detect and can dramatically impact tail latency.</p><p><strong><a href="https://www.micahlerner.com/2023/04/16/perseus-a-fail-slow-detection-framework-for-cloud-storage-systems.html">The paper review is best enjoyed on my blog.</a></strong></p><p>Until next time,</p><p>Micah</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Ambry: LinkedIn’s Scalable Geo-Distributed Object Store]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-ambry-linkedins-scalable</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-ambry-linkedins-scalable</guid><pubDate>Tue, 28 Mar 2023 19:53:26 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers! </em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>This week&#8217;s paper is <a href="https://www.micahlerner.com/2023/03/28/ambry-linkedins-scalable-geo-distributed-object-store.html">Ambry: LinkedIn&#8217;s Scalable Geo-Distributed Object Store</a>.</p><p>Blob stores are used by companies across industry  to store large objects, like photos and videos. This paper focuses on Ambry from LinkedIn which, unlike other implementations, is open source.</p><p>Ambry aims to provide low latency blob store operations, with high throughput, using globally distributed storage and compute resources. At the time of the paper&#8217;s publication in 2016, LinkedIn had hundreds of millions of users, and served more than 120 TB every day. To reach this scale, the team had to solve several challenges including wide variance in object sizes, rapid growth, and unpredictable read workloads.</p><p><strong><a href="https://www.micahlerner.com/2023/03/28/ambry-linkedins-scalable-geo-distributed-object-store.html">The paper review is best enjoyed on my blog.</a></strong></p><p>Until next time,</p><p>Micah</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Meta’s Next-generation Realtime Monitoring and Analytics Platform]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/welcome-post</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/welcome-post</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Thu, 02 Mar 2023 15:00:26 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers! </em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.</em></p><p></p><p>This week&#8217;s paper is <a href="https://www.micahlerner.com/2023/02/27/metas-next-generation-realtime-monitoring-and-analytics-platform.html">Meta&#8217;s Next-generation Realtime Monitoring and Analytics Platform</a>.</p><p>Many companies perform real time analytics in order to customize user experiences or alert on anomalous behavior, and Meta performed this function using a system called <em>Scuba</em>. While the existing system scaled to meet initial demand, over time operational and usability issues led to an internal push to evolve.</p><p>Beyond discussing the tradeoffs and challenges to the project, the paper provides extensive technical references to the underlying primitives that Kraken is built on (one of which, <a href="https://www.micahlerner.com/2022/01/08/shard-manager-a-generic-shard-management-framework-for-geo-distributed-applications.html">Shard Manager</a>, was a previous paper review), making this a great reference for learning about distributed systems.</p><p><strong><a href="https://www.micahlerner.com/2023/02/27/metas-next-generation-realtime-monitoring-and-analytics-platform.html">The paper review is best enjoyed on my blog.</a></strong></p><p>Until next time,</p><p>Micah</p>]]></content:encoded></item><item><title><![CDATA[CS Conferences in 2023]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/cs-conferences-in-2023</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/cs-conferences-in-2023</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Fri, 03 Feb 2023 17:07:57 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers!</em></p><p><em>This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have any papers you have found interesting.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I put together a preliminary list of conferences on my &#8220;watch list&#8221; for 2023, <strong>accessible on my blog <a href="https://www.micahlerner.com/2023/01/16/conferences-2023.html">here</a></strong>. </p><p>It doesn&#8217;t cover everything and I would love suggestions via email (by responding to this email) or <a href="https://twitter.com/micahlerner">Twitter</a>!</p><p>I&#8217;ve added a few new conferences to the list <a href="https://www.micahlerner.com/2021/12/30/conferences-2022.html">from 2022</a>. The new conferences are mostly related to bioinformatics/biotechnology - I want to branch out and read more papers not from &#8220;my area&#8221; of computer science.</p><p>I also certainly won&#8217;t be able to read every paper from the conferences below, but I am striving to keep with a regular posting schedule like the one I had in 2022!</p><p>Until next time,</p><p>Micah</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Elastic Cloud Services: Scaling Snowflake’s Control Plane]]></title><description><![CDATA[Hello new and old subscribers!]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-elastic-cloud-services</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-elastic-cloud-services</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Thu, 19 Jan 2023 22:59:58 GMT</pubDate><content:encoded><![CDATA[<p><em>Hello new and old subscribers!</em></p><p><em>This is my first post on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have any papers you have found interesting.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>This week&#8217;s paper is<a href="https://www.micahlerner.com/2023/01/19/elastic-cloud-services-scaling-snowflakes-control-plane.html"> </a><strong><a href="https://www.micahlerner.com/2023/01/19/elastic-cloud-services-scaling-snowflakes-control-plane.html">Elastic Cloud Services: Scaling Snowflake&#8217;s Control Plane.</a></strong><a href="https://www.micahlerner.com/2023/01/19/elastic-cloud-services-scaling-snowflakes-control-plane.html"> </a></p><p>Snowflake is a company that makes a globally distributed data ware house product. Their database must reliably respond to business-critical, customer-issued queries, while abstracting away scaling challenges associated with rapidly changing load. </p><p>Behind the scenes, Snowflake must also reduce single points of failures by deploying across multiple cloud providers and the regions within them. These constraints make for an interesting set of technical challenges.</p><p><strong><a href="https://www.micahlerner.com/2023/01/19/elastic-cloud-services-scaling-snowflakes-control-plane.html">The paper review is best enjoyed on my blog.</a></strong></p><p>Until next time,</p><p>Micah</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.micahlerner.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Systems Papers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network]]></title><description><![CDATA[Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google&#8217;s Datacenter Network]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-jupiter-rising-a-decade-22-12-15</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-jupiter-rising-a-decade-22-12-15</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Thu, 15 Dec 2022 17:12:51 GMT</pubDate><content:encoded><![CDATA[<h1>Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google&#8217;s Datacenter Network</h1><p>Hello new and old subscribers!</p><p>This week&#8217;s paper is <em>Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google&#8217;s Datacenter Network</em>.</p><p>The paper discusses the design and evolution of Google&#8217;s datacenter network with a focus on how the network scaled to provide high-speed connectivity and efficient resource allocation under increasing demand.</p><p>The paper also discusses the challenges and limitations of the design and how Google has addressed them over the past decade. Beyond the scale of their deployment, the networks described by the paper continue to influence the design of many modern data center networks (including those of Meta, LinkedIn, Dropbox, and others).</p><p><strong><a href="https://www.micahlerner.com/2022/12/13/jupiter-rising-a-decade-of-clos-topologies-and-centralized-control-in-googles-datacenter-network.html">The paper review is best enjoyed on the blog.</a></strong></p><p><strong><a href="https://news.ycombinator.com/item?id=34002580">Discussion on Hacker News</a></strong></p><p>Until next time,</p><p>Micah</p>]]></content:encoded></item><item><title><![CDATA[Systems Papers - Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web]]></title><description><![CDATA[Hello! I&#8217;m trying out a new format for paper reviews and it would be great to get your feedback - feel free to respond to this email or reach out on Twitter with your thoughts.]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-design-and-evaluation-22-11-01</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-design-and-evaluation-22-11-01</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Tue, 01 Nov 2022 00:46:21 GMT</pubDate><content:encoded><![CDATA[<h1>Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web</h1><p>Hello new and old subscribers!</p><p>This week&#8217;s paper is <em>Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web</em>.</p><p>This paper discusses the open source IPFS project, an implementation of a distributed file store. Originally released in 2015, IPFS is far from a research prototype - the system is deployed at scale, running on hundreds of thousands of machines around the world in order to serve a broad userbase.</p><p>IPFS aims to support the &#8220;decentralized web&#8221;, a growing ecosystem of distributed applications - for example, Elon Musk&#8217;s acquisition of Twitter has created a flock of new users to Mastodon, a federated alternative.</p><p><strong><a href="https://www.micahlerner.com/2022/10/31/design-and-evaluation-of-ipfs-a-storage-layer-for-the-decentralized-web.html">The paper review is best enjoyed on the blog.</a></strong></p><p><strong><a href="https://news.ycombinator.com/item?id=33415496">Discussion on Hacker News</a></strong></p><p>Until next time,</p><p>Micah</p>]]></content:encoded></item><item><title><![CDATA[Systems Papers - SDN in the Stratosphere: Loon’s Aerospace Mesh Network]]></title><description><![CDATA[Hello! I&#8217;m trying out a new format for paper reviews and it would be great to get your feedback - feel free to respond to this email or reach out on Twitter with your thoughts.]]></description><link>https://newsletter.micahlerner.com/p/systems-papers-sdn-in-the-stratosphere-22-10-09</link><guid isPermaLink="false">https://newsletter.micahlerner.com/p/systems-papers-sdn-in-the-stratosphere-22-10-09</guid><dc:creator><![CDATA[Micah Lerner]]></dc:creator><pubDate>Sun, 09 Oct 2022 17:34:56 GMT</pubDate><content:encoded><![CDATA[<h1>SDN in the Stratosphere: Loon&#8217;s Aerospace Mesh Network</h1><p>Hello new and old subscribers!</p><p>This week&#8217;s paper is <em>SDN in the Stratosphere: Loon&#8217;s Aerospace Mesh Network</em>.</p><p>The paper discusses one of the core technologies behind <a href="https://x.company/projects/loon/">Loon</a>, a project that provided internet for remote areas using a balloon-based mesh network. This mesh network faced unique challenges to maintaining connectivity, like ever-changing balloon positions and weather conditions.</p><p><strong><a href="https://www.micahlerner.com/2022/10/08/sdn-in-the-stratosphere-loons-aerospace-mesh-network.html">The paper review is best enjoyed on the blog.</a></strong></p><p>Until next time,</p><p>Micah</p>]]></content:encoded></item></channel></rss>