Systems Papers - TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators
Hello new and old subscribers!
This is one of my first posts on Substack - if you have any feedback, I would love to hear it! Feel free to respond to this email with what is on your mind or if you have found any interesting papers recently.
This week’s paper is TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators.
Running ML models requires resources like memory and CPU. Efficiently allocating these resources (in particular memory) to run models in datacenters is an extensively researched area.
Unfortunately, solving the memory allocation problem for mobile devices is more difficult - there is a wide spectrum of capabilities between the newest Pixel phones and lower powered Android devices, and previous implementations struggle with this variety.
To handle this challenge, Google researchers designed an algorithmic approach to allocate memory with a combination of heuristics and solver-based mechanisms. When deployed to production, the system achieved dramatic improvements to user-facing latency.
The paper review is best enjoyed on my blog.
Discussion on Hacker News and Twitter.
Until next time,
Micah