Engineers at Metaparent company of Facebook, revealed how they were able to offer free memory using a software solution called Transparent Memory Offloading (TMO).
now part of Linux kernel and, in a nutshell, automatically offloads the data to other storage layers (eg Samsung’s CX memory expander) that are less expensive and more energy efficient than memory.
The savings are significant; TMO has been running on millions of Facebook servers for over a year, saving up to nearly a third of memory per server. While this is likely to be negligible on dozens or even hundreds of servers, FacebookThe immense scale of the company presents a unique challenge.
Analysis: Facebook’s gigantic appetite for RAM
The world’s largest social network has nearly three billion monthly active users and millions of servers spread across 21 locations across the globe. If each server has 128 GB of RAM on average, that would equate to 256 million GB (or 256 PB) of HIT which, at an average cost of $4 per GB (DDR4 ECC RAM), equates to about $1 billion worth of memory. This assumes that Facebook has at least two million servers (the Facebook blog cited “millions of servers” as early as July 2018), with the actual number likely much higher.
Figures presented by the team working on TMO showed that the cost of memory accounts for a third of the Meta server’s BOM, with compressed RAM and SSD accounting for less than 11%. More worryingly, the cost burden of RAM (as a percentage of total infrastructure) has more than doubled since Facebook launched its first generation of servers (it is currently on its fourth).
Adopting TMO has some disadvantages; most notably, a degradation in performance. But the gains in terms of power and memory savings far outweigh the downsides and future iterations combined with hardware improvements (e.g. SSD or CXL) will offer more mitigation.