DRFM: Directed Refresh Management
With little fanfare, DDR5 has introduced a new DRAM command for Rowhammer protection: Directed Refresh Management (DRFM). For the first time, a memory controller has all it needs to offer decent Rowhammer protection by leveraging DRFM to refresh potential victim rows.
If you need help with Rowhammer terminology, read my Rowhammer terminology cheat sheet. If you need help with Rowhammer, watch my background on DRAM and Rowhammer. Many other videos online provide background on Rowhammer (as an example, check out Prof. Mutlu's lectures).
The goal of this writeup is to improve clarity. I will describe this new command, its semantics, and its shortcomings.
Background on DRFM
The research community has long proposed adding a new DRAM command that takes as input the address of an aggressor row in a Rowhammer attack. Upon receiving this command, DRAM must refresh all victim rows affected by the aggressor row. Different research papers called this command Neighbor Row Refresh (NRR) [Graphene] or Adjacent Row Refresh (ARR) [TWiCe].
A more clear way to describe this command is to think of it as "Report Aggressor Row" -- it is a way for the memory controller to report the identity of an aggressor row to the DRAM device. Because internal DRAM topologies are secret a memory controller cannot identify and refresh the victim rows affected by an aggressor row. With this new command, the memory controller can now delegate the job of refreshing victim rows to the DRAM device.
DRFM thus arises from compromise. Memory controllers can now refresh victim rows when they detect an aggressor row, and the DRAM devices can continue to keep their internal topologies secret. DRFM, however, is far from a perfect solution, and our paper earlier this year goes in a lot of detail on its shortcomings.
How to Use DRFM
DRFM consists of two steps. First, the memory controller "captures" the identity of the aggressor row. I put captures in quotes because that is the terminology the JEDEC spec uses to mean to report. Confusing huh?
This step is done when a bank's row buffer is being closed through the precharge-per-bank (PREPB) command. This command now has an extra special bit. When this bit is set, the DRAM device will "remember" the row that is being pre-charged as an aggressor row. There are three other commands that can "capture" an aggressor row: read-and-auto-precharge, write-and-auto-precharge, and write-pattern-and-auto-recharge. Like PREPB, these three commands are also bank-specific. All other forms of precharge commands (such as precharge-all-banks or precharge-same-bank) cannot "capture" an aggressor row.
Second, the memory controller issues the DRFM command. The JEDEC specification does not enforce when DRFM must be issued (if at all). While this increases the flexibility of the DRAM command schedule, it also opens up potential security problems if DRFM is sent too late. The aggressor row can still be activated even after being reported.
There are two types of DRFM commands: all banks and same bank. When a DRAM device receives DRFM, it refreshes the victim rows affected by the "captured" aggressor row in each bank. If the bank has no "captured" aggressor row, the DRAM device is free to choose what rows to refresh (presumably based on their own internal TRR).
An astute reader might wonder how come there is no DRFM-per-bank that would work in synchrony with the first step (precharge-per-bank)? Great question! :-)
The memory controller can choose a global blast radius for the entire DRAM channel, and, upon receiving a DRFM, the DRAM will refresh all rows within the blast radius of an aggressor row.
But hold on! It gets a lot more complicated. JEDEC spec use the term "BRC". What's BRC? Well, it means either "Bounded Refresh Configuration" or "Blast Radius Configuration" depending on which paragraph you are reading. On a DRFM command, the DRAM device will refresh some of the rows all the time and some of the rows some of the time. This can sound daunting, but it makes sense. Let me explain.
The DRFM command itself can be turned into a Rowhammer attack vector. DRFM refreshes behave like row activations and, unfortunately, these additional activations cannot be tracked by the memory controller. A DRAM device can receive a stream of DRFM commands all reporting one single aggressor row. The DRAM device will refresh the corresponding victim rows, repeatedly, thereby transitively turning them into aggressor rows. These transitive aggressor rows are now hammering freely and can flip bits. Google has reported re-creating such forms of transitive attacks in a lab using refreshes due to TRR. They call these attacks "half-double".
The complicated semantics of BRC is meant to address transitive Rowhammer attacks (aka half-double). The figure above shows a DRFM command with a blast radius of 1 where K-1 and K+1 are refreshed. One way to try to address transitive attacks is to sometimes refresh rows K-2 and K+2. A BRC-2 configuration does exactly that. It guarantees that on a DRFM command, K-1 and K+1 are always refreshed whereas K-2 and K+2 are refreshed based on a "ratio" as per JEDEC's terminology. This means they are refreshed once in a while based on an unpredictable (random?) schedule.
If you are scratching your head wondering about the security of DRFM's handling of transitive Rowhammer attacks, you are not alone. Unfortunately, I believe there is no good solution here. This stems from the memory controller's inability to track the identity of the victim rows due to the secrecy of DRAM's internal topology. You might also wonder what is the value of the ratio? Is it 2? Is it 10? The JEDEC spec is silent about it. This means "DRAM vendors know best!" :-)
Thank you for reading this far. As always, please e-mail me your questions, comments, and feedback.
November 22nd, 2022