Reverse Mapping (rmap) in Linux Kernel
Adrian Huang | May, 2022
* Based on kernel 5.11 (x86_64) – QEMU
* SMP (4 CPUs) and 8GB memory
* Kernel parameter: nokaslr norandmaps
* Userspace: ASLR is disabled
* Legacy BIOS
Agenda
• Mapping & reverse mapping
• rmap: legacy approach vs new approach (performance improvement)
• Implementation Detail
Mapping & Reverse Mapping
Process 1
Process N
.
.
Page Table 1
Page Table N
.
.
Physical Memory
Page Frame
Process 1
Process N
.
.
RMAP
Physical Memory
Page Frame Page Table 1
Page Table N
reclaim
clear pte
clear pte
1
2
Reverse mapping
Mapping
rmap – “clear pte”: check ptep_get_and_clear()
rmap: legacy approach vs new approach
(performance improvement)
1. Legacy approach: 2.6.33 or earlier kernel
2. New approach: 2.6.34 or later kernel – High-level overview
anon_vma
Page #0
vma Page Table
Page #1 Page #999
.
. anon_vma
Page #0
vma: parent
vma: child #1
vma: child #N
Page Table
Page Table
Page Table
.
.
Page #1 Page #999
.
.
Page #0
Page #0 Page #1
Page #1 Page #999
Page #999
Parent process
Parent process & child processes: Some pages may be COWed
Fork #N children
rmap: 2.6.33 or earlier kernel
rmap: 2.6.33 or earlier kernel - Limitation
anon_vma
Page #0
vma: parent
vma: child #1
vma: child #N
Page Table
Page Table
Page Table
.
.
Page #1 Page #999
.
.
Page #0
Page #0 Page #1
Page #1 Page #999
Page #999
Parent process & child processes Issue statement
2.6.34 or later kernel – High-level overview
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
Process
Legend
Pointer
Doubly linked list
RB-tree: RB node
.
.
.
2.6.34 or later kernel – parent/child processes interconnection
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
.
.
.
2.6.34 or later kernel: example 1
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain
anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
2 pfn match?
.
.
.
2.6.34 or later kernel: example 1 – more detail
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain
anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
4
2
3
pfn match?
pfn match?
pfn match?
.
.
.
2.6.34 or later kernel: example 2
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
.
.
.
2.6.34 or later kernel: example 2
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
Do not need to traverse
all anon_vma_chain(s)
Implementation Detail
1. How/when to construct/link anon_vma, anon_vma_chain and vm_area_struct
A. Let’s start from fork()
2. COW - Detail
vm_area_struct
vm_mm
vm_ops
vm_file
anon_vma_chain
anon_vma
anon_vma_chain
vma
anon_vma
same_vma
struct rb_node rb
anon_vma
struct anon_vma *root
struct anon_vma *parent
struct rb_root_cached rb_root
Interval tree implemented
via red-black tree
RMAP
page
mapping
page cache
address_space
i_pages (xarray)
i_mmap
anonymous page
Physical Memory
page frame
unsigned degree = 2
anon_vma, anon_vma_chain & vm_area_struct - Detail
fork() – anon_vma_clone()
fork() – anon_vma_clone()
fork() – anon_vma_fork()
fork() – anon_vma_fork()
fork(): COW → write fault (write-protected fault)
fork(): COW → write fault (write-protected fault)
do_wp_page
wp_page_copy
new_page = alloc_page_vma(…)
cow_user_page
copy_user_highpage
maybe_mkwrite
wp_page_shared
[MAP_PRIVATE] COW: Copy On Write
[MAP_SHARED] vma is (VM_WRITE|VM_SHARED)
page_add_new_anon_rmap
fork(): COW - page_add_new_anon_rmap()
Child Process: COW
Write fault
rb_root & rb – When/who to use?
Interval tree traversal (implemented via red-black tree) for reverse mapping
struct list_head anon_vma_chain – When/who to use?
Remove VMA: check unlink_anon_vmas()
rmap: page reclaiming – try_to_unamp()
rmap_walk
rmap_walk_anon
rmap_walk_ksm rmap_walk_file
try_to_unmap
anon_vma_interval_tree_foreach(…, &anon_vma->rb_root, …)
invalid_migration_vma
try_to_unmap_one
page_mapcount_is_zero
try_to_unmap_one
page_mapcount_is_zero
anon_vma_interval_tree_foreach(…, &mapping->i_mmap, …)
Reference
• Understanding the Linux Kernel, 3rd Edition
• 【原创】(十五)Linux内存管理之RMAP
• 奔跑吧 Linux 內核