Solving PostgreSQL wicked problems
Alexander Korotkov
Oriole DB Inc.
2021
Alexander Korotkov Solving PostgreSQL wicked problems 1 / 40
PostgreSQL has two sides
Alexander Korotkov Solving PostgreSQL wicked problems 2 / 40
The bright side of PostgreSQL
Alexander Korotkov Solving PostgreSQL wicked problems 3 / 40
PostgreSQL – one of the most popular DBMS’es1
1
According to db-engines.com
Alexander Korotkov Solving PostgreSQL wicked problems 4 / 40
PostgreSQL – strong trend2
2
https://db-engines.com/en/ranking_trend/system/PostgreSQL
Alexander Korotkov Solving PostgreSQL wicked problems 5 / 40
PostgreSQL – most loved RDBMS3
3
According to Stackoverflow 2020 survey
Alexander Korotkov Solving PostgreSQL wicked problems 6 / 40
The dark side of PostgreSQL
Alexander Korotkov Solving PostgreSQL wicked problems 7 / 40
Cri cism of PostgreSQL (1/2)
https://eng.uber.com/postgres-to-mysql-migration/
Alexander Korotkov Solving PostgreSQL wicked problems 8 / 40
Cri cism of PostgreSQL (2/2)
https://medium.com/@rbranson/10-things-i-hate-about-postgresql-20dbab8c2791
Alexander Korotkov Solving PostgreSQL wicked problems 9 / 40
10 wicked problems of PostgreSQL
Problem name Known for Work started Resolu on
1. Wraparound 20 years 15 years ago S ll WIP
2. Failover Will Probably Lose Data 20 years 16 years ago S ll WIP
3. Inefficient Replica on That Spreads Corrup on 10 years 8 years ago S ll WIP
4. MVCC Garbage Frequently Painful 20 years 19 years ago Abandoned
5. Process-Per-Connec on = Pain at Scale 20 years 3 years ago Abandoned
6. Primary Key Index is a Space Hog 13 years — Not started
7. Major Version Upgrades Can Require Down me 21 years 16 years ago S ll WIP
8. Somewhat Cumbersome Replica on Setup 10 years 9 years ago S ll WIP
9. Ridiculous No-Planner-Hints Dogma 20 years 11 years ago Extension
10. No Block Compression 12 years 11 years ago S ll WIP
* Scalability on modern hardware
Alexander Korotkov Solving PostgreSQL wicked problems 10 / 40
The exci ng moment
▶ PostgreSQL community have proven to be brilliant on solving
non-design issues, providing fantas c product to the market.
Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40
The exci ng moment
▶ PostgreSQL community have proven to be brilliant on solving
non-design issues, providing fantas c product to the market.
▶ As a result, PostgreSQL has had a strong upwards trend for many
years.
Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40
The exci ng moment
▶ PostgreSQL community have proven to be brilliant on solving
non-design issues, providing fantas c product to the market.
▶ As a result, PostgreSQL has had a strong upwards trend for many
years.
▶ At the same me, the PostgreSQL community appears to be
dysfunc onal in solving design issues, a rac ng severe cri cism.
Nevertheless, cri cs not yet break the upwards trend.
Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40
The exci ng moment
▶ PostgreSQL community have proven to be brilliant on solving
non-design issues, providing fantas c product to the market.
▶ As a result, PostgreSQL has had a strong upwards trend for many
years.
▶ At the same me, the PostgreSQL community appears to be
dysfunc onal in solving design issues, a rac ng severe cri cism.
Nevertheless, cri cs not yet break the upwards trend.
▶ It appears to be a unique moment for PostgreSQL redesign!
Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40
How could we solve the PostgreSQL
wicked problems?
Alexander Korotkov Solving PostgreSQL wicked problems 12 / 40
Tradi onal buffer management
1
2 3
4 5 6 7
Disk
1'
2' 3'
5' 6'
Memory
Buffer
mapping
1 2 3 5 6
4 7
▶ Each page access requires lookup into buffer mapping data structure.
Alexander Korotkov Solving PostgreSQL wicked problems 13 / 40
Tradi onal buffer management
1
2 3
4 5 6 7
Disk
1'
2' 3'
5' 6'
Memory
Buffer
mapping
1 2 3 5 6
4 7
▶ Each page access requires lookup into buffer mapping data structure.
▶ Each B-tree key lookup takes mul ple buffer mapping lookups.
Alexander Korotkov Solving PostgreSQL wicked problems 13 / 40
Tradi onal buffer management
1
2 3
4 5 6 7
Disk
1'
2' 3'
5' 6'
Memory
Buffer
mapping
1 2 3 5 6
4 7
▶ Each page access requires lookup into buffer mapping data structure.
▶ Each B-tree key lookup takes mul ple buffer mapping lookups.
▶ Accessing cached data doesn’t scale on modern hardware.
Alexander Korotkov Solving PostgreSQL wicked problems 13 / 40
Solu on: Dual pointers
1
2 3
5 7
1 2 3 4 5 6 7
Disk
▶ In-memory page refers either in-memory or on-disk page.
Alexander Korotkov Solving PostgreSQL wicked problems 14 / 40
Solu on: Dual pointers
1
2 3
5 7
1 2 3 4 5 6 7
Disk
▶ In-memory page refers either in-memory or on-disk page.
▶ Accessing cached data without buffer mapping lookups.
Alexander Korotkov Solving PostgreSQL wicked problems 14 / 40
Solu on: Dual pointers
1
2 3
5 7
1 2 3 4 5 6 7
Disk
▶ In-memory page refers either in-memory or on-disk page.
▶ Accessing cached data without buffer mapping lookups.
▶ Good scalability!
Alexander Korotkov Solving PostgreSQL wicked problems 14 / 40
PostgreSQL MVCC = bloat + write-amplifica on
▶ New and old row versions shares the same heap.
Alexander Korotkov Solving PostgreSQL wicked problems 15 / 40
PostgreSQL MVCC = bloat + write-amplifica on
▶ New and old row versions shares the same heap.
▶ Non-HOT updates cause index bloat.
Alexander Korotkov Solving PostgreSQL wicked problems 15 / 40
Solu on: undo log for both pages and rows
Undo
row1 row4
row1 row2
row3 row4
page
row2v1
row1v1 row1v2
▶ Old row versions form chains in undo log.
Alexander Korotkov Solving PostgreSQL wicked problems 16 / 40
Solu on: undo log for both pages and rows
Undo
row1 row4
row1 row2
row3 row4
page
row2v1
row1v1 row1v2
▶ Old row versions form chains in undo log.
▶ Page-level chains evict deleted rows from primary storage.
Alexander Korotkov Solving PostgreSQL wicked problems 16 / 40
Solu on: undo log for both pages and rows
Undo
row1 row4
row1 row2
row3 row4
page
row2v1
row1v1 row1v2
▶ Old row versions form chains in undo log.
▶ Page-level chains evict deleted rows from primary storage.
▶ Update only indexes with changed values.
Alexander Korotkov Solving PostgreSQL wicked problems 16 / 40
Block-level WAL
Heap
Index #1 Index #2
WAL
▶ Huge WAL traffic.
Alexander Korotkov Solving PostgreSQL wicked problems 17 / 40
Block-level WAL
Heap
Index #1 Index #2
WAL
▶ Huge WAL traffic.
▶ Problems with parallel apply.
Alexander Korotkov Solving PostgreSQL wicked problems 17 / 40
Block-level WAL
Heap
Index #1 Index #2
WAL
▶ Huge WAL traffic.
▶ Problems with parallel apply.
▶ Not suitable for mul -master replica on.
Alexander Korotkov Solving PostgreSQL wicked problems 17 / 40
Solu on: row-level WAL
Heap
Index #1 Index #2
WAL
▶ Very compact.
Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40
Solu on: row-level WAL
Heap
Index #1 Index #2
WAL
▶ Very compact.
▶ Apply can be parallelized.
Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40
Solu on: row-level WAL
Heap
Index #1 Index #2
WAL
▶ Very compact.
▶ Apply can be parallelized.
▶ Suitable for mul master (row-level conflicts, not block-level).
Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40
Solu on: row-level WAL
Heap
Index #1 Index #2
WAL
▶ Very compact.
▶ Apply can be parallelized.
▶ Suitable for mul master (row-level conflicts, not block-level).
▶ Recovery needs structurally consistent checkpoints.
Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40
Row-level WAL based mul master
OrioleDB instance
Storage
WAL
OrioleDB instance
Storage
WAL
OrioleDB instance
Storage
WAL
Raft replication
Alexander Korotkov Solving PostgreSQL wicked problems 19 / 40
Copy-on-write checkpoints (1/4)
1
2 3
5 7
1 2 3 4 5 6 7
Disk
Alexander Korotkov Solving PostgreSQL wicked problems 20 / 40
Copy-on-write checkpoints (2/4)
1
2 3
5 7*
1 2 3 4 5 6 7
Disk
Alexander Korotkov Solving PostgreSQL wicked problems 21 / 40
Copy-on-write checkpoints (3/4)
1*
2 3*
5 7*
1 2 3 4 5 6 7
Disk
7* 3* 1*
Alexander Korotkov Solving PostgreSQL wicked problems 22 / 40
Copy-on-write checkpoints (4/4)
1*
2 3*
5 7*
2 4 5 6
Disk
7* 3* 1*
Alexander Korotkov Solving PostgreSQL wicked problems 23 / 40
What do we need from PostgreSQL extendability?
Backgroud processes
Backend
Connection
Parser
Rewriter
Planner
Executor
Autovacuum
Background writer
Checkpointer
WAL writer
PostgreSQL server
File system
Data files
WAL files
Log files
OrioleDB

extension
OrioleDB
data files
OrioleDB
undo files
......
File system
▶ Extended table AM.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
Backgroud processes
Backend
Connection
Parser
Rewriter
Planner
Executor
Autovacuum
Background writer
Checkpointer
WAL writer
PostgreSQL server
File system
Data files
WAL files
Log files
OrioleDB

extension
OrioleDB
data files
OrioleDB
undo files
......
File system
▶ Extended table AM.
▶ Custom toast handlers.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
Backgroud processes
Backend
Connection
Parser
Rewriter
Planner
Executor
Autovacuum
Background writer
Checkpointer
WAL writer
PostgreSQL server
File system
Data files
WAL files
Log files
OrioleDB

extension
OrioleDB
data files
OrioleDB
undo files
......
File system
▶ Extended table AM.
▶ Custom toast handlers.
▶ Custom row iden fiers.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
Backgroud processes
Backend
Connection
Parser
Rewriter
Planner
Executor
Autovacuum
Background writer
Checkpointer
WAL writer
PostgreSQL server
File system
Data files
WAL files
Log files
OrioleDB

extension
OrioleDB
data files
OrioleDB
undo files
......
File system
▶ Extended table AM.
▶ Custom toast handlers.
▶ Custom row iden fiers.
▶ Custom error cleanup.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
Backgroud processes
Backend
Connection
Parser
Rewriter
Planner
Executor
Autovacuum
Background writer
Checkpointer
WAL writer
PostgreSQL server
File system
Data files
WAL files
Log files
OrioleDB

extension
OrioleDB
data files
OrioleDB
undo files
......
File system
▶ Extended table AM.
▶ Custom toast handlers.
▶ Custom row iden fiers.
▶ Custom error cleanup.
▶ Recovery & checkpointer hooks.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
Backgroud processes
Backend
Connection
Parser
Rewriter
Planner
Executor
Autovacuum
Background writer
Checkpointer
WAL writer
PostgreSQL server
File system
Data files
WAL files
Log files
OrioleDB

extension
OrioleDB
data files
OrioleDB
undo files
......
File system
▶ Extended table AM.
▶ Custom toast handlers.
▶ Custom row iden fiers.
▶ Custom error cleanup.
▶ Recovery & checkpointer hooks.
▶ Snapshot hooks.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
Backgroud processes
Backend
Connection
Parser
Rewriter
Planner
Executor
Autovacuum
Background writer
Checkpointer
WAL writer
PostgreSQL server
File system
Data files
WAL files
Log files
OrioleDB

extension
OrioleDB
data files
OrioleDB
undo files
......
File system
▶ Extended table AM.
▶ Custom toast handlers.
▶ Custom row iden fiers.
▶ Custom error cleanup.
▶ Recovery & checkpointer hooks.
▶ Snapshot hooks.
▶ Some other miscellaneous hooks
total 1K lines patch to PostgreSQL
Core
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
OrioleDB = PostgreSQL redesign
PostgreSQL
Block-level WAL Row-level WAL
Buffer mapping Direct page links
Buffer locking Lock-less access
Bloat-prone MVCC Undo log
Cumbersome
block-level WAL
replication
Raft-based
multimaster
replication of row-
level WAL
Alexander Korotkov Solving PostgreSQL wicked problems 25 / 40
OrioleDB’s answer to 10 wicked problems of PostgreSQL
Problem name Solu on
1. Wraparound Na ve 64-bit transac on ids
2. Failover Will Probably Lose Data Mul master replica on
3. Inefficient Replica on That Spreads Corrup on Row-level replica on
4. MVCC Garbage Frequently Painful Non-persistent undo log
5. Process-Per-Connec on = Pain at Scale Migra on to mul thread model
6. Primary Key Index is a Space Hog Index-organized tables
7. Major Version Upgrades Can Require Down me Mul master + per-node upgrade
8. Somewhat Cumbersome Replica on Setup Simple setup of ra -based mul master
9. Ridiculous No-Planner-Hints Dogma In-core planner hints
10. No Block Compression Block-level compression
* Scalability on modern hardware
Alexander Korotkov Solving PostgreSQL wicked problems 26 / 40
Let’s do some benchmarks! 4
4
https://gist.github.com/akorotkov/f5e98ba5805c42ee18bf945b30cc3d67
Alexander Korotkov Solving PostgreSQL wicked problems 27 / 40
OrioleDB benchmark: read-only scalability
0 50 100 150 200 250
# Clients
0
200000
400000
600000
800000
TPS
Read-only scalability test PostgreSQL vs OrioleDB
1 minute of pgbench script reading 9 random values of 100M
PostgreSQL
OrioleDB
OrioleDB: 4X higher TPS!
Alexander Korotkov Solving PostgreSQL wicked problems 28 / 40
OrioleDB benchmark: read-write scalability
in-memory case
0 50 100 150 200 250
# Clients
0
100000
200000
300000
400000
TPS
Read-write scalability test PostgreSQL vs OrioleDB
1 minute of pgbench TPC-B like transactions wrapped into stored procedure
PostgreSQL
OrioleDB
OrioleDB: 3.5X higher TPS!
Alexander Korotkov Solving PostgreSQL wicked problems 29 / 40
OrioleDB benchmark: read-write scalability
external storage case
0 250 500 750 1000 1250 1500 1750 2000
# Clients
0
20000
40000
60000
80000
100000
120000
TPS
pgbench -s 20000 -j $n -c $n -M prepared on odb-node02
mean of 3 3-minute runs with shared_buffers = 32GB(128GB), max_connections = 2500
pgsql-read-write
orioledb-read-write
orioledb-read-write-block-device
OrioleDB: up to 50X higher TPS!
Alexander Korotkov Solving PostgreSQL wicked problems 30 / 40
OrioleDB benchmark: read-write scalability
Intel Optane persistent memory
0 250 500 750 1000 1250 1500 1750 2000
# Clients
0
25000
50000
75000
100000
125000
150000
175000
200000
TPS
pgbench -s 20000 -j $n -c $n -M prepared -f read-write-proc.sql on node03
5-minute run with shared_buffers = 32GB, max_connections = 2500
pgsql
orioledb-fsdax
orioledb-devdax
OrioleDB: up to 50X higher TPS!
Alexander Korotkov Solving PostgreSQL wicked problems 31 / 40
OrioleDB benchmark: write-amplifica on & bloat test: CPU
400 600 800 1000 1200 1400 1600
seconds
0
100000
200000
300000
400000
500000
600000
700000
800000
TPS
Troughtput
PostgreSQL
OrioleDB
400 600 800 1000 1200 1400 1600
seconds
0
20
40
60
80
100
Usage,
%
CPU usage
PostgreSQL
OrioleDB
OrioleDB: 5X higher TPS! 2.3X less CPU/TPS!
Alexander Korotkov Solving PostgreSQL wicked problems 32 / 40
OrioleDB benchmark: write-amplifica on & bloat test: IO
400 600 800 1000 1200 1400 1600
seconds
0
100000
200000
300000
400000
500000
600000
700000
800000
TPS
Troughtput
PostgreSQL
OrioleDB
0 250 500 750 1000 1250 1500 1750
seconds
0
5000
10000
15000
20000
25000
30000
35000
IOPS
IO load
PostgreSQL
OrioleDB
OrioleDB: 5X higher TPS! 22X less IO/TPS!
Alexander Korotkov Solving PostgreSQL wicked problems 33 / 40
OrioleDB benchmark: write-amplifica on & bloat test: space
400 600 800 1000 1200 1400 1600
seconds
0
10
20
30
40
50
60
70
80
GB
Space used
PostgreSQL
OrioleDB
OrioleDB: no bloat!
Alexander Korotkov Solving PostgreSQL wicked problems 34 / 40
OrioleDB benchmark: taxi workload (1/3): read
0 500 1000 1500 2000 2500 3000 3500
seconds
0
25
50
75
100
125
150
175
IOPS
Disk read
PostgreSQL
OrioleDB
OrioleDB: 9X less read IOPS!
Alexander Korotkov Solving PostgreSQL wicked problems 35 / 40
OrioleDB benchmark: taxi workload (2/3): write
0 500 1000 1500 2000 2500 3000 3500
seconds
0
50
100
150
200
250
300
350
IOPS
Disk write
PostgreSQL
OrioleDB
OrioleDB: 4.5X less write IOPS!
Alexander Korotkov Solving PostgreSQL wicked problems 36 / 40
OrioleDB benchmark: taxi workload (3/3): space
0 500 1000 1500 2000 2500 3000 3500
seconds
0
5
10
15
20
25
30
35
40
GB
Space used
PostgreSQL
OrioleDB
OrioleDB: 8X less space usage!
Alexander Korotkov Solving PostgreSQL wicked problems 37 / 40
OrioleDB = Solu on of wicked PostgreSQL
problems + extraordinary performance
Alexander Korotkov Solving PostgreSQL wicked problems 38 / 40
Roadmap
▶ Basic engine features 4
▶ Table AM interface implementa on 4
▶ Data compression 4
▶ Undo log 4
▶ TOAST support 4
▶ Parallel row-level replica on 4
▶ Par al and expression indexes 4
Ini al release
▶ GiST/GIN analogues
Alexander Korotkov Solving PostgreSQL wicked problems 39 / 40
OrioleDB status
▶ Release is scheduled for December 1st 2021;
▶ https://github.com/orioledb/orioledb;
▶ If you need more explana on, don’t hesitate to make pull requests.
Alexander Korotkov Solving PostgreSQL wicked problems 40 / 40