Debugging Kernel OOPs
Neependra Khare
STEC India
November 3, 2011
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
1 / 19
OOPs
Agenda
What is OOPs Typical OOPs Kernel Symbol Table Examples Debugging a running Kernel with gdb
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
2 / 19
OOPs
What is OOPs
Whats the most common Bug in user space program.
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
3 / 19
OOPs
What is OOPs
Whats the most common Bug in user space program.
The segfault.
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
3 / 19
OOPs
What is OOPs
Whats the most common Bug in user space program.
The segfault.
What is the most common bug in the Linux kernel?
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
3 / 19
OOPs
What is OOPs
Whats the most common Bug in user space program.
The segfault.
What is the most common bug in the Linux kernel?
The segfault. Except here:The notion of a segfault is much more complicated When the kernel de-references an invalid pointer, its not called a segfault its called an oops.
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
3 / 19
OOPs
Typical OOPs
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
4 / 19
OOPs
Typical OOPs
Kernel exception handler
Kills oending process Prints registers, stack trace with symbolic info
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
4 / 19
OOPs
Typical OOPs
Kernel exception handler
Kills oending process Prints registers, stack trace with symbolic info
Some exceptions non-recoverable (panic())
Prints message on console, halts kernel Oops in interrupt handler, idle (0) or init (1)
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
4 / 19
OOPs
Typical OOPs
Kernel exception handler
Kills oending process Prints registers, stack trace with symbolic info
Some exceptions non-recoverable (panic())
Prints message on console, halts kernel Oops in interrupt handler, idle (0) or init (1)
Oops generated by macros:
BUG(), BUG ON(condition)
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
4 / 19
OOPs
Typical OOPs
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
5 / 19
OOPs
Tainted kernels
Some oops reports contain the string Tainted: after the program counter. G if all modules loaded have a GPL or compatible license, P any proprietary module has been loaded. F if any module was force loaded by insmod -f, if all modules were loaded normally. M if any processor has reported a Machine Check Exception, if no Machine Check Exceptions have occurred. etc.
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
6 / 19
OOPs
Kernel Symbol Table
c04a1210 c04a1525 c04a7a18 c04ac37a ... c068b262 c068c4a0 c068d01f c068d055 c0698a01 c0853400 T T T T t T T T T r vfs_readv vfs_read vfs_readlink vfs_readdir net_rx_action netif_rx netif_rx_ni netdev_rx_csum_fault __netpoll_rx __ksymtab_netif_rx
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
7 / 19
OOPs
Kernel Symbol Table
c04a1210 c04a1525 c04a7a18 c04ac37a ... c068b262 c068c4a0 c068d01f c068d055 c0698a01 c0853400 T T T T t T T T T r vfs_readv vfs_read vfs_readlink vfs_readdir net_rx_action netif_rx netif_rx_ni netdev_rx_csum_fault __netpoll_rx __ksymtab_netif_rx
[Link]
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
7 / 19
OOPs
Kernel Symbol Table
c04a1210 c04a1525 c04a7a18 c04ac37a ... c068b262 c068c4a0 c068d01f c068d055 c0698a01 c0853400 T T T T t T T T T r vfs_readv vfs_read vfs_readlink vfs_readdir net_rx_action netif_rx netif_rx_ni netdev_rx_csum_fault __netpoll_rx __ksymtab_netif_rx
[Link] /proc/kallsyms
Neependra Khare (STEC India) FUDCon 2011 November 3, 2011 7 / 19
OOPs
[Link] File
[Link] is a phone directory list of function in a particular build of a kernel.
[root@nkhare ~]# ls -l /boot/[Link]* lrwxrwxrwx 1 root root 25 2009-08-26 -rw-r--r--. 1 root root 1274567 2009-05-28 -rw-r--r-- 1 root root 1225104 2009-08-26 -rw-r--r-- 1 root root 1257149 2009-06-17 17:05 03:09 17:05 09:02 /boot/[Link] -> /boot/[Link]-[Link] /boot/[Link] /boot/[Link]-[Link] /boot/[Link]-[Link]-191.fc11.i586
How it is produced?
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
8 / 19
OOPs
[Link] File
[Link] is a phone directory list of function in a particular build of a kernel.
[root@nkhare ~]# ls -l /boot/[Link]* lrwxrwxrwx 1 root root 25 2009-08-26 -rw-r--r--. 1 root root 1274567 2009-05-28 -rw-r--r-- 1 root root 1225104 2009-08-26 -rw-r--r-- 1 root root 1257149 2009-06-17 17:05 03:09 17:05 09:02 /boot/[Link] -> /boot/[Link]-[Link] /boot/[Link] /boot/[Link]-[Link] /boot/[Link]-[Link]-191.fc11.i586
How it is produced? When you compile the kernel.
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
8 / 19
OOPs
[Link] File
[Link] is a phone directory list of function in a particular build of a kernel.
[root@nkhare ~]# ls -l /boot/[Link]* lrwxrwxrwx 1 root root 25 2009-08-26 -rw-r--r--. 1 root root 1274567 2009-05-28 -rw-r--r-- 1 root root 1225104 2009-08-26 -rw-r--r-- 1 root root 1257149 2009-06-17 17:05 03:09 17:05 09:02 /boot/[Link] -> /boot/[Link]-[Link] /boot/[Link] /boot/[Link]-[Link] /boot/[Link]-[Link]-191.fc11.i586
How it is produced? When you compile the kernel. nm vmlinux
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
8 / 19
OOPs
/proc/kallsyms
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
9 / 19
OOPs
/proc/kallsyms
Created on the y when a kernel boots up.
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
9 / 19
OOPs
/proc/kallsyms
Created on the y when a kernel boots up. Kernel data which is given the illusion of being a disk le. # le /proc/kallsyms /proc/kallsyms: empty Symbol for custom modules # grep mycdrv /proc/kallsyms f7d14000 t mycdrv1_read [char_read_write] f7d14014 t mycdrv1_release [char_read_write] f7d1402f t mycdrv1_open [char_read_write] f7d140ec r mycdrv1_fops [char_read_write] f7d1404a t mycdrv1_write [char_read_write]
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
9 / 19
OOPs
OOPs Example 1
Aug Aug Aug Aug
28 28 28 28
21:22:55 21:22:55 21:22:55 21:22:55
localhost localhost localhost localhost
kernel: kernel: BUG: unable to handle kernel NULL pointer dereference at (null) kernel: IP: [<f7d18008>] mycdrv1_read+0x8/0x14 [char_read_write] kernel: *pde = 5f136067
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:Oops: 0002 [#1] SMP Aug 28 21:22:55 localhost kernel: Oops: 0002 [#1] SMP
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:00/power_su Aug 28 21:22:55 localhost kernel: last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0 Aug 28 21:22:55 localhost kernel: Modules linked in: char_read_write ppp_synctty n_hdlc ppp_deflate zlib_deflat Aug 28 21:22:55 localhost kernel: i2c_core video output [last unloaded: scsi_wait_scan] Aug 28 21:22:55 localhost kernel: Aug 28 21:22:55 localhost kernel: Pid: 4716, comm: cat Tainted: G W ([Link]-191.fc11.i586 #1) 7735AE7 Aug 28 21:22:55 localhost kernel: EIP: 0060:[<f7d18008>] EFLAGS: 00010286 CPU: 0 Aug 28 21:22:55 localhost kernel: EIP is at mycdrv1_read+0x8/0x14 [char_read_write] Aug 28 21:22:55 localhost kernel: EAX: f3cb9d80 EBX: f3cb9d80 ECX: 00008000 EDX: 0861a000 Aug 28 21:22:55 localhost kernel: ESI: 0861a000 EDI: f7d18000 EBP: f1a67f5c ESP: f1a67f5c Aug 28 21:22:55 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
10 / 19
OOPs
OOPs Example 1
Aug Aug Aug Aug
28 28 28 28
21:22:55 21:22:55 21:22:55 21:22:55
localhost localhost localhost localhost
kernel: kernel: BUG: unable to handle kernel NULL pointer dereference at (null) kernel: IP: [<f7d18008>] mycdrv1_read+0x8/0x14 [char_read_write] kernel: *pde = 5f136067
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:Oops: 0002 [#1] SMP Aug 28 21:22:55 localhost kernel: Oops: 0002 [#1] SMP
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:00/power_su Aug 28 21:22:55 localhost kernel: last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0 Aug 28 21:22:55 localhost kernel: Modules linked in: char_read_write ppp_synctty n_hdlc ppp_deflate zlib_deflat Aug 28 21:22:55 localhost kernel: i2c_core video output [last unloaded: scsi_wait_scan] Aug 28 21:22:55 localhost kernel: Aug 28 21:22:55 localhost kernel: Pid: 4716, comm: cat Tainted: G W ([Link]-191.fc11.i586 #1) 7735AE7 Aug 28 21:22:55 localhost kernel: EIP: 0060:[<f7d18008>] EFLAGS: 00010286 CPU: 0 Aug 28 21:22:55 localhost kernel: EIP is at mycdrv1_read+0x8/0x14 [char_read_write] Aug 28 21:22:55 localhost kernel: EAX: f3cb9d80 EBX: f3cb9d80 ECX: 00008000 EDX: 0861a000 Aug 28 21:22:55 localhost kernel: ESI: 0861a000 EDI: f7d18000 EBP: f1a67f5c ESP: f1a67f5c Aug 28 21:22:55 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
11 / 19
OOPs
OOPs Example 1
Aug Aug Aug Aug
28 28 28 28
21:22:55 21:22:55 21:22:55 21:22:55
localhost localhost localhost localhost
kernel: kernel: BUG: unable to handle kernel NULL pointer dereference at (null) kernel: IP: [<f7d18008>] mycdrv1_read+0x8/0x14 [char_read_write] kernel: *pde = 5f136067
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:Oops: 0002 [#1] SMP Aug 28 21:22:55 localhost kernel: Oops: 0002 [#1] SMP
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:00/power_su Aug 28 21:22:55 localhost kernel: last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0 Aug 28 21:22:55 localhost kernel: Modules linked in: char_read_write ppp_synctty n_hdlc ppp_deflate zlib_deflat Aug 28 21:22:55 localhost kernel: i2c_core video output [last unloaded: scsi_wait_scan] Aug 28 21:22:55 localhost kernel: Aug 28 21:22:55 localhost kernel: Pid: 4716, comm: cat Tainted: G W ([Link]-191.fc11.i586 #1) 7735AE Aug 28 21:22:55 localhost kernel: EIP: 0060:[<f7d18008>] EFLAGS: 00010286 CPU: 0 Aug 28 21:22:55 localhost kernel: EIP is at mycdrv1_read+0x8/0x14 [char_read_write] Aug 28 21:22:55 localhost kernel: EAX: f3cb9d80 EBX: f3cb9d80 ECX: 00008000 EDX: 0861a000 Aug 28 21:22:55 localhost kernel: ESI: 0861a000 EDI: f7d18000 EBP: f1a67f5c ESP: f1a67f5c Aug 28 21:22:55 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
12 / 19
OOPs
OOPs Example 1
Aug Aug Aug Aug
28 28 28 28
21:22:55 21:22:55 21:22:55 21:22:55
localhost localhost localhost localhost
kernel: kernel: BUG: unable to handle kernel NULL pointer dereference at (null) kernel: IP: [<f7d18008>] mycdrv1_read+0x8/0x14 [char_read_write] kernel: *pde = 5f136067
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:Oops: 0002 [#1] SMP Aug 28 21:22:55 localhost kernel: Oops: 0002 [#1] SMP
Message from syslogd@localhost at Aug 28 21:22:55 ... kernel:last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:00/power_su Aug 28 21:22:55 localhost kernel: last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0 Aug 28 21:22:55 localhost kernel: Modules linked in: char_read_write ppp_synctty n_hdlc ppp_deflate zlib_deflat Aug 28 21:22:55 localhost kernel: i2c_core video output [last unloaded: scsi_wait_scan] Aug 28 21:22:55 localhost kernel: Aug 28 21:22:55 localhost kernel: Pid: 4716, comm: cat Tainted: G W ([Link]-191.fc11.i586 #1) 7735AE7 Aug 28 21:22:55 localhost kernel: EIP: 0060:[<f7d18008>] EFLAGS: 00010286 CPU: 0 Aug 28 21:22:55 localhost kernel: EIP is at mycdrv1_read+0x8/0x14 [char_read_write] Aug 28 21:22:55 localhost kernel: EAX: f3cb9d80 EBX: f3cb9d80 ECX: 00008000 EDX: 0861a000 Aug 28 21:22:55 localhost kernel: ESI: 0861a000 EDI: f7d18000 EBP: f1a67f5c ESP: f1a67f5c Aug 28 21:22:55 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
13 / 19
OOPs
OOPs Example 1
\# objdump
-S [Link] (struct file *file, char __user * buf, size_t lbuf, loff_t * ppos) push mov call movl \%ebp \%esp,\%ebp 4 <mycdrv1_read+0x4> \$0x0,0x0
static ssize_t mycdrv1_read { 0: 55 1: 89 e5 3: e8 fc ff ff ff *(int *)0 = 0; 8: c7 05 00 00 00 00 00 f: 00 00 00 } 12: 5d 13: c3
pop ret
\%ebp
00000014 <mycdrv1_release>: printk (" Opening : \%s:\n\n", MYDEV_NAME); return 0; }
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
14 / 19
OOPs
OOPs Example 2
Aug 28 21:40:59 localhost kernel: BUG: unable to handle kernel paging request at ffffffff Aug 28 21:40:59 localhost kernel: IP: [<ffffffff>] 0xffffffff Aug 28 21:40:59 localhost kernel: *pde = 00957067 *pte = 00000000 Message from syslogd@localhost at Aug 28 21:40:59 ... kernel:Oops: 0000 [#3] SMP Aug 28 21:40:59 localhost kernel: Oops: 0000 [#3] SMP
Message from syslogd@localhost at Aug 28 21:40:59 ... kernel:last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:00/power_su Aug 28 21:40:59 localhost kernel: last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0 Aug 28 21:40:59 localhost kernel: Modules linked in: char_read_write1 char_read_write ppp_synctty n_hdlc ppp_de Aug 28 21:40:59 localhost kernel: drm i2c_algo_bit i2c_core video output [last unloaded: char_read_write1] Aug 28 21:40:59 localhost kernel: Aug 28 21:40:59 localhost kernel: Pid: 6050, comm: cat Tainted: G D W ([Link]-191.fc11.i586 #1) 7735AE7 Aug 28 21:40:59 localhost kernel: EIP: 0060:[<ffffffff>] EFLAGS: 00010246 CPU: 0 Aug 28 21:40:59 localhost kernel: EIP is at 0xffffffff Aug 28 21:40:59 localhost kernel: EAX: 00000001 EBX: ffffffff ECX: 00000000 EDX: ffffffff Aug 28 21:40:59 localhost kernel: ESI: 09282000 EDI: ffffffff EBP: ffffffff ESP: f3cebf64 Aug 28 21:40:59 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
15 / 19
OOPs
Debugging a running Kernel with gdb
\# gdb vmlinux /proc/kcore (gdb) p jiffies_64 \$1 = 4295075081 (gdb) core-file /proc/kcore
(gdb) p jiffies_64 \$2 = 4295225679
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
16 / 19
OOPs
Debugging a running Kernel with gdb
(gdb) p mycdrv1_read No symbol "mycdrv1_read" in current context. Why? We need to educate about the module symbols
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
17 / 19
OOPs
Debugging a running Kernel with gdb
cd /sys/module/chra read write/sections cat .bss .data .text 0xf8e1e830 0xf8e1e6f4 0xf8e1e000
(gdb) add-symbol-file /home/nkhare/Training/LinuxKernel/oops/char-dev-read-write-oop/[Link] 0xf8e1 add symbol table from file "/home/nkhare/Training/LinuxKernel/oops/char-dev-read-write-oop/[Link]" .text_addr = 0xf8e1e000 .data_addr = 0xf8e1e6f4 .bss_addr = 0xf8e1e830 (y or n) y Reading symbols from /home/nkhare/Training/LinuxKernel/oops/char-dev-read-write-oop/[Link]...done. (gdb) p mycdrv1_read \$1 = {ssize_t (struct file *, char *, size_t, loff_t *)} 0xf8e1e000 <mycdrv1_read> (gdb) list mycdrv1_read
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
18 / 19
OOPs
References
[Link] [Link] [Link]
Neependra Khare (STEC India)
FUDCon 2011
November 3, 2011
19 / 19