Monday, February 15, 2010

crash core dump analysis setup

When a system crashes due to kernel panic, more often than not the system logs are mute.
data from ganglia, when available, provides only overall system info, but nothing very specific.
the crash utility provides useful information from a core file generated by the running kernel at the moment of a system crash.
proper use of the crash utility is a discipline in and of itself, but has quite a few functions which do not require indepth knowledge of the kernel to be useful.
At the moment of crash one can see
1. entire process table (ps)
2. tasks on the run queue (runq)
3. mounted filesystems on the filesystem (mount)
4. mounted filesystems by process (mount -n pid)
5. network connections (net)

All that is required to enable a core dump at the moment of crash is to give the kernel the parameter crashkernel=128M@16M.
This is set in the /boot/grub/grub.conf file like this:

title Red Hat Enterprise Linux Server (2.6.18-92.1.17.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-92.1.17.el5 ro root=LABEL=/1 rhgb quiet crashkernel=128M@16M
initrd /initrd-2.6.18-92.1.17.el5.img


To read the core dump, a machine (not necessarily the one that produced the dump) needs to have the matching kernel installed (not necessarily booted), as well as the matching kernel-debuginfo and kernel-debuginfo-common packages.
My experiment consisted of the following
1. close adnode44
2. enable crashkernel
3. configure core dump location in /etc/kdump.conf:

ext3 /dev/cciss/c0d0p3

4. reboot
5. force panic:

# echo "c" > /proc/sysrq-trigger

this created /var/crash/127.0.0.1-2010-02-09-13:46:12/vmcore
6. install kernel packages on testlab157 (a VM):

kernel-2.6.18-92.1.17.el5.x86_64.rpm
kernel-debuginfo-2.6.18-92.1.17.el5.x86_64.rpm
kernel-debuginfo-common-2.6.18-92.1.17.el5.x86_64.rpm
7. copy adnode44:/var/crash/127.0.0.1-2010-02-09-13:46:12/vmcore to testlab157
8. execute crash:
crash /usr/lib/debug/lib/modules/2.6.18-92.1.17.el5/vmlinux vmcore
KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.1.17.el5/vmlinux
DUMPFILE: vmcore
CPUS: 8
DATE: Tue Feb 9 13:45:28 2010
UPTIME: 00:03:18
LOAD AVERAGE: 0.18, 0.23, 0.09
TASKS: 271
NODENAME: adnode44.ad.my.com
RELEASE: 2.6.18-92.1.17.el5
VERSION: #1 SMP Wed Oct 22 04:19:38 EDT 2008
MACHINE: x86_64 (3000 Mhz)
MEMORY: 15.8 GB
PANIC: "SysRq : Trigger a crashdump"
PID: 9113
COMMAND: "bash"
TASK: ffff810418b1a7e0 [THREAD_INFO: ffff810418a62000]
CPU: 6
STATE: TASK_RUNNING (SYSRQ)
crash>

0 Comments:

Post a Comment

<< Home