背景说明
这个内容只是为了做个记录。 因为项目中有出现 coredump 的情况。
问题分析
先用 GDB 调起来。
[app@主机A bin]$ gdb PROGRAM core.31018
下面是一连串的 GDB 信息。
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
上面这段话的意思是,随便用,没毛病。
Reading symbols from /bin/PROGRAM...done.
[New LWP 31018]
[New LWP 31027]
[New LWP 31022]
[New LWP 31036]
[New LWP 31038]
[New LWP 31041]
[New LWP 31044]
[New LWP 31047]
[New LWP 31042]
[New LWP 31032]
[New LWP 31033]
[New LWP 31034]
[New LWP 31035]
[New LWP 31037]
[New LWP 31020]
[New LWP 31026]
[New LWP 31031]
[New LWP 31030]
[New LWP 31040]
[New LWP 31039]
[New LWP 31046]
[New LWP 31045]
[New LWP 31043]
[New LWP 31019]
[New LWP 31025]
[New LWP 31024]
[New LWP 31023]
[New LWP 31021]
[New LWP 31029]
[New LWP 31028]
上面是 LWP 编号,也就是我们常说的线程号,在 linux 中线程就是 LWP,有人说,LWP 不是线程,而是进程。因为是 light-weight process 嘛,肯定是进程,是的,又不是 thread,确实它是叫做轻量级进程。但是在 linux中,除了它其他的也没有线程了。看一下 WIKI 上说的:
In computer operating systems, a light-weight process (LWP) is a means of achieving multitasking. In the traditional meaning of the term, as used in Unix System V and Solaris, a LWP runs in user space on top of a single kernel thread and shares its address space and system resources with other LWPs within the same process. Multiple user level threads, managed by a thread library, can be placed on top of one or many LWPs - allowing multitasking to be done at the user level, which can have some performance benefits.
看了半天,也不知道所以然是啥对吧。那就对了,不用纠结,来跟我一起说,计较那么多概念干吗,这个东西就是线程!
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
上面是说 debug 用的是啥子库。
Core was generated by `PROGRAM -g 1 -i 3006 -u VM_16_46_centos -U /data/app/log/LOG -m 0 -A'.
Program terminated with signal 6, Ab
这里列出来了是怎么产生的 core。 这里有信号 6. 中止。 系统有多少信号呢? 大概是下面这么多。
信号 | 值 | 处理动作 | 发出信号的原因 | 标准 |
---|
SIGHUP | 1 | A | 终端挂起或者控制进程终止 | POSIX.1 | SIGINT | 2 | A | 键盘中断(如break键被按下) | POSIX.1 | SIGQUIT | 3 | C | 键盘的退出键被按下 | POSIX.1 | SIGILL | 4 | C | 非法指令 | POSIX.1 | SIGABRT | 6 | C | 由abort(3)发出的退出指令 | POSIX.1 | SIGFPE | 8 | C | 浮点异常 | POSIX.1 | SIGKILL | 9 | AEF | Kill信号 | POSIX.1 | SIGSEGV | 11 | C | 无效的内存引用 | POSIX.1 | SIGPIPE | 13 | A | 管道破裂:写一个没有读端口的管道 | POSIX.1 | SIGALRM | 14 | A | 由alarm(2)发出的信号 | POSIX.1 | SIGTERM | 15 | A | 终止信号 | POSIX.1 | SIGUSR1 | 30,10,16 | A | 用户自定义信号1 | POSIX.1 | SIGUSR2 | 31,12,17 | A | 用户自定义信号2 | POSIX.1 | SIGCHLD | 20,17,18 | B | 子进程结束信号 | POSIX.1 | SIGCONT | 19,18,25 | | 进程继续(曾被停止的进程) | POSIX.1 | SIGSTOP | 17,19,23 | DEF | 终止进程 | POSIX.1 | SIGTSTP | 18,20,24 | D | 控制终端(tty)上按下停止键 | POSIX.1 | SIGTTIN | 21,21,26 | D | 后台进程企图从控制终端读 | POSIX.1 | SIGTTOU | 22,22,27 | D | 后台进程企图从控制终端写 | POSIX.1 | SIGBUS | 10,7,10 | C | 总线错误(错误的内存访问) | SUSv2 | SIGPOLL | A | Sys | V定义的Pollable事件,与SIGIO同义 | SUSv2 | SIGPROF | 27,27,29 | A | Profiling定时器到 | SUSv2 | SIGSYS | 12,-,12 | C | 无效的系统调用(SVID) | SUSv2 | SIGTRAP | 5 | C | 跟踪/断点捕获 | SUSv2 | SIGURG | 16,23,21 | B | Socket出现紧急条件(4.2BSD) | SUSv2 | SIGVTALRM | 26,26,28 | A | 实际时间报警时钟信号(4.2BSD) | SUSv2 | SIGXCPU | 24,24,30 | C | 超出设定的CPU时间限制(4.2BSD) | SUSv2 | SIGXFSZ | 25,25,31 | C | 超出设定的文件大小限制(4.2BSD) | SUSv2 | SIGIOT | 6 | C | IO捕获指令,与SIGABRT同义 | | SIGEMT | 7,-,7 | | | | SIGSTKFLT | -,16,- | A | 协处理器堆栈错误 | | SIGIO | 23,29,22 | A | 某I/O操作现在可以进行了(4.2 BSD) | | SIGCLD | -,-,18 | A | 与SIGCHLD同义 | | SIGPWR | 29,30,19 | A | 电源故障(System V) | | SIGINFO | 29,-,- | A | 与SIGPWR同义 | | SIGLOST | -,-,- | A | 文件锁丢失 | | SIGWINCH | 28,28,20 | B | 窗口大小改变(4.3 BSD,Sun) | | SIGUNUSED | -,31,- | A | 未使用的信号(will be SIGSYS) | |
那上面的处理动作是什么意思呢?
_A 缺省的动作是终止进程 _ _B 缺省的动作是忽略此信号 _ _C 缺省的动作是终止进程并进行内核映像转储(dump core) _ _D 缺省的动作是停止进程 _ _E 信号不能被捕获 _ _F 信号不能被忽略 _
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-19.2.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcurl-7.29.0-25.el7.centos.x86_64 libgcc-4.8.5-4.el7.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-10.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 openldap-2.4.40-8.el7.x86_64 openssl-libs-1.0.1e-42.el7.9.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
上面这些是引用了一系列的东西来 debug这个 core 文件。要是换了个机器说不定 core 的。要是换了个机器说不定 core 的内容都看不到了呢(我猜的,我并没有那么闲,真的换个机器试一下)。
查看断点。
(gdb) bt
上面这条就是告诉你这个 core 文件 dump 点是在哪里,调用关系从下到上。这里面看到的问题点基本上都是底层的调用。而这些底层的调用也只是表现,最重要的是上层的变量是怎么传的。
闲着没事,看下所有线程的当前断点。
(gdb) info threads
Id Target Id Frame
30 Thread 0x7fa1f5365700 (LWP 31028) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
29 Thread 0x7fa1f4b64700 (LWP 31029) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
28 Thread 0x7fa1f8b6c700 (LWP 31021) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
27 Thread 0x7fa1f7b6a700 (LWP 31023) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
26 Thread 0x7fa1f7369700 (LWP 31024) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
25 Thread 0x7fa1f6b68700 (LWP 31025) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
24 Thread 0x7fa1f9b6e700 (LWP 31019) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23 Thread 0x7fa1edb56700 (LWP 31043) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
22 Thread 0x7fa1ecb54700 (LWP 31045) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
21 Thread 0x7fa1ec353700 (LWP 31046) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
20 Thread 0x7fa1efb5a700 (LWP 31039) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
19 Thread 0x7fa1ef359700 (LWP 31040) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
18 Thread 0x7fa1f4363700 (LWP 31030) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17 Thread 0x7fa1f3b62700 (LWP 31031) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
16 Thread 0x7fa1f6367700 (LWP 31026) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x7fa1f936d700 (LWP 31020) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
14 Thread 0x7fa1f0b5c700 (LWP 31037) 0x00007fa1feff09b3 in select () from /lib64/libc.so.6
13 Thread 0x7fa1f1b5e700 (LWP 31035) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x7fa1f235f700 (LWP 31034) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11 Thread 0x7fa1f2b60700 (LWP 31033) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
10 Thread 0x7fa1f3361700 (LWP 31032) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7fa1ee357700 (LWP 31042) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7fa1ebb52700 (LWP 31047) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7fa1ed355700 (LWP 31044) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7fa1eeb58700 (LWP 31041) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7fa1f035b700 (LWP 31038) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7fa1f135d700 (LWP 31036) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7fa1f836b700 (LWP 31022) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7fa1f5b66700 (LWP 31027) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7fa2009b0740 (LWP 31018) 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6
(gdb)
大部分都在 wait/timewait 之类的,也没啥毛病。
尝试打印下变量:
(gdb) p req
No symbol "req" in current context.
怎么没有符号表? 切一下frame。
(gdb) frame 29
(gdb) p req
$1 = (SVCINFO *) 0x7ffeb1c9e340
可以看到这个变量的定义和值。有人说,这玩意是地址怎么看? 其实有源码就什么都能看得到的。只是这里没有加载进来。 GDB 默认搜索当前目录,但是也没搜索到。 编译的时候是会记录源码位置的,但是因为这个主机上没有,所以看不到。
如果有兴趣玩的话,可以自己写一段把源码放一起,看看效果。
|