Linux/amd64的调用规则
为了方便调试,笔者在PC机上直接调试简单的内存相关的应用;这需要了解x86_64 的ABI,该文档对函数调用制定了一些限定规则,其中重要的有两点,第一点是参数的传参(非浮点参数):
User-level applications use as integer registers
for passing the sequence: %rdi, %rsi, %rdx, %rcx, %r8 and %r9.
The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.
在函数入口处加入gdb 断点,可以访问这些寄存器得到函数的入参。第二点是关于栈指针的对齐要求,在本文后面编写的汇编代码中要注意:
The end of the input argument area shall be aligned on a 16 (32, if __m256 is
passed on stack) byte boundary. In other words, the value (%rsp + 8) is always
a multiple of 16 (32) when control is transferred to the function entry point. The
stack pointer, %rsp, always points to the end of the latest allocated stack frame.
使用GDB获取所有malloc 调用的信息
对于glibc中的ptmalloc 模块,其提供的malloc /calloc /realloc /free 等常用的内存分配释放的函数,实际上以__libc_ 为前缀的函数的别名:
strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc)
strong_alias (__libc_free, __free) strong_alias (__libc_free, free)
strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc)
strong_alias (__libc_memalign, __memalign)
weak_alias (__libc_memalign, memalign)
strong_alias (__libc_realloc, __realloc) strong_alias (__libc_realloc, realloc)
strong_alias (__libc_valloc, __valloc) weak_alias (__libc_valloc, valloc)
注意,动态链接器ld.so 也提供了malloc 等函数,但导出的是弱符号;这些内存分配函数在动态链接器加载libc.so 之前使用(或者在链接器内部使用)。因此笔者常常通过带有__libc_ 前缀的函数名查找这些内存分配的函数,以确定是ptmalloc 模块提供的函数:
$ nm -D --defined-only /usr/lib/x86_64-linux-gnu/ld-2.31.so | grep \
-e malloc -e calloc -e realloc -e free
000000000001d5b0 W calloc
0000000000019250 T _dl_exception_free
000000000001d5f0 W free
000000000001d490 W malloc
000000000001d7e0 W realloc
GDB提供了commands,在触发断点时可以自动执行GDB命令。当该命令列表中包含continue 命令时,GDB会继续将调用进程恢复运行,不需要人工干预。笔者获取所有malloc 调用的操作如下:
$ gdb -q ./multi-thread-memory
Reading symbols from ./multi-thread-memory...
(gdb) break main
Breakpoint 1 at 0x17e7: file multi-thread-memory.c, line 283.
(gdb) run
Starting program: /home/yejq/program/blogs/20210912/multi-thread-memory
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main (argc=1, argv=0x7fffffffde98) at multi-thread-memory.c:283
283 {
(gdb) info address __libc_malloc
Symbol "__libc_malloc" is at 0x7ffff7e35260 in a file compiled without debugging.
(gdb) break *0x7ffff7e35260
Breakpoint 2 at 0x7ffff7e35260: file malloc.c, line 3023.
(gdb) commands 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>info register rdi rsp
>x/1xg $rsp
>bt 4
>continue
>end
(gdb) set pagination off
(gdb) c
Continuing.
注意,上面禁用了pagination ;这是因为GDB会有大量的调试信息自动输出。笔者先在main 函数入口加断点,而未直接在__libc_malloc 函数处加断点,是因为此时libc.so 动态库未加载;当应用运行到main 函数入口时,libc.so 动态库就已加载了。此外,笔者没用使用break __libc_malloc 命令加断点,是因为该命令可能不会在__libc_malloc 函数第一条机器码加断点。下面是调试的结果:
Breakpoint 2, __GI___libc_malloc (bytes=1024) at malloc.c:3023
3023 malloc.c: No such file or directory.
rdi 0x400 1024
rsp 0x7fffffffd458 0x7fffffffd458
0x7fffffffd458: 0x00007ffff7e1ce84
#0 __GI___libc_malloc (bytes=1024) at malloc.c:3023
#1 0x00007ffff7e1ce84 in __GI__IO_file_doallocate (fp=0x7ffff7f846a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#2 0x00007ffff7e2d050 in __GI__IO_doallocbuf (fp=fp@entry=0x7ffff7f846a0 <_IO_2_1_stdout_>) at libioP.h:948
#3 0x00007ffff7e2c0b0 in _IO_new_file_overflow (f=0x7ffff7f846a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
thread[0] => allocMax: 2 MB, mbMax: 1 KB
thread[1] => allocMax: 4 MB, mbMax: 2 KB
thread[2] => allocMax: 8 MB, mbMax: 8 KB
thread[3] => allocMax: 16 MB, mbMax: 16 KB
thread[4] => allocMax: 32 MB, mbMax: 64 KB
thread[5] => allocMax: 64 MB, mbMax: 128 KB
thread[6] => allocMax: 128 MB, mbMax: 512 KB
thread[7] => allocMax: 256 MB, mbMax: 1024 KB
[New Thread 0x7ffff7d94700 (LWP 3022)]
[New Thread 0x7ffff7593700 (LWP 3023)]
[New Thread 0x7ffff6d92700 (LWP 3024)]
[Switching to Thread 0x7ffff7d94700 (LWP 3022)]
Thread 2 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=352) at malloc.c:3023
3023 in malloc.c
rdi 0x160 352
rsp 0x7ffff7d93e68 0x7ffff7d93e68
0x7ffff7d93e68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=352) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=320, ranfd=3) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdc20) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[New Thread 0x7ffff6591700 (LWP 3025)]
[New Thread 0x7ffff5d90700 (LWP 3026)]
[Switching to Thread 0x7ffff7593700 (LWP 3023)]
Thread 3 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=2032) at malloc.c:3023
3023 in malloc.c
rdi 0x7f0 2032
rsp 0x7ffff7592e68 0x7ffff7592e68
0x7ffff7592e68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=2032) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=2000, ranfd=5) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdc48) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[New Thread 0x7ffff558f700 (LWP 3027)]
[Switching to Thread 0x7ffff6591700 (LWP 3025)]
Thread 5 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=3624) at malloc.c:3023
3023 in malloc.c
rdi 0xe28 3624
rsp 0x7ffff6590e68 0x7ffff6590e68
0x7ffff6590e68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=3624) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=3592, ranfd=6) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdc98) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[New Thread 0x7ffff4d8e700 (LWP 3028)]
[Switching to Thread 0x7ffff6d92700 (LWP 3024)]
Thread 4 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=5200) at malloc.c:3023
3023 in malloc.c
rdi 0x1450 5200
rsp 0x7ffff6d91e68 0x7ffff6d91e68
0x7ffff6d91e68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=5200) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=5168, ranfd=4) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdc70) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[Switching to Thread 0x7ffff5d90700 (LWP 3026)]
Thread 6 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=51216) at malloc.c:3023
3023 in malloc.c
rdi 0xc810 51216
rsp 0x7ffff5d8fe68 0x7ffff5d8fe68
0x7ffff5d8fe68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=51216) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=51184, ranfd=7) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdcc0) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[New Thread 0x7fffeffff700 (LWP 3029)]
All memory threads created and running...
[Switching to Thread 0x7ffff558f700 (LWP 3027)]
Thread 7 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=56816) at malloc.c:3023
3023 in malloc.c
rdi 0xddf0 56816
rsp 0x7ffff558ee68 0x7ffff558ee68
0x7ffff558ee68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=56816) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=56784, ranfd=8) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdce8) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[Switching to Thread 0x7ffff4d8e700 (LWP 3028)]
Thread 8 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=15912) at malloc.c:3023
3023 in malloc.c
rdi 0x3e28 15912
rsp 0x7ffff4d8de68 0x7ffff4d8de68
0x7ffff4d8de68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=15912) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=15880, ranfd=9) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdd10) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[Switching to Thread 0x7fffeffff700 (LWP 3029)]
Thread 9 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=662080) at malloc.c:3023
3023 in malloc.c
rdi 0xa1a40 662080
rsp 0x7fffefffee68 0x7fffefffee68
0x7fffefffee68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=662080) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=662048, ranfd=10) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdd38) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[Switching to Thread 0x7ffff7d94700 (LWP 3022)]
Thread 2 "multi-thread-me" hit Breakpoint 2, __GI___libc_malloc (bytes=496) at malloc.c:3023
3023 in malloc.c
rdi 0x1f0 496
rsp 0x7ffff7d93e68 0x7ffff7d93e68
0x7ffff7d93e68: 0x00005555555554e9
#0 __GI___libc_malloc (bytes=496) at malloc.c:3023
#1 0x00005555555554e9 in memblock_create (memlen=464, ranfd=3) at multi-thread-memory.c:72
#2 thread_func (tharg=0x7fffffffdc20) at multi-thread-memory.c:235
#3 0x00007ffff7f93609 in start_thread (arg=<optimized out>) at pthread_create.c:477
[Switching to Thread 0x7ffff7593700 (LWP 3023)]
遗憾的是,该调试过程不会得到malloc 函数返回的内存指针。函数入口是固定的,但函数的返回之处可能有多个地址;若要得到malloc 的返回值,需要在多个地方加断点,断点的位置也不易确定。这个遗憾在本文后面的调试中仍将持续。一种可行的方案是在返回地址处加入临时断点tbreak ,并查看rax 寄存器的值;不过这种调试方式是不推荐的,不仅会严重影响应用的运行效率,而且不确定其可行性(需要大量地、动态地插入断点)。
为malloc 函数动态添加钩子函数
现有的一些调试工具(如DTrace等)可以实现malloc 等内存分配函数的返回值的跟踪、记录,笔者未曾实践过,本文暂不讨论。上面的GDB调试存在一个缺陷,它会(严重地)降低被调用应用的运行速度。对于一些大型的嵌入应用,应用的运行效率过于低下会导致运行异常。每一个断点的触发,会导致应用暂停,GDB调试器通过ptrace 系统调用读取相关信息,之后修改应用的地址空间(把断点机器指令替换为原来的指令),最后恢复应用的运行。这一系列操作虽是自动化的,但效率极低。
在笔者以往的文章中,使用LD_PRELOAD 环境预加载了钩子函数,替换了malloc /calloc 等函数。其优点是可以获得到内存分配的返回指针;但其要求钩住了函数符号是可见的——如何不使用LD_PRELOAD 预加载动态库的方法,钩住应用使用的一些(内部)函数?
一种可行的方案是在应用运行过程中,直接修改malloc 等函数入口的汇编指令,添加钩子函数。这些钩子函数因添加在函数入口,因此不能获取内存分配的返回指针。一般情况下,钩子函数会完全替代被钩住的函数;但该情况下,钩子函数在执行之后,仍需要跳转回原处继续执行;这给钩子的实现带来很大的难度。首先,笔者编写的钩子注入函数全部代码如下:
#ifndef MALLOC_INJECTION_H
#define MALLOC_INJECTION_H 1
#ifdef __cplusplus
extern "C" {
#endif
enum inj_type {
inj_func_malloc,
inj_func_calloc,
inj_func_realloc,
inj_func_free,
inj_func_end,
};
int malloc_inject(enum inj_type type);
int malloc_deject(enum inj_type type);
#endif
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <dlfcn.h>
#include "injection.h"
#define INJ_CODE_MAX_LEN 256
#define INJ_PAGE_SIZE 4096
struct injection_code {
const unsigned char * origin_code;
const unsigned char * new_code;
size_t origin_len;
size_t new_len;
const char * func_name;
const unsigned char * jmp_func;
};
extern void phony_malloc(void);
extern void phony_mallocp(void);
extern void phony_calloc(void);
extern void phony_callocp(void);
extern void phony_realloc(void);
extern void phony_reallocp(void);
extern void phony_free(void);
extern void phony_freep(void);
extern unsigned long phony_callback(unsigned long,
unsigned long, unsigned long);
extern unsigned long phony_callback(unsigned long arg0,
unsigned long arg1, unsigned long arg2);
static const struct injection_code inj_codes[] = {
[inj_func_malloc] = {
.origin_code = (const unsigned char *)
"\xf3\x0f\x1e\xfa"
"\x48\x8b\x05\x85\xdc\x14\x00"
"\x41\x54"
"\x55"
"\x48\x89\xfd"
"\x53"
"\x48\x8b\x00",
.new_code = (const unsigned char *) phony_malloc,
.origin_len = 21,
.new_len = 12,
.func_name = "__libc_malloc",
.jmp_func = (const unsigned char *) phony_mallocp,
},
[inj_func_calloc] = {
.origin_code = (const unsigned char *)
"\xf3\x0f\x1e\xfa"
"\x41\x55"
"\x48\x89\xf8"
"\x41\x54"
"\x55"
"\x53",
.new_code = (const unsigned char *) phony_calloc,
.origin_len = 13,
.new_len = 12,
.func_name = "__libc_calloc",
.jmp_func = (const unsigned char *) phony_callocp,
},
[inj_func_realloc] = {
.origin_code = (const unsigned char *)
"\xf3\x0f\x1e\xfa"
"\x41\x57"
"\x41\x56"
"\x41\x55"
"\x41\x54",
.new_code = (const unsigned char *) phony_realloc,
.origin_len = 12,
.new_len = 12,
.func_name = "__libc_realloc",
.jmp_func = (const unsigned char *) phony_reallocp,
},
[inj_func_free] = {
.origin_code = (const unsigned char *)
"\xf3\x0f\x1e\xfa"
"\x48\x83\xec\x18"
"\x48\x8b\x05\x99\xd6\x14\x00"
"\x48\x8b\x00",
.new_code = (const unsigned char *) phony_free,
.origin_len = 18,
.new_len = 12,
.func_name = "__libc_free",
.jmp_func = (const unsigned char *) phony_freep,
},
};
int malloc_inject(enum inj_type type)
{
void * glibc;
size_t off_set;
int itype, rval;
unsigned long faddr;
unsigned char * funcaddr;
const struct injection_code * injcode;
unsigned char opcode[INJ_CODE_MAX_LEN];
itype = (int) type;
if (itype < (int) inj_func_malloc ||
itype > (int) inj_func_free)
return 1;
injcode = &inj_codes[itype];
if (injcode->origin_len < injcode->new_len ||
injcode->origin_len >= INJ_CODE_MAX_LEN)
return 2;
glibc = dlopen("libc.so.6", RTLD_LAZY | RTLD_GLOBAL | RTLD_NODELETE);
if (glibc == NULL)
return 3;
funcaddr = (unsigned char *) dlsym(glibc, injcode->func_name);
if (funcaddr == NULL)
return 4;
faddr = (unsigned long) funcaddr;
off_set = (size_t) (faddr & (INJ_PAGE_SIZE - 1));
if (off_set != 0) {
faddr &= ~(INJ_PAGE_SIZE - 1);
funcaddr = (unsigned char *) faddr;
}
rval = mprotect(funcaddr, INJ_PAGE_SIZE * 2, PROT_READ | PROT_WRITE | PROT_EXEC);
if (rval != 0)
return 5;
memset(opcode, 0x90, sizeof(opcode));
memcpy(opcode, injcode->new_code, injcode->new_len);
*((unsigned long *) &(opcode[0x2])) = (unsigned long) injcode->jmp_func;
memcpy(funcaddr + off_set, opcode, injcode->origin_len);
rval = mprotect(funcaddr, INJ_PAGE_SIZE * 2, PROT_READ | PROT_EXEC);
if (rval != 0)
return 6;
return 0;
}
int malloc_deject(enum inj_type type)
{
void * glibc;
size_t off_set;
int itype, rval;
unsigned long faddr;
unsigned char * funcaddr;
const struct injection_code * injcode;
itype = (int) type;
if (itype < (int) inj_func_malloc ||
itype > (int) inj_func_free)
return 1;
injcode = &inj_codes[itype];
if (injcode->origin_len < injcode->new_len ||
injcode->origin_len >= INJ_CODE_MAX_LEN)
return 2;
glibc = dlopen("libc.so.6", RTLD_LAZY | RTLD_GLOBAL | RTLD_NODELETE);
if (glibc == NULL)
return 3;
funcaddr = (unsigned char *) dlsym(glibc, injcode->func_name);
if (funcaddr == NULL)
return 4;
if (memcmp(funcaddr, injcode->origin_code, injcode->origin_len) == 0)
return 0;
faddr = (unsigned long) funcaddr;
off_set = (size_t) (faddr & (INJ_PAGE_SIZE - 1));
if (off_set != 0) {
faddr &= ~(INJ_PAGE_SIZE - 1);
funcaddr = (unsigned char *) faddr;
}
rval = mprotect(funcaddr, INJ_PAGE_SIZE * 2, PROT_READ | PROT_WRITE | PROT_EXEC);
if (rval != 0)
return 5;
memcpy(funcaddr + off_set, injcode->origin_code, injcode->origin_len);
rval = mprotect(funcaddr, INJ_PAGE_SIZE * 2, PROT_READ | PROT_EXEC);
if (rval != 0)
return 6;
return -1;
}
unsigned long phony_callback(unsigned long arg0,
unsigned long arg1, unsigned long retaddr)
{
fprintf(stderr, "In [%s], return address: %p, arg0: %lx, arg1: %lx\n",
__FUNCTION__, (void *) retaddr, arg0, arg1);
fflush(stderr);
return 0;
}
其中,phony_callback 是钩子函数都会调用;通过retaddr 参数可以确定是哪一个钩子函数调用的,判断的代码如下:
if (retaddr == ((unsigned long) __libc_malloc + 0xc)) {
....
} else if (retaddr == ((unsigned long) __libc_calloc + 0xc)) {
....
} else if (retaddr == ((unsigned long) __libc_realloc + 0xc)) {
....
} else if (retaddr == ((unsigned long) __libc_free + 0xc)) {
....
} else {
}
修改phony_callback 函数,可以增加栈指针的获取功能,回溯函数栈上保存的函数返回地址可以得到哪些地址处调用了malloc /calloc 等函数。上面代码的偏移量0xc 是钩子函数的大小,这些钩子函数分别为:
phony_malloc:
mov rax, 0x1234567890 ; phony_mallocp
call rax
phony_calloc:
mov rax, 0x1234567890 ; phony_callocp
call rax
phony_realloc:
mov rax, 0x1234567890 ; phony_reallocp
call rax
phony_free:
mov rax, 0x1234567890 ; phony_freep
call rax
这四个钩子在注入时会被修改,上面的代码中,jmp_func 指定了写入rax 寄存器的跳转地址:
*((unsigned long *) &(opcode[0x2])) = (unsigned long) injcode->jmp_func;
这样做是必须的,因为带有偏移量的call 汇编指令跳转范围是有限制的,必需写入运行时的地址,通过call rax 来实现间接的跳转。这样四个钩子函数的定义是相同的,两条汇编指令的机器码长度为0xc 。完整的汇编代码如下:
BITS 64
GLOBAL phony_malloc:function
GLOBAL phony_mallocp:function
GLOBAL phony_calloc:function
GLOBAL phony_callocp:function
GLOBAL phony_realloc:function
GLOBAL phony_reallocp:function
GLOBAL phony_free:function
GLOBAL phony_freep:function
EXTERN phony_callback
SECTION .text
phony_all:
push rbp
mov rbp, rsp
push rdi
push rsi
push rdx
mov rdx, rcx
call phony_callback wrt ..plt
pop rdx
pop rsi
pop rdi
mov rsp, rbp
pop rbp
ret
phony_mallocp:
endbr64
sub rsp, 0x8
mov rcx, [rsp + 0x8]
call phony_all
mov rcx, [rsp + 0x8]
add rsp, 0x10
push r12
push rbp
mov rbp, rdi
push rbx
xor rax, rax
jmp rcx
phony_callocp:
endbr64
sub rsp, 0x8
mov rcx, [rsp + 0x8]
call phony_all
mov rcx, [rsp + 0x8]
add rsp, 0x10
push r13
mov rax, rdi
push r12
push rbp
push rbx
jmp rcx
phony_reallocp:
endbr64
sub rsp, 0x8
mov rcx, [rsp + 0x8]
call phony_all
mov rcx, [rsp + 0x8]
add rsp, 0x10
push r15
push r14
push r13
push r12
jmp rcx
phony_freep:
endbr64
sub rsp, 0x8
mov rcx, [rsp + 0x8]
call phony_all
mov rcx, [rsp + 0x8]
add rsp, 0x10
sub rsp, 0x18
xor rax, rax
jmp rcx
phony_malloc:
mov rax, 0x1234567890 ; phony_mallocp
call rax
phony_calloc:
mov rax, 0x1234567890 ; phony_callocp
call rax
phony_realloc:
mov rax, 0x1234567890 ; phony_reallocp
call rax
phony_free:
mov rax, 0x1234567890 ; phony_freep
call rax
上面的汇编代码调用了定义于C代码中的函数phony_callback ;因phony_callback 函数被编译为动态库,因此汇编代码为:
call phony_callback wrt ..plt
值得一提的是,钩子函数因替换了malloc /calloc 等函数入口的指令,在返回call rax 之后继续执行前,需要补充被替换的机器指令,且不能用ret 指令返回(上面是通过jmp rcx 来返回的),因为补充的指令会操作函数栈,而ret 指令需要从栈上弹出返回地址并跳转。这些操作类似汇编的杂技,通常编写汇编代码不会这样写。笔者编写了简单的测试应用,可以测试钩子是否可用:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "injection.h"
int main(int argc, char *argv[])
{
int ret;
void * ptrs[3];
ret = setvbuf(stdout, NULL, _IONBF, 0);
if (ret == 0)
ret = setvbuf(stderr, NULL, _IONBF, 0);
fprintf(stdout, "Buffer all disabled: %d\n", ret);
fflush(stdout);
ptrs[0] = NULL;
ptrs[1] = NULL;
ptrs[2] = NULL;
ret = malloc_inject(inj_func_malloc);
fprintf(stdout, "Malloc hooked: %d\n", ret);
ret = malloc_inject(inj_func_calloc);
fprintf(stdout, "calloc hooked: %d\n", ret);
ret = malloc_inject(inj_func_realloc);
fprintf(stdout, "realloc hooked: %d\n", ret);
ret = malloc_inject(inj_func_free);
fprintf(stdout, "free hooked: %d\n", ret);
ptrs[0] = malloc(100);
fprintf(stdout, "malloc(...): %p\n", ptrs[0]);
ptrs[1] = calloc(1, 100);
fprintf(stdout, "calloc(...): %p\n", ptrs[1]);
ptrs[2] = realloc(NULL, 100);
fprintf(stdout, "realloc(...): %p\n", ptrs[2]);
free(ptrs[0]); ptrs[0] = NULL;
free(ptrs[1]); ptrs[1] = NULL;
free(ptrs[2]); ptrs[2] = NULL;
ret = malloc_deject(inj_func_malloc);
fprintf(stdout, "Malloc unhooked: %d\n", ret);
ret = malloc_deject(inj_func_calloc);
fprintf(stdout, "calloc unhooked: %d\n", ret);
ret = malloc_deject(inj_func_realloc);
fprintf(stdout, "realloc unhooked: %d\n", ret);
ret = malloc_deject(inj_func_free);
fprintf(stdout, "free unhooked: %d\n", ret);
return 0;
}
编译和运行结果如下:
$ make
gcc -Wall -D_GNU_SOURCE -I. -fPIC -O1 -ggdb -c -o main.o main.c
gcc -Wall -D_GNU_SOURCE -I. -fPIC -O1 -ggdb -c -o injection.o injection.c
nasm -f elf64 -g -o test.o test.S
gcc -ggdb -shared -o libinjection.so -Wl,-soname=libinjection.so injection.o test.o -ldl
gcc -ggdb -o testinj main.o -L. "-Wl,-rpath=\$ORIGIN" -linjection
$ ./testinj
Buffer all disabled: 0
Malloc hooked: 0
calloc hooked: 0
realloc hooked: 0
free hooked: 0
In [phony_callback], return address: 0x7eff3aed626c, arg0: 64, arg1: 7ffe8d1b1320
malloc(...): 0x55a2c4f80340
In [phony_callback], return address: 0x7eff3aed7c9c, arg0: 1, arg1: 64
calloc(...): 0x55a2c4f803b0
In [phony_callback], return address: 0x7eff3aed626c, arg0: 64, arg1: 7ffe8d1b1320
realloc(...): 0x55a2c4f80420
In [phony_callback], return address: 0x7eff3aed685c, arg0: 55a2c4f80340, arg1: 7ffe8d1b1320
In [phony_callback], return address: 0x7eff3aed685c, arg0: 55a2c4f803b0, arg1: 55a2c4f80340
In [phony_callback], return address: 0x7eff3aed685c, arg0: 55a2c4f80420, arg1: 55a2c4f803b0
Malloc unhooked: -1
calloc unhooked: -1
realloc unhooked: -1
free unhooked: -1
可以用GDB查看被钩住的函数的反汇编:
(gdb) disassemble /r __libc_malloc
Dump of assembler code for function __GI___libc_malloc:
0x00007ffff7e53260 <+0>: 48 b8 c7 55 fc f7 ff 7f 00 00 movabs $0x7ffff7fc55c7,%rax
0x00007ffff7e5326a <+10>: ff d0 callq *%rax
0x00007ffff7e5326c <+12>: 90 nop
0x00007ffff7e5326d <+13>: 90 nop
(gdb) disassemble /r __libc_calloc
Dump of assembler code for function __libc_calloc:
0x00007ffff7e54c90 <+0>: 48 b8 ee 55 fc f7 ff 7f 00 00 movabs $0x7ffff7fc55ee,%rax
0x00007ffff7e54c9a <+10>: ff d0 callq *%rax
0x00007ffff7e54c9c <+12>: 90 nop
0x00007ffff7e54c9d <+13>: 48 83 ec 08 sub $0x8,%rsp
(gdb) disassemble /r __libc_realloc
Dump of assembler code for function __GI___libc_realloc:
0x00007ffff7e54000 <+0>: 48 b8 14 56 fc f7 ff 7f 00 00 movabs $0x7ffff7fc5614,%rax
0x00007ffff7e5400a <+10>: ff d0 callq *%rax
0x00007ffff7e5400c <+12>: 49 89 f4 mov %rsi,%r12
0x00007ffff7e5400f <+15>: 55 push %rbp
(gdb) disassemble /r __libc_free
Dump of assembler code for function __GI___libc_free:
0x00007ffff7e53850 <+0>: 48 b8 39 56 fc f7 ff 7f 00 00 movabs $0x7ffff7fc5639,%rax
0x00007ffff7e5385a <+10>: ff d0 callq *%rax
0x00007ffff7e5385c <+12>: 90 nop
0x00007ffff7e5385d <+13>: 90 nop
总结
本文记录了笔者为malloc /calloc 等函数添加钩子进行内存分配信息的获取的过程。其缺点是不能获取到内存分配的返回指针;相比于GDB调试,其优点是不会影响被调试应用的运行效率。此外,还需要熟悉汇编并编写可用的钩子函数。这种调试方法是不推荐的,建议先尝试DTrace 等调试工具;走投无路时可以考虑该方法。
|