DisCo: Combining Disassemblers for Improved Performance
RAID 2021
开源:https://github.com/gsrishaila/DisCo-Combining-Disassemblers-for-Improved-Performance/tree/main/SourceCode
abstract
Malware infects thousands of systems globally each day causing millions of dollars in damages.
Which disassembler should a maliware analyst choose in order to get the most accurate disassembly and be able to detect, analyze and defuse malware quickly?
There is no clear answer to this question: (a) the performance of disassemblers varies across configurations, and (b) most prior work on disassemblers focuses on benign software and the x86 CPU architecture.
In this work, we take a different approach and ask:why not use all the disassemblers instead of picking one?
We present xxx, a novel and effective approach to harness the collective capability of a group of disassemblers combining their output into an ensemble consensus.
We develop and evaluate our approach using 1760 IoT malware binaries compiled with different compiled with different compilers and compiler options for the ARM and MIPR architectures.
First, we show that xxx can combine the collective wisdom of disassemblers effectively.
For example, our approach outperforms the best contributing disassembler by as much as 17.8 in the F1 score for function start identification for MIPS binaries compiled using GCC with O3 option.
Second, the collective wisdom of the disassemblers can be brought back to improve each disassembler.
As a proof of concept, we show that byte-level signatures identified by xxx can improve the performance of Ghidra by as much as 13.6 in terms of the F1 score.
Third, we quantify the effect of the architecture, the compiler, and the compiler options on the performance of disassemblers.
Finally, the systematic evaluation within our approach led to a bug discovery in Ghidra v9.1, which was acknowledged by the Ghidra team.
introduction
主要讲述的是一篇联合多反汇编器增加反编译准确率
二进制反汇编是恶意软件防御中必不可少的工具,2017年wannacry和petya勒索席卷全球时,恶意软件分析师需要快速的了解它们的传播机制和操作模式以便控制他们。
Which disassembler should a malware analyst choose for a rapidly-spreading malware binary to get the most accurate results? This is the question that motivates our work.
作者在文章中重点讨论MIPS和ARM架构的恶意软件,反汇编程序的性能会因为二进制文件的类型而异,而二进制文件可以通过以下各种方式创建:
- 编译器
- 编译器优化标志
- 目标CPU架构
这些变化都会导致二进制文件中的汇编代码出现显著差异。
作者提出一种组合反汇编器的有效方法:
- 评估每个反汇编器创建训练数据的有效性,使用各种配置编译恶意软件的源代码,并将每个反汇编的输出与真实情况对比
- 创建和训练机器学习方法来将各个输出转化为一个组合输出,使用神经网络创建一个堆叠集成,采用以下输入:每个反汇编的输出,从实际二进制文件中选择的数据
作者考虑MIPS和ARM两种配置,两种不同的编译器GCC和Clang,以及五个编译器优化级别,且作者关注函数启动标识度量,这是一个关键的反汇编度量
correctly identified function starts(CFS)正确识别函数起始
指令和函数开始识别被认为是评估反汇编程序的两个基本指标,因为它们产出其他指标的输出,如控制流和调用图。
使用1760个Iot二进制文件来训练和评估,这些二进制文件是从88个具有各种配置选项的iot程序编译而来,
作者考虑了五个基线反汇编程序
五个优化级别:O0、O1、O2、O3 和 Os 两个架构:ARM 5 和MIPS R3000 编译器:GCC 5.5.0 Clang 9.0
剥离二进制和非剥离二进制的区别:编译后的二进制文件会包含程序执行所不需要的调试信息,而是用于调试和查找程序中的问题或错误,剥离的二进制文件是一个没有这些调试符号的二进制,更小,很难反汇编。
如图是一个激励性的例子,IDA可以识别241个额外且真实的函数,Ghidra可以识别352个额外且真实的函数,将两者有效组合可以提升性能。
背后的直觉是:不同的反汇编器应该具有互补的功能,因为它们使用不同的算法来识别二进制文件结构。
反汇编器可以看到不同的东西,结合基线反汇编是有益的如果每个恢复二进制文件不同的部分。
合并结果时应谨慎,如果进行简单的联合处理则并不一定会保证是最佳性能。多数投票法也会导致召回率低下,因为某些功能的启动仅由少数汇编程序识别。
创建ground truth:使用源代码开始,-g编译,以便将更丰富的调试信息附加到生成的二进制文件中,使用DWARF库识别函数起始地址,以此创建ground truth。
输入:
- ARM架构:函数开始位置前后四个指令,即16字节
- MIPS架构:函数开始位置后两个指令,即8字节
加上每个反汇编器的投票,占位5
模型配置:
model = Sequential()
model.add(Dense(2053, input_dim=2053, activation='relu')) # 2053 to 2024 #change back from 2054 to 2053
#8*256 +5 = 2053
model.add(Dense(1000, activation='relu')) #added in extra layer
model.add(Dense(250, activation='relu')) #added in extra layer
#model.add(Dense(60, activation='relu')) #added in extra layer
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
效果展示:
启发
很小的一个点,就是利用神经网络结合5个不同的反汇编器,构建ground truth进行模型的训练预测。效果比单个反汇编器要好。但这一个点明显不够,因此只占了5页篇幅,后面作者用详细的篇幅介绍了由此项工作引申出来的一些观察和发现,这是非常重要的。 甚至可以说,如果没有后续的分析,这篇文章是不可能发一篇B类会议的。DisCo: Combining Disassemblers for Improved Performance
|