EISSEC: Extracting Instruction Semantics Via Symbolic Execution of Code Generators
Introduction
Translating low-level machine instructions into higher-level intermediate language (IL) is one of the central steps in many binary analysis and instrumentation systems. Existing systems build such translators manually. As a result, it takes a great deal of effort to support new architectures. Even for widely deployed architectures, full instruction sets may not be modeled, e.g., mature systems such as Valgrind still lack support for AVX, FMA4 and SSE4.1 for x86 processors. To overcome these difficulties, we propose a novel approach that leverages knowledge about instruction set semantics that is already embedded into modern compilers such as GCC. In particular, we present a learning-based approach for automating the translation of assembly instructions to a compiler's architecture-neutral IL. We present an experimental evaluation that demonstrates the ability of our approach to easily support many architectures (x86, ARM and AVR), including their advanced instruction sets.
Download
A tarball of the entire system is available. A README file is available. The best way to try the package is via a Docker container, which makes pre-installing all its required dependencies easy. So a user need not install any package to test the software. Before one uses EISSEC Docker image, please install Docker by referring to https://docs.docker.com/engine/installation. Once Docker is installed, steps below pull EISSEC Docker image, create a container from it and enable terminal access to the container.
$ docker pull seclab/eissec $ docker create -it --name eissec seclab/eissec $ docker exec -it eissec bashOnce inside the container, execute following commands to build the binaries and test them on x86 code generator.
# cd eissec # source env_setup.sh # make # cd test/x86 # ./testmodel dummy.c # ./fullmodel dummy.c
Acknowledgements
This work was supported in part by NSF grants CNS-1319137, CNS-0831298, an AFOSR grant FA9550-09-1-0539, and an ONR grant N000140710928.