x86 to LLVM Bitcode Translation Framework: McSema
x86 to LLVM Bitcode Translation Framework
McSema lifts x86 and amd64 binaries to LLVM bitcode modules. McSema support both Linux and Windows binaries, and most x86 and amd64 instructions, including integer, FPU, and SSE operations.
McSema is separated into two conceptual parts: control flow recovery and instruction translation. Control flow recovery is performed using the
mcsema-disass tool, which uses IDA Pro to disassemble a binary file and produces a control flow graph. Instruction translation is performed using the
mcsema-lift tool, which converts the control flow graph into LLVM bitcode.
- Translates 32- and 64-bit Linux ELF and Windows PE binaries to bitcode, including executables and shared libraries for each platform.
- Supports a large subset of x86 and x86-64 instructions, including most integer, FPU, and SSE operations. Use
mcsema-lift --list-supported --arch x86to see a complete list.
- Runs on both Windows and Linux, and can translate Linux binaries on Windows and Windows binaries on Linux.
- Output bitcode is compatible with the LLVM 3.8 toolchain.
- Translated bitcode can be analyzed or recompiled as a new, working executable with functionality identical to the original.
- McSema runs on Windows and Linux and has been tested on Windows 7, 10, Ubuntu 14.04, and Ubuntu 16.04.
Why would anyone translate binaries back to bitcode?
- Binary Patching And Modification. Lifting to LLVM IR lets you cleanly modify the target program. You can run obfuscation or hardening passes, add features, remove features, rewrite features, or even fix that pesky typo, grammatical error, or insane logic. When done, your new creation can be recompiled to a new binary sporting all those changes. In the Cyber Grand Challenge, we were able to use mcsema to translate challenge binaries to bitcode, insert memory safety checks, and then re-emit working binaries.
- Symbolic Execution with KLEE. KLEE operates on LLVM bitcode, usually generated by providing source to the LLVM toolchain. Mcsema can lift a binary to LLVM bitcode, permitting KLEE to operate on previously unavailable targets.
- Re-use existing LLVM-based tools. KLEE is not the only tool that becomes available for use on bitcode. It is possible to run LLVM optimization passes and other LLVM-based tools like libFuzzer on lifted bitcode.
- Analyze the binary rather than the source. Source level analysis is great but not always possible (e.g. you don’t have the source) and, even when it is available, it lacks compiler transformations, re-ordering, and optimizations. Analyzing the actual binary guarantees that you’re analyzing the true executed behavior.
- Write one set of analysis tools. Lifting to LLVM IR means that one set of analysis tools can work on both the source and the binary. Maintaining a single set of tools saves development time and effort, and allows for a single set of better tools.
Source: CyberPunk @ May 1, 2017 at 11:31PM