Reverse Engineering

From Master Projects
Jump to: navigation, search


About Reverse Engineering

  • This project has not yet been fulfilled.
  • This project fits in the following Bachelor programs: {{#arraymap:|, |xXx|bachelorproject within::xXx|,}}
  • This project fits in the following masterareas: {{#arraymap:Technical Artificial Intelligence, AI and Communication, Internet and Web Technology, High Performance Distributed Computing, Computer Science and Communication, Parallel and Distributed Computer Systems|, |xXx|project within::xXx|,}}


Description

We are currently initiating a large project on reverse engineering of binary programs. Most of the commercial software industry assumes that compilation (the translation of source code to binary code), is irreversible in practice for real applications. The research question for our project is whether this irreversibility assumption is reasonable. For this larger project, we are interested in students in the area of low-level systems, but we also have projects for students with knowledge in the area of machine learning.

At first sight, the task of reverse engineering complex software is hopeless (which is why companies like Microsoft have been able to build on the irreversibility assumption). Binary code lacks most of the features that enable programmers to understand well-structured programs in a higher-level language. Let us consider some of the concepts that are lacking at the binary level:

  • data structures (e.g., knowledge that a certain structure consists of 2 integers, 4 bytes and a pointer);
  • higher-level code (e.g., that a sequence of instructions corresponds to a tree walk algorithm);
  • semantics (e.g., meaningful symbol names or knowledge of the purpose of functions and variables).

Our methodology revolves around recovering data structures, code and semantic information iteratively. Specifically, we will recover data structures not (just) by statically looking at the instructions in the binary program, but mainly by observing how the data is used and by employing methods such as meta-data tagging and machine learning. By uncovering the data structures and as much of the semantics as possible, we expect to be able to succeed in reverse engineering complex software, where other systems to date have failed. Our first step is to dig out data structures.

Specific master projects will depend on the student's expertise and the match with the program.