Reverse engineering (2)

From Master Projects
Jump to: navigation, search


About Reverse engineering (2)

  • This project has been fulfilled.
  • This project fits in the following Bachelor programs: {{#arraymap:|, |xXx|bachelorproject within::xXx|,}}
  • This project fits in the following masterareas: {{#arraymap:Software Engineering, High Performance Distributed Computing, Internet and Web Technology, Technical Artificial Intelligence, Parallel and Distributed Computer Systems|, |xXx|project within::xXx|,}}


Description

Reverse Engineering (2)

      • You may want to read this first: Rosetta's way Back to the Source [1]

Reverse engineering of complex software is extremely difficult. State of the art tools generally use advanced static analysis to see to what higher-level code a set of assembly instructions belongs. We take a different approach: we try to dig out the data structures first and do the code reversing later. After all, this is also how we are taught to write the code in the first place: design the data structures first and then the code will follow. Having the data structures, should be tremendously helpful in doing full reverse engineering.

Herbert Bos recently received a very prestigious ERC research grant for this idea and a group of people will be working on this for a few years [2]. The approach has already proven quite successul, as our paper on reverse engineering data structures was accepted for NDSS (one of the top venues in security (one of the authors is a master student!). Paper details:

- Howard: a dynamic excavator for reverse engineering data structures - NDSS'11 [3]

We have many ideas for projects in this area. Probably too many to list them all:

- Complement our dynamic analysis with static analysis. One idea here is to improve code coverage. But we also work on protecting legacy binaries, given the data structures that we uncover. The idea is as follows: assuming we know all the buffers (arrays) in a program, we could protect them from buffer overflows by instrumenting all accesses to the buffers and ensuring that they do not stray beyond the buffer boundaries.

- Make the analysis tool multithreaded. Our current system can handle only single-threaded applications. It would be a challenging project to make it multithreaded. Then analyze a complex program.

- Deeply analyse challenging programs (Skype?)

- Memory signatures. Target: observe the contents of memory and based on that try to recognize other types than just pointers. The idea is to cluster memory locations which have contents of a similar type (define type by yourself). The starting point: we could start by defining many possible types, e.g., zero, asci, printable, number >= 1 and <= 12, number >= 1 and <= 7, values never multiplied, values never increased, values only increased, and so on. Next we would try to see which categories dominate for a given memory location. And finally clustering: which memory locations seem to share the type (a hint here: when two values are compared, they can be probably clustered together).