Difference between revisions of "Argos-like taint analysis in Qemu tcg"

From Master Projects
Jump to: navigation, search
(New page: {{Projectproposal |Contact person=Herbert Bos |Contact person2=Andrei Bacs |Fulfilled=No }} DTA for intermediary code representation. Dynamic Taint Analysis (or dynamic information flow ...)
 
Line 2: Line 2:
 
|Contact person=Herbert Bos
 
|Contact person=Herbert Bos
 
|Contact person2=Andrei Bacs
 
|Contact person2=Andrei Bacs
 +
|Master areas=High Performance Distributed Computing, Internet and Web Technology, Parallel and Distributed Computer Systems
 
|Fulfilled=No
 
|Fulfilled=No
 
}}
 
}}
 
 
DTA for intermediary code representation.
 
DTA for intermediary code representation.
  

Revision as of 10:15, 20 November 2012


About Argos-like taint analysis in Qemu tcg


Description

DTA for intermediary code representation.

Dynamic Taint Analysis (or dynamic information flow tracking) is usually implemented on a per architecture basis such as x86, ARM, SPARC or others. It consists of the instrumentation of the instruction set of the specific architecture in order to be able to track the information flow across the execution of a program. Data (bytes) processed by a program coming from untrusted sources (such as the network or a file on a memory stick) is used in computations and the untrusted bytes are tracked throughout the execution. The information about untrusted data is kept in memory maps which get updated by the instrumented instructions.

The goal is to have an implementation which has little architecture dependent parts and is located at an intermediary layer. The intermediary representation typically has a lower number of instructions in its instruction set that need to be instrumented in order to be able to achieve information flow tracking.

An example of an intermediary representation is TCG (tiny code generator) which is included in the Qemu emulator. TCG is a code generator which translates basic blocks (pieces of executable code) from a code representation to another. In Qemu, TCG translates guest code representation into host code representation which gets executed. TCG keeps an intermediary representation of instructions from the guest OS which is architecture independent. The implementation of the DTA core is to be located at this level and together with a small architecture dependent part for one supported architecture (such as ARM or x86) would complete the DTA system.

The steps involved: - study how TCG generates the internal representation in Qemu - study the interface of TCG with one Qemu guest architecture (ARM or x86) - implement the data structures required to keep taint information - implement the core flow tracking policies on the intermediary representation - implement the instrumentation code for one guest architecture