Difference between revisions of "Argos-like taint analysis in Qemu tcg"

From Master Projects
Jump to: navigation, search
Line 7: Line 7:
 
DTA for intermediary code representation.
 
DTA for intermediary code representation.
  
Dynamic Taint Analysis (or dynamic information flow tracking) is usually implemented on a per architecture basis such as x86,  ARM, SPARC or others. It consists of the instrumentation of the instruction set of the specific architecture in order to be able to track the information flow across the execution of a program. Data (bytes) processed by a program coming from untrusted sources (such as the network or a file on a memory stick) is used in computations and the untrusted bytes are tracked throughout the execution. The information about untrusted data is kept in memory maps which get updated by the instrumented instructions.
+
Dynamic Taint Analysis (or dynamic information flow tracking) is usually implemented on a per architecture basis such as x86,  ARM, SPARC or others. It consists of the instrumentation of the instruction set of the specific architecture in order to be able to track the information flow across the execution of a program. Data (bytes) processed by a program coming from untrusted sources (such as the network or a file on a memory stick) is used in computations and the untrusted bytes are tracked throughout the execution. The information about untrusted data is kept in memory maps which get updated by the instrumented instructions. An example DTA implementation is our own [http://few.vu.nl/argos/ Argos], which is now used around the world in honeypots and attack analysis engines.
  
 
The goal is to have an implementation which has little architecture dependent parts and is located at an intermediary layer. The intermediary representation typically has a lower number of instructions in its instruction set that need to be instrumented in order to be able to achieve information flow tracking.
 
The goal is to have an implementation which has little architecture dependent parts and is located at an intermediary layer. The intermediary representation typically has a lower number of instructions in its instruction set that need to be instrumented in order to be able to achieve information flow tracking.
Line 15: Line 15:
  
 
The steps involved:
 
The steps involved:
 +
 
- study how TCG generates the internal representation in Qemu
 
- study how TCG generates the internal representation in Qemu
 +
 
- study the interface of TCG with one Qemu guest architecture (ARM or x86)
 
- study the interface of TCG with one Qemu guest architecture (ARM or x86)
 +
 
- implement the data structures required to keep taint information
 
- implement the data structures required to keep taint information
 +
 
- implement the core flow tracking policies on the intermediary representation
 
- implement the core flow tracking policies on the intermediary representation
 +
 
- implement the instrumentation code for one guest architecture
 
- implement the instrumentation code for one guest architecture

Revision as of 10:19, 20 November 2012


About Argos-like taint analysis in Qemu tcg


Description

DTA for intermediary code representation.

Dynamic Taint Analysis (or dynamic information flow tracking) is usually implemented on a per architecture basis such as x86, ARM, SPARC or others. It consists of the instrumentation of the instruction set of the specific architecture in order to be able to track the information flow across the execution of a program. Data (bytes) processed by a program coming from untrusted sources (such as the network or a file on a memory stick) is used in computations and the untrusted bytes are tracked throughout the execution. The information about untrusted data is kept in memory maps which get updated by the instrumented instructions. An example DTA implementation is our own Argos, which is now used around the world in honeypots and attack analysis engines.

The goal is to have an implementation which has little architecture dependent parts and is located at an intermediary layer. The intermediary representation typically has a lower number of instructions in its instruction set that need to be instrumented in order to be able to achieve information flow tracking.

An example of an intermediary representation is TCG (tiny code generator) which is included in the Qemu emulator. TCG is a code generator which translates basic blocks (pieces of executable code) from a code representation to another. In Qemu, TCG translates guest code representation into host code representation which gets executed. TCG keeps an intermediary representation of instructions from the guest OS which is architecture independent. The implementation of the DTA core is to be located at this level and together with a small architecture dependent part for one supported architecture (such as ARM or x86) would complete the DTA system.

The steps involved:

- study how TCG generates the internal representation in Qemu

- study the interface of TCG with one Qemu guest architecture (ARM or x86)

- implement the data structures required to keep taint information

- implement the core flow tracking policies on the intermediary representation

- implement the instrumentation code for one guest architecture