Efficient and non-intrusive data distribution for web application clusters

From Master Projects
Jump to: navigation, search


has title::Efficient and non-intrusive data distribution for web application clusters
status: finished
Master: project within::Parallel and Distributed Computer Systems
Student name: student name::Michal Tekel
Dates
Start start date:=2010/03/08
End end date:=2010/10/08
Supervision
Supervisor: Thilo Kielmann
Second reader: has second reader::Guillaume Pierre
Company: has company::Hyves
Poster: has poster::Media:Media:Posternaam.pdf

Signature supervisor



..................................

Abstract

Various current large scale on-line applications and web sites which run on large clusters/clouds require to efficiently distribute data (usually) from one source to the groups of/all nodes. This concerns usually various application content updates, software and security patches deploying, backups and similar. It is very important that such data distribution does not interfere with the application responsiveness and performance, so that it remains unnoticable to the external clients. One possibility to achieve this is to simply slow the transfers down to some safe limit. This solution doesn’t however scale and it can then take several hours to deploy even smaller update/security patch, which is too long and unacceptable. In this thesis we have developed a scaleable and cluster environment sensitive file transfer framework, which brings down such transfers to just a couple of minutes, while still being non-intrusive. We have achieved this via collecting bandwidth and resource consumption feedback from single nodes as well as network routers. We have also taken additional care to use the available bandwidth efficiently and thus to further reduce the data delivery times. To achieve this, we have modified bittorrent p2p client and also extended the protocol itself. We have developed a distributed tracker which creates a dynamical, available bandwidth and network topology aware overlay, which is then used to determine the data transfer paths. Together with advanced techniques like work stealing, we have achieved global workload distribution among the recipient set of nodes. The framework itself is fault tolerant and is able to handle and react to node failures in real time. We evaluate and compare our framework against other file transfer methods for clusters.