Improving real time data storing performance in a large scale event driven environment

From Master Projects
Jump to: navigation, search

has title::Improving real time data storing performance in a large scale event driven environment
status: finished
Master: project within::High Performance Distributed Computing
Student name: student name::Gergely Kovacs
Start start date:=2011/02/01
End end date:=2012/08/01
Supervisor: Thilo Kielmann
Second reader: has second reader::Guillaume Pierre
Company: has company::The Widget Company
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Media:Posternaam.pdf

Signature supervisor



The Widget Company is a full-service company creating, developing and distributing widgets (also called applications) for multiple platforms like web, mobile, TV and desktop. Together with SoftSenS they developed an intelligence system that is able to process online social activities in real time, enabling widgets to act in an intelligent manner. To improve the performance of the system storing the data for further processing is necessary.

The data to store is from several social media. In the past few years the information appeared in the social media significantly increased so the volume of the incoming information can be huge, even 1000 messages in a second. A fast and reliable distributed system is required, which is capable to store and analyze the information real time. Scalability is a main criteria, because later the data can increase even more.

There are different requirement profiles for storing the data with various expectations. To compare the data storing architecture styles performance test is required. The most relevant performance requirements are the running times for the different components. The main components are storing the data, building an index, and searching on the data.

The master project proposes to study real time storing and searching capabilities of the diverse data storing architectures such as relational databases, NoSQL databases and the full-text search engine libraries. In order to compare the advantages and disadvantages of the various systems a benchmark is necessary. The measurement focuses on the running times, the scalability and the reliability. With the benchmark different prototype solutions can be compared and based on the results new improved data storage can be made.

The data store with real time searching and fast indexing algorithms is a good base for the further processing like sentiment analysis and other statistical searching problems such as full text search.