Adaptive query processing in Vector-on-Hadoop

From Master Projects
Jump to: navigation, search


About Adaptive query processing in Vector-on-Hadoop

Description

|free text=}} Database systems are an important component in many IT architectures, and one of the most pressing challenges database query execution is in query optimization. Very often, query optimizers choose a bad plan, typically because the cost models that are used to estiate query plan cost have errors, are too simple, or the data turns out to have correlations.

Vectorwise-on-Hadoop is a parallel database system with very high performance that runs on Hadoop clusters. It has been develped by a CWI spin-off.

The idea in the project is to make query processing adaptive, such that *during* execution of the query, while the executor can observe the properties (amounts of tuples) passing through the query, the query plan can be changed/corrected/ We seek to do this e.g. in parallel&distributed aggregation and parallel&distributes joins.

The project will be advised by Peter Boncz, and co-advised by an Actian engineer. Actian is located at Science Park in Watergraafsmeer and usually pays an intern salary.

See www.cwi.nl/~boncz for previous examples of MSc theses with Vectorwise. They tend to be pretty good and often led to a publication. Looking for proficient students only.