Adaptive query processing in Vector-on-Hadoop

From Master Projects
Revision as of 22:36, 8 October 2014 by Paboncz (talk | contribs) (New page: {{Projectproposal |Contact person=Peter Boncz |Master areas=Software Engineering, High Performance Distributed Computing |Project page=www.tpc.org/tpch/results/tpch_perf_results.asp?result...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


About Adaptive query processing in Vector-on-Hadoop

Description

Database systems are an important component in many IT architectures, and one of the most pressing challenges database query execution is in query optimization. Very often, query optimizers choose a bad plan, typically because the cost models that are used to estiate query plan cost have errors, are too simple, or the data turns out to have correlations.

Vectorwise-on-Hadoop is a parallel database system with very high performance that runs on Hadoop clusters. It has been develped by a CWI spin-off.

The idea in the project is to make query processing adaptive, such that *during* execution of the query, while the executor can observe the properties (amounts of tuples) passing through the query, the query plan can be changed/corrected/ We seek to do this e.g. in parallel&distributed aggregation and parallel&distributes joins.

The project will be advised by Peter Boncz, and co-advised by an Actian engineer. Actian is located at Science Park in Watergraafsmeer and usually pays an intern salary.

See www.cwi.nl/~boncz for previous examples of MSc theses with Vectorwise. They tend to be pretty good and often led to a publication. Looking for proficient students only.