Algorithms for mining sparse data sets

From Master Projects
Jump to: navigation, search

The main purpose of this project is to re-design and re-implement some classical machine learning and data mining algorithms to operate on sparse data sets. The algorithms we consider are: NaiveBayes, ID3, C4.5, M5Prime, LinearRegression, AdaBoost, Perceptron and Decision Stumps. We developed a toolbox that can process sparse data sets and apply selected algorithms to them. This toolbox is built in C. The name of the toolbox is Sparse Data Miner, SDM. We performed numerous tests comparing the speed, memory requirements and accuracy of our implementation against the well-known WEKA system. Also some tests in the field of recommender systems were made.