18845 Group Project abstract Title: MapReduce Application and Evaluation Author: Jipeng Han (jipengh) and Xia Wu (xiaw) Dealing with dataset in Machine learning is a big problem and a time consuming work. Training and learning in machine learning require processing large numbers of data, which often take several days to complete, even the smallest dataset may need several hours to finish. From the research, we found machine learning algorithms which fit the Statistical Query model can be written in a summation and thus can be easily parallelized. The aim of the project is to speed up some machine learning algorithm using MapReduce programming model proposed by Google, which is used for parallel computation of large scale of dataset. In this project, we plan to implement these learning algorithm using Hadoop: K-means, Naive Bayes, SVM and Linear Regression. And compare the results with the result of traditional sequential computing, specifically, we expect to show that the result of these two approaches is the same but the performance is highly improved. We will also compare the performance as the number of computing nodes changed in the MapReduce.