MapReduce: An Investigation of Sorting Dan Granahan and Eric Tang Google has developed a programming model and corresponding distributed system called MapReduce. This system hides many of the complex details of developing applications for distributed systems such as machine failures, load balancing, and machine coordination and communication. This functional programming abstraction allows programmers to easily use the powerful resources of such a system without dealing with the design of complicated distributed systems. We aim to investigate sorting using the MapReduce programming model. At a high-level, it is almost trivial to write a MapReduce program to sort since the MapReduce model sorts by key automatically so an Identity reduce function is all that is needed. Under the hood of the MapReduce distributed architecture, there are many possible bottlenecks and areas of potential improvement, as with any complex system. We intend to experiment with some possible improvements to the MapReduce sorting framework. For instance, we will investigate alternative methods of portioning and sorting methods, such as a more distributed sort than what is currently implemented. We would optimistically like to find an improvement to the sorting speed but that is something we will need to investigate during the course of our project. We hope to gain more knowledge of the implementation and design of distributed computing. To some extent, we also hope to verify and repeat the results found in the MapReduce model, though, due to time limitations, we will not be able to implement all improvements they describe. In addition, we hope to find some improvement that could actually be implemented in a MapReduce system such as Google.s. The MapReduce programming model is an interesting and powerful abstraction for distributed systems and we hope to make some contribution to this area of study.