MapReduce: Efficient Graph Algorithms
Xianzhe Liang, Xuan Zhang
MapReduce is a programming model and distributed computing system
developed by Google. The framework is designed to serve large data set
on cluster of computers. It hides many of the complex details of
distributed system such as machine failure, load balancing and so
on. MapReduce is consisted with map and reduce parts, which make it
easy to scale on large number of nudes.
We aim at investigating algorithms of graph theory on top of
MapReduce. Graph is widely used in large web application. Hyperlink
structure of the web can be viewed as a graph, so that paging ranking
and other usage can be analyzed. Social network is also a good
example, as well as a lot of topics such as breadth first search.
In the scope of this course, we hope to focus on how to design
efficient graph algorithms in MapReduce. d like to investigate a
general graph problem choosing from BFS, shortest paths, random walk,
clustering and so on. We will apply different design patterns to the
algorithm and make evaluation. We will build our system with Hadoop
and learn how number of nodes and configuration affect the result. To
sum up, the problem we are interest is: how to design efficient graph
algorithms on top of MapReduce and how the performance is affected by
the distributed manner.