A P2P Network for Content Distribution Bruce Kao and Eric Li The goal of our project is to design a P2P network that enables easy distribution of high quality content. While going through current designs of different types of overlays, we came across Pastry, and thought of a modification to its underlying structure that would give us a network with the properties we want. In Pastry, each node in the network has a unique nodeId (typically a hash of the node's IP). When presented with a (message, key) pair, a node will route the message to the node in the network with the nodeId that is numerically closest to the key using a prefix matching routing algorithm that is O(log N) where N is the number of nodes in the network. We propose the following changes to this design: - Instead of each host being a node, each file that a peer wants to share will be its own node in the network. - The nodeId will be a hash of the file contents plus a numeric tag. - Files with the same hash will have the same first part of the nodeId but different tags. For example, the 1st foo.mp3 to enter the network will have nodeId ABC001, and the 2nd will have nodeId ABC002, and so on. - New nodes for files that already exist in the network will have tags that are 1 greater than the largest tag number. - Each node will maintain a network routing table that is the same as the Pastry one as well as a local routing table that contains all the nodes that are on the same host as this node. When routing a message, the node first checks its local table to see if the message could be routed to a local node, which helps minimize network traffic. Some of the advantages of our design are described below: - Using file content hashes directly makes searching much simpler and more direct than keyword searches. A list of hashes would be posted on various web sites interested in helping publish certain content. Users would search on something like Google to find a list of web sites that post hashes of files that they want. This lets us leverage the power of existing search engines to take care of all the keyword searching. - It also helps maintain the integrity of the content in the network to a degree by requiring content publishers to host web sites with content hashes. This is how the popular filesharing protocol BitTorrent works, and it has shown that this is fairly effective. - Since multiple copies of files with the same content will be close neighbors in the overlay's structure, availability and fault-tolerance are naturally built in. - We could also have peers download simultaneously from multiple clients to increase transfer rate. This would be quite easy due to the clustering structure. - Peers that share a lot of files will naturally be doing more of the routing in the network. This is advantageous because these peers usually also have more bandwidth to spare. That is the basic idea behind our project. There is another project called PAST, which is implemented on top of Pastry, that uses file name hashes as the nodeId in the network. But that project differs significantly in that it copies files entering the network onto multiple nodes with nodeId's close to that file's fileId. This creates unnecessary network traffic in moving the content around. At this time, we're unaware of any other projects with ideas similar to ours.