18-845 Group Research Project
Important dates
- Thu 2/28: GP Proposal deadline
- Tue 4/9 and Thu 4/11: Oral GP status reports
- Thu 5/2: Poster Presentations (Newell Simon Hall 3305, 12:30pm - 2:00pm)
- Sun 5/5 (11:59pm): GP written reports due
Summary of Group Projects
You are free to propose any idea you want for your group project. To
help you start thinking, here are the group projects from last year:
- Blake Scholl,
Distributed Computation of Performance-Aware Webmaps with HTTP Proxies
- Punitha Manavalan and Michael Wagner,
Robot Telemetry Manager
- Li-Chiou Chen & Xia Chen,
Evaluating Methods of Defending Distributed Denial of Service
Attacks
- Pratish Halady, Rahul Mangharam and Vishal Soni,
Location Based Wireless Network Services
- Nitin Gupta and Sandhya Gupta,
QoS in Web-Servers
- Aravind Pavuluri and Saumitra Das,
An Active Architecture for User-Profile Based Dynamic Web Caching
- Vijay Pandurangan and Mehmet Bakkaloglu,
PASISizing the Web
- Arif Ulaugac and Nawaportn Wisitpongphan,
Micro-Evaluation of the Flash Server
- David Oleszkiewicz and Ed Neto,
Distributed Anonymous Information Retrieval
- Joshua T. Anhalt,
Live Freenet or Die
- Laura Bowser,
Internet-based Damage-Assessment System
- Albert Song,
Advantages and Uses of One-Way Packet Loss Information
Sample topics
There are two basic approaches you can use for your group research
projects.
- Develop a new idea or a new twist on an existing idea, and then do
enough evaluation to serve as a proof of concept.
- Do an extensive evaluation of an existing idea that gives you
some insight into the advantages or disadvantages of that idea.
Here are some other ideas for topics (in no particular order):
- Internet host counting.
Perform your own Internet Domain Survey and compare it to
the published survey at www.isc.org.
- Congestion-aware routing. Develop a scheme that would
allow us to use BGP to change the routing tables in routers to improve
communication performance between different AS's (ISP's). (Blake
Scholl)
- Peer-to-peer systems.
There are a number of interesting unresolved issues for
peer-to-peer systems such as
Freenet. What are the performance bottlenecks? How anonymous are
Freenet objects really? How can Freenet peers be attacked and
defended? How to delete and update documents? How to invalidate
cached copies of updated documents? How to resolve the fundamental
tension between privacy and accountability? How to name objects? How
to search for objects in systems that are designed to make it
impossible to identify the origin server for any particular document?
Are there better approaches for caching and replicating objects
through the network.
- Peer-to-peer distributed keyword search. Develop a
peer-to-peer distributed search engine. Issues: Tradeoffs between
security and convenience, scaling outside the local area, searching on
metadata as well as contents.
- Peer-to-peer content publishing systems.
There are a number of interesting unresolved issues for
anonymous content publishing peer-to-peer systems such as
Freenet. What are the performance bottlenecks? How anonymous are
Freenet objects really? How can Freenet peers be attacked and
defended? How to delete and update documents? How to invalidate
cached copies of updated documents? How to resolve the fundamental
tension between privacy and accountability? How to name objects? How
to search for objects in systems that are designed to make it
impossible to identify the origin server for any particular document?
Are there better approaches for caching and replicating objects
through the network.
- Monitoring in a non-cooperative environment. Stefan
Savage at UCSD has developed a powerful technique for estimating
end-to-end bandwidths and packet-loss between hosts, where the remote
host is not cooperative in the sense that it would be impossible to
get an account on the machine (e.g., the Yahoo server). Savage's
approach is to exploit the behavior of TCP (which all servers must
implement to the specification) to gain information about the
effective bandwidth from the server to the client. For this project,
you might apply this general idea in some new context, or use Savage's
method for estimating packet loss in the context of a larger
application. For example, would it be possible to use Savage's
technique to build a client-side performance monitoring system that,
for a given HTTP transaction, would isolate the network transmission
time from the server processing time and determine which is the
bottleneck?
- Attaching geographical locations to IP addresses in the
context of a world-wide disaster monitoring system. When natural
disasters such as earthquakes occur, it is very difficult to make
accurate estimates of the geographical extent and severity of the
damage because the communication infrastructure disappears. However,
hosts that provide Internet services are always turned on, so the lack
of response from those systems contains some information. The idea is
to build a system that would sample hosts in earthquake prone regions
on a continual basis. Each sample is a bit vector, one bit per host.
Some interesting issues are assigning IP addresses to geographical
locations, developing a hierarchical scheme to aggregate response bit
vectors, and developing analysis techniques of the response bit
vectors to distinguish transients (e.g. localized power failures or
normal host downtime) from real damage.
- Network topology discovery and bandwidth monitoring with
incomplete and incompatible SNMP information. Network monitoring
tools such as the CMU Remos system use
information from SNMP daemons running on routers and hosts to discover
network topologies (bridges, routers, and links) and to predict the
available link bandwidths. However, the SNMP information is sometimes
incomplete (because of misconfigured routers) or unavailable (because
of proprietary and non-compatible SNMP daemons or heavy link
traffic). Our current systems assume perfect information, and thus
fail in the presence of incomplete or incompatible information. The
idea here would be to develop some techniques to improve this
situation and then implement them in Remos.
- Scalable search engines. Current search engines are not
scalable because all of the work is done at the remote server site. As
a result, the servers are not able to perform much computation when
they satisfy a request, typically a quick lookup of an inverted index.
As a result, single-word queries, which directly index the database on
the server, typically work pretty well, but multiple word queries can
fail miserably. The idea here is to investigate the following
question: Can we improve the performance of search engines
such as Google by doing some additional work on the client?
- Performance evaluation of content distribution networks.
Content distribution networks such as Akamai claim to significantly
reduce the latency of Web page downloads. The idea here is to
evaluate and quantify the performance benefits of the Akamai service.
When does it help? When does it not help?
- Defending against dDOS attacks. Distributed denial of
service attacks are somewhat scary and difficult to defend
against. The idea here would be to survey the existing approaches,
identify strengths and weaknesses, and propose and evaluate
some improvement.
- Investigate issues in caching dynamic content. The
unfortunate irony is that the high-volume sites that could benefit the
most from Web caching typically generate dynamic content, which is not
cached by existing Web caches. The idea here is to survey the existing
approaches for dynamic content and develop and evaluate an
alternative.
- Locality and load-balancing tradeoffs in cluster-based servers.
The idea here is to explore the conflicting
tradeoffs between locality and load-balancing in cluster-based servers,
identify weaknesses in existing approaches (e.g., LARD) and propose
and evaluate an alternative.
- Evaluation of performance issues in high-speed non-blocking servers.
The idea here is to build a high speed server that never blocks on I/O (such
as the Flash server from Rice) and then do extensive micro-evaluation
of its performance in order to understand the extent of the performance
gain that is possible from such non-blocking servers.
Last modified: Wed Jan 8 18:50:12 EST 2003