18-845 Group Project (GP)
Important dates
- Thu Feb 22 (1:30pm): GP abstracts due
- Email your abstract to droh@cs.cmu.edu
- Thu Mar 22 (1:30pm): GP mid-term oral reports due
- Make an appointment to meet with your TA.
- Thu Apr 26 (1:30pm) : GP reports due
- Email your report to droh@cs.cmu.edu
- Tue May 1 (1:30pm) : GP reviews due
- Email your reviews (one file per review) to droh@cs.cmu.edu
- Thu May 3 (1:30pm): GP poster session, Location HH 1112
- Prepare eight or nine letter sized sheets
- Sun May 6 (11:59pm): Final camera-ready GP reports due
- Email your report to droh@cs.cmu.edu
1. Instructions for Preparing Your GP abstracts
-
The abstract must contain the following parts:
- Title
- Authors
- One or two paragraphs describing what you aim to do
- A paragraph describing the expected result (What do you hope to learn? What
conclusions do you hope to draw?)
2. Instructions for Delivering Your GP Mid-Term
Oral Report
- Your TA will meet with you March 22 or March 23 to learn about your
progress on the project and to give you guidance and encouragement.
This is an informal face-to-face meeting. You are not required to
prepare any written material. Both group members must attend.
3. Instructions for Preparing and Submitting Your GP Reports
- Reports are limited to 10 pages.
- Reports must follow the official ACM Proceedings format. Use the
Word or Latex templates provided by ACM.
- Reports must include the following parts:
- Abstract - A paragraph that summarizes the problem and the results.
- Introduction -
Sets the context, describes the problem, and describes your solution.
- Description - One or more sections that
describes the problem and your approach to the solution in detail.
- Evaluation - A section that quantitatively evaluates your
ideas.
- Related work - Describes work related to your work.
- Summary and Conclusions - Summarize what you did and what interesting
things you learned from the project.
- Send your reports to droh@cs.cmu.edu.
4. Instructions for Reviewing Your Classmates' Reports
- Each project report will be reviewed (anonymously) by three of your
classmates. Thus, every student will review three reports.
- Your
instructors will evaluate the quality of your reviews as part of your
overall project score.
- Use the same review template you used for your critiques.
- Send each review as a separate file attachment to
droh@cs.cmu.edu.
5. Instructions for Preparing Your Poster Presentations
- I'll bring 32" x 40" foam core poster boards,
easels, and push pins for you to attach your
hardcopies to the poster board. Each group will have
their own poster board and easel.
- A poster board can hold 8 letter-size (8 1/2 x 11)
pieces of paper in landscape format, or 9 pages in
portrait format.
- You should prepare an 8 or 9 page presentation and bring
the hardcopies with you to class. One of these pages must
be a title page with the title of your project and your names.
- I'll order some good food and drinks for us.
Hints for Coming Up with a Topic
If you are currently working on a masters or Ph.D. thesis,
you are encouraged to to pursue a topic that is directly related to
your thesis research. It's OK (ideal in fact!) to use the group
project as a way to make progress on your thesis.
There are two basic approaches you can use for your group research projects.
- Develop a new idea or a new twist on an existing idea, and then do
enough evaluation to serve as a proof of concept.
- Do an extensive evaluation of an existing idea that gives you
some insight into the advantages or disadvantages of that idea.
Here are examples of some projects from previous years:
- 2007
- Michael Ho,
VM forking in Xen
- Hiroshi Isozaki and Syed Wasif Haider,
Implementing DELETE function in a trusted P2P network
- Orathai Sukwong,
User-Friendly Process Behavior Monitoring Tool
- Mark Hairgrove,
Capriccio for Multiprocessors
- Ajay Surie,
Enabling Opportunistic use of Transient Thin-Clients in Internet Suspend/Resume
- Ryan Frishberg and Keetaek Hong,
Examining performance characteristics of HTTP persistent connections and
pipelining and modifying the server scheduling algorithm to punish bad netizens
- Dan Granahan and Eric Tang,
MapReduce: An Investigation of Sorting
- Adrian Ng,
Activity Tracing Component: Preprocessing Traces in a Distributed Storage System
- Supriya Kher, Xiaohui Wang, and Shivani Kirubanandan,
C Implementation of Google's MapReduce: A Simplified Data Processing On Large Clusters
- Joseph C Laws
Efficient Memory Transport for ISR Systems
- 2006
- Chutika Udomsinn,
Determination of Replication Degree
- Himanshu Khurana and Clive Leung,
Evaluating methods of peer-to-peer keyword search
-
Benjamin Gilbert,
Xen and the Art of On-Demand Mobility
-
Sachin Kulkarni and Joseph Mou,
ISR Disk Performance with Xen and iSCSI
-
Sean O'Loughlin,
Attaching geographical locations to IP addresses in the context of a
world-wide disaster monitoring system
-
Rahul Iyer,
Parallel NFS implementation for an Object based Storage Backend
-
Theta Maxino,
Connecting Embedded and Enterprise Systems
-
Hwi (Paul) H. Cheong,
Implementation and evaluation of high-speed non-blocking server
-
Keisuke Ito,
Distributed Annotations Database Performance Measurements
-
Abhinav Mishra & Gesly George,
Exploring a peer-to-peer protocol for streaming content
- 2005
- Ian Kalinowski and Woon Ho Jung,
Pocket ISR: A Live USB-Bootable Version of Internet Suspend/Resume
- Supiti Buranawatanachoke and Kanat Tangwongsan,
A CAS Storage System for ISR
- Tudor Dumitras,
Estimating the Confidence in the QoS Guarantees of Internet Services
- Charles Fry,
Scalability of Fleet Object Store
- John Bucy,
Understanding OpenAFS performance
- Srikant Varadan, Guarav Mehta, and Gautam Kedia,
ISR Ballooning Study
- Bruce Kao and Eric Li,
A P2P Network for Content Distribution
- Pranav Goel and Rajat Venkatesh,
A CAS Storage System for ISR
- Andrew Widdowson,
A Live CD-Bootable Version of Internet Suspend/Resume
- 2004
- Prior to 2004
- Glenn Judd,
Improving 802.11 Access Point Selection: A Preliminary Investigation.
- Hen-I Yang and Anupam Dhanuka,
Performance Evaluation of Multiple Fields Matching Scheme.
- Gautaum Garg and Gene Soo,
Space-Time Codes.
- Punitha Manavalan and Michael Wagner,
Robot Telemetry Manager.
- Li-Chiou Chen & Xia Chen,
Evaluating Methods of Defending Distributed Denial of Service
Attacks.
- Pratish Halady, Rahul Mangharam and Vishal Soni,
Location Based Wireless Network Services.
- Nitin Gupta and Sandhya Gupta,
QoS in Web-Servers.
- Aravind Pavuluri and Saumitra Das,
An Active Architecture for User-Profile Based Dynamic Web Caching.
- Vijay Pandurangan and Mehmet Bakkaloglu,
PASISizing the Web.
- Arif Ulaugac and Nawaportn Wisitpongphan,
Micro-Evaluation of the Flash Server.
- David Oleszkiewicz and Ed Neto,
Distributed Anonymous Information Retrieval.
- Blake Scholl,
Distributed Computation of Performance-Aware Webmaps with HTTP Proxies.
- Thomas Madden and Christopher Palow,
Denial of Service Detector (DoSD).
- Shaheen Gandhi and Alan Wang,
Effects of Latency on Game State Prediction Methods.
- Asad Samar,
An Implementation of Capture Resilient Devices.
- Damon Becknel,
A Test of TFRC as an Effective Congestion Avoidance Protocol
- Devin Mahony and Noah Hafner,
The Personal Backup System
- James Casazza, Rory O'Neill, and James Grace,
Network Assisted Traffic Information in Automobiles
- Mike Schellhase,
Exploring Location-Based Messaging
- Mobeen Bajwa,
XVM: Providing Web Services Through Configuration
- Wei Zhang,
A Study of Internet Connectivity by Applying TCP-based Network Measurement Techiniques
Here are some possible ISR-related projects:
-
Exploring opportunistic state copying. Two basic techniques for
dealing with low bandwidth are prefetching (copying parcel state from
the server to the client before it is needed), and preputting (copying
dirty parcel state from the client to the server before the parcel is
checked in). The disadvantages of these approache are that they can
consume already scarce network bandwidth. This project would develop a
system for opportunistic prefetching and preputting that would
limit these operations to those times when the network is quiescient.
-
Adding CAS everywhere.The current version of ISR stores disk
state on both the content server and the local cache in a separate
2-level directory tree for each parcel, where the leaves of the tree
are encrypted 128 KB chunks. Another approach is to replace the
individual parcel trees on both the client and server with a single
flat content addressable storage (CAS) where the file name for each
chunk is its SHA1 hash. The goal is to reduce total storage
requirements and to simplify parcel management. The resulting system
would also be an ideal candidate for a p2p system based on consistent
hashing.
-
Combining ISR with portable storage devices.. One can imagine
a number of interesting ways to combine ISR and portable storage
devices.
- Pocket ISR. In this scenario, a complete client ISR system
(Knoppix Linux+VMWare+ISR) runs on a bootable 20-40 GB USB hard drive
that fits in a shirt pocket (Pocket ISR). You can turn any
machine into your personal ISR machine by plugging a Pocket ISR drive
into the USB port and then booting from USB. Parcels reside
permanently on the content server, but partial or complete parcel
state can be cached on the Pocket ISR disk.
- CD ISR. A variant of the Pocket ISR idea is to remaster
the Knoppix CD to include VMWare and the ISR client. When the CD
boots, it mounts a partition on the host hard drive, and then uses
this device to hold all of the local parcel state.
- LKA cache. A USB dongle could be used as a lookaside (LKA)
cache that the ISR client checks before going to the
network. Similarly, at checkin time the client could write dirty state
to the LKA instead of sending it over the network.
-
Dealing with dirty state and low bandwidth..
One problem with the current
ISR design is figuring out what to do with dirty state that is created
when there is little or no bandwidth back to the content server (such as
at home with a 30 KB/s cable modem upstream bandwidth). Portable storage
over sneaker net is one solution, but the ISR model needs to be
changed to handle the possibility. In particular, one would need the
ability to transport local client state to a new machine in the
middle of an ISR session.
-
Dealing with dirty state that isn't really dirty.. One of the
significant changes that the ISR model introduces is that some disk
state is dirtier than other state. For example, if an application on
the guest creates a temp file, writes to it, and then deletes it, then
dirty state is created on the disk and in the current ISR model, must
eventually be uploaded to the content server, even though to the guest
operation system the creation of that dirty state was a no-op. An
extreme example of this is defragmenting the disk! Similarly for data
that has been cached on disk by applications or operating system on
the guest. In this project, you will investigate the causes of
false-dirty data and develop techniques for minimizing the amount of
it that needs to be uploaded.
-
Building an ISR friendly guest file system. Another way
to deal with false-dirty data is at the file system level. This project
would modify an existing file system such as Linux's Ext2 fs to minimize
the size of compressed ISR parcels when they are transferred between
client and server. Techniques might include zeroing deleted files
and blocking disk sectors so that related sectors are in the same
chunk files.
-
Evaluating the effectivness of memory and disk ballooning. A
recent paper (Optimizing the Migration of Virtual Computers, OSDI '02)
describes an ISR-like system for migrating virtual machines. In the
paper, they introduce a technique, called ballooning, for reducing the
size of virtual machine memory images. The basic idea is to run a
helper process on the guest that allocates and zeros memory blocks,
relying on the page replacement mechanism of the guest OS to steal
pages that are not part of the working sets of other processes. The
result is a memory image with an increased number of highly
compressible zero pages. ISR currently does not do ballooning. The
evaluation in OSDI is a good start, but by no means convincing. The
project would design, implement, and evaluate ballooning in ISR on
representative workloads (designed and codified by the student team),
concluding with a firm recommendation about whether to include the
technique in a later ISR release.
-
Minimizing the size of downloaded memory images. Another way
to deal with false-dirty data is at the file system level. This project
would modify an existing file system such as Linux ext2 to minimize
the size of compressed ISR parcels when they are transferred between
client and server. Techniques might include zeroing deleted files
and blocking disk sectors so that related sectors are in the same
chunk files.
-
Software distributions in parcels. Unix machines in the CS and
ECE environments have access to shared software distributions in the
/usr/local directory. This mechanism allows each Unix
machine in the environment to have access to a uniform up-to-date set
of software programs that are updated nightly. Windows uses a
different mechanism, storing the software installation scripts on a
server, and relying on each user to download and install the software
they want for their system. The goal of this project is to develop
techniques for providing shared up-to-date software distributions
on Linux and/or Windows parcels.
-
Writing a Windows version of ISR. The ISR client software
currently runs on Linux 2.4 and 2.6 machines. This project would build
a version of ISR that runs on Windows. The main task would be writing
the equivalent of a Linux loadable kernel module (device driver) that
interposes VMWare requests to a non-existant raw disk and passes those
requests up to a user level process that interprets those requests.
Here are some other ideas for topics (in no particular order):
- Internet host counting.
Perform your own Internet Domain Survey and compare it to
the published survey at www.isc.org.
- Peer-to-peer systems.
There are a number of interesting unresolved issues for
peer-to-peer systems such as
Freenet. What are the performance bottlenecks? How anonymous are
Freenet objects really? How can Freenet peers be attacked and
defended? How to delete and update documents? How to invalidate
cached copies of updated documents? How to resolve the fundamental
tension between privacy and accountability? How to name objects? How
to search for objects in systems that are designed to make it
impossible to identify the origin server for any particular document?
Are there better approaches for caching and replicating objects
through the network.
- Peer-to-peer distributed keyword search. Develop a
peer-to-peer distributed search engine. Issues: Tradeoffs between
security and convenience, scaling outside the local area, searching on
metadata as well as contents.
- Peer-to-peer content publishing systems.
There are a number of interesting unresolved issues for
anonymous content publishing peer-to-peer systems such as
Freenet. What are the performance bottlenecks? How anonymous are
Freenet objects really? How can Freenet peers be attacked and
defended? How to delete and update documents? How to invalidate
cached copies of updated documents? How to resolve the fundamental
tension between privacy and accountability? How to name objects? How
to search for objects in systems that are designed to make it
impossible to identify the origin server for any particular document?
Are there better approaches for caching and replicating objects
through the network.
- Monitoring in a non-cooperative environment. Stefan
Savage at UCSD has developed a powerful technique for estimating
end-to-end bandwidths and packet-loss between hosts, where the remote
host is not cooperative in the sense that it would be impossible to
get an account on the machine (e.g., the Yahoo server). Savage's
approach is to exploit the behavior of TCP (which all servers must
implement to the specification) to gain information about the
effective bandwidth from the server to the client. For this project,
you might apply this general idea in some new context, or use Savage's
method for estimating packet loss in the context of a larger
application. For example, would it be possible to use Savage's
technique to build a client-side performance monitoring system that,
for a given HTTP transaction, would isolate the network transmission
time from the server processing time and determine which is the
bottleneck?
- Attaching geographical locations to IP addresses in the
context of a world-wide disaster monitoring system. When natural
disasters such as earthquakes occur, it is very difficult to make
accurate estimates of the geographical extent and severity of the
damage because the communication infrastructure disappears. However,
hosts that provide Internet services are always turned on, so the lack
of response from those systems contains some information. The idea is
to build a system that would sample hosts in earthquake prone regions
on a continual basis. Each sample is a bit vector, one bit per host.
Some interesting issues are assigning IP addresses to geographical
locations, developing a hierarchical scheme to aggregate response bit
vectors, and developing analysis techniques of the response bit
vectors to distinguish transients (e.g. localized power failures or
normal host downtime) from real damage.
- Scalable search engines. Current search engines are not
scalable because all of the work is done at the remote server site. As
a result, the servers are not able to perform much computation when
they satisfy a request, typically a quick lookup of an inverted index.
As a result, single-word queries, which directly index the database on
the server, typically work pretty well, but multiple word queries often
give poor results. The idea here is to investigate the following
question: Can we improve the performance of search engines
such as Google by doing some additional work on the client?
- Performance evaluation of content distribution networks.
Content distribution networks such as Akamai claim to significantly
reduce the latency of Web page downloads. The idea here is to
evaluate and quantify the performance benefits of the Akamai service.
When does it help? When does it not help?
- Defending against dDOS attacks. Distributed denial of
service attacks are somewhat scary and difficult to defend
against. The idea here would be to survey the existing approaches,
identify strengths and weaknesses, and propose and evaluate
some improvement.
- Evaluation of performance issues in high-speed non-blocking servers.
The idea here is to build a high speed server that never blocks on I/O (such
as the Flash server from Rice) and then do extensive micro-evaluation
of its performance in order to understand the extent of the performance
gain that is possible from such non-blocking servers.
|