Team 3 : Project Page
|
17-654: Analysis of Software Artifacts
18-846: Dependability Analysis of Middleware |
TEAM 3
Team Members:
Team Roles:
- Project lead: Ackley
- Baseline: Fry / Boyer
- MTA front-end, Spam back-end: Fry / Wilson
- Database: Ackley / Wilson
- Testing: Ackley / Boyer
- Documentation: Ackley / Boyer / Fry / Wilson
Project Title: Spam'n'Beans - A High-Performance Mail Content Checker
Baseline Application Description:
A system that assists high-volume EMail servers by analyzing received EMail
against a central database of spam. Mail servers may be configured to reject
EMail classified as having a high probability of being spam.
Configuration:
- Java
- Enterprise JavaBeans (JBoss)
- Linux
Third-party software, if any (databases):
- Sendmail
- SpamAssassin
- PostGreSQL
Project Documents
Project Downloads:
Baseline Application
Interfaces
Scenarios/Interactions
- The customer configures their MTA to forward email
into the Spam'n'Beans system.
- The system will then compare the content of the
emails and assign a likelihood value that the email is spam.
- The system will return the original email back to the
customer's system with the appropriate spam likelihood embedded within the
message headers.
- See Baseline Use Cases
Current Status
Downloads
Fault-Tolerant Baseline Application
Architecture
Fault Tolerance will be achieved with the following system
attributes:
- Replication
- The system is designed with a cluster
of replicated middle-tier servers, and one or more back-end
servers which are not replicated.
- A backend server will
contain a
Global naming server, EJB container, Fault-Detector and
Replication Manager console, and a Load-Balancing manager.
A second back-end server will run a PostGreSQL database.
None of the back-end servers are replicated and thus remain as
single points of failure within the system. The backend servers will not
host middle-tier replicas.
- All middle-tier
replicas consist of its own JBoss naming service, EJB
container, and SpamAssassin Daemon. All replicas in the
cluster may simultaneously process client requests, though only one server will process
a given client request at a time.
- Client requests are
load-balanced across the servers in the middle-tier cluster.
This permits optimum use of resources and allows the fastest
turn-around time (on average) for client requests. To
permit load balancing each client will query the Load
Balancing Manager on the back-end server - which returns to
the client a particular middle-tier server to contact.
The client will then submit its mail processing request to
the indicated middle-tier server.
-
The Replication Manager console running a back-end
server offers administrative functions to dynamically add/remove machines to
the pool of replica servers and to individually launch/shutdown each
middle-tier replica. This permits system administrators to perform
routine maintenance on servers with minimal disruption to the system.
Using the console, a system administrator may shutdown, or add and launch a
middle-tier server without affecting the other middle-tier servers in the
cluster.
- Fault Detection
- Faults are detected through the use of throwing and
handling exceptions.
- Such exceptions fall into 3 categories:
- Application
- Caused by invalid user input and are returned to the user as descriptive errors.
Requests sent from invalid or inactive clients will
be returned to the user as a user error. - Non-Fatal
- These are exceptions that occur as a result of a system
component failure.
- Such exceptions are not returned to the user, but are handled gracefully
using failover mechanisms.
- Network exceptions received by clients from the
middle-tier servers are
considered Non-Fatal and clients will transparently failover to a secondary server
using the same transaction ID.
- Exceptions received by middleware servers to the clients will result in no activity
on the part of the middleware server. If the client
detected the failure, it may retry the operation. The middleware
must then detect and ignore the duplicate transaction (which may have
been routed to a secondary server). If the client does not retry
the operation, all is well and no corrective system action is
required.
- Fatal
- These are generally non-recoverable system faults which may require
complete a system shutdown and restart.
- These exceptions are returned to the user and
system administrator where possible, and may result in total system
failure.
- Examples include failures in the Database
backend, Replication Manager, etc.
- Exceptions
received by the middle-tier from the backend
(database) server are considered Fatal and will be
reported to the client and system administrator.
-
Exceptions received by the backend (database)
server from the middle-tier will result in a transaction rollback.
- This console also acts as a Fault Detector which
periodically polls all middle-tier servers
launched from it. In the event of the polling
receives a Non-Fatal exception, the console will assume a
crash-fault of the middle-tier server
and will automatically restart the failed
replica.
- In the event the client receives a Non-Fatal
exception, it will assume a crash-fault
of the middle-tier server, and will attempt to
transparently failover to another middle-tier server.
Any Application exceptions received by the client are
considered to be caused by invalid user input, and are
indicated as such to the user. Any Fatal exceptions
received by the client indicate there is no opportunity
for failover and reported to the user to indicate the
system is unavailable.
- Assuming fail-silent behavior, the system is
capable of handling any number of simultaneous or successive faults in the
middle-tier replicas, provided there is at least one middle-tier replica
that is running at all times.
- Failover
- When a client
receives a Non-Fatal exception (indicating a middle-tier
server failure) it will transparently failover
by contacting the Load Balancing Manager for another
middle-tier server and re-submit it's request. This
assumes the services of the back-end servers (such as the Load
Balancing Manager) are always available.
-
The system design is such that the client communication with session beans are stateless and can be considered
idempotent. The only state retained by client requests is maintained
by entity beans which are stored in the back-end database. The state
of a client for a particular transaction is saved/updated only on successful
processing of a message. This should obviate the need for replica
checkpoints.
- Unique Identifiers
- Each client request will supply a unique
MsgID. This unique permits
detection of duplicate transactions that may occur as a result
of a failover so that the middle-tier servers do not perform
the transaction twice.
- Fault Injection Techniques
- Inject a crash-fault using any of the following techniques:
- Use the console to shutdown a replica gracefully
& remove it from the replica cluster.
- Use "repman" to shutdown a replica - resulting in
the console restarting that replica.
- Use "kill" to kill a replica process - resulting
in the console restarting that replica.
- Disconnect the network cable or shutdown a
machine running a replica - resulting in console repeatedly attempting to
restart unsuccessfully until the system is restarted/re-connected.
- Run the "scripts/FaultInjector.pl" script to periodically kill
replicas
Scenarios/Interactions
- See Fault Tolerance Use Cases
- Current testing has used the following configuration:
- The PostGreSQL database server is always running on the same
machine.
-
The console (Admin, Fault Detector, Load Balancer) machine may
be chosen at startup to be any one of the ECE cluster machines.
(prefer settlers, othello, or girltalk) It is assumed that
JBoss has already been started on the machine which is to be the
console system before the console is
launched.
-
The middle-tier cluster machines may be chosen shortly after
system startup. (prefer settlers, othello, or girltalk)
-
It is assumed that all machines in the ECE cluster
mount the same network paths. Thus the location of the JBoss
application and deployment directories for each of the system is the
same.
Current Status
Fault tolerant performance graph
Fault tolerant performance data
Elapsed seconds | 1622 |
Seconds/message | 0.81 |
Messages/second | 1.23 |
Bytes processed | 11069358 |
Bytes/message | 5534.76 |
Bytes/second | 6824.62 |
Fault tolerant performance graph with fault injection
The FaultInjector.pl script was configured to kill one replica every 180 seconds,
or approximately every 200 messages
Fault tolerant performance graph with multiple clients
Four machines were each running five clients. Every client processed 100 messages
for a total of 2000 messages. The database time gets much larger when many clients
are running simultaneously.
Downloads
Real-Time Fault-Tolerant Baseline Application
Scenarios/Interactions
Current Status
Downloads
High-Performance Real-Time Fault-Tolerant Baseline Application
Scenarios/Interactions
Current Status
Downloads
$Id: index.html,v 1.24 2004/04/19 22:07:40 gca Exp $