MEAD User Manual - Version 1.5
Software Usage Notice
By downloading this software the user agrees to accept this software entirely as is, with absolutely no warranty
whatsoever, expressed or implied. The responsibility for ensuring fitness and correctness for any purpose lies
entirely with the user. Questions about this document to be sent to
mead-support@lists.andrew.cmu.edu
If you face a specific problem with running or installing the MEAD system, please fill out the details of the problem
using the
Support Request Form
What is MEAD?
The Middleware for Embedded Adaptive Dependability (MEAD) infrastructure aims to enhance distributed real-time middleware applications with new capabilities including:
- Transparent, yet tunable, fault-tolerance in real time
- Proactive dependability
- Resource-aware system adaptation to crash, communication and timing faults
- Scalable and fast fault-detection and fault-recovery
The MEAD release is currently comprised of two components. The first component, a shared library, is the heart of
the replication framework and essential for process replication. The shared library is loaded at runtime under a
CORBA application, and provides transparent replication.
The second component, a user application, is responsible for resource monitoring of the distributed system and
provides automatic replica launching.
The fore mentioned application polls data from the MEAD Replication Library and /proc on the host node where a
replica executes. In the future this application will provide hooks into the rest of the MEAD infrastructure.
More details on these two components can be found herein.
The components are respectively referred to as the MEAD Replication Library (replication framework) and the MEAD
Manager.
More information, including publications on the MEAD system, can be found at the
MEAD website.
Fault-Tolerance Fundamentals
This release of MEAD focuses on non-adaptive replication, i.e., the replication style does not currently change on the fly. Given the current state-of-the-art in ORBs, which are not completely stateless black-boxes, the unit of replication (also known as a replica) is effectively the process or the container, and not the individual object or the component. Note that replicating processes automatically replicates the components/objects hosted within those processes. The replication styles currently offered by MEAD are active replication and warm passive replication.
Active replication (see figure immediately below) allows multiple replicas when spawned, to join a replica group. Each individual replica processes each invocation and sends results back to the client. The beauty lies in the fact that MEAD suppresses duplicate responses, and the client believes it is communicating with a single server. Checkpoints (i.e., state snapshots) are triggered in active replication only when a new replica is launched and needs to synchronize its state with those of the other running replicas.
Warm passive replication (see figure immediately below) designates one replica to be the primary and the others to be backups, in the event of the primary's failure. Only the primary replica processes invocations from the client and responds to the client. All backups in the replica group periodically receive checkpoints from the primary that allow them to synchronize their respective states with that of the primary.
Both replication styles offer fault tolerance through redundancy. Active replication can be resource intensive
(because of increased CPU usage and bandwidth usage), but provides faster response and recovery times.
On the other hand, warm passive replication typically conserves resources, but the response time is bounded by the
network and processing speed of the primary replica. Also, increased bandwidth and CPU usage might result if the
size and frequency of checkpoints is sufficiently large. Warm passive replication's recovery times are also slower
because the recovery time includes failover time from the crashed primary to the backup.
Installation and Distribution
Release Notes
The following list outlines functionality and enhancements of the current and past MEAD releases as a change log:
- MEAD v1.5
- Resource monitoring integrated
- Automatic replica launch and bootstrapping
- 100% improvement in Active Replication client side round-trip latencies
- Several minor bug fixes and enhancements
- MEAD v1.1
- Added support for the Naming Service
- Added support for Spread Messages up to 80KB
- Improved RTT latency by ~600usec
- Several minor bug fixes and enhancements
- MEAD v1.0
- Initial MEAD release, replication framework only
- Support for Active or Warm Passive Server Replication
- Support for CORBA IOR files
We recommend using MEAD 1.5 (which is a stable release)
Currently Supported Configurations
Emulab
is the distributed test-bed of choice for the MEAD system. If you have never used Emulab before, please
refer to the
Emulab "Getting Started" Tutorial.
More details to get MEAD running on Emulab can be found in the
Running MEAD on Emulab section.
The current version of MEAD supports, and has been tested with, the following platforms, ORBs and third-party software.
Platforms | Red Hat Linux 9 (The BBN-RH9-SS7-8 image on Emulab) |
Compiler | g++ 3.2.2 |
Networks | 100Mbps Ethernet |
ORBs | TAO v 1.4.1 and ACE v 5.4.1 |
Third-Party Software | Spread group communication system v 3.17.1 |
MEAD uses the Spread group communication system for inter-object communication. More details can be found below under
the
Running and Configuring Spread section of this manual.
Distribution
The directory containing the latest release of the MEAD software distribution is located on Emulab (ops.emulab.net) at
/groups/pces/uav_oep/mead_cmu/release/mead.
The current MEAD release is comprised of library code for the MEAD replication mechanism, the MEAD Manager
application, pre-built Spread libraries for Red Hat 9, and example scripts used to run the replication
infrastructure:
/MEAD ($MEAD_ROOT)
/docs
/examples
/counter
/stateless
/namingservicestateless
/manager
/idl
/src
/replication
/include
/lib
/obj
/src
/scripts
/spreadlinuxbin
/docs
/include
Known Caveats and Recommendations
In the interests of fault containment and effective replication, we recommend hosting only one replica of a server
per node. While it is possible to host more than one server replica per node, if the node is to contain the fault,
and a single processor-crash should not affect more than one replica, running only one replica of a server process
per node is ideal.
The current version of MEAD requires the CORBA application to be
deterministic in behavior, i.e., any two replicas
of the server, when starting from the same initial state, and receiving the same set of invocations in the same
order, should reach the same final state. This eliminates the use of local timers, shared memory and any other OS
primitives that can lead to irreproducible behavior across different nodes in a distributed system. Determinism is
a common assumption in the development of fault-tolerant distributed systems; while ongoing development in the MEAD
project aims to eliminate the need for determinism, the current version of the MEAD requires this of the application.
Finally, a few limitations of the MEAD Manager and Resource Monitoring Framework can be found
here.
Reporting Problems & Obtaining Support
If you face a specific problem with running or installing the MEAD system, please fill out the details of the problem
using the
Support Request Form.
For more general questions on MEAD, please email us at
mead-support@lists.andrew.cmu.edu
CORBA Application Requirements
In order to restore application-level state in the event of a server crash, the Fault-Tolerant CORBA standard
requires every CORBA object to support an additional Checkpointable interface, with methods for the retrieval and
assignment of application-level state. This interface is an abstract class and can be inherited by the application
when working with MEAD. Note that State can be defined in a custom way, based on the CORBA object's state. MEAD
simply invokes these methods in order to perform checkpointing and state transfer for both warm passive and active
replication. After implementing these functions the application programmer does not need to worry further about them
(except, of course, for updating the implementations of these methods should the application's definition of state
change) MEAD handles ORB-level and infrastructure-level state so that the application programmer does not need to
worry about them.
exception NoStateAvailable{};
exception InvalidState{};
interface Checkpointable {
State get_state() raises(NoStateAvailable);
void set_state(in State s) raises(InvalidState);
};
When developing CORBA applications for MEAD, it is necessary to implement
get_state()
and
set_state() at the object
level, and
get_global_state() and
set_global_state() for void
pointer data at the application level (extern functions).
At the application level, the example below uses void pointers to offer a less restrictive implementation. The data
type state can be any type of structure containing all of the current state for the process as well as CORBA object
state from above. This C code can be used to declare the external functions for get_global_state() and
set_global_state():
extern "C" void *get_global_state (int *size);
extern "C" void set_global_state (void * state, int size);
The sample program "counter" ($MEAD_ROOT/examples/counter) contains simple usage of both the
Checkpointable interface and the globally extern'd functions.
Building MEAD
The MEAD library (libmead.so) requires the Spread GCS for communication, more details can be found further in this
document at this
link.
Spread must be installed and correctly configured prior
to building MEAD. More details can be found at the
Spread website. Example
programs require ACE/TAO to be installed, details can be found at the
TAO website.
To build all of the MEAD release and sample programs, type the following commands at the MEAD root to run a recursive make:
- source source_this_for_Emulab
- This will establish the $MEAD_ROOT environmental variable
- make
The replication makefile makes use of the Spread library (libspread.so) found in the spreadlinuxbin directory. If you
are not using Red Hat 9 it will be necessary to install and configure Spread on your target platform. For convenience
this MEAD distribution includes pre-built Spread binaries in the spreadlinuxbin directory.
Make can be executed individually in the $MEAD_ROOT/manager and $MEAD_ROOT/replication directories to compile just
the MEAD components.
Finally, both the sample CORBA applications as well as the MEAD Replication Library makefiles can compile the target
application in debug mode. This will enable portions of the inner workings of the MEAD Replication Library or CORBA
applications to be viewed from the standard output.
To Enable Debugging
To enable debugging of the Replication Library look for the CPPFLAGS variables and values, they will be commented out
by default. Two values can be used for runtime debugging: −DDEBUG and −DTRACE. −DDEBUG is used for
various print
statements containing runtime values, while −DTRACE is used for a full function trace of the MEAD Replication
Library's execution.
Uncomment the values and recompile to view debugging information.
Makefiles for the example programs also contain debugging information. Look for and uncomment the mINCLUDES variable
that contains the −DDEBUG value.
To Enable the Resource Monitoring Framework
To enable resource monitoring, look for the CPPFLAGS in the $MEAD_ROOT/replication makefile and uncomment the
−DMONITOR flag. More details can be found in the section MEAD Manager and Resource Monitoring Framework.
MEAD Manager and Resource Monitoring Framework
Overview
The MEAD Manager is a run-time component of MEAD that performs automatic replica launching to 1) bootstrap the
fault-tolerant system and to 2) recover from process and processor crash faults. The Manager’s distributed resource
monitoring framework collects and broadcasts detailed resource usage statistics for each replica and node in the
system. The Manager connects to the MEAD Replication Library to perform event-triggered fault detection and
lightweight accounting of network traffic produced and received per process. System data and events are broadcast
over Spread, enabling remote system monitoring and management. The section of the following diagram labeled
"Now Available" incorporates all functional portions of the MEAD 1.5 release.
Architecture
The MEAD Manager process runs on each node in the distributed system and communicates with other Managers through
Spread. On launch, the Manager reads the mead.conf configuration file. The mead.conf file describes all programs to
run and monitor. Processes can be launched during the Manager’s startup or be triggered remotely through the
communication group. When launch is signaled on a host, the manager uses the program options to build a command line
and forks and execs a new process. The process is LD_PRELOADed with the MEAD Replication Library. When the process
launches, the MEAD Replication Library opens a socket (defined in environmental variable MANAGER_PORT) to the MEAD Manager that
is used to collect network data and to detect process failures through socket closures. The MEAD Manager
periodically collects resource data from the process and from the kernel through /proc, then packages and broadcasts
the data to the rest of the system. Much of the MEAD Manager’s functionality is customizable through the mead.conf
file including the resource data broadcast frequency. The MEAD Manager can also be configured to launch the spread
daemon on startup, simplifying system bootstrapping.
Configuration
The MEAD Manager is located in the $MEAD_ROOT/manager folder of the MEAD distribution. The following steps need to
be performed to execute the MEAD Manager:
- source MEAD’s environment on each node for your experiment
- In the Root of the MEAD folder
%source source_this_for_Emulab
- make the MEAD Replication Library with DMONITOR enabled
- Open the $MEAD_ROOT/replication/Makefile and search for −DMONITOR.
Make sure this line is uncommented, save changes, re-make the replication library.
- run a top level make at $MEAD_ROOT
- this will build example programs, the MEAD Replication Library, and the MEAD Manager
- execute Spread on each node of your experiment
- details can be found here
- configure the mead.conf file
- details can be found here
- execute the MEAD Manager on each node of your experiment
- % ./manager
- The manager display will update as connections are established with other managers currently running in
the Spread Segment.
- The MEAD Manager will launch the applications configured in the mead.conf file, or wait for applications
launched manually with the MEAD Replication Library to make connections to the Manager.
- Applications launched manually with the MEAD Replication Library will have to have the environmental variable
MANAGER_PORT set to the localhost port where the MEAD Manager resides, by default this is 11051.
- After the applications are launched watch the monitor ui for resource updates, service details, and node
states. Within a few seconds, the managers should recognize each other and the interface will update
accordingly.
The mead.conf File
The $MEAD_ROOT/manager/mead.conf file is used to customize the MEAD Manager’s behavior. The mead.conf file is self
documented and contains detailed instructions on configuration of all available options and parameters.
The mead.conf file defines five main tags:
Tag |
Description |
<managerHome> |
Manager home directory; usually set to $PWD |
<stateBroadcastInterval> |
Broadcast interval in milliseconds |
<spreadPath> |
Spread path; if uncommented the MEAD Manager will attempt to launch Spread during startup. If set to run
Spread, $MEAD_ROOT/manager/spread.conf must be present and configured for the Spread Segment.
For most purposes the spread.conf file in the manager directory will be
identical to the spread.conf.emulab file in the scripts directory. Details on configuring the spread.conf
file can be found here.
|
<spreadPort> |
Spread port; default is 6011. Should also be the same as the port defined in $MEAD_ROOT/manager/spread.conf |
<service> |
Service tag defines an application to manage. Services can be launched during the Manager’s startup
(bootstrapping). Services can also be set to re-launch if they crash. All services are monitored.
More details of the <service> tag are found here. |
Service Tags
Create a <service> section for each program that the MEAD Manager should monitor and/or launch. The following table
details parameters of the <service> tag
Name |
Description |
Status |
name |
Service name |
Mandatory |
path |
Path to executable |
Mandatory |
host |
Host node to execute on. Setting to “auto” will not automatically launch the process, instead the Manager will
wait for the process to be either launched manually from the console or via Spread commands. |
Mandatory |
arg |
Executable arguments (command line arguments). All command line arguments are white space delimited and are
inserted into the command line in-order. |
Optional |
env |
Executable environment |
Optional |
preload |
Path for library for LD_PRELOAD |
Optional |
window |
Launch process in separate x-term. The output of programs that are not launched in a new shell is piped to
a file in the MEAD/manager/output directory. |
Optional |
monitor |
Preload monitor library |
Deprecated |
relaunch |
Automatically relaunch replica on failure |
Optional |
repStyle |
MEAD Replication Library replication style |
Not Available |
checkpointingInterval |
MEAD Replication Library WARM_PASSIVE checkpointing interval |
Not Available |
faultDetectionInterval |
Socket based fault detection interval |
Not Available |
Interfacing
Any program that connects to Spread can subscribe to the MEAD Manager’s Spread Group to receive resource data
broadcasts or launch and kill processes. The spread group is named meadManagers and the message types are defined
the following file $MEAD_ROOT/manager/src/meadMsgTypes.h. You can use the SpreadConnection wrapper class defined in
$MEAD_ROOT/manager/src/SpreadConnection.h to simplify connecting to spread and receiving and sending spread messages.
Current Limitations
As of the MEAD 1.5 release the following limitations are acknowledged and we are actively looking for solutions:
- A SIGINT handler is currently installed by the MEAD Replication Library when compiled with DMONITOR. This will
overwrite any user SIGINT handler
- MEAD Manager should only be used for server replicas in most cases. SIGIO is used for asynchronous socket
communication and seems to cause a conflict with user input based applications (e.g. console based clients).
- Due to the tight communication model between a MEAD Replication Library and the MEAD Manager, killing a MEAD
Manager may kill replicas currently be monitored.
Running and Configuring Spread
Spread Overview
Spread is a group communication system developed at John Hopkins University and currently developed by
Spread Concepts LLC and CNDS at John Hopkins University. Additional information can be found at the
Spread website
as well as the
Spread User Manual.
The Spread daemon can be launched using this command:
% spread -c <spread.config.file>
Spread Segment
The Spread configuration file contains options for the Spread daemon, and comments on the configuration of each
option. MEAD is mostly concerned with the Spread_Segment { }, which contains a list of
<hostname> <IP> pairs for each node in the cluster, as well as a broadcast address and port for execution.
The following is a sample configuration file used in the Emulab environment. The broadcast address can differ in
range (i.e., in the first three dotted-decimal places) from the IP addresses of the three nodes; this is specific to
the Emulab environment where the nodes seem to be multi-homed. The broadcast is for local connectivity and the
individual host IPs are actual addresses that can be resolved for the machine.
Spread_Segment BroadcastAddress:SpreadPortNumber {
node0 node0_IP_address
node1 node1_IP_address
node2 node2_IP_address
: :
}
Finally, only use the computer name for the <Hostname> parameter and not the fully qualified name for the computer (e.g. use node0 not node0.test.pces.emulab.net).
Spread Timeouts
When compiling the Spread library and the Spread daemon, there is a series of timeouts that can be set can be
configured and compiled. These can significantly affect the performance of both Spread and MEAD. More details
can be found in the
Spread User Manual.
Spread Related Errors returned by MEAD
The following lists a few common errors related to configuration of spread that are returned from the MEAD library,
more details can be found at:
http://www.spread.org/docs/docspread.html.
When launching spread from the Spreadstart
script in the $MEAD_ROOT/scripts folder errors are generally masked by the scripts. To trouble shoot the issue,
launch
spread at the command line using the command above.
- MEAD: Could not connect to SPREAD daemon! (error=-2)
- The Spread_Segment may have not been setup properly to include the <Hostname> <IP> where the
current application resides.
- The port used in the Spread_Segment may be different from the port used in the MEAD library
Running MEAD on Emulab
The steps below detail the execution of the MEAD Replication Library using example bash scripts provided in the
$MEAD_ROOT/scripts directory. This run will replicate three servers in either active or warm passive replication
styles and will use a single client to make requests of the three-way replicated server. An additional step will include
the MEAD Manager for resource monitoring. These bash scripts set the
environmental variables necessary to execute, feel free to open them up and look at how the variables are used.
The primary test-bed for MEAD software is
Emulab
(BBN-RH9-SS7-8 Emulab image).
Some of the following steps need to be executed on every assigned Emulab node (step is prefaced by
[every node]) or
only on one assigned Emulab node (step is prefaced by
[one node]).
The symbol % represents the shell command-line prompt.
- Setting up the Emulab Experiment
- Enter the following values in the "Begin a Testbed Experiment" form on Emulab
- Select Project: pces (If this choice is not available to you, please contact us).
- Group: pces/uav_oep
- Description: Replication with MEAD
- Your NS file: Please download the
sample NS file that we have provided in order to simplify the configuration of the
topology for this experimental run. You can edit this file for future runs.
- The topology consists of four nodes, one 100Mbps LAN; three nodes will be used to host three individual server replicas, and the fourth node will be used for the client application.
- Use the OSID BBN-RH9-SS7-8 for each of the nodes when setting up your experiment with Emulab.
- Wait to receive an email from the Emulab Testbed Operations team (notifying you of the availability of your requested experimental configuration) before proceeding further.
- Once you receive the notification about your experiment being started, ssh into the machines assigned to you, according to the instructions on Emulab.
- [any node] Obtaining the MEAD source distribution
- The MEAD distribution is available on users.emulab.net in the directory: /groups/pces/uav_oep/mead_cmu/release/mead/
- Use cp (or scp) to copy the MEAD source to your working directory (for convenience, copy from the tar ball from the root of the release folder).
- If you copied the .tar.gz file, then, you should do the following to extract the MEAD directory:
% tar -zxf mead-1.0.tar.gz
A directory called MEAD is created in the process of extracting the .tar.gz file.
- [every node] Obtaining the right environmental settings
- Enter the MEAD directory and obtain the right environmental settings for your binary and library
search-paths:
% source source_this_for_Emulab
- [any node] Building the MEAD library and the test applications
- Enter the MEAD root and type make to run a recursive make, e.g.,
% cd $MEAD_ROOT
% make
- Setting up the Spread configuration
- Find out the following information for your assigned Emulab nodes: each node's IP address, each node's hostname, and the broadcast address.
- [any node] The Emulab machines appear to be multi-homed (five
interfaces per node). To see this for yourself, try the following:
% ifconfig -a
You will see interfaces eth0 through eth4 defined.
- [any node] For the broadcast address, look at the output of:
% ifconfig eth3
and replace the last part of the dotted-decimal IP address with 255, e.g., if you see an IP address of
10.1.1.4, the broadcast address is 10.1.1.255
- [every node]For each node's IP address, look at the output of:
% hostname -i
to obtain the IP address associated with the interface.
- [every node] For each node's hostname, look at the output of:
% hostname
to obtain the qualified hostname associated with the interface. You will only need the first part of this address, i.e., for node8. *.pces.emulab.net, you only need to remember node8).
- [any node] Edit the file MEAD/scripts/spread.conf.emulab to specify the set
of Emulab nodes that you are using for your experiment.
- Edit the Spread_Segment by listing the <hostname> <ip> pairs within the parentheses.
- The default Spread port number is 6011, but can be changed; however, the new value of the port
must be identical in two places: (i) the Emulab configuration file
MEAD/scripts/spread.conf.emulab, and (ii) the SPREAD_PORT environment variable supplied to the
application (and, indirectly, to MEAD) at run-time.
- Here is an example for a broadcast address of 10.1.1.255, a Spread port number of 6011 and four
nodes (node6, node7, node8 and node9) with their respective IP addresses:
Spread_Segment 10.1.1.255:6011 {
node6 155.101.132.92
node7 155.101.132.93
node8 155.101.132.94
node9 155.101.132.95
}
- [every node]Running Spread
- Launch the Spread daemon (spread) on each host that will run either a server or a client. The Spread
daemon can be launched from the MEAD root directory, as follows:
% cd MEAD/scripts
% ./Spreadstart
- [every node – OPTIONAL STEP] Launch Manager
- If the MEAD Replication Library was build with the −DMONITOR flag active, applications launched
in step 8
will attempt to make a connection to the MEAD Manger. A MEAD Manager, if executing, will track the
resource usage of the host and the replica. If the manager program is not executing a failed connection
attempt will be returned by the new replica, but processing will continue. To launch a manager per host,
follow these steps:
- Open a separate terminal on a host that currently has the Spread Daemon executing
- % cd $MEAD_ROOT
- % source soure_this_for_emulab file
- % cd $MEAD_ROOT/manger
- Configure the mead.conf file for resource monitoring interval, custom Spread port, and a
<service> entry in for the process. The service entry host should be set to "auto".
More details can be found here.
- % ./manager
- Keep the terminals open to view the resource usage when executing the scripts in Step 8.
- There are six scripts that demonstrate the execution of the three provided test applications. All of the
scripts are found in the MEAD/scripts directory, and display usage information. The scripts will not launch if
the Spread daemon is not running. The scripts provide settings for most of the MEAD environment variables. If
you wish to bypass the scripts and run processes directly at the command-line, please read the
MEAD Replication Library Parameters section of this document.
Choose one of the three applications below (either counter, stateless, namingservicestateless* ) and launch
one server replica on each of the three nodes of your cluster. Finally, launch the client as instructed below.
Servers (replicas) can be killed at any time using Control-C. Launching a server script will add a new server
replica to the group, initialize its state, and the either start normal processing (for active replication) or
state transfer (for warm passive replication).
counter is a "stateful application", where the server maintains the count of user requests, and implements the
Checkpointable interface for the CORBA objects and exports global state for the actual application.
- To run a three-way actively replicated version of this test application:
% cd $MEAD_ROOT/script
% ./counterServer_run SVR ACTIVE
% ./counterClient_run CLNT SVR
- To run a three-way warm passively replicated version of this test application:
% cd $MEAD_ROOT/scripts
% ./counterServer_run SVR WARM_PASSIVE
% ./counterClient_run CLNT SVR
stateless is a "stateless application" that sends a string containing the process identifier of the
process and the current number of requests that the object has serviced.
- To run a three-way actively replicated version of this test application:
% cd $MEAD_ROOT/scripts
% ./statelessServer_run SVR ACTIVE
% ./statelessClient_run CLNT SVR
- To run a three-way warm passively replicated version of this test application:
% cd $MEAD_ROOT/scripts
% ./statelessServer_run SVR WARM_PASSIVE
% ./statelessClient_run CLNT SVR
namingservicestateless is a the same stateless application from above, but it makes use of the Naming
Service to acquire references to the CORBA Objects. In addition to launching the scripts for the
application the Naming Service must be started. The port that the Naming Service is running on must be
allowed to pass by the MEAD library. This is allowed by settings the environmental variable for the
host and port when launching the MEAD. Use NSstart and NSkill to start and stop the applications
- To run a three-way actively replicated version of this test application:
% cd $MEAD_ROOT/scripts
% ./statelessServer_run SVR ACTIVE
% ./statelessClient_run CLNT SVR
- To run a three-way warm passively replicated version of this test application:
% cd $MEAD_ROOT/scripts
% ./statelessServer_run SVR WARM_PASSIVE
% ./statelessClient_run CLNT SVR
After the server replicas are done processing invocations from the client, systematically kill (^C)
the server replicas, until the system is down to the last remaining replica. Then, bring the other
two server replicas back up. Finally kill the previously remaining server replica. The system will
still continue to function, due to MEAD's fault-tolerant support.
- [every node] Cleaning Up
- Kill the Spread daemon (at the end of all your experiments, and not before, or during, your
experiments):
% cd MEAD/scripts
% ./killSpread
Must-Read Notes
Some common things to watch out for when running MEAD:
- The source_this_for_Emulab file must be sourced only from the $MEAD_ROOT directory, where it is located.
- The Spread daemon must be running on every node on which you expect to run clients or servers using MEAD.
- The client group id (CLNT) and the server group id (SVR) should not be the same for a process; these identifiers represent the client and server object group names, respectively, and should be distinct.
- The same Spread group id (e.g. SVR) should only be used once per host.
- When replicating a server, ensure that all of the replicas of the server are launched with the same object group identifier and the same replication style, i.e., within the same group, all of the replicas must possess the same GID and should have the same REPLICATION_STYLE
- For example, make sure when launching an actively replicated server, ensure that you use the same command on all three server machines (./counterServer_run SVR ACTIVE).
- Pure clients should always be launched using ACTIVE replication style in order to enable duplicate suppression
(the client scripts in the $MEAD_ROOT/scripts directory automatically take care of this), regardless of the
server's replication style.
Using the Naming Service
As of the MEAD 1.1 release MEAD has support for running the CORBA Naming Service. To enable the MEAD Replication Library
to work with
CORBA applications using the Naming Service the following environmental variables must be set:
For example, when launching a simple application try the following:
% env LD_PRELOAD=$MEAD_ROOT/replication/libmead.so.1 \
REPLICATION_STYLE=ACTIVE \
GID=MYCLNT \
SERVER_ID=MYSVR \
IS_STATELESS=yes \
SPREAD_PORT=6011 \
MANAGER_PORT=11051 \
USING_NS=yes \
NS_HOST=10.1.1.4 \
NS_PORT=6012 \
IS_SERVER=no \
$MEAD_ROOT/examples/counter/client iorfile time
Notice that the previously mentioned environmental variables are set. Both NS_HOST and NS_PORT are required for the
MEAD Replication library to work with the Naming Service.
MEAD Replication Library Parameters
The MEAD Replication Library uses several environmental variables for configuration. The following table provides the
names and descriptions of the variables as well as acceptable values.
Name
| Description
| Value
|
LD_PRELOAD |
Used to specify that run time linking and loading should include LD_PRELOAD's libraries first.
Note that the library will be loaded for all applications in that environment or
process address space. |
libmead.so.1 |
REPORT_CALLS |
REPORT_CALLS is used to filter the debugging messages returned by the MEAD library. The parameter as a string
value that includes the names of intercepted system call functions. Only used when the MEAD Replication
Library is compiled using the –DREPORT Option. |
all, socket, read, writev, bind, select, connect, accept, ETC. |
REPLICATION_STYLE |
Used to set the replication style for the application that is being launched. |
ACTIVE or WARM_PASSIVE |
GID |
Object group name. |
alpha-numeric strings between 3 and 8 bytes. |
SERVER_ID |
Object group name for a server that a client application will connect to. |
alpha-numeric strings between 3 and 8 bytes. |
SERVER2_ID |
Object group name for another type of server that the client can connect to. |
alpha-numeric strings between 3 and 8 bytes. |
IS_STATELESS |
Lets MEAD know that the application is stateless. |
yes/no |
SPREAD_PORT |
Port number that MEAD should look for the Spread daemon to run on. |
dynamic port range only. |
USING_NS |
Lets MEAD know the Naming Service is being used. If set to 'yes' NS_HOST and NS_PORT must also be set. |
yes/no |
NS_HOST |
IP that the Naming Service is running on. Intercepted calls to this ip and port combination will be allowed
to pass through the MEAD interceptor. |
IP (xxx.xxx.xxx.xxx) |
NS_PORT |
Port number that MEAD will bypass during the interception process. |
dynamic port range only |
MANAGER_PORT |
MEAD Resource Manager Port. This port allows a connection to be established between a MEAD replica and the
MEAD Resource Manager. |
Numeric value; default is 11051 |
Other Configurations
MEAD will run in other environments that have configured functional copies of both ACE/TAO and Spread. The examples
above should work as well provided that:
- The proper paths are set for the environment to build and runtime libraries.
- MEAD makefiles for both the MEAD Replication Library and the sample applications are modified to include the
proper paths to libraries and include files
- The MEAD Replication Library is rebuilt using the version of Spread for the intended platform
- The sample CORBA applications are rebuilt using the version of ACE/TAO for the intended platform
- All of the scripts in the scripts sub-directory should be changed to include the proper path for Spread
Trouble-shooting
Title
| Description/Possible Solution
|
LD_PRELOAD is not set |
In most cases the CORBA application will still function, without the replication mechanism. |
When Spread is not running |
MEAD will return the following error message: Could not connect to SPREAD daemon! (error=-2).
The application will not actually launch. |
env variables are not correct |
Critical environmental variables are checked during initialization. MEAD will return error messages that
variables are not set. |
LD_LIBRARY_PATH does not include right libraries |
These are standard errors that will be returned by the run time linker and loader. In most cases they include
the name of the library that has not been pathed properly. |
Compiled with –DMONITOR but manager is not running |
A trivial error will return indicating a failed connection. The MEAD Replication Library will still function
for the duration of execution, but resource monitoring will not be performed. |
References
Contributors
Contributors to MEAD include:
- Priya Narasimhan
- Tudor Dumitras
- Aaron Paulos
- Soila Pertet
- Charlie Reverte
- Joe Slember
- Deepti Srivastava