MEAD User Manual - Version 1.5

Software Usage Notice

By downloading this software the user agrees to accept this software entirely as is, with absolutely no warranty whatsoever, expressed or implied. The responsibility for ensuring fitness and correctness for any purpose lies entirely with the user. Questions about this document to be sent to mead-support@lists.andrew.cmu.edu

If you face a specific problem with running or installing the MEAD system, please fill out the details of the problem using the Support Request Form

Index

What is MEAD?

The Middleware for Embedded Adaptive Dependability (MEAD) infrastructure aims to enhance distributed real-time middleware applications with new capabilities including:

Transparent, yet tunable, fault-tolerance in real time
Proactive dependability
Resource-aware system adaptation to crash, communication and timing faults
Scalable and fast fault-detection and fault-recovery

The MEAD release is currently comprised of two components. The first component, a shared library, is the heart of the replication framework and essential for process replication. The shared library is loaded at runtime under a CORBA application, and provides transparent replication. The second component, a user application, is responsible for resource monitoring of the distributed system and provides automatic replica launching. The fore mentioned application polls data from the MEAD Replication Library and /proc on the host node where a replica executes. In the future this application will provide hooks into the rest of the MEAD infrastructure. More details on these two components can be found herein. The components are respectively referred to as the MEAD Replication Library (replication framework) and the MEAD Manager.

More information, including publications on the MEAD system, can be found at the MEAD website.

Fault-Tolerance Fundamentals

This release of MEAD focuses on non-adaptive replication, i.e., the replication style does not currently change on the fly. Given the current state-of-the-art in ORBs, which are not completely stateless black-boxes, the unit of replication (also known as a replica) is effectively the process or the container, and not the individual object or the component. Note that replicating processes automatically replicates the components/objects hosted within those processes. The replication styles currently offered by MEAD are active replication and warm passive replication.

Active replication (see figure immediately below) allows multiple replicas when spawned, to join a replica group. Each individual replica processes each invocation and sends results back to the client. The beauty lies in the fact that MEAD suppresses duplicate responses, and the client believes it is communicating with a single server. Checkpoints (i.e., state snapshots) are triggered in active replication only when a new replica is launched and needs to synchronize its state with those of the other running replicas.

Warm passive replication (see figure immediately below) designates one replica to be the primary and the others to be backups, in the event of the primary's failure. Only the primary replica processes invocations from the client and responds to the client. All backups in the replica group periodically receive checkpoints from the primary that allow them to synchronize their respective states with that of the primary.

Both replication styles offer fault tolerance through redundancy. Active replication can be resource intensive (because of increased CPU usage and bandwidth usage), but provides faster response and recovery times. On the other hand, warm passive replication typically conserves resources, but the response time is bounded by the network and processing speed of the primary replica. Also, increased bandwidth and CPU usage might result if the size and frequency of checkpoints is sufficiently large. Warm passive replication's recovery times are also slower because the recovery time includes failover time from the crashed primary to the backup.

Installation and Distribution

Release Notes

The following list outlines functionality and enhancements of the current and past MEAD releases as a change log:

MEAD v1.5
- Resource monitoring integrated
- Automatic replica launch and bootstrapping
- 100% improvement in Active Replication client side round-trip latencies
- Several minor bug fixes and enhancements
MEAD v1.1
- Added support for the Naming Service
- Added support for Spread Messages up to 80KB
- Improved RTT latency by ~600usec
- Several minor bug fixes and enhancements
MEAD v1.0
- Initial MEAD release, replication framework only
- Support for Active or Warm Passive Server Replication
- Support for CORBA IOR files

We recommend using MEAD 1.5 (which is a stable release)

Currently Supported Configurations

Emulab is the distributed test-bed of choice for the MEAD system. If you have never used Emulab before, please refer to the Emulab "Getting Started" Tutorial. More details to get MEAD running on Emulab can be found in the Running MEAD on Emulab section.

The current version of MEAD supports, and has been tested with, the following platforms, ORBs and third-party software.

Platforms	Red Hat Linux 9 (The BBN-RH9-SS7-8 image on Emulab)
Compiler	g++ 3.2.2
Networks	100Mbps Ethernet
ORBs	TAO v 1.4.1 and ACE v 5.4.1
Third-Party Software	Spread group communication system v 3.17.1

MEAD uses the Spread group communication system for inter-object communication. More details can be found below under the Running and Configuring Spread section of this manual.

Distribution

The directory containing the latest release of the MEAD software distribution is located on Emulab (ops.emulab.net) at /groups/pces/uav_oep/mead_cmu/release/mead.

The current MEAD release is comprised of library code for the MEAD replication mechanism, the MEAD Manager application, pre-built Spread libraries for Red Hat 9, and example scripts used to run the replication infrastructure:

   /MEAD ($MEAD_ROOT)
     /docs
     /examples
         /counter
         /stateless
         /namingservicestateless
     /manager
         /idl
         /src
     /replication
         /include
         /lib
         /obj
         /src
     /scripts
     /spreadlinuxbin
           /docs
           /include

Known Caveats and Recommendations

In the interests of fault containment and effective replication, we recommend hosting only one replica of a server per node. While it is possible to host more than one server replica per node, if the node is to contain the fault, and a single processor-crash should not affect more than one replica, running only one replica of a server process per node is ideal.

The current version of MEAD requires the CORBA application to be deterministic in behavior, i.e., any two replicas of the server, when starting from the same initial state, and receiving the same set of invocations in the same order, should reach the same final state. This eliminates the use of local timers, shared memory and any other OS primitives that can lead to irreproducible behavior across different nodes in a distributed system. Determinism is a common assumption in the development of fault-tolerant distributed systems; while ongoing development in the MEAD project aims to eliminate the need for determinism, the current version of the MEAD requires this of the application.

Finally, a few limitations of the MEAD Manager and Resource Monitoring Framework can be found here.

Reporting Problems & Obtaining Support

If you face a specific problem with running or installing the MEAD system, please fill out the details of the problem using the Support Request Form. For more general questions on MEAD, please email us at mead-support@lists.andrew.cmu.edu

CORBA Application Requirements

In order to restore application-level state in the event of a server crash, the Fault-Tolerant CORBA standard requires every CORBA object to support an additional Checkpointable interface, with methods for the retrieval and assignment of application-level state. This interface is an abstract class and can be inherited by the application when working with MEAD. Note that State can be defined in a custom way, based on the CORBA object's state. MEAD simply invokes these methods in order to perform checkpointing and state transfer for both warm passive and active replication. After implementing these functions the application programmer does not need to worry further about them (except, of course, for updating the implementations of these methods should the application's definition of state change) MEAD handles ORB-level and infrastructure-level state so that the application programmer does not need to worry about them.

  exception NoStateAvailable{};
  exception InvalidState{};

  interface Checkpointable {
    State get_state() raises(NoStateAvailable);
    void set_state(in State s) raises(InvalidState);
  };

When developing CORBA applications for MEAD, it is necessary to implement get_state() and set_state() at the object level, and get_global_state() and set_global_state() for void pointer data at the application level (extern functions).

At the application level, the example below uses void pointers to offer a less restrictive implementation. The data type state can be any type of structure containing all of the current state for the process as well as CORBA object state from above. This C code can be used to declare the external functions for get_global_state() and set_global_state():

  extern "C" void *get_global_state (int *size);
  extern "C" void set_global_state (void * state, int size);

The sample program "counter" ($MEAD_ROOT/examples/counter) contains simple usage of both the Checkpointable interface and the globally extern'd functions.

Building MEAD

The MEAD library (libmead.so) requires the Spread GCS for communication, more details can be found further in this document at this link. Spread must be installed and correctly configured prior to building MEAD. More details can be found at the Spread website. Example programs require ACE/TAO to be installed, details can be found at the TAO website.

To build all of the MEAD release and sample programs, type the following commands at the MEAD root to run a recursive make:

source source_this_for_Emulab
- This will establish the $MEAD_ROOT environmental variable
make

The replication makefile makes use of the Spread library (libspread.so) found in the spreadlinuxbin directory. If you are not using Red Hat 9 it will be necessary to install and configure Spread on your target platform. For convenience this MEAD distribution includes pre-built Spread binaries in the spreadlinuxbin directory.

Make can be executed individually in the $MEAD_ROOT/manager and $MEAD_ROOT/replication directories to compile just the MEAD components.

Finally, both the sample CORBA applications as well as the MEAD Replication Library makefiles can compile the target application in debug mode. This will enable portions of the inner workings of the MEAD Replication Library or CORBA applications to be viewed from the standard output.

To Enable Debugging

To enable debugging of the Replication Library look for the CPPFLAGS variables and values, they will be commented out by default. Two values can be used for runtime debugging: −DDEBUG and −DTRACE. −DDEBUG is used for various print statements containing runtime values, while −DTRACE is used for a full function trace of the MEAD Replication Library's execution. Uncomment the values and recompile to view debugging information.

Makefiles for the example programs also contain debugging information. Look for and uncomment the mINCLUDES variable that contains the −DDEBUG value.

To Enable the Resource Monitoring Framework

To enable resource monitoring, look for the CPPFLAGS in the $MEAD_ROOT/replication makefile and uncomment the −DMONITOR flag. More details can be found in the section MEAD Manager and Resource Monitoring Framework.

MEAD Manager and Resource Monitoring Framework

Overview

The MEAD Manager is a run-time component of MEAD that performs automatic replica launching to 1) bootstrap the fault-tolerant system and to 2) recover from process and processor crash faults. The Manager’s distributed resource monitoring framework collects and broadcasts detailed resource usage statistics for each replica and node in the system. The Manager connects to the MEAD Replication Library to perform event-triggered fault detection and lightweight accounting of network traffic produced and received per process. System data and events are broadcast over Spread, enabling remote system monitoring and management. The section of the following diagram labeled "Now Available" incorporates all functional portions of the MEAD 1.5 release.

Architecture

The MEAD Manager process runs on each node in the distributed system and communicates with other Managers through Spread. On launch, the Manager reads the mead.conf configuration file. The mead.conf file describes all programs to run and monitor. Processes can be launched during the Manager’s startup or be triggered remotely through the communication group. When launch is signaled on a host, the manager uses the program options to build a command line and forks and execs a new process. The process is LD_PRELOADed with the MEAD Replication Library. When the process launches, the MEAD Replication Library opens a socket (defined in environmental variable MANAGER_PORT) to the MEAD Manager that is used to collect network data and to detect process failures through socket closures. The MEAD Manager periodically collects resource data from the process and from the kernel through /proc, then packages and broadcasts the data to the rest of the system. Much of the MEAD Manager’s functionality is customizable through the mead.conf file including the resource data broadcast frequency. The MEAD Manager can also be configured to launch the spread daemon on startup, simplifying system bootstrapping.

Configuration

The MEAD Manager is located in the $MEAD_ROOT/manager folder of the MEAD distribution. The following steps need to be performed to execute the MEAD Manager:

source MEAD’s environment on each node for your experiment
- In the Root of the MEAD folder
  %source source_this_for_Emulab
make the MEAD Replication Library with DMONITOR enabled
- Open the $MEAD_ROOT/replication/Makefile and search for −DMONITOR. Make sure this line is uncommented, save changes, re-make the replication library.
run a top level make at $MEAD_ROOT
- this will build example programs, the MEAD Replication Library, and the MEAD Manager
execute Spread on each node of your experiment
- details can be found here
configure the mead.conf file
- details can be found here
execute the MEAD Manager on each node of your experiment
- % ./manager
- The manager display will update as connections are established with other managers currently running in the Spread Segment.
- The MEAD Manager will launch the applications configured in the mead.conf file, or wait for applications launched manually with the MEAD Replication Library to make connections to the Manager.
- Applications launched manually with the MEAD Replication Library will have to have the environmental variable MANAGER_PORT set to the localhost port where the MEAD Manager resides, by default this is 11051.
- After the applications are launched watch the monitor ui for resource updates, service details, and node states. Within a few seconds, the managers should recognize each other and the interface will update accordingly.

The mead.conf File

The $MEAD_ROOT/manager/mead.conf file is used to customize the MEAD Manager’s behavior. The mead.conf file is self documented and contains detailed instructions on configuration of all available options and parameters. The mead.conf file defines five main tags:

Tag	Description
<managerHome>	Manager home directory; usually set to $PWD
<stateBroadcastInterval>	Broadcast interval in milliseconds
<spreadPath>	Spread path; if uncommented the MEAD Manager will attempt to launch Spread during startup. If set to run Spread, $MEAD_ROOT/manager/spread.conf must be present and configured for the Spread Segment. For most purposes the spread.conf file in the manager directory will be identical to the spread.conf.emulab file in the scripts directory. Details on configuring the spread.conf file can be found here.
<spreadPort>	Spread port; default is 6011. Should also be the same as the port defined in $MEAD_ROOT/manager/spread.conf
<service>	Service tag defines an application to manage. Services can be launched during the Manager’s startup (bootstrapping). Services can also be set to re-launch if they crash. All services are monitored. More details of the <service> tag are found here.

Service Tags

Create a <service> section for each program that the MEAD Manager should monitor and/or launch. The following table details parameters of the <service> tag

Name	Description	Status
name	Service name	Mandatory
path	Path to executable	Mandatory
host	Host node to execute on. Setting to “auto” will not automatically launch the process, instead the Manager will wait for the process to be either launched manually from the console or via Spread commands.	Mandatory
arg	Executable arguments (command line arguments). All command line arguments are white space delimited and are inserted into the command line in-order.	Optional
env	Executable environment	Optional
preload	Path for library for LD_PRELOAD	Optional
window	Launch process in separate x-term. The output of programs that are not launched in a new shell is piped to a file in the MEAD/manager/output directory.	Optional
monitor	Preload monitor library	Deprecated
relaunch	Automatically relaunch replica on failure	Optional
repStyle	MEAD Replication Library replication style	Not Available
checkpointingInterval	MEAD Replication Library WARM_PASSIVE checkpointing interval	Not Available
faultDetectionInterval	Socket based fault detection interval	Not Available

Interfacing

Any program that connects to Spread can subscribe to the MEAD Manager’s Spread Group to receive resource data broadcasts or launch and kill processes. The spread group is named meadManagers and the message types are defined the following file $MEAD_ROOT/manager/src/meadMsgTypes.h. You can use the SpreadConnection wrapper class defined in $MEAD_ROOT/manager/src/SpreadConnection.h to simplify connecting to spread and receiving and sending spread messages.

Current Limitations

As of the MEAD 1.5 release the following limitations are acknowledged and we are actively looking for solutions:

A SIGINT handler is currently installed by the MEAD Replication Library when compiled with DMONITOR. This will overwrite any user SIGINT handler
MEAD Manager should only be used for server replicas in most cases. SIGIO is used for asynchronous socket communication and seems to cause a conflict with user input based applications (e.g. console based clients).
Due to the tight communication model between a MEAD Replication Library and the MEAD Manager, killing a MEAD Manager may kill replicas currently be monitored.

Running and Configuring Spread

Spread Overview

Spread is a group communication system developed at John Hopkins University and currently developed by Spread Concepts LLC and CNDS at John Hopkins University. Additional information can be found at the Spread website as well as the Spread User Manual.

The Spread daemon can be launched using this command:

% spread -c <spread.config.file>

Spread Segment

The Spread configuration file contains options for the Spread daemon, and comments on the configuration of each option. MEAD is mostly concerned with the Spread_Segment { }, which contains a list of <hostname> <IP> pairs for each node in the cluster, as well as a broadcast address and port for execution. The following is a sample configuration file used in the Emulab environment. The broadcast address can differ in range (i.e., in the first three dotted-decimal places) from the IP addresses of the three nodes; this is specific to the Emulab environment where the nodes seem to be multi-homed. The broadcast is for local connectivity and the individual host IPs are actual addresses that can be resolved for the machine.

  Spread_Segment BroadcastAddress:SpreadPortNumber {
    node0    node0_IP_address
    node1    node1_IP_address
    node2    node2_IP_address
    :    :
  }

Finally, only use the computer name for the <Hostname> parameter and not the fully qualified name for the computer (e.g. use node0 not node0.test.pces.emulab.net).

Spread Timeouts

When compiling the Spread library and the Spread daemon, there is a series of timeouts that can be set can be configured and compiled. These can significantly affect the performance of both Spread and MEAD. More details can be found in the Spread User Manual.

Spread Related Errors returned by MEAD

The following lists a few common errors related to configuration of spread that are returned from the MEAD library, more details can be found at: http://www.spread.org/docs/docspread.html. When launching spread from the Spreadstart script in the $MEAD_ROOT/scripts folder errors are generally masked by the scripts. To trouble shoot the issue, launch spread at the command line using the command above.

MEAD: Could not connect to SPREAD daemon! (error=-2)
- The Spread_Segment may have not been setup properly to include the <Hostname> <IP> where the current application resides.
- The port used in the Spread_Segment may be different from the port used in the MEAD library

Running MEAD on Emulab

The steps below detail the execution of the MEAD Replication Library using example bash scripts provided in the $MEAD_ROOT/scripts directory. This run will replicate three servers in either active or warm passive replication styles and will use a single client to make requests of the three-way replicated server. An additional step will include the MEAD Manager for resource monitoring. These bash scripts set the environmental variables necessary to execute, feel free to open them up and look at how the variables are used.

The primary test-bed for MEAD software is Emulab (BBN-RH9-SS7-8 Emulab image).

Some of the following steps need to be executed on every assigned Emulab node (step is prefaced by [every node]) or only on one assigned Emulab node (step is prefaced by [one node]). The symbol % represents the shell command-line prompt.

Setting up the Emulab Experiment
- Enter the following values in the "Begin a Testbed Experiment" form on Emulab
  - Select Project: pces (If this choice is not available to you, please contact us).
  - Group: pces/uav_oep
  - Description: Replication with MEAD
  - Your NS file: Please download the sample NS file that we have provided in order to simplify the configuration of the topology for this experimental run. You can edit this file for future runs.
- The topology consists of four nodes, one 100Mbps LAN; three nodes will be used to host three individual server replicas, and the fourth node will be used for the client application.
- Use the OSID BBN-RH9-SS7-8 for each of the nodes when setting up your experiment with Emulab.
- Wait to receive an email from the Emulab Testbed Operations team (notifying you of the availability of your requested experimental configuration) before proceeding further.
- Once you receive the notification about your experiment being started, ssh into the machines assigned to you, according to the instructions on Emulab.
[any node] Obtaining the MEAD source distribution
- The MEAD distribution is available on users.emulab.net in the directory: /groups/pces/uav_oep/mead_cmu/release/mead/
- Use cp (or scp) to copy the MEAD source to your working directory (for convenience, copy from the tar ball from the root of the release folder).
- If you copied the .tar.gz file, then, you should do the following to extract the MEAD directory:
  % tar -zxf mead-1.0.tar.gz
  A directory called MEAD is created in the process of extracting the .tar.gz file.
[every node] Obtaining the right environmental settings
- Enter the MEAD directory and obtain the right environmental settings for your binary and library search-paths:
  % source source_this_for_Emulab
[any node] Building the MEAD library and the test applications
- Enter the MEAD root and type make to run a recursive make, e.g.,
  % cd $MEAD_ROOT
  % make
Setting up the Spread configuration
- Find out the following information for your assigned Emulab nodes: each node's IP address, each node's hostname, and the broadcast address.
  - [any node] The Emulab machines appear to be multi-homed (five interfaces per node). To see this for yourself, try the following:
    % ifconfig -a
    You will see interfaces eth0 through eth4 defined.
  - [any node] For the broadcast address, look at the output of:
    % ifconfig eth3
    and replace the last part of the dotted-decimal IP address with 255, e.g., if you see an IP address of 10.1.1.4, the broadcast address is 10.1.1.255
  - [every node]For each node's IP address, look at the output of:
    % hostname -i
    to obtain the IP address associated with the interface.
  - [every node] For each node's hostname, look at the output of:
    % hostname
    to obtain the qualified hostname associated with the interface. You will only need the first part of this address, i.e., for node8. *.pces.emulab.net, you only need to remember node8).
- [any node] Edit the file MEAD/scripts/spread.conf.emulab to specify the set of Emulab nodes that you are using for your experiment.
  - Edit the Spread_Segment by listing the <hostname> <ip> pairs within the parentheses.
  - The default Spread port number is 6011, but can be changed; however, the new value of the port must be identical in two places: (i) the Emulab configuration file MEAD/scripts/spread.conf.emulab, and (ii) the SPREAD_PORT environment variable supplied to the application (and, indirectly, to MEAD) at run-time.
  - Here is an example for a broadcast address of 10.1.1.255, a Spread port number of 6011 and four nodes (node6, node7, node8 and node9) with their respective IP addresses:
      Spread_Segment 10.1.1.255:6011 {
        node6  155.101.132.92
        node7  155.101.132.93
        node8  155.101.132.94
        node9  155.101.132.95
      }
[every node]Running Spread
- Launch the Spread daemon (spread) on each host that will run either a server or a client. The Spread daemon can be launched from the MEAD root directory, as follows:
  % cd MEAD/scripts
  % ./Spreadstart
[every node – OPTIONAL STEP] Launch Manager
- If the MEAD Replication Library was build with the −DMONITOR flag active, applications launched in step 8 will attempt to make a connection to the MEAD Manger. A MEAD Manager, if executing, will track the resource usage of the host and the replica. If the manager program is not executing a failed connection attempt will be returned by the new replica, but processing will continue. To launch a manager per host, follow these steps:
  - Open a separate terminal on a host that currently has the Spread Daemon executing
  - % cd $MEAD_ROOT
  - % source soure_this_for_emulab file
  - % cd $MEAD_ROOT/manger
  - Configure the mead.conf file for resource monitoring interval, custom Spread port, and a <service> entry in for the process. The service entry host should be set to "auto". More details can be found here.
  - % ./manager
  - Keep the terminals open to view the resource usage when executing the scripts in Step 8.
There are six scripts that demonstrate the execution of the three provided test applications. All of the scripts are found in the MEAD/scripts directory, and display usage information. The scripts will not launch if the Spread daemon is not running. The scripts provide settings for most of the MEAD environment variables. If you wish to bypass the scripts and run processes directly at the command-line, please read the MEAD Replication Library Parameters section of this document.

Choose one of the three applications below (either counter, stateless, namingservicestateless* ) and launch one server replica on each of the three nodes of your cluster. Finally, launch the client as instructed below. Servers (replicas) can be killed at any time using Control-C. Launching a server script will add a new server replica to the group, initialize its state, and the either start normal processing (for active replication) or state transfer (for warm passive replication).

counter is a "stateful application", where the server maintains the count of user requests, and implements the Checkpointable interface for the CORBA objects and exports global state for the actual application.
- To run a three-way actively replicated version of this test application:
  % cd $MEAD_ROOT/script
  % ./counterServer_run SVR ACTIVE
  % ./counterClient_run CLNT SVR
- To run a three-way warm passively replicated version of this test application:
  % cd $MEAD_ROOT/scripts
  % ./counterServer_run SVR WARM_PASSIVE
  % ./counterClient_run CLNT SVR
  
  stateless is a "stateless application" that sends a string containing the process identifier of the process and the current number of requests that the object has serviced.
- To run a three-way actively replicated version of this test application:
  % cd $MEAD_ROOT/scripts % ./statelessServer_run SVR ACTIVE % ./statelessClient_run CLNT SVR
- To run a three-way warm passively replicated version of this test application:
  % cd $MEAD_ROOT/scripts
  % ./statelessServer_run SVR WARM_PASSIVE
  % ./statelessClient_run CLNT SVR
  
  namingservicestateless is a the same stateless application from above, but it makes use of the Naming Service to acquire references to the CORBA Objects. In addition to launching the scripts for the application the Naming Service must be started. The port that the Naming Service is running on must be allowed to pass by the MEAD library. This is allowed by settings the environmental variable for the host and port when launching the MEAD. Use NSstart and NSkill to start and stop the applications
- To run a three-way actively replicated version of this test application:
  % cd $MEAD_ROOT/scripts
  % ./statelessServer_run SVR ACTIVE
  % ./statelessClient_run CLNT SVR
- To run a three-way warm passively replicated version of this test application:
  % cd $MEAD_ROOT/scripts
  % ./statelessServer_run SVR WARM_PASSIVE
  % ./statelessClient_run CLNT SVR
  
  After the server replicas are done processing invocations from the client, systematically kill (^C) the server replicas, until the system is down to the last remaining replica. Then, bring the other two server replicas back up. Finally kill the previously remaining server replica. The system will still continue to function, due to MEAD's fault-tolerant support.
[every node] Cleaning Up
- Kill the Spread daemon (at the end of all your experiments, and not before, or during, your experiments):
  % cd MEAD/scripts
  % ./killSpread

Must-Read Notes

Some common things to watch out for when running MEAD:

The source_this_for_Emulab file must be sourced only from the $MEAD_ROOT directory, where it is located.
The Spread daemon must be running on every node on which you expect to run clients or servers using MEAD.
The client group id (CLNT) and the server group id (SVR) should not be the same for a process; these identifiers represent the client and server object group names, respectively, and should be distinct.
The same Spread group id (e.g. SVR) should only be used once per host.
When replicating a server, ensure that all of the replicas of the server are launched with the same object group identifier and the same replication style, i.e., within the same group, all of the replicas must possess the same GID and should have the same REPLICATION_STYLE
- For example, make sure when launching an actively replicated server, ensure that you use the same command on all three server machines (./counterServer_run SVR ACTIVE).
Pure clients should always be launched using ACTIVE replication style in order to enable duplicate suppression (the client scripts in the $MEAD_ROOT/scripts directory automatically take care of this), regardless of the server's replication style.

Using the Naming Service

As of the MEAD 1.1 release MEAD has support for running the CORBA Naming Service. To enable the MEAD Replication Library to work with CORBA applications using the Naming Service the following environmental variables must be set:

USING_NS
NS_PORT
NS_HOST

For example, when launching a simple application try the following:

  % env LD_PRELOAD=$MEAD_ROOT/replication/libmead.so.1 \
      REPLICATION_STYLE=ACTIVE \
      GID=MYCLNT \
      SERVER_ID=MYSVR \
      IS_STATELESS=yes \
      SPREAD_PORT=6011 \
      MANAGER_PORT=11051 \
      USING_NS=yes \
      NS_HOST=10.1.1.4 \
      NS_PORT=6012 \
      IS_SERVER=no \
      $MEAD_ROOT/examples/counter/client iorfile time

Notice that the previously mentioned environmental variables are set. Both NS_HOST and NS_PORT are required for the MEAD Replication library to work with the Naming Service.

MEAD Replication Library Parameters

The MEAD Replication Library uses several environmental variables for configuration. The following table provides the names and descriptions of the variables as well as acceptable values.

Name	Description	Value
LD_PRELOAD	Used to specify that run time linking and loading should include LD_PRELOAD's libraries first. Note that the library will be loaded for all applications in that environment or process address space.	libmead.so.1
REPORT_CALLS	REPORT_CALLS is used to filter the debugging messages returned by the MEAD library. The parameter as a string value that includes the names of intercepted system call functions. Only used when the MEAD Replication Library is compiled using the –DREPORT Option.	all, socket, read, writev, bind, select, connect, accept, ETC.
REPLICATION_STYLE	Used to set the replication style for the application that is being launched.	ACTIVE or WARM_PASSIVE
GID	Object group name.	alpha-numeric strings between 3 and 8 bytes.
SERVER_ID	Object group name for a server that a client application will connect to.	alpha-numeric strings between 3 and 8 bytes.
SERVER2_ID	Object group name for another type of server that the client can connect to.	alpha-numeric strings between 3 and 8 bytes.
IS_STATELESS	Lets MEAD know that the application is stateless.	yes/no
SPREAD_PORT	Port number that MEAD should look for the Spread daemon to run on.	dynamic port range only.
USING_NS	Lets MEAD know the Naming Service is being used. If set to 'yes' NS_HOST and NS_PORT must also be set.	yes/no
NS_HOST	IP that the Naming Service is running on. Intercepted calls to this ip and port combination will be allowed to pass through the MEAD interceptor.	IP (xxx.xxx.xxx.xxx)
NS_PORT	Port number that MEAD will bypass during the interception process.	dynamic port range only
MANAGER_PORT	MEAD Resource Manager Port. This port allows a connection to be established between a MEAD replica and the MEAD Resource Manager.	Numeric value; default is 11051

Other Configurations

MEAD will run in other environments that have configured functional copies of both ACE/TAO and Spread. The examples above should work as well provided that:

The proper paths are set for the environment to build and runtime libraries.
MEAD makefiles for both the MEAD Replication Library and the sample applications are modified to include the proper paths to libraries and include files
The MEAD Replication Library is rebuilt using the version of Spread for the intended platform
The sample CORBA applications are rebuilt using the version of ACE/TAO for the intended platform
All of the scripts in the scripts sub-directory should be changed to include the proper path for Spread

Trouble-shooting

Title	Description/Possible Solution
LD_PRELOAD is not set	In most cases the CORBA application will still function, without the replication mechanism.
When Spread is not running	MEAD will return the following error message: Could not connect to SPREAD daemon! (error=-2). The application will not actually launch.
env variables are not correct	Critical environmental variables are checked during initialization. MEAD will return error messages that variables are not set.
LD_LIBRARY_PATH does not include right libraries	These are standard errors that will be returned by the run time linker and loader. In most cases they include the name of the library that has not been pathed properly.
Compiled with –DMONITOR but manager is not running	A trivial error will return indicating a failed connection. The MEAD Replication Library will still function for the duration of execution, but resource monitoring will not be performed.

References

Publications on the MEAD sytem: http://www.ece.cmu.edu/~mead
The Spread group communication system: http://www.spread.org/
Emulab distributed test-bed: http://www.emulab.net/
- Emulab "Getting Started" Tutorial
Fault-Tolerance CORBA specification: http://www.omg.org/docs/formal/04-03-21.pdf

Contributors

Contributors to MEAD include:

Priya Narasimhan
Tudor Dumitras
Aaron Paulos
Soila Pertet
Charlie Reverte
Joe Slember
Deepti Srivastava