15-712
Vahe Poladian
Nov. 16, 2001

IFG Based Event Distribution Middleware

Traditionally, publish / subscribe systems are
subject-based, whereby producers and consumers
of messages identify the messages that they send
or are interested in receiving by _subject_.

Subject-based routing is coarse-grain in that it
necessitates the recipient of the message to sift
through the entire data in order to filter down to
a more relevant set.

Furthermore, in real-world several related data streams,
delivering essentially similar information, might use
different format for presenting such information (e.g.
NASDAQ-based quotes vs NYSE-based quotes).  In such cases
there is a need for efficient transformation of messages.

The authors of this paper present a system, which is:

a.  based on a subject-based publish subscribe system,
but additionally offers the following features:

 i.  content-based publish / subscribe.  In
other words, the subscribers (consumers) of information
can specify content-sensitive rules for receiving
relevant data,

 ii.   content transformations that are stateless.
This type of transforms essentially convert one or more
related messages of one format into one or more messages
of another format.

 iii.  transformations that are more data-manipulative
in nature, e.g. summaries, averages, etc.  These authors
call data-stream interpretations,

The key ideas of the paper are as follows:

1.  Use Information Flow Graphs to define various components
and dataflows of publich / subscribe systems, rougly including:
data sources and sinks, and various kinds of data transforms
manipulation ops as "select" (filter), "transforms", "collapse",
and "expand" (notice that latter two are very interesting in
developing one of the examples in the paper).

2.  Come up with a formalism for IFGs: components and
transformations, in order to prove certain properties about
the IFGs.  Such properties are essential in that they help
to "commute" various dataflows in order to arrive at a more
optimal or efficient configuration.  Two sources of optimizations
that are mentioned are:

 i.  select and transform flows can be commuted, thereby
select flows can be pushed before transform flows.  This clearly
will reduce the data being transformed,

 ii.  several selects in a row can be combined, and several
transforms in a row can be combined.

 Notice that the composition of the rules above means that
a mixed group of selects and transforms can be combined nicely,
e.g.:
 [T T S T S T] --> (by virtue of rule i) [S S T T T T]
and --> (by virtue of rule ii) [ S T]

Such optimizations could clearly translate into tangible
performance gains at run-time.  Furthermore, reasoning about the
IFG formally will allow more research in the future.

--
It seems that the ideas of the paper are new and good.  However, the
research work itself is in somewhat of a tentative state
(I guess the reader gets as hint of that by reading section
5.2 on current status and future work).  Even the exposition
of the paper kind of suggests that.  Notice that there are 15
citations in the bibliography but only a handful are referenced.

Furthermore, judging from the current status section, it seems
that authors are aiming to handle lots of problems.  Although
all of those are related, it does not seem that they are all
that easy.  Let's consider them:

a.  implementing stateful operations in IFGs (related),
b.  mapping the optimize IFGs to physical brokers (related),
c.  incorporating protocols for reliable and ordered delivery,
(sort of different area),
d.  putting it all together into real-world projects (stretch),

Especially for thet last point, I think they need quite a bit
more research into the current state in industry, including analysis
of more than one commercial tool that they allude to.

I am not sure whether mentioning all that future work helps the
credibility of the research.