15-712 Vahe Poladian Nov. 16, 2001 IFG Based Event Distribution Middleware Traditionally, publish / subscribe systems are subject-based, whereby producers and consumers of messages identify the messages that they send or are interested in receiving by _subject_. Subject-based routing is coarse-grain in that it necessitates the recipient of the message to sift through the entire data in order to filter down to a more relevant set. Furthermore, in real-world several related data streams, delivering essentially similar information, might use different format for presenting such information (e.g. NASDAQ-based quotes vs NYSE-based quotes). In such cases there is a need for efficient transformation of messages. The authors of this paper present a system, which is: a. based on a subject-based publish subscribe system, but additionally offers the following features: i. content-based publish / subscribe. In other words, the subscribers (consumers) of information can specify content-sensitive rules for receiving relevant data, ii. content transformations that are stateless. This type of transforms essentially convert one or more related messages of one format into one or more messages of another format. iii. transformations that are more data-manipulative in nature, e.g. summaries, averages, etc. These authors call data-stream interpretations, The key ideas of the paper are as follows: 1. Use Information Flow Graphs to define various components and dataflows of publich / subscribe systems, rougly including: data sources and sinks, and various kinds of data transforms manipulation ops as "select" (filter), "transforms", "collapse", and "expand" (notice that latter two are very interesting in developing one of the examples in the paper). 2. Come up with a formalism for IFGs: components and transformations, in order to prove certain properties about the IFGs. Such properties are essential in that they help to "commute" various dataflows in order to arrive at a more optimal or efficient configuration. Two sources of optimizations that are mentioned are: i. select and transform flows can be commuted, thereby select flows can be pushed before transform flows. This clearly will reduce the data being transformed, ii. several selects in a row can be combined, and several transforms in a row can be combined. Notice that the composition of the rules above means that a mixed group of selects and transforms can be combined nicely, e.g.: [T T S T S T] --> (by virtue of rule i) [S S T T T T] and --> (by virtue of rule ii) [ S T] Such optimizations could clearly translate into tangible performance gains at run-time. Furthermore, reasoning about the IFG formally will allow more research in the future. -- It seems that the ideas of the paper are new and good. However, the research work itself is in somewhat of a tentative state (I guess the reader gets as hint of that by reading section 5.2 on current status and future work). Even the exposition of the paper kind of suggests that. Notice that there are 15 citations in the bibliography but only a handful are referenced. Furthermore, judging from the current status section, it seems that authors are aiming to handle lots of problems. Although all of those are related, it does not seem that they are all that easy. Let's consider them: a. implementing stateful operations in IFGs (related), b. mapping the optimize IFGs to physical brokers (related), c. incorporating protocols for reliable and ordered delivery, (sort of different area), d. putting it all together into real-world projects (stretch), Especially for thet last point, I think they need quite a bit more research into the current state in industry, including analysis of more than one commercial tool that they allude to. I am not sure whether mentioning all that future work helps the credibility of the research.