Massively Distributed Monitoring Christopher Peplin (peplin@cmu.edu) May 1, 2011 Abstract: As the scale of distributed systems continue to grow, the basic question of monitoring the system's status becomes more difficult. How do you monitor an extremely large scale network of nodes without requiring a massive, centralized cluster for data collection? To monitor these systems from a central location would mean potentially hundreds of thousands of new connections every second, each with a very small update. Persistent connections do not help, since a server cannot maintain that many open connections. This paper evaluates different approaches to monitoring and describes the implementation of a few novel monitoring techniques in the peer-to-peer video distribution network Astral. Peers in Astral use a self-organized dynamic hierarchy, temporal batching and update filtering to increase the scalability of the monitoring subsystem. Some of the trade-offs associated with these optimizations are also enumerated.