The Anatomy of a Small-Scale Hypertextual Web Search Engine Adhish Ramkumar Robert Maratos Carnegie Mellon University ABSTRACT This paper studies the feasibility of implementing a search engine on small web domains and how the web and commodity hardware has changed since the early 2000s. To give a quantity to this measurement, this paper evaluates how large the fan-out is when a web crawler crawls Carnegie Mellon University sub domains, how quickly web crawlers are able to recover in the face of failure, the percentage of dead links encountered, and how long it takes to index specific sub-domains. The system detailed in this paper was initially limited to only scraping pages under the Carnegie Mellon University: Department of Electrical & Computer Engineering domain (ece.cmu.edu), in order to avoid storage limitations that may have arisen due to the sheer volume of pages published by the university. After evaluating the storage needs of the smaller domain, the paper analyzes the performance of the system as it scales up to include all pages published by Carnegie Mellon University (www.cmu.edu).