FastHASH

FastHASH: A Filtering Algorithm for NGS Read Mapping

FastHASH is a filtering algorithm designed to speed up the seed-and-extend based read mapper for NGS (Next Generation Sequencing) technologies.

FastHASH has two independent components: Adjacency Filtering (AF) and Cheap K-mer Selection (CKS). Both components aim at reducing the frequency of edit-distance calculations of the mapper: AF filters out false mappings through simple computations while CKS reduces the number of potential mappings through carefully picking seeding K-mers.

With FastHASH, the frequency of edit-distance calculation of the mapper is drastically reduced, drastically improving the performance of the mapper.

FastHASH is applied by mrFAST after version 2.5. We observed up to 19x speedup by applying FastHASH to mrFAST.

As a filter, FastHASH does not affect the sensitivity of the mapper. For mrFAST, this means the mapping result of mrFAST retains the same after integrating FastHASH into the mapper.

Important notice for mrFAST users: mrFAST with FastHASH (post 2.5.0.0) is drastically faster than plain mrFAST (pre 2.5.0.0). When using mrFAST, please ensure the version is after 2.5.0.0.

Click here to download the latest version of mrFAST.

Please cite the following paper if you use FastHASH or any mrFAST version after 2.5.0.0 :
  • Hongyi Xin, Donghyuk Lee, Farhad Hormozdiari, Samihan Yedkar, Onur Mutlu, and Can Alkan,
    "Accelerating Read Mapping with FastHASH" BMC Genomics, 14(Suppl 1):S13, 21 January 2013. PDF article
    also appears in Proceedings of the 11th Asia Pacific Bioinformatics Conference (APBC), Vancouver, BC, Canada, January 2013.