Erik Garrison
email twitter code papers

+1 502 382 6005 / +39 320 244 2758
Vico San Pietro a Maiella, 6
80138 Napoli, NA, Italy

Genomicist with a quantitative social science background. Harvard undergrad, Cambridge PhD. Learned in the ways of free culture. Sharing in the powers of free software. Lover of commonwealths. Born in Kentucky, matured in Massachusetts, honed in England. Lives in Italy and travels the world, both physically and virtually.


PhD in Genomics
Cambridge University
October 2014 January 2019
Student at the Wellcome Sanger Institute. Advised by Richard Durbin. Thesis “Graphical pangenomics” put forward methods of using pangenomes encoded in sequence variation graphs in alignment and genome inference. Led the development of vg, an open source toolkit enabling the use of genome graphs in bioinformatic analysis. Visiting researcher at Stazione Zoologica Anton Dohrn and visiting student at Cambridge Genetics. Explored applications of variation graphs to analyses in population genetics, ancient DNA, marine biology, metagenomics, and genome assembly.

Bachelor of Arts in Social Studies
Harvard University
Fall 2002 Spring 2006
Undergraduate Fellow, Harvard Institute for Quantitative Social Science. Senior thesis focused on the relationship between social structure and communication technologies. Electives included classes in functional programming, theoretical computer science, peer-to-peer networks, and linear algebra. Spanish language citation. Rower from 2002 to 2005.


University of California, Santa Cruz
February 2019 present
Postdoctoral Fellow with Benedict Paten. Developing scalable methods for pangenomic analysis based on genome variation graphs.

September 2015 March 2018
Part-time contractor with research and development team. Explored machine learning based approaches to variant calling as part of the PrecisionFDA Challenge, producing the hhga variant caller. Maintenance, development, and continued support of vcflib and freebayes.

Boston College
February 2010 September 2014
Research associate in the laboratory of Gabor Marth. Designed and implemented freebayes, a genetic variant detector designed for short-read sequencing data. Developed tools to manipulate sequencing data and descriptions of genetic variation.Wrote first haplotype and graph-based variant detection methods for short-read sequencing data. Generated the final 1000 Genomes Project release, and helped to produce its paper as part of the project writing group.

The Echonest
January 2009 May 2009
Contractor. Designed and implemented control and monitoring systems to manage a compute cluster deployed in the Amazon EC2 cloud.

One Laptop Per Child
May 2008 January 2009
Software engineer. Focused on operating system build processes, customer support, maintenance, software design planning, communication among a globally-dispersed group of volunteers and educators.

Harvard Medical School
August 2006 April 2008
Contractor in the laboratory of George Church. Designed, wrote, and tested data acquisition and system control software for the ”Polonator” open-source DNA sequencing device.

National Bureau of Economic Research
May 2006 May 2007
Research assistant. Wrote software to efficiently process Wikipedia’s XML-based data dumps (wikiq), and evaluated metrics of user contribution. Analyzed data related to the internationalization of clinical trials.

Harvard Kennedy School of Government
January 2005 September 2005
Research assistant. Obtained and processed data for country-level quantitative studies of terrorism and violent extremism.


I build scalable computational approaches to infer genomes from DNA sequencing data. My work on this topic began with the development of Bayesian methods to detect and genotype genomic variants, with application of these methods to the thousands of human genomes cataloged in the 1000 Genomes Project. Lessons learned in that effort guided me to work on unbiased methods for genome inference based on graphical models of pangenomes. In these, the genome is encoded in a graph that may represent a population sample of individuals from the same species, a metagenome, the diploid genome of a single individual, or any other useful collection of genomic sequence information. I have shown that this approach provides more accurate alignment of reads when it is possible to construct a high-quality pangenome.

I develop and share my source code publicly ( under permissive open licenses. I am a frequent reviewer for Bioinformatics, Nature Biotechnology, and Nucleic Acids Research. I am a review editor for Frontiers in Genetics, Frontiers in Plant Science, and Frontiers in Bioengineering and Biotechnology. I have supported the following written works:

[1]   Glenn Hickey, David Heller, Jean Monlong, Jonas Andreas Sibbesen, Jouni Siren, Jordan Eizenga, Eric Dawson, Erik Garrison, Adam Novak, and Benedict Paten. Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv:654566, 2019.

[2]   Erik Garrison, Jouni Sirén, Adam M Novak, Glenn Hickey, Jordan M Eizenga, Eric T Dawson, William Jones, Shilpa Garg, Charles Markello, Michael F Lin, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology, 36(9):875–879, 2018.

[3]   Jouni Sirén, Erik Garrison, Adam M Novak, Benedict Paten, and Richard Durbin. Haplotype-aware graph indexes. arXiv:1805.03834, 2018.

[4]   Vincenza Colonna, Nunzio D’Agostino, Erik Garrison, Jonas Meisner, Anders Albrechtsen, Angelo Facchiano, Teodoro Cardi, and Pasquale Tripodi. Genomic diversity and novel genome-wide association with fruit morphology in Capsicum, from 746k polymorphic sites. bioRxiv:487165, 2018.

[5]   Benedict Paten, Jordan M Eizenga, Yohei M Rosen, Adam M Novak, Erik Garrison, and Glenn Hickey. Superbubbles, ultrabubbles, and cacti. Journal of Computational Biology, 25(7):649–663, 2018.

[6]   Shilpa Garg, Mikko Rautiainen, Adam M Novak, Erik Garrison, Richard Durbin, and Tobias Marschall. A graph-based approach to diploid genome assembly. Bioinformatics, 34(13):i105–i114, 2018.

[7]   Eric T Dawson, Sarah Wagner, David Roberson, Meredith Yeager, Joseph Boland, Erik Garrison, Mark Schiffman, Tina Raine-Bennet, Thomas Lorey, Phillip Castle, et al. rkmh: A MinHash toolbox for analyzing HPV coinfections. American Association for Cancer Research, 2018.

[8]   Adam M Novak, Glenn Hickey, Erik Garrison, Sean Blum, Abram Connelly, Alexander Dilthey, Jordan Eizenga, MA Saleh Elmohamed, Sally Guthrie, André Kahles, et al. Genome graphs. bioRxiv:101378, 2017.

[9]   Benedict Paten, Adam M Novak, Jordan M Eizenga, and Erik Garrison. Genome graphs and the evolution of genome inference. Genome research, 27(5):665–676, 2017.

[10]   Eric T Dawson, Erik Garrison, Adam Novak, Benedict Paten, Jordan Eizinga, Glenn Hickey, Stephen Chanock, and Richard Durbin. Germline structural variant detection with variation graphs. American Association for Cancer Research, 2017.

[11]   Adam M Novak, Erik Garrison, and Benedict Paten. A graph extension of the positional burrows–wheeler transform and its applications. Algorithms for Molecular Biology, 12(1):18, 2017.

[12]   Sebastian M Waszak, Grace Tiao, Bin Zhu, Tobias Rausch, Francesc Muyas, Bernardo Rodriguez-Martin, Raquel Rabionet, Sergei Yakneen, Georgia Escaramis, Yilong Li, et al. Germline determinants of the somatic mutation landscape in 2,642 cancer genomes. bioRxiv:208330, 2017.

[13]   Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 19(1):118–135, 2016.

[14]   G David Poznik, Yali Xue, Fernando L Mendez, Thomas F Willems, Andrea Massaia, Melissa A Wilson Sayres, Qasim Ayub, Shane A McCarthy, Apurva Narechania, Seva Kashin, et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nature genetics, 48(6):593, 2016.

[15]   1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature, 526(7571):68, 2015.

[16]   Colby Chiang, Ryan M Layer, Gregory G Faust, Michael R Lindberg, David B Rose, Erik P Garrison, Gabor T Marth, Aaron R Quinlan, and Ira M Hall. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nature Methods, 12(10):966, 2015.

[17]   Danny Challis, Lilian Antunes, Erik Garrison, Eric Banks, Uday S Evani, Donna Muzny, Ryan Poplin, Richard A Gibbs, Gabor Marth, and Fuli Yu. The distribution and mutagenesis of short coding indels from 1,128 whole exomes. BMC Genomics, 16(1):143, 2015.

[18]   Massimiliano Cocca, Marc Pybus, Pier Francesco Palamara, Erik Garrison, Michela Traglia, Cinzia F Sala, Sheila Ulivi, Yasin Memari, Anja Kolb-Kokocinski, Richard Durbin, et al. Purging of deleterious variants in Italian founder populations with extended autozygosity. bioRxiv:022947, 2015.

[19]   Peter H Sudmant, Tobias Rausch, Eugene J Gardner, Robert E Handsaker, Alexej Abyzov, John Huddleston, Yan Zhang, Kai Ye, Goo Jun, Markus Hsi-Yang Fritz, et al. An integrated map of structural variation in 2,504 human genomes. Nature, 526(7571):75, 2015.

[20]   Olivier Delaneau, Jonathan Marchini, Gil A McVean, Peter Donnelly, Gerton Lunter, Jonathan L Marchini, Simon Myers, Anjali Gupta-Hinch, Zamin Iqbal, Iain Mathieson, et al. Integrating sequence and array data to create an improved 1000 genomes project haplotype reference panel. Nature Communications, 5:3934, 2014.

[21]   Wan-Ping Lee, Michael P Stromberg, Alistair Ward, Chip Stewart, Erik P Garrison, and Gabor T Marth. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PloS One, 9(3):e90581, 2014.

[22]   Vincenza Colonna, Qasim Ayub, Yuan Chen, Luca Pagani, Pierre Luisi, Marc Pybus, Erik Garrison, Yali Xue, and Chris Tyler-Smith. Human genomic regions with exceptionally high or low levels of population differentiation identified from 911 whole-genome sequences. bioRxiv:005462, 2014.

[23]   Ekta Khurana, Yao Fu, Vincenza Colonna, Xinmeng Jasmine Mu, Hyun Min Kang, Tuuli Lappalainen, Andrea Sboner, Lucas Lochovsky, Jieming Chen, Arif Harmanci, et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science, 342(6154):1235587, 2013.

[24]   Mengyao Zhao, Wan-Ping Lee, Erik P Garrison, and Gabor T Marth. SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PloS One, 8(12):e82138, 2013.

[25]   Erik Garrison and Gabor Marth. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907, 2012.

[26]   1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422):56, 2012.

[27]   Simon Gravel, Brenna M Henn, Ryan N Gutenkunst, Amit R Indap, Gabor T Marth, Andrew G Clark, Fuli Yu, Richard A Gibbs, Carlos D Bustamante, David L Altshuler, et al. Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences, 108(29):11983–11988, 2011.

[28]   Chip Stewart, Deniz Kural, Michael P Strömberg, Jerilyn A Walker, Miriam K Konkel, Adrian M Stütz, Alexander E Urban, Fabian Grubert, Hugo YK Lam, Wan-Ping Lee, et al. A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genetics, 7(8):e1002236, 2011.

[29]   Derek W Barnett, Erik P Garrison, Aaron R Quinlan, Michael P Strömberg, and Gabor T Marth. Bamtools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 27(12):1691–1692, 2011.

[30]   1000 Genomes Project Consortium et al. A map of human genome variation from population-scale sequencing. Nature, 467(7319):1061, 2010.

Selected talks

Variation graphs for efficient unbiased pangenomic sequence interpretation. Biology of Genomes. Cold Spring Harbor, 2018.

Resequencing against a pangenome. NBDC/DBCLS BioHackathon. Keio University. Tsuruoka, Japan, 2016.

Variant detection using a graph of genomic variation. Advances in Genome Biology and Technology, 2014.

From short reads to genotypes, haplotypes, and frequencies. Penn State, 2014.

A generalized human reference as a graph of genomic variation. American Society of Human Genetics, 2013.

Simultaneous assembly of thousands of human genomes. Biology of Genomes, 2013.

Haplotype-based variant detection and interpretation enables the population-scale analysis of multi-nucleotide sequence variants. American Society of Human Genetics, 2012.

Haplotype-based variant detection from short-read sequencing. Biology of Genomes, 2012.


Course lead and instructor. Computational Pangenomics. Instituto Gulbenkian de Cięncia. Oieras, Portugal. March 2018.

Instructor. NGS alignment and variant calling practical. OBiLab, Consiglio Nazionale delle Ricerche. Napoli, Italy. April 2015.

Instructor. Biology for Adaptation Genomics. Weggis, Switzerland. Winters 2015-2018.

Instructor. Wellcome Genome Campus Advanced Course on Next Generation Sequencing Bioinformatics. Hinxton, UK. November 2015.

Guest lecturer. Iowa Bioinformatics Summer. Iowa City, Iowa, USA. May 2015.

Instructor. SeqShop. University of Michigan. Ann Arbor, Michigan, USA. June 2014 and May 2015.

Trainer. Galaxy Community Conference 2013. Oslo, Norway. June 2013.


Undergraduate Fellow. Harvard Institute for Quantitative Social Science. 2005-2006.

PhD fellowship. Wellcome Trust. 2014-2018.

Discovery Project Grant DP190103705. Australian Research Council. 2019-2021.

NLnet Foundation NGI0 Discovery Fund. Privacy-preserving varation graphs. 2020.


Native English. Fluent Italian. Conversational Spanish.

 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]