Статья

Towards complete and error-free genome assemblies of all vertebrate species

A. Rhie, S. McCarthy, O. Fedrigo, J. Damas, G. Formenti, S. Koren, M. Uliano-Silva, W. Chow, A. Fungtammasan, J. Kim, C. Lee, B. Ko, M. Chaisson, G. Gedman, L. Cantin, F. Thibaud-Nissen, L. Haggerty, I. Bista, M. Smith, B. Haase, J. Mountcastle, S. Winkler, S. Paez, J. Howard, S. Vernes, T. Lama, F. Grutzner, W. Warren, C. Balakrishnan, D. Burt, J. George, M. Biegler, D. Iorns, A. Digby, D. Eason, B. Robertson, T. Edwards, M. Wilkinson, G. Turner, A. Meyer, A. Kautt, P. Franchini, H. Detrich, H. Svardal, M. Wagner, G. Naylor, M. Pippel, M. Malinsky, M. Mooney, M. Simbirsky, B. Hannigan, T. Pesout, M. Houck, A. Misuraca, S. Kingan, R. Hall, Z. Kronenberg, I. Sović, C. Dunn, Z. Ning, A. Hastie, J. Lee, S. Selvaraj, R. Green, N. Putnam, I. Gut, J. Ghurye, E. Garrison, Y. Sims, J. Collins, S. Pelan, J. Torrance, A. Tracey, J. Wood, R. Dagnew, D. Guan, S. London, D. Clayton, C. Mello, S. Friedrich, P. Lovell, E. Osipova, F. Al-Ajli, S. Secomandi, H. Kim, C. Theofanopoulou, M. Hiller, Y. Zhou, R. Harris, K. Makova, P. Medvedev, J. Hoffman, P. Masterson, K. Clark, F. Martin, K. Howe, P. Flicek, B. Walenz, W. Kwak, H. Clawson,
2021

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species . To address this issue, the international Genome 10K (G10K) consortium has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. 1–4 5,6

Цитирование

Похожие публикации

Документы

Источник

Версии

  • 1. Version of Record от 2021-04-29

Метаданные

Об авторах
  • A. Rhie
    National Human Genome Research Institute (NHGRI)
  • S. McCarthy
    University of Cambridge, Wellcome Sanger Institute
  • O. Fedrigo
    Rockefeller University
  • J. Damas
    University of California, Davis
  • G. Formenti
    Rockefeller University, Rockefeller University
  • S. Koren
    National Human Genome Research Institute (NHGRI)
  • M. Uliano-Silva
    Leibniz-Institut für Zoo- und Wildtierforschung, Berlin Center for Genomics in Biodiversity Research
  • W. Chow
    Wellcome Sanger Institute
  • A. Fungtammasan
    DNAnexus
  • J. Kim
    Seoul National University
  • C. Lee
    Seoul National University
  • B. Ko
    Department of Food and Animal Biotechnology
  • M. Chaisson
    University of Southern California
  • G. Gedman
    Rockefeller University
  • L. Cantin
    Rockefeller University
  • F. Thibaud-Nissen
    National Center for Biotechnology Information (NCBI)
  • L. Haggerty
    European Bioinformatics Institute
  • I. Bista
    University of Cambridge, Wellcome Sanger Institute
  • M. Smith
    Wellcome Sanger Institute
  • B. Haase
    Rockefeller University
  • J. Mountcastle
    Rockefeller University
  • S. Winkler
    Max Planck Institute of Molecular Cell Biology and Genetics, Technische Universität Dresden
  • S. Paez
    Rockefeller University, Rockefeller University
  • J. Howard
    Novogene
  • S. Vernes
    Max Planck Institute for Psycholinguistics, Donders Institute for Brain, Cognition and Behaviour, University of St Andrews
  • T. Lama
    University of Massachusetts Amherst
  • F. Grutzner
    The University of Adelaide
  • W. Warren
    University of Missouri
  • C. Balakrishnan
    East Carolina University
  • D. Burt
    The University of Queensland
  • J. George
    Clemson University
  • M. Biegler
    Rockefeller University
  • D. Iorns
    Genetic Rescue Foundation
  • A. Digby
    Te Papa Atawhai
  • D. Eason
    Te Papa Atawhai
  • B. Robertson
    University of Otago
  • T. Edwards
    The University of Arizona
  • M. Wilkinson
    The Natural History Museum, London
  • G. Turner
    Bangor University
  • A. Meyer
    Universität Konstanz
  • A. Kautt
    Universität Konstanz, Harvard University
  • P. Franchini
    Universität Konstanz
  • H. Detrich
    Northeastern University
  • H. Svardal
    Universiteit Antwerpen, Naturalis Biodiversity Center
  • M. Wagner
    Karl-Franzens-Universitat Graz
  • G. Naylor
    Florida Museum of Natural History
  • M. Pippel
    Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden
  • M. Malinsky
    Wellcome Sanger Institute, Universität Basel, Zoologisches Institut
  • M. Mooney
    Tag.bio
  • M. Simbirsky
    DNAnexus
  • B. Hannigan
    DNAnexus
  • T. Pesout
    University of California, Santa Cruz
  • M. Houck
    San Diego Zoo Global
  • A. Misuraca
    San Diego Zoo Global
  • S. Kingan
    Pacific Biosciences
  • R. Hall
    Pacific Biosciences
  • Z. Kronenberg
    Pacific Biosciences
  • I. Sović
    Pacific Biosciences, Digital BioLogic
  • C. Dunn
    Pacific Biosciences
  • Z. Ning
    Wellcome Sanger Institute
  • A. Hastie
    Bionano Genomics, Inc.
  • J. Lee
    Bionano Genomics, Inc.
  • S. Selvaraj
    Arima Genomics
  • R. Green
    University of California, Santa Cruz, Dovetail Genomics
  • N. Putnam
    Independent
  • I. Gut
    Centro de Regulacion Genomica, Barcelona, Universitat Pompeu Fabra Barcelona
  • J. Ghurye
    Dovetail Genomics, University of Maryland, College Park (UMD)
  • E. Garrison
    University of California, Santa Cruz
  • Y. Sims
    Wellcome Sanger Institute
  • J. Collins
    Wellcome Sanger Institute
  • S. Pelan
    Wellcome Sanger Institute
  • J. Torrance
    Wellcome Sanger Institute
  • A. Tracey
    Wellcome Sanger Institute
  • J. Wood
    Wellcome Sanger Institute
  • R. Dagnew
    University of Southern California
  • D. Guan
    University of Cambridge, Harbin Institute of Technology
  • S. London
    The Department of Psychology, The University of Chicago
  • D. Clayton
    Clemson University
  • C. Mello
    OHSU School of Medicine
  • S. Friedrich
    OHSU School of Medicine
  • P. Lovell
    OHSU School of Medicine
  • E. Osipova
    Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden, Max Planck Institute for the Physics of Complex Systems
  • F. Al-Ajli
    Monash University Malaysia, Monash University Malaysia, Qatar Falcon Genome Project
  • S. Secomandi
    Università degli Studi di Milano
  • H. Kim
    Seoul National University, Department of Food and Animal Biotechnology
  • C. Theofanopoulou
    Rockefeller University
  • M. Hiller
    LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Forschungsinstitut und Naturmuseum, Goethe-Universität Frankfurt am Main
  • Y. Zhou
    BGI-Shenzhen
  • R. Harris
    Pennsylvania State University
  • K. Makova
    Pennsylvania State University, Pennsylvania State University, Pennsylvania State University
  • P. Medvedev
    Pennsylvania State University, Pennsylvania State University, Pennsylvania State University, Pennsylvania State University
  • J. Hoffman
    National Center for Biotechnology Information (NCBI)
  • P. Masterson
    National Center for Biotechnology Information (NCBI)
  • K. Clark
    National Center for Biotechnology Information (NCBI)
  • F. Martin
    European Bioinformatics Institute
  • K. Howe
    European Bioinformatics Institute
  • P. Flicek
    European Bioinformatics Institute
  • B. Walenz
    National Human Genome Research Institute (NHGRI)
  • W. Kwak
    Hoonygen
  • H. Clawson
    University of California, Santa Cruz
Название журнала
  • Nature
Том
  • 592
Выпуск
  • 7856
Страницы
  • 737-746
Финансирующая организация
  • Wellcome Trust
Номер гранта
  • 750747
Тип документа
  • journal article
Тип лицензии Creative Commons
  • CC BY
Правовой статус документа
  • Свободная лицензия
Источник
  • scopus