You are here
June 9, 2020
Largest catalog of human genetic diversity
At a Glance
- Researchers have created a massive catalog聽of human genome data, along with tools to understand it.
- Using DNA from over 140,000 people, they analyzed genomic variation, how variants affect gene function, and which may cause disease or serve as new drug targets.
The genome is the complete set of your DNA, including all of your genes. The human genome was first decoded nearly two decades ago. The genetic sequencing of thousands of genomes has allowed researchers to begin to understand how the human body is built and maintained.
But each person鈥檚 genome is unique. Not enough genomes have been sequenced to understand all the ways that genetic variation can contribute to disease. To better understand the genetic diversity of the human genome, the Genome Aggregation Database (gnomAD) Consortium was formed over eight years ago to collect and study the genomes of people around the world.
The international gnomAD team of over 100 scientists released its first set of discoveries in a collection of seven papers published on May 27, 2020 in Nature, Nature Communications, and Nature Medicine. The work was funded in part by several NIH institutes (see Funding section below for full list).
The flagship paper cataloged the genetic variation in both the protein coding and non-coding regions of human DNA. Included were more than 125,000 exomes (which include only the parts that code for proteins) and 15,000 whole genomes, from populations in Europe, East and South Asia, Africa, and more. The researchers identified a total of 241 million variants that were either small single point mutations (changes in a single DNA building block, called a nucleotide) or insertions or deletions of short pieces of DNA.
The team explored how likely certain variants are to cause a loss of function in the proteins produced from the gene. Protein-coding genes were categorized based on their ability to tolerate genetic variations without being disrupted or inactivated by them. This analysis found more than 443,000 genetic variants that were likely to cause a loss of protein function.
The second paper explored why mutations identified as likely to cause a loss of function don鈥檛 always cause the problems that might be expected. The team found that such variants are within segments of DNA that are often spliced out of the final mRNA copies of the gene used to produce proteins.
A third paper detailed the analysis of more than 433,000 structural variants in the human genome. Structural variants are changes that span long stretches of DNA, of at least 50 nucleotides. Structural variants were less likely to appear in protein coding regions than in non-protein coding regions. The team estimated that only about 0.13% of people carry a structural variant with any clinical significance.
The fourth paper explored how loss of function variations could be used to identify new drug targets. The fifth paper provided an example of how gnomAD could be used to validate drug targets. It analyzed the effects of loss of function variants in a gene called LRRK2, which has been associated with Parkinson鈥檚 disease. The results suggest聽that therapies to inhibit the LRRK2 protein would be unlikely to cause severe side effects.
The sixth paper described the impacts of variants in the region that sits immediately before the protein coding region of genes, called the 5鈥 untranslated region. The researchers identified specific genes where variants in this region could lead to disease. One novel variant they uncovered was tied to neurofibromatosis. Finally, the last paper showed how gnomAD could be used to analyze multi-nucleotide variants鈥攃lusters of two or more variants that are often inherited together.
鈥淭he wide-ranging impact this resource has already had on medical research and clinical practice is a testament to the incredible value of genomic data sharing and aggregation,鈥 says Dr. Daniel MacArthur at the Broad Institute of MIT and Harvard, who is a lead author on the papers. 鈥淢ore than 350 independent studies have already made use of gnomAD for research on cancer predisposition, cardiovascular disease, rare genetic disorders, and more since we made the data available.鈥
The consortium鈥檚 next steps are to expand gnomAD to increase the number of genomes and diversity of populations included. 鈥淲e are very far from saturating discoveries or solving variant interpretation,鈥 MacArthur says. 鈥淭he next steps for the consortium will be focused on increasing the size and population diversity of these resources, and linking the resulting massive-scale genetic data sets with clinical information.鈥
鈥攂y Tianna Hicklin, Ph.D.
Related Links
- Diversity Enhances Genomic Analyses
- Catalog of Human Genetic Diversity Expands
- Charting Genetic Variation Across the Globe
- Comparing the Mouse and Human Genomes
- Expanding Our Understanding of Genomics
- Cataloging Human Genetic Variation
- Finding Treasure in 鈥淛unk鈥 DNA
- Genetic Variations Affect Control of the Genome
- Map of Structural Variation in the Human Genome
References:
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alf枚ldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O'Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME; Genome Aggregation Database Consortium, Neale BM, Daly MJ,聽MacArthur DG. Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27. PMID:聽32461654.
Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J, Satterstrom FK, O'Donnell-Luria AH, Poterba T, Seed C, Solomonson M, Alf枚ldi J; Genome Aggregation Database Production Team; Genome Aggregation Database Consortium, Daly MJ,聽MacArthur DG. Nature. 2020 May;581(7809):452-458. doi: 10.1038/s41586-020-2329-2. Epub 2020 May 27. PMID:聽32461655.
Collins RL, Brand H, Karczewski KJ, Zhao X, Alf枚ldi J, Francioli LC, Khera AV, Lowther C, Gauthier LD, Wang H, Watts NA, Solomonson M, O'Donnell-Luria A, Baumann A, Munshi R, Walker M, Whelan CW, Huang Y, Brookings T, Sharpe T, Stone MR, Valkanas E, Fu J, Tiao G, Laricchia KM, Ruano-Rubio V, Stevens C, Gupta N, Cusick C, Margolin L; Genome Aggregation Database Production Team; Genome Aggregation Database Consortium, Taylor KD, Lin HJ, Rich SS, Post WS, Chen YI, Rotter JI, Nusbaum C, Philippakis A, Lander E, Gabriel S, Neale BM, Kathiresan S, Daly MJ, Banks E, MacArthur DG, Talkowski ME. Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27. PMID:聽32461652.
Minikel EV, Karczewski KJ, Martin HC, Cummings BB, Whiffin N, Rhodes D, Alf枚ldi J, Trembath RC, van Heel DA, Daly MJ; Genome Aggregation Database Production Team; Genome Aggregation Database Consortium, Schreiber SL, MacArthur DG. Nature. 2020 May;581(7809):459-464. doi: 10.1038/s41586-020-2267-z. Epub 2020 May 27. PMID:聽32461653.
Whiffin N, Armean IM, Kleinman A, Marshall JL, Minikel EV, Goodrich JK, Quaife NM, Cole JB, Wang Q, Karczewski KJ, Cummings BB, Francioli L, Laricchia K, Guan A, Alipanahi B, Morrison P, Baptista MAS, Merchant KM; Genome Aggregation Database Production Team; Genome Aggregation Database Consortium, Ware JS, Havulinna AS, Iliadou B, Lee JJ, Nadkarni GN, Whiteman C; 23andMe Research Team, Daly M, Esko T, Hultman C, Loos RJF, Milani L, Palotie A, Pato C, Pato M, Saleheen D, Sullivan PF, Alf枚ldi J, Cannon P,聽MacArthur DG. Nat Med. 2020 May 27. doi: 10.1038/s41591-020-0893-5. Online ahead of print. PMID:聽32461697.
Whiffin N, Karczewski KJ, Zhang X, Chothani S, Smith MJ, Evans DG, Roberts AM, Quaife NM, Schafer S, Rackham O, Alf枚ldi J, O'Donnell-Luria AH, Francioli LC; Genome Aggregation Database Production Team; Genome Aggregation Database Consortium, Cook SA, Barton PJR,聽MacArthur DG, Ware JS. Nat Commun. 2020 May 27;11(1):2523. doi: 10.1038/s41467-019-10717-9.PMID:聽32461616.
Wang Q, Pierce-Hoffman E, Cummings BB, Alf枚ldi J, Francioli LC, Gauthier LD, Hill AJ, O'Donnell-Luria AH; Genome Aggregation Database Production Team; Genome Aggregation Database Consortium, Karczewski KJ,聽MacArthur DG. Nat Commun. 2020 May 27;11(1):2539. doi: 10.1038/s41467-019-12438-5. PMID:聽32461613.
Funding: NIH鈥檚 最新麻豆视频 Institute of General Medical Sciences (NIGMS), 最新麻豆视频 Human Genome Research Institute (NHGRI), 最新麻豆视频 Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), Eunice Kennedy Shriver 最新麻豆视频 Institute of Child Health and Human Development (NICHD), 最新麻豆视频 Institute of Mental Health (NIMH), and 最新麻豆视频 Heart, Lung, and Blood Institute (NHLBI), 最新麻豆视频 Institute of Allergy and Infectious Diseases (NIAID), 最新麻豆视频 Center for Advancing Translational Sciences (NCATS), 最新麻豆视频 Institute of Dental and Craniofacial Research (NIDCR), and 最新麻豆视频 Center for Research Resources (NCRR); Swiss 最新麻豆视频 Science Foundation; BioMarin Pharmaceutical Inc.; Sanofi Genzyme Inc.; Broad Institute; Wellcome Trust; Medical Research Council (UK); University of Sheffield; Barts Charity; Health Data Research UK; NHS 最新麻豆视频 Institute for Health Research; Rosetrees/Stoneygate Imperial College; Simons Foundation; 最新麻豆视频 Science Foundation; Desmond and Ann Heathwood; Southern California Diabetes Endocrinology Research Center; Michael J. Fox Foundation; Estonian Research Council; Royal Brompton and Harefield NHS Foundation; Imperial College London; Fondation Leducq; Department of Health, UK; Swiss 最新麻豆视频 Science Foundation; Imperial College Academic Health Science Centre; Nakajima Foundation Scholarship.