assembling a catalogue of human protein-coding variation
Together with collaborators from a wide range of disease-specific research consortia we have assembled and reprocessed the world’s largest collection of human exome data, the Exome Aggregation Consortium (ExAC) collection, providing unprecedented resolution of the patterns of genetic variation in human protein-coding genes. We have released a public dataset with variation on 60,706 humans and we are currently mining this dataset for insight into human evolution, gene function, and disease gene identification. The ExAC website has been accessed over 3.5 million times since its launch in October 2014, and has become the default reference data set for many clinical diagnostic labs.
To find out more about our research in this area you can read the ExAC manuscript in Nature, or our Science Translational Medicine paper describing the application of the ExAC dataset to understand variation in the PRNP gene (you can also read the deeply personal back-story behind this paper). Finally, you can download all of the code and data required to fully reproduce the analyses described in the paper.
We more recently released a massive update of this reference database, more than doubling the number of samples and giving it a new name: the Genome Aggregation Database (gnomAD). You can access the new data set here, and read more about how we made it here.
Daniel presents an update on gnomAD at the 2017 Future of Genomic Medicine meeting: