assembling a catalogue of human protein-coding variation
Together with collaborators from a wide range of disease-specific research consortia we have assembled and reprocessed the world’s largest collection of human exome data, the Exome Aggregation Consortium (ExAC) collection, providing unprecedented resolution of the patterns of genetic variation in human protein-coding genes. We have released a public dataset with variation on 60,706 humans and we are currently mining this dataset for insight into human evolution, gene function, and disease gene identification. The ExAC website has been accessed over 3.5 million times since its launch in October 2014, and has become the default reference data set for many clinical diagnostic labs.
To find out more about our research in this area you can read the preprint of our manuscript, or our recent Science Translational Medicine paper describing the application of the ExAC dataset to understand variation in the PRNP gene (you can also read the deeply personal back-story behind this paper). Finally, you can download all of the code and data required to fully reproduce the analyses described in the paper.