New DNA sequencing technologies are rapidly transforming the diagnosis of rare genetic diseases, but they also carry a risk: by allowing us to see all of the hundreds of “interesting-looking” variants in a patient’s genome, they make it potentially easy for researchers to spin a causal narrative around genetic changes that have nothing to do with disease status. Such false positive reports can have serious consequences: incorrect diagnoses, unnecessary or ineffective treatment, and reproductive decisions (such as embryo termination) based on spurious test results. In order to minimize such outcomes the field needs to decide on clear statistical guidelines for deciding whether or not a variant is truly causally linked with disease.
In a paper in Nature this week we report the consensus statement from a workshop sponsored by the National Human Genome Research Institute, on establishing guidelines for assessing the evidence for variant causality. We argue for a careful two-stage approach to assessing evidence, taking into account the overall support for a causal role of the affected gene in the disease phenotype, followed by an assessment of the probability that the variant(s) carried by the patient do indeed play a causal role in that patient’s disease state. We argue for the primacy of statistical genetic evidence for new disease genes, which can be supplemented (but not replaced by) additional informatic and experimental support; and we emphasize the need for all forms of evidence to be placed within a statistical framework that considers the probability of any of the reported lines of evidence arising by chance.
The paper itself is open access, so you can read the whole thing – we won’t rehash a complete summary here. However, we did want to discuss the back story and expand on a few issues raised in the paper.
Tangent: the history behind the paper
The idea for the workshop arose from a Twitter conversation between the two of us about the possibility of an explosion of false positive reports of disease-causing mutations in the age of exome sequencing. We were not alone in this concern – we had numerous conversations over the span of several weeks with other researchers sharing similar trepidation. We are lucky to be part of a genomic community which plays an active role in creating brain trusts to tackle bigger issues, as had been done with the problem of replication in GWAS in 2007. In addition, while there are other groups working on guidelines in related areas (including an upcoming update to the 2007 ACMG “recommendations for standards for interpretation and reporting of sequence variations”) there wasn’t yet a clear set of standards for evaluating the evidence for variant pathogenicity in the sequencing era.
So we decided pull together a group of people also grappling with this issue with the goal of creating a guidelines document. This workshop then became a reality thanks to a discussion with Teri Manolio of the National Human Genome Research Institute, who generously offered to host the workshop, and then played a critical role in the organization and follow-up of the event. These efforts were also shepherded by a steering group composed of David Dimmock, Heidi Rehm, Jay Shendure, Teri Manolio, and the two of us.
The whole workshop was captured on video and can be viewed on the NHGRI’s archives. What isn’t directly visible there is the huge amount of work that went into organizing the day, including the writing of six white papers by individual working groups on separate sub-topics.
The workshop was a great success and sparked extensive (and sometimes contentious) discussion. However, it was only after the workshop that the real work began: hammering the points of consensus that had been reached during the meeting into a manuscript that would be accessible to a wider audience. We wanted this report to be usable for non-genomicists who were entering this arena to study specific conditions, but we also of course wanted it to capture the relevant, detailed evidence amassed by many groups in the field.
The resulting manuscript was very much the fruit of the entire group, but we want to acknowledge a few individuals in particular for sustained and robust discussion and editing of the manuscript: Les Biesecker, David Goldstein, Greg Cooper, Don Conrad, Mark Daly, Heidi Rehm, Goncalo Abecasis, David Adams and Ben Voight. We also owe a major debt of gratitude to Matt Hurles and another (anonymous) reviewer for Nature, whose substantive comments dramatically improved the final paper; and to Magdalena Skipper from Nature for her editorial oversight.
What the field needs now
We end the paper with an outline of specific priorities for research and infrastructure development. Since the workshop in September 2012 those priorities have not greatly changed. For instance, we still feel there is an urgent need for improvements in databases for reporting pathogenicity of mutations, although we are extremely optimistic about the impact of the NCBI’s ClinVar database in creating a centralized repository for human disease mutations.
We remain concerned about the fact that for nearly all reported disease-causing variants we have little or no information about their penetrance (that is, the probability that a mutation carrier will in fact suffer from the disease in question). This can only be accurately assessed through very large, unbiased population surveys (here is a recent example). Really getting to grips with the true impact of reported rare disease-causing variants will require that such sites be genotyped in hundreds of thousands of people, and carriers followed up for signs of disease symptoms – an ambitious undertaking, but one that is already underway through efforts such as the UK Biobank.
Critically, while our paper provides largely “soft” guidance on approaches to assessing variant causality, we desperately need such guidelines to be implemented within formal statistical frameworks quantifying the relative strength of evidence for pathogenicity. In the paper we outline one possible approach, largely fleshed out by Don Conrad, that leverages large data-sets of normal human variation, but much work remains to be done to generate robust implementations that can be run routinely in a clinical setting. Developing such frameworks will require deep engagement between the clinical genetics and statistical genetics communities.
Finally, we need to ensure these advances and resulting best-practice guidelines are disseminated to clinical and research labs, to editors and referees evaluating papers for publication (or preprints), and to databases accepting variant reports.
We were encouraged by the fact that such a diverse group of researchers were able to reach consensus across a wide range of issues in this paper: for instance, the clear need for more statistically rigorous approaches to variant implication, and the imperative for enhanced data sharing. However, this paper is just a starting point for a far broader conversation. We welcome comments from the community, and look forward to seeing these guidelines implemented!