Response to “Proposal to Update Data Management of Genomic Summary Results Under the NIH Genomic Data Sharing Policy”

Executive summary: the NIH is seeking comments on a new proposed policy on genomic data sharing. While there is much to like about the new policy, we are very concerned about the proposed requirement for a click-through agreement on all aggregate genomic resources (which would include heavily-used databases such as ExAC and gnomAD). Our draft response to the Request for Comments is below. If you agree with our concern, please consider replying to the Request for Comments yourself, using the template text at the end of this post if useful.

Draft response to Request for Comments
We would like to applaud the NIH for moving in the right direction with its new “Proposal to Update Data Management of Genomic Summary Results Under the NIH Genomic Data Sharing Policy”. The rapid and open sharing of summary statistics from aggregate genomic data brings tremendous benefit to the scientific community, and the potential harms of such sharing are largely theoretical. Our own experience with the ExAC and gnomAD public variant frequency databases has shown that the benefits to academic, clinical and pharmaceutical scientists from sharing of aggregate data are substantial: The browsers have had over 10 million page views by over 200,000 unique users from 166 countries in the past three years, and have been used by diagnostic laboratories in the analysis of >50,000 rare disease families. Even greater value will arise as a result of broader sharing of aggregate statistics as empowered by the new policy.

However, we are still very concerned by one aspect of the new Genomic Summary Results Data Sharing Policy – the creation of a new tier of access, rapid-access, which requires a click-through agreement to gain access to summary statistics. These concerns can be summarized as follows: (1) Click-throughs make programmatic access to data-sets challenging; (2) they greatly complicate or prevent multiple important types of re-use of the data; and (3) they are highly unlikely to deter anyone with genuine malicious intent. Overall, our position is that click-through agreements are a security fig leaf that gives the impression of extra protection, but actually do no good – and can do non-trivial harm. And we would like to emphasize that ExAC and gnomAD, along with other aggregate data sharing sites such as the Exome Variant Server, do not and never have had click-through agreements, and to the best of our knowledge no harm has ever come to participants as a result.

To explain those points in a bit more detail:

  1. It is critical for summary statistic resources such as gnomAD that we allow access through programmatic interfaces (APIs) so that people can query them using software (e.g. pull out just a targeted slice of the data) and perform automated queries (e.g. pull out the frequency of a specific variant when a user loads a different web page about that variant). Most implementations of click-through agreements will prevent or greatly complicate any form of programmatic access. There are possible technical workarounds, but all of them result in some kind of barrier to programmatic access.
  2. Probably the single biggest obstacle created by click-through agreements is that they prevent or substantially complicate data re-use. Right now anyone can download the complete list of frequencies from gnomAD and load it up in another website, or use it to build other useful web services (the complete ExAC data set has been downloaded thousands of times). With any kind of click-through agreement they either couldn’t do that at all, or would have to incorporate the same agreement in their usage policy, which may be incompatible with their proposed usage.
  3. Most importantly, click-through agreements do nothing to prevent the types of usage that are most likely to be harmful. It is worth noting that ExAC and gnomAD have existed on the web for almost 3 years and been accessed more than 10 million times without us being aware of a single incident that has any risk of harming participants. The vast majority of users are simply interested in using the data in their research. The theoretical bad actor who is interested in malicious usage is extremely unlikely to be dissuaded by a click-through agreement, nor does the click-through agreement offer any real after-the-fact protection if a malicious actor decides to do harm.

In summary, click-through agreements will degrade or destroy programmatic access and data reuse, without having any meaningful effect on participant safety. Any policy that advocates for click-through agreements as a solution should spell out explicitly exactly what types of misuse the click-through will prevent, and should justify the barriers to data usage that would result.

We believe it would be a mistake to incorporate click-through agreements into any NIH-wide policy. Instead, we suggest that the NIH require clear wording about the responsible use of aggregate data (such as avoiding reidentification) on all websites sharing aggregate genetic data, perhaps with a link on every page, but with no click-through barrier. This would provide a reasonable balance between serving the needs of the research community and protecting the public trust.


A request for gnomAD users and supporters
For any member of the ExAC/gnomAD community who agrees that the public sharing of summary statistics is both harmless to participants and of great benefit to science, we urge you to read the new policy proposal here, and to respond to the NIH’s Request for Comment here by October 20th.

Feel free to edit or use the text below:

I am writing as an avid user of the ExAC and gnomAD databases. [Please provide a brief description of your use of these resources and their benefits to your research]

I believe that the new “Proposal to Update Data Management of Genomic Summary Results Under the NIH Genomic Data Sharing Policy” is a step in the right direction – there is no evidence that controlled-access of summary statistics prevents any harm to participants, and the open access to variant frequency data through ExAC and gnomAD has been very important to my research.

However, I am concerned about the proposal to create a new rapid-access category for summary statistics that would require the use of click-through agreements. These agreements make it difficult to reuse summary statistics and to access data programmatically. Most importantly, there is no evidence that they prevent harm to participants. A wide variety of summary statistics have been publicly available without click-through agreements for many years, including ExAC and gnomAD, and no harm of any kind has ever been done to any participant whose data is aggregated in those summary statistics in that time.

I urge the NIH to modify this proposal, and to designate summary statistics as open access, with the exception of communities and populations who believe that they are especially vulnerable to harm from possible reidentification.


