Characterizing the Biomedical Data-Sharing Landscape

Paper by Angela G. Villanueva et al: “Advances in technologies and biomedical informatics have expanded capacity to generate and share biomedical data. With a lens on genomic data, we present a typology characterizing the data-sharing landscape in biomedical research to advance understanding of the key stakeholders and existing data-sharing practices. The typology highlights the diversity of data-sharing efforts and facilitators and reveals how novel data-sharing efforts are challenging existing norms regarding the role of individuals whom the data describe.

Technologies such as next-generation sequencing have dramatically expanded capacity to generate genomic data at a reasonable cost, while advances in biomedical informatics have created new tools for linking and analyzing diverse data types from multiple sources. Further, many research-funding agencies now mandate that grantees share data. The National Institutes of Health’s (NIH) Genomic Data Sharing (GDS) Policy, for example, requires NIH-funded research projects generating large-scale human genomic data to share those data via an NIH-designated data repository such as the Database of Geno-types and Phenotypes (dbGaP). Another example is the Parent Project Muscular Dystrophy, a non-profit organization that requires applicants to propose a data-sharing plan and take into account an applicant’s history of data sharing.

The flow of data to and from different projects, institutions, and sectors is creating a medical information commons (MIC), a data-sharing ecosystem consisting of networked resources sharing diverse health-related data from multiple sources for research and clinical uses. This concept aligns with the 2018 NIH Strategic Plan for Data Science, which uses the term “data ecosystem” to describe “a distributed, adaptive, open system with properties of self-organization, scalability and sustainability” and proposes to “modernize the biomedical research data ecosystem” by funding projects such as the NIH Data Commons. Consistent with Elinor Ostrom’s discussion of nested institutional arrangements, an MIC is both singular and plural and may describe the ecosystem as a whole or individual components contributing to the ecosystem. Thus, resources like the NIH Data Commons with its associated institutional arrangements are MICs, and also form part of the larger MIC that encompasses all such resources and arrangements.

Although many research funders incentivize data sharing, in practice, progress in making biomedical data broadly available to maximize its utility is often hampered by a broad range of technical, legal, cultural, normative, and policy challenges that include achieving interoperability, changing the standards for academic promotion, and addressing data privacy and security concerns. Addressing these challenges requires multi-stakeholder involvement. To identify relevant stakeholders and advance understanding of the contributors to an MIC, we conducted a landscape analysis of existing data-sharing efforts and facilitators. Our work builds on typologies describing various aspects of data sharing that focused on biobanks, research consortia, or where data reside (e.g., degree of data centralization).7 While these works are informative, we aimed to capture the biomedical data-sharing ecosystem with a wider scope. Understanding the components of an MIC ecosystem and how they interact, and identifying emerging trends that test existing norms (such as norms respecting the role of individuals from whom the data describe), is essential to fostering effective practices, policies and governance structures, guiding resource allocation, and promoting the overall sustainability of the MIC….(More)”