Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing


Paper by Bill Howe et al: “Data too sensitive to be “open” for analysis and re-purposing typically remains “closed” as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legaltechnical approach provided by a third-party public-private data trust designed to balance these competing interests.

Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage for the data providers, and c) removal of biases that could reinforce discriminatory policies, all while maintaining empirically good fidelity to the original data. We find that the liberal use of synthetic data, in conjunction with strong legal protections over raw data, strikes a tunable balance between transparency, proprietorship, privacy, and research objectives; and that the legal-technical framework we describe can form the basis for organizational data trusts in a variety of contexts….(More)”.