Federated Learning for Privacy-Preserving Data Access


Paper by Małgorzata Śmietanka, Hirsh Pithadia and Philip Treleaven: “Federated learning is a pioneering privacy-preserving data technology and also a new machine learning model trained on distributed data sets.

Companies collect huge amounts of historic and real-time data to drive their business and collaborate with other organisations. However, data privacy is becoming increasingly important because of regulations (e.g. EU GDPR) and the need to protect their sensitive and personal data. Companies need to manage data access: firstly within their organizations (so they can control staff access), and secondly protecting raw data when collaborating with third parties. What is more, companies are increasingly looking to ‘monetize’ the data they’ve collected. However, under new legislations, utilising data by different organization is becoming increasingly difficult (Yu, 2016).

Federated learning pioneered by Google is the emerging privacy- preserving data technology and also a new class of distributed machine learning models. This paper discusses federated learning as a solution for privacy-preserving data access and distributed machine learning applied to distributed data sets. It also presents a privacy-preserving federated learning infrastructure….(More)”.

Four Principles to Make Data Tools Work Better for Kids and Families


Blog by the Annie E. Casey Foundation: “Advanced data analytics are deeply embedded in the operations of public and private institutions and shape the opportunities available to youth and families. Whether these tools benefit or harm communities depends on their design, use and oversight, according to a report from the Annie E. Casey Foundation.

Four Principles to Make Advanced Data Analytics Work for Children and Families examines the growing field of advanced data analytics and offers guidance to steer the use of big data in social programs and policy….

The Foundation report identifies four principles — complete with examples and recommendations — to help steer the growing field of data science in the right direction.

Four Principles for Data Tools

  1. Expand opportunity for children and families. Most established uses of advanced analytics in education, social services and criminal justice focus on problems facing youth and families. Promising uses of advanced analytics go beyond mitigating harm and help to identify so-called odds beaters and new opportunities for youth.
    • Example: The Children’s Data Network at the University of Southern California is helping the state’s departments of education and social services explore why some students succeed despite negative experiences and what protective factors merit more investment.
    • Recommendation: Government and its philanthropic partners need to test if novel data science applications can create new insights and when it’s best to apply them.
       
  2. Provide transparency and evidence. Advanced analytical tools must earn and maintain a social license to operate. The public has a right to know what decisions these tools are informing or automating, how they have been independently validated, and who is accountable for answering and addressing concerns about how they work.
    • Recommendations: Local and state task forces can be excellent laboratories for testing how to engage youth and communities in discussions about advanced analytics applications and the policy frameworks needed to regulate their use. In addition, public and private funders should avoid supporting private algorithms whose design and performance are shielded by trade secrecy claims. Instead, they should fund and promote efforts to develop, evaluate and adapt transparent and effective models.
       
  3. Empower communities. The field of advanced data analytics often treats children and families as clients, patients and consumers. Put to better use, these same tools can help elucidate and reform the systems acting upon children and families. For this shift to occur, institutions must focus analyses and risk assessments on structural barriers to opportunity rather than individual profiles.
    • Recommendation: In debates about the use of data science, greater investment is needed to amplify the voices of youth and their communities.
       
  4. Promote equitable outcomes. Useful advanced analytics tools should promote more equitable outcomes for historically disadvantaged groups. New investments in advanced analytics are only worthwhile if they aim to correct the well-documented bias embedded in existing models.
    • Recommendations: Advanced analytical tools should only be introduced when they reduce the opportunity deficit for disadvantaged groups — a move that will take organizing and advocacy to establish and new policy development to institutionalize. Philanthropy and government also have roles to play in helping communities test and improve tools and examples that already exist….(More)”.

A Legal Framework for Access to Data – A Competition Policy Perspective


Paper by Heike Schweitzer and Robert Welker: “The paper strives to systematise the debate on access to data from a competition policy angle. At the outset, two general policy approaches to access to data are distinguished: a “private control of data” approach versus an “open access” approach. We argue that, when it comes to private sector data, the “private control of data” approach is preferable. According to this approach, the “whether” and “how” of data access should generally be left to the market. However, public intervention can be justified by significant market failures. We discuss the presence of such market failures and the policy responses, including, in particular, competition policy responses, with a view to three different data access scenarios: access to data by co-generators of usage data (Scenario 1); requests for access to bundled or aggregated usage data by third parties vis-à-vis a service or product provider who controls such datasets, with the goal to enter complementary markets (Scenario 2); requests by firms to access the large usage data troves of the Big Tech online platforms for innovative purposes (Scenario 3). On this basis we develop recommendations for data access policies….(More)”.

Not fit for Purpose: A critical analysis of the ‘Five Safes’


Paper by Chris Culnane, Benjamin I. P. Rubinstein, and David Watts: “Adopted by government agencies in Australia, New Zealand, and the UK as policy instrument or as embodied into legislation, the ‘Five Safes’ framework aims to manage risks of releasing data derived from personal information. Despite its popularity, the Five Safes has undergone little legal or technical critical analysis. We argue that the Fives Safes is fundamentally flawed: from being disconnected from existing legal protections and appropriation of notions of safety without providing any means to prefer strong technical measures, to viewing disclosure risk as static through time and not requiring repeat assessment. The Five Safes provides little confidence that resulting data sharing is performed using ‘safety’ best practice or for purposes in service of public interest….(More)”.

COVID-19 Data and Data Sharing Agreements: The Potential of Sunset Clauses and Sunset Provisions


A report by SDSN TReNDS and DataReady Limited on behalf of Contracts4DataCollaboration: “Building upon issues discussed in the C4DC report, “Laying the Foundation for Effective Partnerships: An Examination of Data Sharing Agreements,” this brief examines the potential of sunset clauses or sunset provisions to be a legally binding, enforceable, and accountable way of ensuring COVID-19 related data sharing agreements are wound down responsibly at the end of the pandemic. The brief is divided into four substantive parts: Part I introduces sunset clauses as legislative tools, highlighting a number of examples of how they have been used in both COVID-19 related and other contexts; Part II discusses sunset provisions in the context of data sharing agreements and attempts to explain the complex interrelationship between data ownership, intellectual property, and sunset provisions; Part III identifies some key issues policymakers should consider when assessing the utility and viability of sunset provisions within their data sharing agreements and arrangements; and Part IV highlights the value of a memorandum of understanding (MoU) as a viable vehicle for sunset provisions in contexts where data sharing agreements are either non-existent or not regularly used….(More)“.(Contracts 4 Data Collaboration Framework).

NIH Releases New Policy for Data Management and Sharing


NIH Blogpost by Carrie Wolinetz: “Today, nearly twenty years after the publication of the Final NIH Statement on Sharing Research Data in 2003, we have released a Final NIH Policy for Data Management and Sharing. This represents the agency’s continued commitment to share and make broadly available the results of publicly funded biomedical research. We hope it will be a critical step in moving towards a culture change, in which data management and sharing is seen as integral to the conduct of research. Responsible data management and sharing is good for science; it maximizes availability of data to the best and brightest minds, underlies reproducibility, honors the participation of human participants by ensuring their data is both protected and fully utilized, and provides an element of transparency to ensure public trust and accountability.

This policy has been years in the making and has benefited enormously from feedback and input from stakeholders throughout the process. We are grateful to all those who took the time to comment on Request for Information, the Draft policy, or to participate in workshops or Tribal consultations. That thoughtful feedback has helped shape the Final policy, which we believe strikes a balance between reasonable expectations for data sharing and flexibility to allow for a diversity of data types and circumstances. How we incorporated public comments and decision points that led to the Final policy are detailed in the Preamble to the DMS policy.

The Final policy applies to all research funded or conducted by NIH that results in the generation of scientific data. The Final Policy has two main requirements (1) the submission of a Data Management and Sharing Plan (Plan); and (2) compliance with the approved Plan. We are asking for Plans at the time of submission of the application, because we believe planning and budgeting for data management and sharing needs to occur hand in hand with planning the research itself. NIH recognizes that science evolves throughout the research process, which is why we have built in the ability to update DMS Plans, but at the end of the day, we are expecting investigators and institutions to be accountable to the Plans they have laid out for themselves….

Anticipating that variation in readiness, and in recognition of the cultural change we are trying to seed, there is a two-year implementation period. This time will be spent developing the information, support, and tools that the biomedical enterprise will need to comply with this new policy. NIH has already provided additional supplementary information – on (1) elements of a data management and sharing plan; (2) allowable costs; and (3) selecting a data repository – in concert with the policy release….(More)”

Your phone already tracks your location. Now that data could fight voter suppression


Article by Seth Rosenblatt: “Smartphone location data is a dream for marketers who want to know where you go and how long you spend there—and a privacy nightmare. But this kind of geolocation data could also be used to protect people’s voting rights on Election Day.

The newly founded nonprofit Center for New Data is now tracking voters at the polls using smartphone location data to help researchers understand how easy—or difficult—it is for people to vote in different places. Called the Observing Democracy project, the nonpartisan effort is making data on how far people have to travel to vote and how long they have to wait in line available in a privacy-friendly way so it can be used to craft election policies that ensure voting is accessible for everyone.

Election data has already fueled changes in various municipalities and states. A 66-page lawsuit filed by Fair Fight Action against the state of Georgia in the wake of Stacey Abrams’s narrow loss to Brian Kemp in the 2018 gubernatorial race relies heavily on data to back its assertions of unconstitutionally delayed and deferred voter registration, unfair challenges to absentee and provisional ballots, and unjustified purges of voter rolls—all hallmarks of voter suppression.

The promise of Observing Democracy is to make this type of impactful data available much more rapidly than ever before. Barely a month old, Observing Democracy isn’t wasting any time: Its all-volunteer staffers will be receiving data potentially as soon as Nov. 4 on voter wait times at polling locations, travel times to polling stations, and how frequently ballot drop-off boxes are visited, courtesy of location-data mining companies X-Mode Social and Veraset, which was spun off from SafeGraph….(More)”.

To mitigate the costs of future pandemics, establish a common data space


Article by Stephanie Chin and Caitlin Chin: “To improve data sharing during global public health crises, it is time to explore the establishment of a common data space for highly infectious diseases. Common data spaces integrate multiple data sources, enabling a more comprehensive analysis of data based on greater volume, range, and access. At its essence, a common data space is like a public library system, which has collections of different types of resources from books to video games; processes to integrate new resources and to borrow resources from other libraries; a catalog system to organize, sort, and search through resources; a library card system to manage users and authorization; and even curated collections or displays that highlight themes among resources.

Even before the COVID-19 pandemic, there was significant momentum to make critical data more widely accessible. In the United States, Title II of the Foundations for Evidence-Based Policymaking Act of 2018, or the OPEN Government Data Act, requires federal agencies to publish their information online as open data, using standardized, machine-readable data formats. This information is now available on the federal data.gov catalog and includes 50 state- or regional-level data hubs and 47 city- or county-level data hubs. In Europe, the European Commission released a data strategy in February 2020 that calls for common data spaces in nine sectors, including healthcare, shared by EU businesses and governments.

Going further, a common data space could help identify outbreaks and accelerate the development of new treatments by compiling line list incidence data, epidemiological information and models, genome and protein sequencing, testing protocols, results of clinical trials, passive environmental monitoring data, and more.

Moreover, it could foster a common understanding and consensus around the facts—a prerequisite to reach international buy-in on policies to address situations unique to COVID-19 or future pandemics, such as the distribution of medical equipment and PPE, disruption to the tourism industry and global supply chains, social distancing or quarantine, and mass closures of businesses….(More). See also Call for Action for a Data Infrastructure to tackle Pandemics and other Dynamic Threats.

Third Wave of Open Data


Paper (and site) by Stefaan G. Verhulst, Andrew Young, Andrew J. Zahuranec, Susan Ariel Aaronson, Ania Calderon, and Matt Gee on “How To Accelerate the Re-Use of Data for Public Interest Purposes While Ensuring Data Rights and Community Flourishing”: “The paper begins with a description of earlier waves of open data. Emerging from freedom of information laws adopted over the last half century, the First Wave of Open Data brought about newfound transparency, albeit one only available on request to an audience largely composed of journalists, lawyers, and activists. 

The Second Wave of Open Data, seeking to go beyond access to public records and inspired by the open source movement, called upon national governments to make their data open by default. Yet, this approach too had its limitations, leaving many data silos at the subnational level and in the private sector untouched..

The Third Wave of Open Data seeks to build on earlier successes and take into account lessons learned to help open data realize its transformative potential. Incorporating insights from various data experts, the paper describes the emergence of a Third Wave driven by the following goals:

  1. Publishing with Purpose by matching the supply of data with the demand for it, providing assets that match public interests;
  2. Fostering Partnerships and Data Collaboration by forging relationships with  community-based organizations, NGOs, small businesses, local governments, and others who understand how data can be translated into meaningful real-world action;
  3. Advancing Open Data at the Subnational Level by providing resources to cities, municipalities, states, and provinces to address the lack of subnational information in many regions.
  4. Prioritizing Data Responsibility and Data Rights by understanding the risks of using (and not using) data to promote and preserve the public’s general welfare.

Riding the Wave

Achieving these goals will not be an easy task and will require investments and interventions across the data ecosystem. The paper highlights eight actions that decision and policy makers can take to foster more equitable, impactful benefits… (More) (PDF) “

Data to Go: The Value of Data Portability as a Means to Data Liquidity


Juliet McMurren and Stefaan G. Verhulst at Data & Policy: “If data is the “new oil,” why isn’t it flowing? For almost two decades, data management in fields such as government, healthcare, finance, and research has aspired to achieve a state of data liquidity, in which data can be reused where and when it is needed. For the most part, however, this aspiration remains unrealized. The majority of the world’s data continues to stagnate in silos, controlled by data holders and inaccessible to both its subjects and others who could use it to create or improve services, for research, or to solve pressing public problems.

Efforts to increase liquidity have focused on forms of voluntary institutional data sharing such as data pools or other forms of data collaboratives. Although useful, these arrangements can only advance liquidity so far. Because they vest responsibility and control over liquidity in the hands of data holders, their success depends on data holders’ willingness and ability to provide access to their data for the greater good. While that willingness exists in some fields, particularly medical research, a willingness to share data is much less likely where data holders are commercial competitors and data is the source of their competitive advantage. And even where willingness exists, the ability of data holders to share data safely, securely, and interoperably may not. Without a common set of secure, standardized, and interoperable tools and practices, the best that such bottom-up collaboration can achieve is a disconnected patchwork of initiatives, rather than the data liquidity proponents are seeking.

Image for post

Data portability is one potential solution to this problem. As enacted in the EU General Data Protection Regulation (2018) and the California Consumer Privacy Act (2018), the right to data portability asserts that individuals have a right to obtain, copy, and reuse their personal data and transfer it between platforms or services. In so doing, it shifts control over data liquidity to data subjects, obliging data holders to release data whether or not it is in their commercial interests to do so. Proponents of data portability argue that, once data is unlocked and free to move between platforms, it can be combined and reused in novel ways and in contexts well beyond those in which it was originally collected, all while enabling greater individual control.

To date, however, arguments for the benefits of the right to data portability have typically failed to connect this rights-based approach with the larger goal of data liquidity and how portability might advance it. This failure to connect these principles and to demonstrate their collective benefits to data subjects, data holders, and society has real-world consequences. Without a clear view of what can be achieved, policymakers are unlikely to develop interventions and incentives to advance liquidity and portability, individuals will not exercise their rights to data portability, and industry will not experiment with use cases and develop the tools and standards needed to make portability and liquidity a reality.

Toward these ends, we have been exploring the current literature on data portability and liquidity, searching for lessons and insights into the benefits that can be unlocked when data liquidity is enabled through the right to data portability. Below we identify some of the greatest potential benefits for society, individuals, and data-holding organizations. These benefits are sometimes in conflict with one another, making the field a contentious one that demands further research on the trade-offs and empirical evidence of impact. In the final section, we also discuss some barriers and challenges to achieving greater data liquidity….(More)”.