The DeepMind debacle demands dialogue on data


Hetan Shah in Nature: “Without public approval, advances in how we use data will stall. That is why a regulator’s ruling against the operator of three London hospitals is about more than mishandling records from 1.6 million patients. It is a missed opportunity to have a conversation with the public about appropriate uses for their data….

What can be done to address this deficit? Beyond meeting legal standards, all relevant institutions must take care to show themselves trustworthy in the eyes of the public. The lapses of the Royal Free hospitals and DeepMind provide, by omission, valuable lessons.

The first is to be open about what data are transferred. The extent of data transfer between the Royal Free and DeepMind came to light through investigative journalism. In my opinion, had the project proceeded under open contracting, it would have been subject to public scrutiny, and to questions about whether a company owned by Google — often accused of data monopoly — was best suited to create a relatively simple app.

The second lesson is that data transfer should be proportionate to the task. Information-sharing agreements should specify clear limits. It is unclear why an app for kidney injury requires the identifiable records of every patient seen by three hospitals over a five-year period.

Finally, governance mechanisms must be strengthened. It is shocking to me that the Royal Free did not assess the privacy impact of its actions before handing over access to records. DeepMind does deserve credit for (belatedly) setting up an independent review panel for health-care projects, especially because the panel has a designated budget and has not required members to sign non-disclosure agreements. (The two groups also agreed a new contract late last year, after criticism.)

More is needed. The Information Commissioner asked the Royal Free to improve its processes but did not fine it or require it to rescind data. This rap on the knuckles is unlikely to deter future, potentially worse, misuses of data. People are aware of the potential for over-reach, from the US government’s demands for state voter records to the Chinese government’s alleged plans to create a ‘social credit’ system that would monitor private behaviour.

Innovations such as artificial intelligence, machine learning and the Internet of Things offer great opportunities, but will falter without a public consensus around the role of data. To develop this, all data collectors and crunchers must be open and transparent. Consider how public confidence in genetic modification was lost in Europe, and how that has set back progress.

Public dialogue can build trust through collaborative efforts. A 14-member Citizen’s Reference Panel on health technologies was convened in Ontario, Canada in 2009. The Engage2020 programme incorporates societal input in the Horizon2020 stream of European Union science funding….(More)”

Modernizing government’s approach to transportation and land use data: Challenges and opportunities


Adie Tomer and Ranjitha Shivaram at Brookings: “In the fields of transportation and land use planning, the public sector has long taken the leading role in the collection, analysis, and dissemination of data. Often, public data sets drawn from traveler diaries, surveys, and supply-side transportation maps were the only way to understand how people move around in the built environment – how they get to work, how they drop kids off at school, where they choose to work out or relax, and so on.

But, change is afoot: today, there are not only new data providers, but also new types of data. Cellphones, GPS trackers, and other navigation devices offer real-time demand-side data. For instance, mobile phone data can point to where distracted driving is a problem and help implement measures to deter such behavior. Insurance data and geo-located police data can guide traffic safety improvements, especially in accident-prone zones. Geotagged photo data can illustrate the use of popular public spaces by locals and tourists alike, enabling greater return on investment from public spaces. Data from exercise apps like Fitbit and Runkeeper can help identify recreational hot spots that attract people and those that don’t.

However, integrating all this data into how we actually plan and build communities—including the transportation systems that move all of us and our goods—will not be easy. There are several core challenges. Limited staff capacity and restricted budgets in public agencies can slow adoption. Governmental procurement policies are stuck in an analog era. Privacy concerns introduce risk and uncertainty. Private data could be simply unavailable to public consumers. And even if governments could acquire all of the new data and analytics that interest them, their planning and investment models must be updated to fully utilize these new resources.

Using a mix of primary research and expert interviews, this report catalogs emerging data sets related to transportation and land use, and assesses the ease by which they can be integrated into how public agencies manage the built environment. It finds that there is reason for the hype; we have the ability to know more about how humans move around today than at any time in history. But, despite all the obvious opportunities, not addressing core challenges will limit public agencies’ ability to put all that data to use for the collective good….(More)”

Uber Releases Open Source Project for Differential Privacy


Katie Tezapsidis at Uber Security: “Data analysis helps Uber continuously improve the user experience by preventing fraud, increasing efficiency, and providing important safety features for riders and drivers. Data gives our teams timely feedback about what we’re doing right and what needs improvement.

Uber is committed to protecting user privacy and we apply this principle throughout our business, including our internal data analytics. While Uber already has technical and administrative controls in place to limit who can access specific databases, we are adding additional protections governing how that data is used — even in authorized cases.

We are excited to give a first glimpse of our recent work on these additional protections with the release of a new open source tool, which we’ll introduce below.

Background: Differential Privacy

Differential privacy is a formal definition of privacy and is widely recognized by industry experts as providing strong and robust privacy assurances for individuals. In short, differential privacy allows general statistical analysis without revealing information about a particular individual in the data. Results do not even reveal whether any individual appears in the data. For this reason, differential privacy provides an extra layer of protection against re-identification attacks as well as attacks using auxiliary data.

Differential privacy can provide high accuracy results for the class of queries Uber commonly uses to identify statistical trends. Consequently, differential privacy allows us to calculate aggregations (averages, sums, counts, etc.) of elements like groups of users or trips on the platform without exposing information that could be used to infer details about a specific user or trip.

Differential privacy is enforced by adding noise to a query’s result, but some queries are more sensitive to the data of a single individual than others. To account for this, the amount of noise added must be tuned to the sensitivity of the query, which is defined as the maximum change in the query’s output when an individual’s data is added to or removed from the database.

As part of their job, a data analyst at Uber might need to know the average trip distance in a particular city. A large city, like San Francisco, might have hundreds of thousands of trips with an average distance of 3.5 miles. If any individual trip is removed from the data, the average remains close to 3.5 miles. This query therefore has low sensitivity, and thus requires less noise to enable each individual to remain anonymous within the crowd.

Conversely, the average trip distance in a smaller city with far fewer trips is more influenced by a single trip and may require more noise to provide the same degree of privacy. Differential privacy defines the precise amount of noise required given the sensitivity.

A major challenge for practical differential privacy is how to efficiently compute the sensitivity of a query. Existing methods lack sufficient support for the features used in Uber’s queries and many approaches require replacing the database with a custom runtime engine. Uber uses many different database engines and replacing these databases is infeasible. Moreover, custom runtimes cannot meet Uber’s demanding scalability and performance requirements.

Introducing Elastic Sensitivity

To address these challenges we adopted Elastic Sensitivity, a technique developed by security researchers at the University of California, Berkeley for efficiently calculating the sensitivity of a query without requiring changes to the database. The full technical details of Elastic Sensitivity are described here.

Today, we are excited to share a tool developed in collaboration with these researchers to calculate Elastic Sensitivity for SQL queries. The tool is available now on GitHub. It is designed to integrate easily with existing data environments and support additional state-of-the-art differential privacy mechanisms, which we plan to share in the coming months….(More)”.

Intelligent sharing: unleashing the potential of health and care data in the UK to transform outcomes


Report by Future Care Capital: “….Data is often referred to as the ‘new oil’ – the 21st century raw material which, when hitched to algorithmic refinement, may be mined for insight and value – and ‘data flows’ are said to have exerted a greater impact upon global growth than traditional goods flows in recent years (Manyika et al, 2016). Small wonder, then, that governments around the world are endeavouring to strike a balance between individual privacy rights and protections on the one hand, and organisational permissions to facilitate the creation of social, economic and environmental value from broad-ranging data on the other: ‘data rights’ are now of critical importance courtesy of technological advancements. The tension between the two is particularly evident where health and care data in the UK is concerned. Individuals are broadly content with anonymised data from their medical records being used for public benefit but are, understandably, anxious about the implications of the most intimate aspects of their lives being hacked or, else, shared without their knowledge or consent….

The potential for health and care data to be transformative remains, and there is growing concern that opportunities to improve the use of health and care data in peoples’ interests are being missed….

we recommend additional support for digitisation efforts in social care settings. We call upon the Government to streamline processes associated with Information Governance (IG) modelling to help data sharing initiatives that traverse organisational boundaries. We also advocate for investment and additional legal safeguards to make more anonymised data sets available for research and innovation. Crucially, we recommend expediting the scope for individuals to contribute health and care data to sharing initiatives led by the public sector through promotion, education and pilot activities – so that data is deployed to transform public health and support the ‘pivot to prevention’.

In Chapter Two, we explore the rationale and scope for the UK to build upon emergent practice from around the world and become a global leader in ‘data philanthropy’ – to push at the boundaries of existing plans and programmes, and support the development of and access to unrivalled health and care data sets. We look at member-controlled ‘data cooperatives’ and what we’ve termed ‘data communities’ operated by trusted intermediaries. We also explore ‘data collaboratives’ which involve the private sector engaging in data philanthropy for public benefit. Here, we make recommendations about promoting a culture of data philanthropy through the demonstration of tangible benefits to participants and the wider public, and we call upon Government to assess the appetite and feasibility of establishing the world’s first National Health and Care Data Donor Bank….(More)”

 

The ethics issue: Should we abandon privacy online?


Special issue of the New Scientist: “Those who would give up essential Liberty to purchase a little temporary Safety,” Benjamin Franklin once said, “deserve neither Liberty nor Safety.” But if Franklin were alive today, where would he draw the line? Is the freedom to send an encrypted text message essential? How about the right to keep our browsing history private? What is the sweet spot between our need to be left alone and our desire to keep potential criminals from communicating in secret?

In an age where fear of terrorism is high in the public consciousness, governments are likely to err on the side of safety. Over the past decade, the authorities have been pushing for – and getting – greater powers of surveillance than they have ever had, all in the name of national security.

The downsides are not immediately obvious. After all, you might think you have nothing to hide. But most of us have perfectly legal secrets we’d rather someone else didn’t see. And although the chances of the authorities turning up to take you away in a black SUV on the basis of your WhatsApp messages are small in free societies, the chances of insurance companies raising your premiums are not….(More)”.

Four lessons NHS Trusts can learn from the Royal Free case


Blog by Elizabeth Denham, Information Commissioner in the UK: “Today my office has announced that the Royal Free London NHS Foundation Trust did not comply with the Data Protection Act when it turned over the sensitive medical data of around 1.6 million patients to Google DeepMind, a private sector firm, as part of a clinical safety initiative. As a result of our investigation, the Trust has been asked to sign an undertaking committing it to changes to ensure it is acting in accordance with the law, and we’ll be working with them to make sure that happens.

But what about the rest of the sector? As organisations increasingly look to unlock the huge potential that creative uses of data can have for patient care, what are the lessons to be learned from this case?

It’s not a choice between privacy or innovation

It’s welcome that the trial looks to have been positive. The Trust has reported successful outcomes. Some may reflect that data protection rights are a small price to pay for this.

But what stood out to me on looking through the results of the investigation is that the shortcomings we found were avoidable. The price of innovation didn’t need to be the erosion of legally ensured fundamental privacy rights….

Don’t dive in too quickly

Privacy impact assessments are a key data protection tool of our era, as evolving law and best practice around the world demonstrate. Privacy impact assessments play an increasingly prominent role in data protection, and they’re a crucial part of digital innovation. ….

New cloud processing technologies mean you can, not that you always should

Changes in technology mean that vast data sets can be made more readily available and can be processed faster and using greater data processing technologies. That’s a positive thing, but just because evolving technologies can allow you to do more doesn’t mean these tools should always be fully utilised, particularly during a trial initiative….

Know the law, and follow it

No-one suggests that red tape should get in the way of progress. But when you’re setting out to test the clinical safety of a new service, remember that the rules are there for a reason….(More)”

Open Government: Concepts and Challenges for Public Administration’s Management in the Digital Era


Tippawan Lorsuwannarat in the Journal of Public and Private Management: “This paper has four main objectives. First, to disseminate a study on the meaning and development of open government. Second, to describe the components of an open government. Third, to examine the international movement situation involved with open government. And last, to analyze the challenges related to the application of open government in Thailandus current digital era. The paper suggests four periods of open government by linking to the concepts of public administration in accordance with the use of information technology in the public sector. The components of open government are consistent with the meaning of open government, including open data, open access, and open engagement. The current international situation of open government considers the ranking of open government and open government partnership. The challenges of adopting open government in Thailand include clear policy regarding open government, digital gap, public organizational culture, laws supporting privacy and data infrastructure….(More)”.

Big Data: A Twenty-First Century Arms Race


Report by Atlantic Council and Thomson Reuters: “We are living in a world awash in data. Accelerated interconnectivity, driven by the proliferation of internet-connected devices, has led to an explosion of data—big data. A race is now underway to develop new technologies and implement innovative methods that can handle the volume, variety, velocity, and veracity of big data and apply it smartly to provide decisive advantage and help solve major challenges facing companies and governments

For policy makers in government, big data and associated technologies like machine-learning and artificial Intelligence, have the potential to drastically improve their decision-making capabilities. How governments use big data may be a key factor in improved economic performance and national security. This publication looks at how big data can maximize the efficiency and effectiveness of government and business, while minimizing modern risks. Five authors explore big data across three cross-cutting issues: security, finance, and law.

Chapter 1, “The Conflict Between Protecting Privacy and Securing Nations,” Els de Busser
Chapter 2, “Big Data: Exposing the Risks from Within,” Erica Briscoe
Chapter 3, “Big Data: The Latest Tool in Fighting Crime,” Benjamin Dean, Fellow
Chapter 4, “Big Data: Tackling Illicit Financial Flows,” Tatiana Tropina
Chapter 5, “Big Data: Mitigating Financial Crime Risk,” Miren Aparicio….Read the Publication (PDF)

A Road-Map To Transform The Secure And Accessible Use Of Data For High Impact Program Management, Policy Development, And Scholarship


Preface and Roadmap by Andrew Reamer and Julia Lane: “Throughout the United States, there is broadly emerging support to significantly enhance the nation’s capacity for evidence-based policymaking. This support is shared across the public and private sectors and all levels of geography. In recent years, efforts to enable evidence-based analysis have been authorized by the U.S. Congress, and funded by state and local governments, philanthropic foundations.

The potential exists for substantial change. There has been dramatic growth in technological capabilities to organize, link, and analyze massive volumes of data from multiple, disparate sources. A major resource is administrative data, which offer both advantages and challenges in comparison to data gathered through the surveys that have been the basis for much policymaking to date. To date, however, capability-building efforts have been largely “artisanal” in nature. As a result, the ecosystem of evidence-based policymaking capacity-building efforts is thin and weakly connected.

Each attempt to add a node to the system faces multiple barriers that require substantial time, effort, and luck to address. Those barriers are systemic. Too much attention is paid to the interests of researchers, rather than in the engagement of data producers. Individual projects serve focused needs and operate at a relative distance from one another Researchers, policymakers and funding agencies thus need exists to move from these artisanal efforts to new, generalized solutions that will catalyze the creation of a robust, large-scale data infrastructure for evidence-based policymaking.

This infrastructure will have be a “complex, adaptive ecosystem” that expands, regenerates, and replicates as needed while allowing customization and local control. To create a path for achieving this goal, the U.S. Partnership on Mobility from Poverty commissioned 12 papers and then hosted a day-long gathering (January 23, 2017) of over 60 experts to discuss findings and implications for action. Funded by the Gates Foundation, the papers and workshop panels were organized around three topics: privacy and confidentiality, data providers, and comprehensive strategies.

This issue of the Annals showcases those 12 papers which jointly propose solutions for catalyzing the development of a data infrastructure for evidence-based policymaking.

This preface:

  • places current evidence-based policymaking efforts in historical context
  • briefly describes the nature of multiple current efforts,
  • provides a conceptual framework for catalyzing the growth of any large institutional ecosystem,
  • identifies the major dimensions of the data infrastructure ecosystem,
  • describes key barriers to the expansion of that ecosystem, and
  • suggests a roadmap for catalyzing that expansion….(More)

(All 12 papers can be accessed here).

Blockchains, personal data and the challenge of governance


Theo Bass at NESTA: “…There are a number of dominant internet platforms (Google, Facebook, Amazon, etc.) that hoard, analyse and sell information about their users in the name of a more personalised and efficient service. This has become a problem.

People feel they are losing control over how their data is used and reused on the web. 500 million adblocker downloads is a symptom of a market which isn’t working well for people. As Irene Ng mentions in a recent guest blog on the Nesta website, the secondary data market is thriving (online advertising is a major player), as companies benefit from the opacity and lack of transparency about where profit is made from personal data.

It’s said that blockchain’s key characteristics could provide a foundational protocol for a fairer digital identity system on the web. Beyond its application as digital currency, blockchain could provide a new set of technical standards for transparency, openness, and user consent, on top of which a whole new generation of services might be built.

While the aim is ambitious, a handful of projects are rising to the challenge.

Blockstack is creating a global system of digital IDs, which are written into the bitcoin blockchain. Nobody can touch them other than the owner of that ID. Blockstack are building a new generation of applications on top of this infrastructure which promises to provide “a new decentralized internet where users own their data and apps run locally”.

Sovrin attempts to provide users with “self-sovereign identity”. The argument is that “centralized” systems for storing personal data make it a “treasure chest for attackers”. Sovrin argues that users should more easily be able to have “ownership” over their data, and the exchange of data should be made possible through a decentralised, tamper-proof ledger of transactions between users.

Our own DECODE project is piloting a set of collaboratively owned, local sharing economy platforms in Barcelona and Amsterdam. The blockchain aims to provide a public record of entitlements over where people’s data is stored, who can access it and for what purpose (with some additional help from new techniques in zero-knowledge cryptography to preserve people’s privacy).

There’s no doubt this is an exciting field of innovation. But the debate is characterised by a lot of hype. The following sections therefore discuss some of the challenges thrown up when we start thinking about implementations beyond bitcoin.

Blockchains and the challenge of governance

As mentioned above, bitcoin is a “bearer asset”. This is a necessary feature of decentralisation — all users maintain sole ownership over the digital money they hold on the network. If users get hacked (digital wallets sometimes do), or if a password gets lost, the money is irretrievable.

While the example of losing a password might seem trivial, it highlights some difficult questions for proponents of blockchain’s wider uses. What happens if there’s a dispute over an online transaction, but no intermediary to settle it? What happens if a someone’s digital assets or their digital identity is breached and sensitive data falls into the wrong hands? It might be necessary to assign responsibility to a governing actor to help resolve the issue, but of course this would require the introduction of a trusted middleman.

Bitcoin doesn’t try to answer these questions; its anonymous creators deliberately tried to avoid implementing a clear model of governance over the network, probably because they knew that bitcoin would be used by people as a method for subverting the law. Bitcoin still sees a lot of use in gray economies, including for the sale of drugs and gambling.

But if blockchains are set to enter the mainstream, providing for businesses, governments and nonprofits, then they won’t be able to function irrespective of the law. They will need to find use-cases that can operate alongside legal frameworks and jurisdictional boundaries. They will need to demonstrate regulatory compliance, create systems of rules and provide accountability when things go awry. This cannot just be solved through increasingly sophisticated coding.

All of this raises a potential paradox recently elaborated in a post by Vili Lehdonvirta of the Oxford Internet Institute: is it possible to successfully govern blockchains without undermining their entire purpose?….

If blockchain advocates only work towards purely technical solutions and ignore real-world challenges of trying to implement decentralisation, then we’ll only ever see flawed implementations of the technology. This is already happening in the form of centrally administered, proprietary or ‘half-baked’ blockchains, which don’t offer much more value than traditional databases….(More)”.