Data & Policy


Data & Policy, an open-access journal exploring the potential of data science for governance and public decision-making, published its first cluster of peer-reviewed articles last week.

The articles include three contributions specifically concerned with data protection by design:

·       Gefion Theurmer and colleagues (University of Southampton) distinguish between data trusts and other data sharing mechanisms and discuss the need for workflows with data protection at their core;

·       Swee Leng Harris (King’s College London) explores Data Protection Impact Assessments as a framework for helping us know whether government use of data is legal, transparent and upholds human rights;

·       Giorgia Bincoletto’s (University of Bologna) study investigates data protection concerns arising from cross-border interoperability of Electronic Health Record systems in the European Union;

Also published, research by Jacqueline Lam and colleagues (University of Cambridge; Hong Kong University) on how fine-grained data from satellites and other sources can help us understand environmental inequality and socio-economic disparities in China, and this also reflects upon the importance of safeguarding data privacy and security. See also the blogs this week on the potential of Data Collaboratives for COVID-19 by Editor-in-Chief Stefaan Verhulst (the GovLab) and how COVID-19 exposes a widening data divide for the Global South, by Stefania Milan (University of Amsterdam) and Emiliano Treré (University of Cardiff).

Data & Policy is an open access, peer-reviewed venue for contributions that consider how systems of policy and data relate to one another. Read the 5 ways you can contribute to Data & Policy and contact [email protected] with any questions….(More)”.

The Concept of Function Creep


Paper by Bert-Jaap Koops: “Function creep – the expansion of a system or technology beyond its original purposes – is a well-known phenomenon in STS, technology regulation, and surveillance studies. Correction: it is a well-referenced phenomenon. Yearly, hundreds of publications use the term to criticise developments in technology regulation and data governance. But why function creep is problematic, and why authors call system expansion ‘function creep’ rather than ‘innovation’, is underresearched. If the core problem is unknown, we can hardly identify suitable responses; therefore, we first need to understand what the concept actually refers to.

Surprisingly, no-one has ever written a paper about the concept itself. This paper fills that gap in the literature, by analysing and defining ‘function creep’. This creates conceptual clarity that can help structure future debates and address function creep concerns. First, I analyse what ‘function creep’ refers to, through semiotic analysis of the term and its role in discourse. Second, I discuss concepts that share family resemblances, including other ‘creep’ concepts and many theoretical notions from STS, economics, sociology, public policy, law, and discourse theory. Function creep can be situated in the nexus of reverse adaptation and self-augmentation of technology, incrementalism and disruption in policy and innovation, policy spillovers, ratchet effects, transformative use, and slippery slope argumentation.

Based on this, function creep can be defined as *an imperceptibly transformative and therewith contestable change in a data-processing system’s proper activity*. What distinguishes function creep from innovation is that it denotes some qualitative change in functionality that causes concern not only because of the change itself, but also because the change is insufficiently acknowledged as transformative and therefore requiring discussion. Argumentation theory illuminates how the pejorative ‘function creep’ functions in debates: it makes visible that what looks like linear change is actually non-linear, and simultaneously calls for much-needed debate about this qualitative change…(More)”.

The explanation game: a formal framework for interpretable machine learning


Paper by David S. Watson & Luciano Floridi: “We propose a formal framework for interpretable machine learning. Combining elements from statistical learning, causal interventionism, and decision theory, we design an idealised explanation game in which players collaborate to find the best explanation(s) for a given algorithmic prediction. Through an iterative procedure of questions and answers, the players establish a three-dimensional Pareto frontier that describes the optimal trade-offs between explanatory accuracy, simplicity, and relevance. Multiple rounds are played at different levels of abstraction, allowing the players to explore overlapping causal patterns of variable granularity and scope. We characterise the conditions under which such a game is almost surely guaranteed to converge on a (conditionally) optimal explanation surface in polynomial time, and highlight obstacles that will tend to prevent the players from advancing beyond certain explanatory thresholds. The game serves a descriptive and a normative function, establishing a conceptual space in which to analyse and compare existing proposals, as well as design new and improved solutions….(More)”

Mobile phone data and COVID-19: Missing an opportunity?


Paper by Nuria Oliver, et al: “This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic….(More)”.

Deliberative Mini-Publics as a Response to Populist Democratic Backsliding


Chapter by Oran Doyle and Rachael Walsh: “Populisms come in different forms, but all involve a political rhetoric that invokes the will of a unitary people to combat perceived constraints, whether economic, legal, or technocratic. In this chapter, our focus is democratic backsliding aided by populist rhetoric. Some have suggested deliberative democracy as a means to combat this form of populism. Deliberative democracy encourages and facilitates both consultation and contestation, emphasizing plurality of voices, the legitimacy of disagreement, and the imperative of reasoned persuasion. Its participatory and inclusive character has the potential to undermine the credibility of populists’ claims to speak for a unitary people. Ireland has been widely referenced in constitutionalism’s deliberative turn, given its recent integration of deliberative mini-publics into the constitutional amendment process.

Reviewing the Irish experience, we suggest that deliberative mini-publics are unlikely to reverse democratic backsliding. Populist rhetoric is fueled by the very measures intended to combat democratic backsliding: enhanced constitutional constraints merely illustrate how the will of the people is being thwarted. The virtues of Ireland’s experiment in deliberative democracy — citizen participation, integration with representative democracy, deliberation, balanced information, expertise — have all been criticized in ways that are at least consistent with populist narratives. The failure of such narratives to take hold in Ireland, we suggest, may be due to a political system that is already resistant to populist rhetoric, as well as a tradition of participatory constitutionalism. The experiment with deliberative mini-publics may have strengthened Ireland’s constitutional culture by reinforcing anti-populist features. But it cannot be assumed that this experience would be replicated in larger countries polarized along political, ethnic, or religious lines….(More)”.

Urgently Needed for Policy Guidance: An Operational Tool for Monitoring the COVID-19 Pandemic


Paper by Stephane Luchini et al:” The radical uncertainty around the current COVID19 pandemics requires that governments around the world should be able to track in real time not only how the virus spreads but, most importantly, what policies are effective in keeping the spread of the disease under check. To improve the quality of health decision-making, we argue that it is necessary to monitor and compare acceleration/deceleration of confirmed cases over health policy responses, across countries. To do so, we provide a simple mathematical tool to estimate the convexity/concavity of trends in epidemiological surveillance data. Had it been applied at the onset of the crisis, it would have offered more opportunities to measure the impact of the policies undertaken in different Asian countries, and to allow European and North-American governments to draw quicker lessons from these Asian experiences when making policy decisions. Our tool can be especially useful as the epidemic is currently extending to lower-income African and South American countries, some of which have weaker health systems….(More)”.

Researchers Develop Faster Way to Replace Bad Data With Accurate Information


NCSU Press Release: “Researchers from North Carolina State University and the Army Research Office have demonstrated a new model of how competing pieces of information spread in online social networks and the Internet of Things (IoT). The findings could be used to disseminate accurate information more quickly, displacing false information about anything from computer security to public health….

In their paper, the researchers show that a network’s size plays a significant role in how quickly “good” information can displace “bad” information. However, a large network is not necessarily better or worse than a small one. Instead, the speed at which good data travels is primarily affected by the network’s structure.

A highly interconnected network can disseminate new data very quickly. And the larger the network, the faster the new data will travel.

However, in networks that are connected primarily by a limited number of key nodes, those nodes serve as bottlenecks. As a result, the larger this type of network is, the slower the new data will travel.

The researchers also identified an algorithm that can be used to assess which point in a network would allow you to spread new data throughout the network most quickly.

“Practically speaking, this could be used to ensure that an IoT network purges old data as quickly as possible and is operating with new, accurate data,” Wenye Wang says.

“But these findings are also applicable to online social networks, and could be used to facilitate the spread of accurate information regarding subjects that affect the public,” says Jie Wang. “For example, we think it could be used to combat misinformation online.”…(More)”

Full paper: “Modeling and Analysis of Conflicting Information Propagation in a Finite Time Horizon,”

Responding to COVID-19 with AI and machine learning


Paper by Mihaela van der Schaar et al: “…AI and machine learning can use data to make objective and informed recommendations, and can help ensure that scarce resources are allocated as efficiently as possible. Doing so will save lives and can help reduce the burden on healthcare systems and professionals….

1. Managing limited resources

AI and machine learning can help us identify people who are at highest risk of being infected by the novel coronavirus. This can be done by integrating electronic health record data with a multitude of “big data” pertaining to human-to-human interactions (from cellular operators, traffic, airlines, social media, etc.). This will make allocation of resources like testing kits more efficient, as well as informing how we, as a society, respond to this crisis over time….

2. Developing a personalized treatment course for each patient 

As mentioned above, COVID-19 symptoms and disease evolution vary widely from patient to patient in terms of severity and characteristics. A one-size-fits-all approach for treatment doesn’t work. We also are a long way off from mass-producing a vaccine. 

Machine learning techniques can help determine the most efficient course of treatment for each individual patient on the basis of observational data about previous patients, including their characteristics and treatments administered. We can use machine learning to answer key “what-if” questions about each patient, such as “What if we postpone a couple hours before putting them on a ventilator?” or “Would the outcome for this patient be better if we switched them from supportive care to an experimental treatment earlier?”

3. Informing policies and improving collaboration

…It’s hard to get a clear sense of which decisions result in the best outcomes. In such a stressful situation, it’s also hard for decision-makers to be aware of the outcomes of decisions being made by their counterparts elsewhere. 

Once again, data-driven AI and machine learning can provide objective and usable insights that far exceed the capabilities of existing methods. We can gain valuable insight into what the differences between policies are, why policies are different, which policies work better, and how to design and adopt improved policies….

4. Managing uncertainty

….We can use an area of machine learning called transfer learning to account for differences between populations, substantially eliminating bias while still extracting usable data that can be applied from one population to another. 

We can also use methods to make us aware of the degree of uncertainty of any given conclusion or recommendation generated from machine learning. This means that decision-makers can be provided with confidence estimates that tell them how confident they can be about a recommended course of action.

5. Expediting clinical trials

Randomized clinical trials (RCTs) are generally used to judge the relative effectiveness of a new treatment. However, these trials can be slow and costly, and may fail to uncover specific subgroups for which a treatment may be most effective. A specific problem posed by COVID-19 is that subjects selected for RCTs tend not to be elderly, or to have other conditions; as we know, COVID-19 has a particularly severe impact on both those patient groups….

The AI and machine learning techniques I’ve mentioned above do not require further peer review or further testing. Many have already been implemented on a smaller scale in real-world settings. They are essentially ready to go, with only slight adaptations required….(More) (Full Paper)”.

Human migration: the big data perspective


Alina Sîrbu et al at the International Journal of Data Science and Analytics: “How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants….(More)”.

The Law and Economics of Online Republication


Paper by Ronen Perry: “Jerry publishes unlawful content about Newman on Facebook, Elaine shares Jerry’s post, the share automatically turns into a tweet because her Facebook and Twitter accounts are linked, and George immediately retweets it. Should Elaine and George be liable for these republications? The question is neither theoretical nor idiosyncratic. On occasion, it reaches the headlines, as when Jennifer Lawrence’s representatives announced she would sue every person involved in the dissemination, through various online platforms, of her illegally obtained nude pictures. Yet this is only the tip of the iceberg. Numerous potentially offensive items are reposted daily, their exposure expands in widening circles, and they sometimes “go viral.”

This Article is the first to provide a law and economics analysis of the question of liability for online republication. Its main thesis is that liability for republication generates a specter of multiple defendants which might dilute the originator’s liability and undermine its deterrent effect. The Article concludes that, subject to several exceptions and methodological caveats, only the originator should be liable. This seems to be the American rule, as enunciated in Batzel v. Smith and Barrett v. Rosenthal. It stands in stark contrast to the prevalent rules in other Western jurisdictions and has been challenged by scholars on various grounds since its very inception.

The Article unfolds in three Parts. Part I presents the legal framework. It first discusses the rules applicable to republication of self-created content, focusing on the emergence of the single publication rule and its natural extension to online republication. It then turns to republication of third-party content. American law makes a clear-cut distinction between offline republication which gives rise to a new cause of action against the republisher (subject to a few limited exceptions), and online republication which enjoys an almost absolute immunity under § 230 of the Communications Decency Act. Other Western jurisdictions employ more generous republisher liability regimes, which usually require endorsement, a knowing expansion of exposure or repetition.

Part II offers an economic justification for the American model. Law and economics literature has showed that attributing liability for constant indivisible harm to multiple injurers, where each could have single-handedly prevented that harm (“alternative care” settings), leads to dilution of liability. Online republication scenarios often involve multiple tortfeasors. However, they differ from previously analyzed phenomena because they are not alternative care situations, and because the harm—increased by the conduct of each tortfeasor—is not constant and indivisible. Part II argues that neither feature precludes the dilution argument. It explains that the impact of the multiplicity of injurers in the online republication context on liability and deterrence provides a general justification for the American rule. This rule’s relatively low administrative costs afford additional support.

Part III considers the possible limits of the theoretical argument. It maintains that exceptions to the exclusive originator liability rule should be recognized when the originator is unidentifiable or judgment-proof, and when either the republisher’s identity or the republication’s audience was unforeseeable. It also explains that the rule does not preclude liability for positive endorsement with a substantial addition, which constitutes a new original publication, or for the dissemination of illegally obtained content, which is an independent wrong. Lastly, Part III addresses possible challenges to the main argument’s underlying assumptions, namely that liability dilution is a real risk and that it is undesirable….(More)”.