Urban Slums in a Datafying Milieu: Challenges for Data-Driven Research Practice


Paper by Bijal Brahmbhatt et al: “With the ongoing trend of urban datafication and growing use of data/evidence to shape developmental initiatives by state as well as non-state actors, this exploratory case study engages with the complex and often contested domains of data use. This study uses on-the-ground experience of working with informal settlements in Indian cities to examine how information value chains work in practice and the contours of their power to intervene in building an agenda of social justice into governance regimes. Using illustrative examples from ongoing action-oriented projects of Mahila Housing Trust in India such as the Energy Audit Project, Slum Mapping Exercise and women-led climate resilience building under the Global Resilience Partnership, it raises questions about challenges of making effective linkages between data, knowledge and action in and for slum communities in the global South by focussing on two issues.

First, it reveals dilemmas of achieving data accuracy when working with slum communities in developing cities where populations are dynamically changing, and where digitisation and use of ICT has limited operational currency. The second issue focuses on data ownership. It foregrounds the need for complementary inputs and the heavy requirement for support systems in informal settlements in order to translate data-driven knowledge into actionable forms. Absence of these will blunt the edge of data-driven community participation in local politics. Through these intersecting streams, the study attempts to address how entanglements between southern urbanism, datafication, governance and social justice diversify the discourse on data justice. It highlights existing hurdles and structural hierarchies within a data-heavy developmental register emergent across multiple cities in the global South where data-driven governmental regimes interact with convoluted urban forms and realities….(More)”.

Algorithmic Impact Assessments under the GDPR: Producing Multi-layered Explanations


Paper by Margot E. Kaminski and Gianclaudio Malgieri: “Policy-makers, scholars, and commentators are increasingly concerned with the risks of using profiling algorithms and automated decision-making. The EU’s General Data Protection Regulation (GDPR) has tried to address these concerns through an array of regulatory tools. As one of us has argued, the GDPR combines individual rights with systemic governance, towards algorithmic accountability. The individual tools are largely geared towards individual “legibility”: making the decision-making system understandable to an individual invoking her rights. The systemic governance tools, instead, focus on bringing expertise and oversight into the system as a whole, and rely on the tactics of “collaborative governance,” that is, use public-private partnerships towards these goals. How these two approaches to transparency and accountability interact remains a largely unexplored question, with much of the legal literature focusing instead on whether there is an individual right to explanation.

The GDPR contains an array of systemic accountability tools. Of these tools, impact assessments (Art. 35) have recently received particular attention on both sides of the Atlantic, as a means of implementing algorithmic accountability at early stages of design, development, and training. The aim of this paper is to address how a Data Protection Impact Assessment (DPIA) links the two faces of the GDPR’s approach to algorithmic accountability: individual rights and systemic collaborative governance. We address the relationship between DPIAs and individual transparency rights. We propose, too, that impact assessments link the GDPR’s two methods of governing algorithmic decision-making by both providing systemic governance and serving as an important “suitable safeguard” (Art. 22) of individual rights….(More)”.

Data Fiduciary in Order to Alleviate Principal-Agent Problems in the Artificial Big Data Age


Paper by Julia M. Puaschunder: “The classic principal-agent problem in political science and economics describes agency dilemmas or problems when one person, the agent, is put in a situation to make decisions on behalf of another entity, the principal. A dilemma occurs in situations when individual profit maximization or principal and agent are pitted against each other. This so-called moral hazard is nowadays emerging in the artificial big data age, when big data reaping entities have to act on behalf of agents, who provide their data with trust in the principal’s integrity and responsible big data conduct. Yet to this day, no data fiduciary has been clearly described and established to protect the agent from misuse of data. This article introduces the agent’s predicament between utility derived from information sharing and dignity in privacy as well as hyper-hyperbolic discounting fallibilities to not clearly foresee what consequences information sharing can have over time and in groups. The principal’s predicament between secrecy and selling big data insights or using big data for manipulative purposes will be outlined. Finally, the article draws a clear distinction between manipulation and nudging in relation to the potential social class division of those who nudge and those who are nudged…(More)”.

Nudging the Nudger: Toward a Choice Architecture for Regulators


Working Paper by Susan E. Dudley and Zhoudan Xie: “Behavioral research has shown that individuals do not always behave in ways that match textbook definitions of rationality. Recognizing that “bounded rationality” also occurs in the regulatory process and building on public choice insights that focus on how institutional incentives affect behavior, this article explores the interaction between the institutions in which regulators operate and their cognitive biases. It attempts to understand the extent to which the “choice architecture” regulators face reinforces or counteracts predictable cognitive biases. Just as behavioral insights are increasingly used to design choice architecture to frame individual decisions in ways that encourage welfare-enhancing choices, consciously designing the institutions that influence regulators’ policy decisions with behavioral insights in mind could lead to more public-welfare-enhancing policies. The article concludes with some modest ideas for improving regulators’ choice architecture and suggestions for further research….(More)”.

The crowd in crowdsourcing: Crowdsourcing as a pragmatic research method


Lina Eklund, Isabell Stamm, Wanda Katja Liebermann at First Monday:
“Crowdsourcing, as a digital process employed to obtain information, ideas, and solicit contributions of work, creativity, etc., from large online crowds stems from business, yet is increasingly used in research. Engaging with previous literature and a symposium on academic crowdsourcing this study explores the underlying assumptions about crowdsourcing as a potential academic research method and how these affect the knowledge produced. Results identify crowdsourcing research as research about and with the crowd, explore how tasks can be productive, reconfiguring, and evaluating, and how these are linked to intrinsic and extrinsic rewards, we also identify three types of platforms: commercial platforms, research-specific platforms, and project specific platforms. Finally, the study suggests that crowdsourcing is a digital method that could be considered a pragmatic method; the challenge of a sound crowdsourcing project is to think about the researcher’s relationship to the crowd, the tasks, and the platform used….(More)”.

Risk identification and management for the research use of government administrative data


Paper by Elizabeth Shepherd, Anna Sexton, Oliver Duke-Williams, and Alexandra Eveleigh: “Government administrative data have enormous potential for public and individual benefit through improved educational and health services to citizens, medical research, environmental and climate interventions and exploitation of scarce energy resources. Administrative data is usually “collected primarily for administrative (not research) purposes by government departments and other organizations for the purposes of registration, transaction and record keeping, during the delivery of a service” such as health care, vehicle licensing, tax and social security systems (https://esrc.ukri.org/funding/guidance-for-applicants/research-ethics/useful-resources/key-terms-glossary/). Administrative data are usually distinguished from data collected for statistical use such as the census. Unlike administrative records, they do not provide evidence of activities and generally lack metadata and context relating to provenance. Administrative data, unlike open data, are not routinely made open or accessible, but access can be provided only on request to named researchers for specified research projects through research access protocols that often take months to negotiate and are subject to significant constraints around re-use such as the use of safe havens. Researchers seldom make use of freedom of information or access to information protocols to access such data because they need specific datasets and particular levels of granularity and an ability to re-process data, which are not made generally available. This study draws on research undertaken by the authors as part of the Administrative Data Research Centre in England (ADRC-E). The research examined perspectives on the sharing, linking and re-use (secondary use) of administrative data in England, viewed through three analytical themes: trust, consent and risk. This study presents the analysis of the identification and management of risk in the research use of government administrative data and presents a risk framework. Risk management (i.e. coordinated activities that allow organizations to control risks, Lemieux, 2010) enables us to think about the balance between risk and benefit for the public good and for other stakeholders. Mitigating activities or management mechanisms used to control the identified risks depend on the resources available to implement the options, on the risk appetite or tolerance of the community and on the cost and likely effectiveness of the mitigation. Mitigation and risk do not work in isolation and should be holistically viewed by keeping the whole information infrastructure in balance across the administrative data system and between multiple stakeholders.

This study seeks to establish a clearer picture of risk with regard to government administrative data in England. It identifies and categorizes the risks arising from the research use of government administrative data. It identifies mitigating risk management activities, linked to five key stakeholder communities and discusses the locus of responsibility for risk management actions. The identification of the risks and of mitigation strategies is derived from the viewpoints of the interviewees and associated documentation; therefore, they reflect their lived experience. The five stakeholder groups identified from the data are as follows: individual researchers; employers of researchers; wider research community; data creators and providers and data subjects and the broader public. The primary sections of the study, following the methodology and research context, set out the seven identified types of risk events in the research use of administrative data, present a stakeholder mapping of the communities in this research affected by the risks and discuss the findings related to managing and mitigating the risks identified. The conclusion presents the elements of a new risk framework to inform future actions by the government data community and enable researchers to exploit the power of administrative data for public good….(More)”.

Three Eras of Digital Governance


Paper by Jonathan L. Zittrain: “To understand where digital governance is going, we must take stock of where it’s been, because the timbre of mainstream thinking around digital governance today is dramatically different than it was when study of “Internet governance” coalesced in the late 1990s.

Perhaps the most obvious change has been from emphasizing networked technologies’ positive effects and promise – couched around concepts like connectivity, innovation, and, by this author, “generativity” – to pointing out their harms and threats. It’s not that threats weren’t previously recognized, but rather that they were more often seen in external clamps on technological development and upon the corresponding new freedoms for users, whether government intervention to block VOIP services like Skype to protect incumbent telco revenues, or in the shaping of technology to effect undue surveillance, whether for government or corporate purposes.

The shift in emphasis from positive to negative corresponds to a change in the overarching frameworks for talking about regulating information technology. We have moved from a discourse around rights – particularly those of end-users, and the ways in which abstention by intermediaries is important to facilitate citizen flourishing – to one of public health, which naturally asks for a weighing of the systemic benefits or harms of a technology, and to think about what systemic interventions might curtail its apparent excesses.

Each framework captures important values around the use of technology that can both empower and limit individual freedom of action, including to engage in harmful conduct. Our goal today should be to identify where competing values frameworks themselves preclude understanding of others’ positions about regulation, and to see if we can map a path forward that, if not reconciling the frameworks, allows for satisfying, if ever-evolving, resolutions to immediate questions of public and private governance…(More)”.

Bottom-up data Trusts: disturbing the ‘one size fits all’ approach to data governance


Sylvie Delacroix and Neil D Lawrence at International Data Privacy Law: “From the friends we make to the foods we like, via our shopping and sleeping habits, most aspects of our quotidian lives can now be turned into machine-readable data points. For those able to turn these data points into models predicting what we will do next, this data can be a source of wealth. For those keen to replace biased, fickle human decisions, this data—sometimes misleadingly—offers the promise of automated, increased accuracy. For those intent on modifying our behaviour, this data can help build a puppeteer’s strings. As we move from one way of framing data governance challenges to another, salient answers change accordingly. Just like the wealth redistribution way of framing those challenges tends to be met with a property-based, ‘it’s our data’ answer, when one frames the problem in terms of manipulation potential, dignity-based, human rights answers rightly prevail (via fairness and transparency-based answers to contestability concerns). Positive data-sharing aspirations tend to be raised within altogether different conversations from those aimed at addressing the above concerns. Our data Trusts proposal challenges these boundaries.

This article proceeds from an analysis of the very particular type of vulnerability concomitant with our ‘leaking’ data on a daily basis, to show that data ownership is both unlikely and inadequate as an answer to the problems at stake. We also argue that the current construction of top-down regulatory constraints on contractual freedom is both necessary and insufficient. To address the particular type of vulnerability at stake, bottom-up empowerment structures are needed. The latter aim to ‘give a voice’ to data subjects whose choices when it comes to data governance are often reduced to binary, ill-informed consent. While the rights granted by instruments like the GDPR can be used as tools in a bid to shape possible data-reliant futures—such as better use of natural resources, medical care, etc, their exercise is both demanding and unlikely to be as impactful when leveraged individually. As a bottom-up governance structure that is uniquely capable of taking into account the vulnerabilities outlined in the first section, we highlight the constructive potential inherent in data Trusts. This potential crosses the traditional boundaries between individualist protection concerns on one hand and collective empowerment aspirations on the other.

The second section explains how the Trust structure allows data subjects to choose to pool the rights they have over their personal data within the legal framework of a data Trust. It is important that there be a variety of data Trusts, arising out of a mix of publicly and privately funded initiatives. Each Trust will encapsulate a particular set of aspirations, reflected in the terms of the Trust. Bound by a fiduciary obligation of undivided loyalty, data trustees will exercise the data rights held under the Trust according to its particular terms. In contrast to a recently commissioned report,1 we explain why data can indeed be held in a Trust, and why the extent to which certain kinds of data may be said to give rise to property rights is neither here nor there as far as our proposal is concerned. What matters, instead, is the extent to which regulatory instruments such as the GDPR confer rights, and for what kind of data. The breadth of those rights will determine the possible scope of data Trusts in various jurisdictions.

Our ‘Case Studies’ aim to illustrate the complementarity of our data Trusts proposal with the legal provisions pertaining to different kinds of personal data, from medical, genetic, financial, and loyalty card data to social media feeds. The final section critically considers a variety of implementation challenges, which range from Trust Law’s cross-jurisdictional aspects to uptake and exit procedures, including issues related to data of shared provenance. We conclude by highlighting the way in which an ecosystem of data Trusts addresses ethical, legal, and political needs that are complementary to those within the reach of regulatory interventions such as the GDPR….(More)”.

Tracking the Labor Market with “Big Data”


Tomaz Cajner, Leland Crane, Ryan Decker, Adrian Hamins-Puertolas, and Christopher Kurz at FEDSNotes: “Payroll employment growth is one of the most reliable business cycle indicators. Each postwar recession in the United States has been characterized by a year-on-year drop in payroll employment as measured by the BLS Current Employment Statistics (CES) survey, and, outside of these recessionary declines, the year-on-year payroll employment growth has always been positive. Thus, it is not surprising that policymakers, financial markets, and the general public pay a great deal of attention to the CES payroll employment gains reported at the beginning of each month.

However, while the CES survey is one of the most carefully conducted measures of labor market activity and uses an extremely large sample, it is still subject to significant sampling error and nonsampling errors. For example, when the BLS first reported that private nonfarm payroll gains were 148,000 in July 2019, the associated 90 percent confidence interval was +/- 100,000 due to sampling error alone….

One such source of alternative labor market data is the payroll-processing company ADP, which covers 20 percent of the private workforce. These are the data that underlie ADP’s monthly National Employment Report (NER), which forecasts BLS payroll employment changes by using a combination of ADP-derived data and other publicly available data. In our research, we explore the information content of the ADP microdata alone by producing an estimate of employment changes independent from the BLS payroll series as well as from other data sources.

A potential concern when using the ADP data is that only the firms which hire ADP to manage their payrolls will appear in the data, and this may introduce sample selection issues….(More)”

The Economics of Social Data: An Introduction


Paper by Dirk Bergemann and Alessandro Bonatti: “Large internet platforms collect data from individual users in almost every interaction on the internet. Whenever an individual browses a news website, searches for a medical term or for a travel recommendation, or simply checks the weather forecast on an app, that individual generates data. A central feature of the data collected from the individuals is its social aspect. Namely, the data captured from an individual user is not only informative about this specific individual, but also about users in some metric similar to the individual. Thus, the individual data is really social data. The social nature of the data generates an informational externality that we investigate in this note….(More)”.