Leveraging Private Data for Public Good: A Descriptive Analysis and Typology of Existing Practices


New report by Stefaan Verhulst, Andrew Young, Michelle Winowatan. and Andrew J. Zahuranec: “To address the challenges of our times, we need both new solutions and new ways to develop those solutions. The responsible use of data will be key toward that end. Since pioneering the concept of “data collaboratives” in 2015, The GovLab has studied and experimented with innovative ways to leverage private-sector data to tackle various societal challenges, such as urban mobility, public health, and climate change.

While we have seen an uptake in normative discussions on how data should be shared, little analysis exists of the actual practice. This paper seeks to address that gap and seeks to answer the following question: What are the variables and models that determine functional access to private sector data for public good? In Leveraging Private Data for Public Good: A Descriptive Analysis and Typology of Existing Practices, we describe the emerging universe of data collaboratives and develop a typology of six practice areas. Our goal is to provide insight into current applications to accelerate the creation of new data collaboratives. The report outlines dozens of examples, as well as a set of recommendations to enable more systematic, sustainable, and responsible data collaboration….(More)”

The Colombian Anti-Corruption Referendum: Why It Failed?


Paper by Michael Haman: “The objective of this article is to analyze the results of the anti-corruption referendum in Colombia in 2018. Colombia is a country with a significant corruption problem. More than 99% of the voters who came to the polls voted in favor of the proposals. However, the anti-corruption referendum nonetheless failed because not enough citizens were mobilized to participate. The article addresses the reasons why turnout was very low…

Conclusions: I find that the more transparent a municipality, the higher the percentage of the municipal electorate that voted for proposals in the anti-corruption referendum. Moreover, I find that in municipalities where support for Sergio Fajardo in the presidential election was higher and support for Iván Duque was lower, support for the referendum proposals was higher. Also, turnout was lower in municipalities with higher poverty rates and higher homicide rates…(More)”.

Identifying Citizens’ Needs by Combining Artificial Intelligence (AI) and Collective Intelligence (CI)


Report by Andrew Zahuranec, Andrew Young and Stefaan G. Verhulst: “Around the world, public leaders are seeking new ways to better understand the needs of their citizens, and subsequently improve governance, and how we solve public problems. The approaches proposed toward changing public engagement tend to focus on leveraging two innovations. The first involves artificial intelligence (AI), which offers unprecedented abilities to quickly process vast quantities of data to deepen insights into public needs. The second is collective intelligence (CI), which provides means for tapping into the “wisdom of the crowd.” Both have strengths and weaknesses, but little is known on how the combination of both could address their weaknesses while radically transform how we meet public demands for more responsive governance.

Today, The GovLab is releasing a new report, Identifiying Citizens’ Needs By Combining AI and CI, which seeks to identify and assess how institutions might responsibly experiment in how they engage with citizens by leveraging AI and CI together.

The report, authored by Stefaan G. Verhulst, Andrew J. Zahuranec, and Andrew Young, builds upon an initial examination of the intersection of AI and CI conducted in the context of the MacArthur Foundation Research Network on Opening Governance. …

The report features five in-depth case studies and an overview of eight additional examples from around the world on how AI and CI together can help to: 

  • Anticipate citizens’ needs and expectations through cognitive insights and process automation and pre-empt problems through improved forecasting and anticipation;
  • Analyze large volumes of citizen data and feedback, such as identifying patterns in complaints;
  • Allow public officials to create highly personalized campaigns and services; or
  • Empower government service representatives to deliver relevant actions….(More)”.

Restricting data’s use: A spectrum of concerns in need of flexible approaches


Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.

Urban Systems Design: From “science for design” to “design in science”


Introduction to Special Issue of Urban Analytics and City Science by Perry PJ Yang and Yoshiki Yamagata: “The direct design of cities is often regarded as impossible, owing to the fluidity, complexity, and uncertainty entailed in urban systems. And yet, we do design our cities, however imperfectly. Cities are objects of our own creation, they are intended landscapes, manageable, experienced and susceptible to analysis (Lynch, 1984). Urban design as a discipline has always focused on “design” in its professional practices. Urban designers tend to ask normative questions about how good city forms are designed, or how a city and its urban spaces ought to be made, thereby problematizing urban form-making and the values entailed. These design questions are analytically distinct from “science”-related research that tends to ask positive questions such as how cities function, or what properties emerge from interactive processes of urban systems. The latter questions require data, analytic techniques, and research methods to generate insight.

This theme issue “Urban Systems Design” is an attempt to outline a research agenda by connecting urban design and systems science, which is grounded in both normative and positive questions. It aims to contribute to the emerging field of urban analytics and city science that is central to this journal. Recent discussions of smart cities inspire urban design, planning and architectural professionals to address questions of how smart cities are shaped and what should be made. What are the impacts of information and communication technologies (ICT) on the questions of how built environments are designed and developed? How would the internet of things (IoT), big data analytics and urban automation influence how humans perceive, experience, use and interact with the urban environment? In short, what are the emerging new urban forms driven by the rapid move to ‘smart cities’?…(More)”.

#Kremlin: Using Hashtags to Analyze Russian Disinformation Strategy and Dissemination on Twitter


Paper by Sarah Oates, and John Gray: “Reports of Russian interference in U.S. elections have raised grave concerns about the spread of foreign disinformation on social media sites, but there is little detailed analysis that links traditional political communication theory to social media analytics. As a result, it is difficult for researchers and analysts to gauge the nature or level of the threat that is disseminated via social media. This paper leverages both social science and data science by using traditional content analysis and Twitter analytics to trace how key aspects of Russian strategic narratives were distributed via #skripal, #mh17, #Donetsk, and #russophobia in late 2018.

This work will define how key Russian international communicative goals are expressed through strategic narratives, describe how to find hashtags that reflect those narratives, and analyze user activity around the hashtags. This tests both how Twitter amplifies specific information goals of the Russians as well as the relative success (or failure) of particular hashtags to spread those messages effectively. This research uses Mentionmapp, a system co-developed by one of the authors (Gray) that employs network analytics and machine intelligence to identify the behavior of Twitter users as well as generate profiles of users via posting history and connections. This study demonstrates how political communication theory can be used to frame the study of social media; how to relate knowledge of Russian strategic priorities to labels on social media such as Twitter hashtags; and to test this approach by examining a set of Russian propaganda narratives as they are represented by hashtags. Our research finds that some Twitter users are consistently active across multiple Kremlin-linked hashtags, suggesting that knowledge of these hashtags is an important way to identify Russian propaganda online influencers. More broadly, we suggest that Twitter dichotomies such as bot/human or troll/citizen should be used with caution and analysis should instead address the nuances in Twitter use that reflect varying levels of engagement or even awareness in spreading foreign disinformation online….(More)”.

The personification of big data


Paper by Stevenson, Phillip Douglas and Mattson, Christopher Andrew: “Organizations all over the world, both national and international, gather demographic data so that the progress of nations and peoples can be tracked. This data is often made available to the public in the form of aggregated national level data or individual responses (microdata). Product designers likewise conduct surveys to better understand their customer and create personas. Personas are archetypes of the individuals who will use, maintain, sell or otherwise be affected by the products created by designers. Personas help designers better understand the person the product is designed for. Unfortunately, the process of collecting customer information and creating personas is often a slow and expensive process.

In this paper, we introduce a new method of creating personas, leveraging publicly available databanks of both aggregated national level and information on individuals in the population. A computational persona generator is introduced that creates a population of personas that mirrors a real population in terms of size and statistics. Realistic individual personas are filtered from this population for use in product development…(More)”.

Artificial Intelligence and Digital Repression: Global Challenges to Governance


Paper by Steven Feldstein: “Across the world, artificial intelligence (AI) is showing its potential for abetting repressive regimes and upending the relationship between citizen and state, thereby exacerbating a global resurgence of authoritarianism. AI is a component in a broader ecosystem of digital repression, but it is relevant to several different techniques, including surveillance, censorship, disinformation, and cyber attacks. AI offers three distinct advantages to autocratic leaders: it helps solve principal-agent loyalty problems, it offers substantial cost-efficiencies over traditional means of surveillance, and it is particularly effective against external regime challenges. China is a key proliferator of AI technology to authoritarian and illiberal regimes; such proliferation is an important component of Chinese geopolitical strategy. To counter the spread of high-tech repression abroad, as well as potential abuses at home, policy makers in democratic states must think seriously about how to mitigate harms and to shape better practices….(More)”

Whose Commons? Data Protection as a Legal Limit of Open Science


Mark Phillips and Bartha M. Knoppers in the Journal of Law, Medicine and Ethics: “Open science has recently gained traction as establishment institutions have come on-side and thrown their weight behind the movement and initiatives aimed at creation of information commons. At the same time, the movement’s traditional insistence on unrestricted dissemination and reuse of all information of scientific value has been challenged by the movement to strengthen protection of personal data. This article assesses tensions between open science and data protection, with a focus on the GDPR.

Powerful institutions across the globe have recently joined the ranks of those making substantive commitments to “open science.” For example, the European Commission and the NIH National Cancer Institute are supporting large-scale collaborations, such as the Cancer Genome Collaboratory, the European Open Science Cloud, and the Genomic Data Commons, with the aim of making giant stores of genomic and other data readily available for analysis by researchers. In the field of neuroscience, the Montreal Neurological Institute is midway through a novel five-year project through which it plans to adopt open science across the full spectrum of its research. The commitment is “to make publicly available all positive and negative data by the date of first publication, to open its biobank to registered researchers and, perhaps most significantly, to withdraw its support of patenting on any direct research outputs.” The resources and influence of these institutions seem to be tipping the scales, transforming open science from a longstanding aspirational ideal into an existing reality.

Although open science lacks any standard, accepted definition, one widely-cited model proposed by the Austria-based advocacy effort openscienceASAP describes it by reference to six principles: open methodology, open source, open data, open access, open peer review, and open educational resources. The overarching principle is “the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.” This article adopts this principle as a working definition of open science, with a particular emphasis on open sharing of human data.

As noted above, many of the institutions committed to open science use the word “commons” to describe their initiatives, and the two concepts are closely related. “Medical information commons” refers to “a networked environment in which diverse sources of health, medical, and genomic information on large populations become widely shared resources.” Commentators explicitly link the success of information commons and progress in the research and clinical realms to open science-based design principles such as data access and transparent analysis (i.e., sharing of information about methods and other metadata together with medical or health data).

But what legal, as well as ethical and social, factors will ultimately shape the contours of open science? Should all restrictions be fought, or should some be allowed to persist, and if so, in what form? Given that a commons is not a free-for-all, in that its governing rules shape its outcomes, how might we tailor law and policy to channel open science to fulfill its highest aspirations, such as universalizing practical access to scientific knowledge and its benefits, and avoid potential pitfalls? This article primarily concerns research data, although passing reference is also made to the approach to the terms under which academic publications are available, which are subject to similar debates….(More)”.

The Market for Data Privacy


Paper by Tarun Ramadorai, Antoine Uettwiller and Ansgar Walther: “We scrape a comprehensive set of US firms’ privacy policies to facilitate research on the supply of data privacy. We analyze these data with the help of expert legal evaluations, and also acquire data on firms’ web tracking activities. We find considerable and systematic variation in privacy policies along multiple dimensions including ease of access, length, readability, and quality, both within and between industries. Motivated by a simple theory of big data acquisition and usage, we analyze the relationship between firm size, knowledge capital intensity, and privacy supply. We find that large firms with intermediate data intensity have longer, legally watertight policies, but are more likely to share user data with third parties….(More)”.