Restricting data’s use: A spectrum of concerns in need of flexible approaches


Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.

Urban Systems Design: From “science for design” to “design in science”


Introduction to Special Issue of Urban Analytics and City Science by Perry PJ Yang and Yoshiki Yamagata: “The direct design of cities is often regarded as impossible, owing to the fluidity, complexity, and uncertainty entailed in urban systems. And yet, we do design our cities, however imperfectly. Cities are objects of our own creation, they are intended landscapes, manageable, experienced and susceptible to analysis (Lynch, 1984). Urban design as a discipline has always focused on “design” in its professional practices. Urban designers tend to ask normative questions about how good city forms are designed, or how a city and its urban spaces ought to be made, thereby problematizing urban form-making and the values entailed. These design questions are analytically distinct from “science”-related research that tends to ask positive questions such as how cities function, or what properties emerge from interactive processes of urban systems. The latter questions require data, analytic techniques, and research methods to generate insight.

This theme issue “Urban Systems Design” is an attempt to outline a research agenda by connecting urban design and systems science, which is grounded in both normative and positive questions. It aims to contribute to the emerging field of urban analytics and city science that is central to this journal. Recent discussions of smart cities inspire urban design, planning and architectural professionals to address questions of how smart cities are shaped and what should be made. What are the impacts of information and communication technologies (ICT) on the questions of how built environments are designed and developed? How would the internet of things (IoT), big data analytics and urban automation influence how humans perceive, experience, use and interact with the urban environment? In short, what are the emerging new urban forms driven by the rapid move to ‘smart cities’?…(More)”.

#Kremlin: Using Hashtags to Analyze Russian Disinformation Strategy and Dissemination on Twitter


Paper by Sarah Oates, and John Gray: “Reports of Russian interference in U.S. elections have raised grave concerns about the spread of foreign disinformation on social media sites, but there is little detailed analysis that links traditional political communication theory to social media analytics. As a result, it is difficult for researchers and analysts to gauge the nature or level of the threat that is disseminated via social media. This paper leverages both social science and data science by using traditional content analysis and Twitter analytics to trace how key aspects of Russian strategic narratives were distributed via #skripal, #mh17, #Donetsk, and #russophobia in late 2018.

This work will define how key Russian international communicative goals are expressed through strategic narratives, describe how to find hashtags that reflect those narratives, and analyze user activity around the hashtags. This tests both how Twitter amplifies specific information goals of the Russians as well as the relative success (or failure) of particular hashtags to spread those messages effectively. This research uses Mentionmapp, a system co-developed by one of the authors (Gray) that employs network analytics and machine intelligence to identify the behavior of Twitter users as well as generate profiles of users via posting history and connections. This study demonstrates how political communication theory can be used to frame the study of social media; how to relate knowledge of Russian strategic priorities to labels on social media such as Twitter hashtags; and to test this approach by examining a set of Russian propaganda narratives as they are represented by hashtags. Our research finds that some Twitter users are consistently active across multiple Kremlin-linked hashtags, suggesting that knowledge of these hashtags is an important way to identify Russian propaganda online influencers. More broadly, we suggest that Twitter dichotomies such as bot/human or troll/citizen should be used with caution and analysis should instead address the nuances in Twitter use that reflect varying levels of engagement or even awareness in spreading foreign disinformation online….(More)”.

The personification of big data


Paper by Stevenson, Phillip Douglas and Mattson, Christopher Andrew: “Organizations all over the world, both national and international, gather demographic data so that the progress of nations and peoples can be tracked. This data is often made available to the public in the form of aggregated national level data or individual responses (microdata). Product designers likewise conduct surveys to better understand their customer and create personas. Personas are archetypes of the individuals who will use, maintain, sell or otherwise be affected by the products created by designers. Personas help designers better understand the person the product is designed for. Unfortunately, the process of collecting customer information and creating personas is often a slow and expensive process.

In this paper, we introduce a new method of creating personas, leveraging publicly available databanks of both aggregated national level and information on individuals in the population. A computational persona generator is introduced that creates a population of personas that mirrors a real population in terms of size and statistics. Realistic individual personas are filtered from this population for use in product development…(More)”.

Artificial Intelligence and Digital Repression: Global Challenges to Governance


Paper by Steven Feldstein: “Across the world, artificial intelligence (AI) is showing its potential for abetting repressive regimes and upending the relationship between citizen and state, thereby exacerbating a global resurgence of authoritarianism. AI is a component in a broader ecosystem of digital repression, but it is relevant to several different techniques, including surveillance, censorship, disinformation, and cyber attacks. AI offers three distinct advantages to autocratic leaders: it helps solve principal-agent loyalty problems, it offers substantial cost-efficiencies over traditional means of surveillance, and it is particularly effective against external regime challenges. China is a key proliferator of AI technology to authoritarian and illiberal regimes; such proliferation is an important component of Chinese geopolitical strategy. To counter the spread of high-tech repression abroad, as well as potential abuses at home, policy makers in democratic states must think seriously about how to mitigate harms and to shape better practices….(More)”

Whose Commons? Data Protection as a Legal Limit of Open Science


Mark Phillips and Bartha M. Knoppers in the Journal of Law, Medicine and Ethics: “Open science has recently gained traction as establishment institutions have come on-side and thrown their weight behind the movement and initiatives aimed at creation of information commons. At the same time, the movement’s traditional insistence on unrestricted dissemination and reuse of all information of scientific value has been challenged by the movement to strengthen protection of personal data. This article assesses tensions between open science and data protection, with a focus on the GDPR.

Powerful institutions across the globe have recently joined the ranks of those making substantive commitments to “open science.” For example, the European Commission and the NIH National Cancer Institute are supporting large-scale collaborations, such as the Cancer Genome Collaboratory, the European Open Science Cloud, and the Genomic Data Commons, with the aim of making giant stores of genomic and other data readily available for analysis by researchers. In the field of neuroscience, the Montreal Neurological Institute is midway through a novel five-year project through which it plans to adopt open science across the full spectrum of its research. The commitment is “to make publicly available all positive and negative data by the date of first publication, to open its biobank to registered researchers and, perhaps most significantly, to withdraw its support of patenting on any direct research outputs.” The resources and influence of these institutions seem to be tipping the scales, transforming open science from a longstanding aspirational ideal into an existing reality.

Although open science lacks any standard, accepted definition, one widely-cited model proposed by the Austria-based advocacy effort openscienceASAP describes it by reference to six principles: open methodology, open source, open data, open access, open peer review, and open educational resources. The overarching principle is “the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.” This article adopts this principle as a working definition of open science, with a particular emphasis on open sharing of human data.

As noted above, many of the institutions committed to open science use the word “commons” to describe their initiatives, and the two concepts are closely related. “Medical information commons” refers to “a networked environment in which diverse sources of health, medical, and genomic information on large populations become widely shared resources.” Commentators explicitly link the success of information commons and progress in the research and clinical realms to open science-based design principles such as data access and transparent analysis (i.e., sharing of information about methods and other metadata together with medical or health data).

But what legal, as well as ethical and social, factors will ultimately shape the contours of open science? Should all restrictions be fought, or should some be allowed to persist, and if so, in what form? Given that a commons is not a free-for-all, in that its governing rules shape its outcomes, how might we tailor law and policy to channel open science to fulfill its highest aspirations, such as universalizing practical access to scientific knowledge and its benefits, and avoid potential pitfalls? This article primarily concerns research data, although passing reference is also made to the approach to the terms under which academic publications are available, which are subject to similar debates….(More)”.

The Market for Data Privacy


Paper by Tarun Ramadorai, Antoine Uettwiller and Ansgar Walther: “We scrape a comprehensive set of US firms’ privacy policies to facilitate research on the supply of data privacy. We analyze these data with the help of expert legal evaluations, and also acquire data on firms’ web tracking activities. We find considerable and systematic variation in privacy policies along multiple dimensions including ease of access, length, readability, and quality, both within and between industries. Motivated by a simple theory of big data acquisition and usage, we analyze the relationship between firm size, knowledge capital intensity, and privacy supply. We find that large firms with intermediate data intensity have longer, legally watertight policies, but are more likely to share user data with third parties….(More)”.

Privacy-Preserved Data Sharing for Evidence-Based Policy Decisions: A Demonstration Project Using Human Services Administrative Records for Evidence-Building Activities


Paper by the Bipartisan Policy Center: “Emerging privacy-preserving technologies and approaches hold considerable promise for improving data privacy and confidentiality in the 21st century. At the same time, more information is becoming accessible to support evidence-based policymaking.

In 2017, the U.S. Commission on Evidence-Based Policymaking unanimously recommended that further attention be given to the deployment of privacy-preserving data-sharing applications. If these types of applications can be tested and scaled in the near-term, they could vastly improve insights about important policy problems by using disparate datasets. At the same time, the approaches could promote substantial gains in privacy for the American public.

There are numerous ways to engage in privacy-preserving data sharing. This paper primarily focuses on secure computation, which allows information to be accessed securely, guarantees privacy, and permits analysis without making private information available. Three key issues motivated the launch of a domestic secure computation demonstration project using real government-collected data:

  • Using new privacy-preserving approaches addresses pressing needs in society. Current widely accepted approaches to managing privacy risks—like preventing the identification of individuals or organizations in public datasets—will become less effective over time. While there are many practices currently in use to keep government-collected data confidential, they do not often incorporate modern developments in computer science, mathematics, and statistics in a timely way. New approaches can enable researchers to combine datasets to improve the capability for insights, without being impeded by traditional concerns about bringing large, identifiable datasets together. In fact, if successful, traditional approaches to combining data for analysis may not be as necessary.
  • There are emerging technical applications to deploy certain privacy-preserving approaches in targeted settings. These emerging procedures are increasingly enabling larger-scale testing of privacy-preserving approaches across a variety of policy domains, governmental jurisdictions, and agency settings to demonstrate the privacy guarantees that accompany data access and use.
  • Widespread adoption and use by public administrators will only follow meaningful and successful demonstration projects. For example, secure computation approaches are complex and can be difficult to understand for those unfamiliar with their potential. Implementing new privacy-preserving approaches will require thoughtful attention to public policy implications, public opinions, legal restrictions, and other administrative limitations that vary by agency and governmental entity.

This project used real-world government data to illustrate the applicability of secure computation compared to the classic data infrastructure available to some local governments. The project took place in a domestic, non-intelligence setting to increase the salience of potential lessons for public agencies….(More)”.

Our data, our society, our health: a vision for inclusive and transparent health data science in the UK and Beyond


Paper by Elizabeth Ford et al in Learning Health Systems: “The last six years have seen sustained investment in health data science in the UK and beyond, which should result in a data science community that is inclusive of all stakeholders, working together to use data to benefit society through the improvement of public health and wellbeing.

However, opportunities made possible through the innovative use of data are still not being fully realised, resulting in research inefficiencies and avoidable health harms. In this paper we identify the most important barriers to achieving higher productivity in health data science. We then draw on previous research, domain expertise, and theory, to outline how to go about overcoming these barriers, applying our core values of inclusivity and transparency.

We believe a step-change can be achieved through meaningful stakeholder involvement at every stage of research planning, design and execution; team-based data science; as well as harnessing novel and secure data technologies. Applying these values to health data science will safeguard a social license for health data research, and ensure transparent and secure data usage for public benefit….(More)”.

Big Data and Dahl’s Challenge of Democratic Governance


Alex Ingrams in the Review of Policy Research: “Big data applications have been acclaimed as potentially transformative for the public sector. But, despite this acclaim, most theory of big data is narrowly focused around technocratic goals. The conceptual frameworks that situate big data within democratic governance systems recognizing the role of citizens are still missing. This paper explores the democratic governance impacts of big data in three policy areas using Robert Dahl’s dimensions of control and autonomy. Key impacts and potential tensions are highlighted. There is evidence of impacts on both dimensions, but the dimensions conflict as well as align in notable ways and focused policy efforts will be needed to find a balance….(More)”.