Incentivising research data sharing: a scoping review


Paper by Helen Buckley Woods and Stephen Pinfield: “Numerous mechanisms exist to incentivise researchers to share their data. This scoping review aims to identify and summarise evidence of the efficacy of different interventions to promote open data practices and provide an overview of current research….Seven major themes in the literature were identified: publisher/journal data sharing policies, metrics, software solutions,research data sharing agreements in general, open science ‘badges’, funder mandates, and initiatives….

A number of key messages for data sharing include: the need to build on existing cultures and practices, meeting people where they are and tailoring interventions to support them; the importance of publicising and explaining the policy/service widely; the need to have disciplinary data champions to model good practice and drive cultural change; the requirement to resource interventions properly; and the imperative to provide robust technical infrastructure and protocols, such as labelling of data sets, use of DOIs, data standards and use of data repositories….(More)”.

Biases in human mobility data impact epidemic modeling


Paper by Frank Schlosser, Vedran Sekara, Dirk Brockmann, and Manuel Garcia-Herranz: “Large-scale human mobility data is a key resource in data-driven policy making and across many scientific fields. Most recently, mobility data was extensively used during the COVID-19 pandemic to study the effects of governmental policies and to inform epidemic models. Large-scale mobility is often measured using digital tools such as mobile phones. However, it remains an open question how truthfully these digital proxies represent the actual travel behavior of the general population. Here, we examine mobility datasets from multiple countries and identify two fundamentally different types of bias caused by unequal access to, and unequal usage of mobile phones. We introduce the concept of data generation bias, a previously overlooked type of bias, which is present when the amount of data that an individual produces influences their representation in the dataset. We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented, with the richest 20% contributing over 50% of all recorded trips, substantially skewing the datasets. This inequality is consequential, as we find mobility patterns of different wealth groups to be structurally different, where the mobility networks of high-wealth users are denser and contain more long-range connections. To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness. Using our approach we show how biases can severely impact outcomes of dynamic processes such as epidemic simulations, where biased data incorrectly estimates the severity and speed of disease transmission. Overall, we show that a failure to account for biases can have detrimental effects on the results of studies and urge researchers and practitioners to account for data-fairness in all future studies of human mobility…(More)”.

Expecting the Unexpected: Effects of Data Collection Design Choices on the Quality of Crowdsourced User-Generated Content


Paper by Roman Lukyanenko: “As crowdsourced user-generated content becomes an important source of data for organizations, a pressing question is how to ensure that data contributed by ordinary people outside of traditional organizational boundaries is of suitable quality to be useful for both known and unanticipated purposes. This research examines the impact of different information quality management strategies, and corresponding data collection design choices, on key dimensions of information quality in crowdsourced user-generated content. We conceptualize a contributor-centric information quality management approach focusing on instance-based data collection. We contrast it with the traditional consumer-centric fitness-for-use conceptualization of information quality that emphasizes class-based data collection. We present laboratory and field experiments conducted in a citizen science domain that demonstrate trade-offs between the quality dimensions of accuracy, completeness (including discoveries), and precision between the two information management approaches and their corresponding data collection designs. Specifically, we show that instance-based data collection results in higher accuracy, dataset completeness and number of discoveries, but this comes at the expense of lower precision. We further validate the practical value of the instance-based approach by conducting an applicability check with potential data consumers (scientists, in our context of citizen science). In a follow-up study, we show, using human experts and supervised machine learning techniques, that substantial precision gains on instance-based data can be achieved with post-processing. We conclude by discussing the benefits and limitations of different information quality and data collection design choice for information quality in crowdsourced user-generated content…(More)”.

Regulating New Tech: Problems, Pathways, and People


Paper by Cary Coglianese: “New technologies bring with them many promises, but also a series of new problems. Even though these problems are new, they are not unlike the types of problems that regulators have long addressed in other contexts. The lessons from regulation in the past can thus guide regulatory efforts today. Regulators must focus on understanding the problems they seek to address and the causal pathways that lead to these problems. Then they must undertake efforts to shape the behavior of those in industry so that private sector managers focus on their technologies’ problems and take actions to interrupt the causal pathways. This means that regulatory organizations need to strengthen their own technological capacities; however, they need most of all to build their human capital. Successful regulation of technological innovation rests with top quality people who possess the background and skills needed to understand new technologies and their problems….(More)”.

Technology and democracy: a paradox wrapped in a contradiction inside an irony


Paper by Stephan Lewandowsky and Peter Pomerantsev: “Democracy is in retreat around the globe. Many commentators have blamed the Internet for this development, whereas others have celebrated the Internet as a tool for liberation, with each opinion being buttressed by supporting evidence. We try to resolve this paradox by reviewing some of the pressure points that arise between human cognition and the online information architecture, and their fallout for the well-being of democracy. We focus on the role of the attention economy, which has monetised dwell time on platforms, and the role of algorithms that satisfy users’ presumed preferences. We further note the inherent asymmetry in power between platforms and users that arises from these pressure points, and we conclude by sketching out the principles of a new Internet with democratic credentials….(More)”.

The role of artificial intelligence in disinformation


Paper by Noémi Bontridder and Yves Poullet: “Artificial intelligence (AI) systems are playing an overarching role in the disinformation phenomenon our world is currently facing. Such systems boost the problem not only by increasing opportunities to create realistic AI-generated fake content, but also, and essentially, by facilitating the dissemination of disinformation to a targeted audience and at scale by malicious stakeholders. This situation entails multiple ethical and human rights concerns, in particular regarding human dignity, autonomy, democracy, and peace. In reaction, other AI systems are developed to detect and moderate disinformation online. Such systems do not escape from ethical and human rights concerns either, especially regarding freedom of expression and information. Having originally started with ascending co-regulation, the European Union (EU) is now heading toward descending co-regulation of the phenomenon. In particular, the Digital Services Act proposal provides for transparency obligations and external audit for very large online platforms’ recommender systems and content moderation. While with this proposal, the Commission focusses on the regulation of content considered as problematic, the EU Parliament and the EU Council call for enhancing access to trustworthy content. In light of our study, we stress that the disinformation problem is mainly caused by the business model of the web that is based on advertising revenues, and that adapting this model would reduce the problem considerably. We also observe that while AI systems are inappropriate to moderate disinformation content online, and even to detect such content, they may be more appropriate to counter the manipulation of the digital ecosystem….(More)”.

Evidence-Based Policymaking: What Human Service Agencies Can Learn from Implementation Science and Integrated Data Systems


Paper by Sharon Zanti & M. Lori Thomas: “The evidence-based policymaking movement compels government leaders and agencies to rely on the best available research evidence to inform policy and program decisions, yet how to do this effectively remains a challenge. This paper demonstrates how the core concepts from two emerging fields—Implementation Science (IS) and Integrated Data Systems (IDS)—can help human service agencies and their partners realize the aims of the evidence-based policymaking movement. An IS lens can help agencies address the role of context when implementing evidence-based practices, complement other quality and process improvement efforts, simultaneously study implementation and effectiveness outcomes, and guide de-implementation of ineffective policies. The IDS approach offers governance frameworks to support ethical and legal data use, provides high-quality administrative data for in-house analyses, and allows for more time-sensitive analyses of pressing agency needs. Ultimately, IS and IDS can support human service agencies in more efficiently using government resources to deliver the best available programs and policies to the communities they serve. Although this paper focuses on examples within the United States context, key concepts and guidance are intended to be broadly applicable across geographies, given that IS, IDS, and the evidence-based policymaking movement are globally relevant….(More)”.

Business Data Sharing through Data Marketplaces: A Systematic Literature Review


Paper by Abbas, Antragama E., Wirawan Agahari, Montijn van de Ven, Anneke Zuiderwijk, and Mark de Reuver: “Data marketplaces are expected to play a crucial role in tomorrow’s data economy, but such marketplaces are seldom commercially viable. Currently, there is no clear understanding of the knowledge gaps in data marketplace research, especially not of neglected research topics that may advance such marketplaces toward commercialization. This study provides an overview of the state-of-the-art of data marketplace research. We employ a Systematic Literature Review (SLR) approach to examine 133 academic articles and structure our analysis using the Service-Technology-Organization-Finance (STOF) model. We find that the extant data marketplace literature is primarily dominated by technical research, such as discussions about computational pricing and architecture. To move past the first stage of the platform’s lifecycle (i.e., platform design) to the second stage (i.e., platform adoption), we call for empirical research in non-technological areas, such as customer expected value and market segmentation….(More)”.

Creating and governing social value from data


Paper by Diane Coyle and Stephanie Diepeveen: “Data is increasingly recognised as an important economic resource for innovation and growth, but its innate characteristics mean market-based valuations inadequately account for the impact of its use on social welfare. This paper extends the literature on the value of data by providing a framework that takes into account its non-rival nature and integrates its inherent positive and negative externalities. Positive externalities consist of the scope for combining different data sets or enabling innovative uses of existing data, while negative externalities include potential privacy loss. We propose a framework integrating these and explore the policy trade-offs shaping net social welfare through a case study of geospatial data and the transport sector in the UK, where insufficient recognition of the trade-offs has contributed to suboptimal policy outcomes. We conclude by proposing methods for empirical approaches to social data valuation, essential evidence for decisions regarding the policy trade-offs . This article therefore lays important groundwork for novel approaches to the measurement of the net social welfare contribution of data, and hence illuminating opportunities for greater and more equitable creation of value from data in our societies….(More)”

Conceptualizing AI literacy: An exploratory review


Paper by Davy Tsz KitNg, Jac Ka LokLeung, Samuel K.W.Chu, and Maggie QiaoShen: “Artificial Intelligence (AI) has spread across industries (e.g., business, science, art, education) to enhance user experience, improve work efficiency, and create many future job opportunities. However, public understanding of AI technologies and how to define AI literacy is under-explored. This vision poses upcoming challenges for our next generation to learn about AI. On this note, an exploratory review was conducted to conceptualize the newly emerging concept “AI literacy”, in search for a sound theoretical foundation to define, teach and evaluate AI literacy. Grounded in literature on 30 existing peer-reviewed articles, this review proposed four aspects (i.e., know and understand, use, evaluate, and ethical issues) for fostering AI literacy based on the adaptation of classic literacies. This study sheds light on the consolidated definition, teaching, and ethical concerns on AI literacy, establishing the groundwork for future research such as competency development and assessment criteria on AI literacy….(More)”.