No Ground Truth? No Problem: Improving Administrative Data Linking Using Active Learning and a Little Bit of Guile


Paper by Sarah Tahamont et al: “While linking records across large administrative datasets [“big data”] has the potential to revolutionize empirical social science research, many administrative data files do not have common identifiers and are thus not designed to be linked to others. To address this problem, researchers have developed probabilistic record linkage algorithms which use statistical patterns in identifying characteristics to perform linking tasks. Naturally, the accuracy of a candidate linking algorithm can be substantially improved when an algorithm has access to “ground-truth” examples — matches which can be validated using institutional knowledge or auxiliary data. Unfortunately, the cost of obtaining these examples is typically high, often requiring a researcher to manually review pairs of records in order to make an informed judgement about whether they are a match. When a pool of ground-truth information is unavailable, researchers can use “active learning” algorithms for linking, which ask the user to provide ground-truth information for select candidate pairs. In this paper, we investigate the value of providing ground-truth examples via active learning for linking performance. We confirm popular intuition that data linking can be dramatically improved with the availability of ground truth examples. But critically, in many real-world applications, only a relatively small number of tactically-selected ground-truth examples are needed to obtain most of the achievable gains. With a modest investment in ground truth, researchers can approximate the performance of a supervised learning algorithm that has access to a large database of ground truth examples using a readily available off-the-shelf tool…(More)”.

Valuing the U.S. Data Economy Using Machine Learning and Online Job Postings


Paper by J Bayoán Santiago Calderón and Dylan Rassier: “With the recent proliferation of data collection and uses in the digital economy, the understanding and statistical treatment of data stocks and flows is of interest among compilers and users of national economic accounts. In this paper, we measure the value of own-account data stocks and flows for the U.S. business sector by summing the production costs of data-related activities implicit in occupations. Our method augments the traditional sum-of-costs methodology for measuring other own-account intellectual property products in national economic accounts by proxying occupation-level time-use factors using a machine learning model and the text of online job advertisements (Blackburn 2021). In our experimental estimates, we find that annual current-dollar investment in own-account data assets for the U.S. business sector grew from $84 billion in 2002 to $186 billion in 2021, with an average annual growth rate of 4.2 percent. Cumulative current-dollar investment for the period 2002–2021 was $2.6 trillion. In addition to the annual current-dollar investment, we present historical-cost net stocks, real growth rates, and effects on value-added by the industrial sector…(More)”.

Using the future wheel methodology to assess the impact of open science in the transport sector


Paper by Anja Fleten Nielsen et al: “Open Science enhances information sharing and makes scientific results of transport research more transparent and accessible at all levels and to everyone allowing integrity and reproducibility. However, what future impacts will Open Science have on the societal, environmental and economic development within the transport sector? Using the Future Wheel methodology, we conducted a workshop with transport experts from both industry and academia to answer this question. The main findings of this study point in the direction of previous studies in other fields, in terms of increased innovation, increased efficiency, economic savings, more equality, and increased participation of citizens. In addition, we found several potential transport specific impacts: lower emission, faster travel times, improved traffic safety, increased awareness for transport policies, artificial intelligence improving mobility services. Several potential negative outcomes of Open Science were also identified by the expert group: job loss, new types of risks, increased cost, increased conflicts, time delays, increased inequality and increased energy consumption. If we know the negative outcomes it is much easier to put in place strategies that are sustainable for a broader stakeholder group, which also increase the probability of taking advantage of all the positive impacts of Open Science…(More)”

Data in design: How big data and thick data inform design thinking projects


Paper by Marzia Mortati, Stefano Magistretti , Cabirio Cautela, and Claudio Dell’Era: “Scholars and practitioners have recognized that making innovation happen today requires renewed approaches focused on agility, dynamicity, and other organizational capabilities that enable firms to cope with uncertainty and complexity. In turn, the literature has shown that design thinking is a useful methodology to cope with ill-defined and wicked problems. In this study, we address the question of the little-known role of different types of data in innovation projects characterized by ill-defined problems requiring creativity to be solved. Rooted in qualitative observation (thick data) and quantitative analyses (big data), we investigate the role of data in eight design thinking projects dealing with ill-defined and wicked problems. Our findings highlight the practical and theoretical implications of eight practices that differently make use of big and thick data, informing academics and practitioners on how different types of data are utilized in design thinking projects and the related principles and practices…(More)”.

Data Cooperatives as Catalysts for Collaboration, Data Sharing, and the (Trans)Formation of the Digital Commons


Paper by Michael Max Bühler et al: “Network effects, economies of scale, and lock-in-effects increasingly lead to a concentration of digital resources and capabilities, hindering the free and equitable development of digital entrepreneurship (SDG9), new skills, and jobs (SDG8), especially in small communities (SDG11) and their small and medium-sized enterprises (“SMEs”). To ensure the affordability and accessibility of technologies, promote digital entrepreneurship and community well-being (SDG3), and protect digital rights, we propose data cooperatives [1,2] as a vehicle for secure, trusted, and sovereign data exchange [3,4]. In post-pandemic times, community/SME-led cooperatives can play a vital role by ensuring that supply chains to support digital commons are uninterrupted, resilient, and decentralized [5]. Digital commons and data sovereignty provide communities with affordable and easy access to information and the ability to collectively negotiate data-related decisions. Moreover, cooperative commons (a) provide access to the infrastructure that underpins the modern economy, (b) preserve property rights, and (c) ensure that privatization and monopolization do not further erode self-determination, especially in a world increasingly mediated by AI. Thus, governance plays a significant role in accelerating communities’/SMEs’ digital transformation and addressing their challenges. Cooperatives thrive on digital governance and standards such as open trusted Application Programming Interfaces (APIs) that increase the efficiency, technological capabilities, and capacities of participants and, most importantly, integrate, enable, and accelerate the digital transformation of SMEs in the overall process. This policy paper presents and discusses several transformative use cases for cooperative data governance. The use cases demonstrate how platform/data-cooperatives, and their novel value creation can be leveraged to take digital commons and value chains to a new level of collaboration while addressing the most pressing community issues. The proposed framework for a digital federated and sovereign reference architecture will create a blueprint for sustainable development both in the Global South and North…(More)”

The disarming simplicity of wicked problems: The biography of an idea


Paper by Niraj Verma: “The idea of “wicked problems” indicates the intractability and dilemmatic nature of design and planning. At the same time, it also encourages the development of design methods and information systems. So how do designers, technologists, and administrators reconcile and respond to these competing ideas? Using William James’s “psychology of truth,” the paper answers this question by putting wicked problems in intellectual relief. It also suggests that as long as pluralism, diversity, and interdisciplinary thinking are in good currency, the idea of wicked problems will retain its popularity, appeal, and usefulness…(More)”.

Knowledge monopolies and the innovation divide: A governance perspective


Paper by Hani Safadi and Richard Thomas Watson: “The rise of digital platforms creates knowledge monopolies that threaten innovation. Their power derives from the imposition of data obligations and persistent coupling on platform participation and their usurpation of the rights to data created by other participants to facilitate information asymmetries. Knowledge monopolies can use machine learning to develop competitive insights unavailable to every other platform participant. This information asymmetry stifles innovation, stokes the growth of the monopoly, and reinforces its ascendency. National or regional governance structures, such as laws and regulatory authorities, constrain economic monopolies deemed not in the public interest. We argue the need for legislation and an associated regulatory mechanism to curtail coercive data obligations, control, eliminate data rights exploitation, and prevent mergers and acquisitions that could create or extend knowledge monopolies…(More)”.

Towards Responsible Quantum Technology


Paper by Mauritz Kop et al: “The expected societal impact of quantum technologies (QT) urges us to proceed and innovate responsibly. This article proposes a conceptual framework for Responsible QT that seeks to integrate considerations about ethical, legal, social, and policy implications (ELSPI) into quantum R&D, while responding to the Responsible Research and Innovation dimensions of anticipation, inclusion, reflection and responsiveness. After examining what makes QT unique, we argue that quantum innovation should be guided by a methodological framework for Responsible QT, aimed at jointly safeguarding against risks by proactively addressing them, engaging stakeholders in the innovation process, and continue advancing QT (‘SEA’). We further suggest operationalizing the SEA-framework by establishing quantum-specific guiding principles. The impact of quantum computing on information security is used as a case study to illustrate (1) the need for a framework that guides Responsible QT, and (2) the usefulness of the SEA-framework for QT generally. Additionally, we examine how our proposed SEA-framework for responsible innovation can inform the emergent regulatory landscape affecting QT, and provide an outlook of how regulatory interventions for QT as base-layer technology could be designed, contextualized, and tailored to their exceptional nature in order to reduce the risk of unintended counterproductive effects of policy interventions.

Laying the groundwork for a responsible quantum ecosystem, the research community and other stakeholders are called upon to further develop the recommended guiding principles, and discuss their operationalization into best practices and real-world applications. Our proposed framework should be considered a starting point for these much needed, highly interdisciplinary efforts…(More)”.

Unpacking Social Capital


Paper by Ruben Durante, Nicola Mastrorocco, Luigi Minale & James M. Snyder Jr. : “We use novel and unique survey data from Italy to shed light on key questions regarding the measurement of social capital and the use of social capital indicators for empirical work. Our data cover a sample of over 600,000 respondents interviewed between 2000 and 2015. We identify four distinct components of social capital – i) social participation, ii) political participation, iii) trust in others, and iv) trust in institutions – and examine how they relate to each other. We then study how each dimension of social capital relates to various socioeconomic factors both at the individual and the aggregate level, and to various proxies of social capital commonly used in the literature. Finally, building on previous work, we investigate to what extent different dimensions of social capital predict differences in key economic, political, and health outcomes. Our findings support the view that social capital is a multifaceted object with multiple dimensions that, while related, are distinct from each other. Future work should take such multidimensionality into account and carefully consider what measure of social capital to use…(More)”.

Responding to the coronavirus disease-2019 pandemic with innovative data use: The role of data challenges


Paper by Jamie Danemayer, Andrew Young, Siobhan Green, Lydia Ezenwa and Michael Klein: “Innovative, responsible data use is a critical need in the global response to the coronavirus disease-2019 (COVID-19) pandemic. Yet potentially impactful data are often unavailable to those who could utilize it, particularly in data-poor settings, posing a serious barrier to effective pandemic mitigation. Data challenges, a public call-to-action for innovative data use projects, can identify and address these specific barriers. To understand gaps and progress relevant to effective data use in this context, this study thematically analyses three sets of qualitative data focused on/based in low/middle-income countries: (a) a survey of innovators responding to a data challenge, (b) a survey of organizers of data challenges, and (c) a focus group discussion with professionals using COVID-19 data for evidence-based decision-making. Data quality and accessibility and human resources/institutional capacity were frequently reported limitations to effective data use among innovators. New fit-for-purpose tools and the expansion of partnerships were the most frequently noted areas of progress. Discussion participants identified building capacity for external/national actors to understand the needs of local communities can address a lack of partnerships while de-siloing information. A synthesis of themes demonstrated that gaps, progress, and needs commonly identified by these groups are relevant beyond COVID-19, highlighting the importance of a healthy data ecosystem to address emerging threats. This is supported by data holders prioritizing the availability and accessibility of their data without causing harm; funders and policymakers committed to integrating innovations with existing physical, data, and policy infrastructure; and innovators designing sustainable, multi-use solutions based on principles of good data governance…(More)”.