Meta Kills a Crucial Transparency Tool At the Worst Possible Time


Interview by Vittoria Elliott: “Earlier this month, Meta announced that it would be shutting down CrowdTangle, the social media monitoring and transparency tool that has allowed journalists and researchers to track the spread of mis- and disinformation. It will cease to function on August 14, 2024—just months before the US presidential election.

Meta’s move is just the latest example of a tech company rolling back transparency and security measures as the world enters the biggest global election year in history. The company says it is replacing CrowdTangle with a new Content Library API, which will require researchers and nonprofits to apply for access to the company’s data. But the Mozilla Foundation and 140 other civil society organizations protested last week that the new offering lacks much of CrowdTangle’s functionality, asking the company to keep the original tool operating until January 2025.

Meta spokesperson Andy Stone countered in posts on X that the groups’ claims “are just wrong,” saying the new Content Library will contain “more comprehensive data than CrowdTangle” and be made available to nonprofits, academics, and election integrity experts. When asked why commercial newsrooms, like WIRED, are to be excluded from the Content Library, Meta spokesperson Eric Porterfield said,  that it was “built for research purposes.” While journalists might not have direct access he suggested they could use commercial social network analysis tools, or “partner with an academic institution to help answer a research question related to our platforms.”

Brandon Silverman, cofounder and former CEO of CrowdTangle, who continued to work on the tool after Facebook acquired it in 2016, says it’s time to force platforms to open up their data to outsiders. The conversation has been edited for length and clarity…(More)”.

Commons-based Data Set: Governance for AI


Report by Open Future: “In this white paper, we propose an approach to sharing data sets for AI training as a public good governed as a commons. By adhering to the six principles of commons-based governance, data sets can be managed in a way that generates public value while making shared resources resilient to extraction or capture by commercial interests.

The purpose of defining these principles is two-fold:

We propose these principles as input into policy debates on data and AI governance. A commons-based approach can be introduced through regulatory means, funding and procurement rules, statements of principles, or data sharing frameworks. Secondly, these principles can also serve as a blueprint for the design of data sets that are governed and shared as a commons. To this end, we also provide practical examples of how these principles are being brought to life. Projects like Big Science or Common Voice have demonstrated that commons-based data sets can be successfully built.

These principles, tailored for the governance of AI data sets, are built on our previous work on Data Commons Primer. They are also the outcome of our research into the governance of AI datasets, including the AI_Commons case study.  Finally, they are based on ongoing efforts to define how AI systems can be shared and made open, in which we have been participating – including the OSI-led process to define open-source AI systems, and the DPGA Community of Practice exploring AI systems as Digital Public Goods…(More)”.

The six principles for commons-based data set governance are as follows:

Unconventional data, unprecedented insights: leveraging non-traditional data during a pandemic


Paper by Kaylin Bolt et al: “The COVID-19 pandemic prompted new interest in non-traditional data sources to inform response efforts and mitigate knowledge gaps. While non-traditional data offers some advantages over traditional data, it also raises concerns related to biases, representativity, informed consent and security vulnerabilities. This study focuses on three specific types of non-traditional data: mobility, social media, and participatory surveillance platform data. Qualitative results are presented on the successes, challenges, and recommendations of key informants who used these non-traditional data sources during the COVID-19 pandemic in Spain and Italy….

Non-traditional data proved valuable in providing rapid results and filling data gaps, especially when traditional data faced delays. Increased data access and innovative collaborative efforts across sectors facilitated its use. Challenges included unreliable access and data quality concerns, particularly the lack of comprehensive demographic and geographic information. To further leverage non-traditional data, participants recommended prioritizing data governance, establishing data brokers, and sustaining multi-institutional collaborations. The value of non-traditional data was perceived as underutilized in public health surveillance, program evaluation and policymaking. Participants saw opportunities to integrate them into public health systems with the necessary investments in data pipelines, infrastructure, and technical capacity…(More)”.

Governing the use of big data and digital twin technology for sustainable tourism


Report by Eko Rahmadian: “The tourism industry is increasingly utilizing big data to gain valuable insights and enhance decision-making processes. The advantages of big data, such as real-time information, robust data processing capabilities, and improved stakeholder decision-making, make it a promising tool for analyzing various aspects of tourism, including sustainability. Moreover, integrating big data with prominent technologies like machine learning, artificial intelligence (AI), and the Internet of Things (IoT) has the potential to revolutionize smart and sustainable tourism.

Despite the potential benefits, the use of big data for sustainable tourism remains limited, and its implementation poses challenges related to governance, data privacy, ethics, stakeholder communication, and regulatory compliance. Addressing these challenges is crucial to ensure the responsible and sustainable use of these technologies. Therefore, strategies must be developed to navigate these issues through a proper governing system.

To bridge the existing gap, this dissertation focuses on the current research on big data for sustainable tourism and strategies for governing its use and implementation in conjunction with emerging technologies. Specifically, this PhD dissertation centers on mobile positioning data (MPD) as a case due to its unique benefits, challenges, and complexity. Also, this project introduces three frameworks, namely: 1) a conceptual framework for digital twins (DT) for smart and sustainable tourism, 2) a documentation framework for architectural decisions (DFAD) to ensure the successful implementation of the DT technology as a governance mechanism, and 3) a big data governance framework for official statistics (BDGF). This dissertation not only presents these frameworks and their benefits but also investigates the issues and challenges related to big data governance while empirically validating the applicability of the proposed frameworks…(More)”.

Community views on the secondary use of general practice data: Findings from a mixed-methods study


Paper by Annette J. Braunack-Mayer et al: “General practice data, particularly when combined with hospital and other health service data through data linkage, are increasingly being used for quality assurance, evaluation, health service planning and research.Using general practice data is particularly important in countries where general practitioners (GPs) are the first and principal source of health care for most people.

Although there is broad public support for the secondary use of health data, there are good reasons to question whether this support extends to general practice settings. GP–patient relationships may be very personal and longstanding and the general practice health record can capture a large amount of information about patients. There is also the potential for multiple angles on patients’ lives: GPs often care for, or at least record information about, more than one generation of a family. These factors combine to amplify patients’ and GPs’ concerns about sharing patient data….

Adams et al. have developed a model of social licence, specifically in the context of sharing administrative data for health research, based on an analysis of the social licence literature and founded on two principal elements: trust and legitimacy.In this model, trust is founded on research enterprises being perceived as reliable and responsive, including in relation to privacy and security of information, and having regard to the community’s interests and well-being.

Transparency and accountability measures may be used to demonstrate trustworthiness and, as a consequence, to generate trust. Transparency involves a level of openness about the way data are handled and used as well as about the nature and outcomes of the research. Adams et al. note that lack of transparency can undermine trust. They also note that the quality of public engagement is important and that simply providing information is not sufficient. While this is one element of transparency, other elements such as accountability and collaboration are also part of the trusting, reflexive relationship necessary to establish and support social licence.

The second principal element, legitimacy, is founded on research enterprises conforming to the legal, cultural and social norms of society and, again, acting in the best interests of the community. In diverse communities with a range of views and interests, it is necessary to develop a broad consensus on what amounts to the common good through deliberative and collaborative processes.

Social licence cannot be assumed. It must be built through public discussion and engagement to avoid undermining the relationship of trust with health care providers and confidence in the confidentiality of health information…(More)”

Outpacing Pandemics: Solving the First and Last Mile Challenges of Data-Driven Policy Making


Article by Stefaan Verhulst, Daniela Paolotti, Ciro Cattuto, and Alessandro Vespignani: “As society continues to emerge from the legacy of COVID-19, a dangerous complacency seems to be setting in. Amidst recurrent surges of cases, each serving as a reminder of the virus’s persistence, there is a noticeable decline in collective urgency to prepare for future pandemics. This situation represents not just a lapse in memory but a significant shortfall in our approach to pandemic preparedness. It dramatically underscores the urgent need to develop novel and sustainable approaches and responses and to reinvent how we approach public health emergencies.

Among the many lessons learned from previous infectious disease outbreaks, the potential and utility of data, and particularly non-traditional forms of data, are surely among the most important lessons. Among other benefits, data has proven useful in providing intelligence and situational awareness in early stages of outbreaks, empowering citizens to protect their health and the health of vulnerable community members, advancing compliance with non-pharmaceutical interventions to mitigate societal impacts, tracking vaccination rates and the availability of treatment, and more. A variety of research now highlights the particular role played by open source data (and other non-traditional forms of data) in these initiatives.

Although multiple data sources are useful at various stages of outbreaks, we focus on two critical stages proven to be especially challenging: what we call the first mile and the last mile.

We argue that focusing on these two stages (or chokepoints) can help pandemic responses and rationalize resources. In particular, we highlight the role of Data Stewards at both stages and in overall pandemic response effectiveness…(More)”.

How to craft fair, transparent data-sharing agreements


Article by Stephanie Kanowitz: “Data collaborations are critical to government decision-making, but actually sharing data can be difficult—not so much the mechanics of the collaboration, but hashing out the rules and policies governing it. A new report offers three resources that will make data sharing more straightforward, foster accountability and build trust among the parties.

“We’ve heard over and over again that one of the biggest barriers to collaboration around data turns out to be data sharing agreements,” said Stefaan Verhulst, co-founder of the Governance Lab at New York University and an author of the November report, “Moving from Idea to Practice.” It’s sometimes a lot to ask stakeholders “to provide access to some of their data,” he said.

To help, Verhulst and other researchers identified three components of successful data-sharing agreements: conducting principled negotiations, establishing the elements of a data-sharing agreement and assessing readiness.

To address the first, the report breaks the components of negotiation into a framework with four tenets: separating people from the problem, focusing on interests rather than positions, identifying options and using objective criteria. From discussions with stakeholders in data sharing agreement workshops that GovLab held through its Open Data Policy Lab, three principles emerged—fairness, transparency and reciprocity…(More)”.

Measuring Global Migration: Towards Better Data for All


Book by Frank Laczko, Elisa Mosler Vidal, Marzia Rango: “This book focuses on how to improve the collection, analysis and responsible use of data on global migration and international mobility. While migration remains a topic of great policy interest for governments around the world, there is a serious lack of reliable, timely, disaggregated and comparable data on it, and often insufficient safeguards to protect migrants’ information. Meanwhile, vast amounts of data about the movement of people are being generated in real time due to new technologies, but these have not yet been fully captured and utilized by migration policymakers, who often do not have enough data to inform their policies and programmes. The lack of migration data has been internationally recognized; the Global Compact for Safe, Orderly and Regular Migration urges all countries to improve data on migration to ensure that policies and programmes are “evidence-based”, but does not spell out how this could be done.

This book examines both the technical issues associated with improving data on migration and the wider political challenges of how countries manage the collection and use of migration data. The first part of the book discusses how much we really know about international migration based on existing data, and key concepts and approaches which are often used to measure migration. The second part of the book examines what measures could be taken to improve migration data, highlighting examples of good practice from around the world in recent years, across a range of different policy areas, such as health, climate change and sustainable development more broadly.

Written by leading experts on international migration data, this book is the perfect guide for students, policymakers and practitioners looking to understand more about the existing evidence base on migration and what can be done to improve it…(More)”. (See also: Big Data For Migration Alliance).

Public Value of Data: B2G data-sharing Within the Data Ecosystem of Helsinki


Paper by Vera Djakonoff: “Datafication penetrates all levels of society. In order to harness public value from an expanding pool of private-produced data, there has been growing interest in facilitating business-to-government (B2G) data-sharing. This research examines the development of B2G data-sharing within the data ecosystem of the City of Helsinki. The research has identified expectations ecosystem actors have for B2G data-sharing and factors that influence the city’s ability to unlock public value from private-produced data.

The research context is smart cities, with a specific focus on the City of Helsinki. Smart cities are in an advantageous position to develop novel public-private collaborations. Helsinki, on the international stage, stands out as a pioneer in the realm of data-driven smart city development. For this research, nine data ecosystem actors representing the city and companies participated in semi-structured thematic interviews through which their perceptions and experiences were mapped.

The theoretical framework of this research draws from the public value management (PVM) approach in examining the smart city data ecosystem and alignment of diverse interests for a shared purpose. Additionally, the research transcends the examination of the interests in isolation and looks at how technological artefacts shape the social context and interests surrounding them. Here, the focus is on the properties of data as an artefact with anti-rival value-generation potential.

The findings of this research reveal that while ecosystem actors recognise that more value can be drawn from data through collaboration, this is not apparent at the level of individual initiatives and transactions. This research shows that the city’s commitment to and facilitation of a long-term shared sense of direction and purpose among ecosystem actors is central to developing B2G data-sharing for public value outcomes. Here, participatory experimentation is key, promoting an understanding of the value of data and rendering visible the diverse motivations and concerns of ecosystem actors, enabling learning for wise, data-driven development…(More)”.

Elon Musk is now taking applications for data to study X — but only EU risk researchers need apply…


Article by Natasha Lomas: “Lawmakers take note: Elon Musk-owned X appears to have quietly complied with a hard legal requirement in the European Union that requires larger platforms (aka VLOPs) to provide researchers with data access in order to study systemic risks arising from use of their services — risks such as disinformation, child safety issues, gender-based violence and mental heath concerns.

X (or Twitter as it was still called at the time) was designated a VLOP under the EU’s Digital Services Act (DSA) back in April after the bloc’s regulators confirmed it meets their criteria for an extra layer of rules to kick in that are intended to drive algorithmic accountability via applying transparency measures on larger platforms.

Researchers intending to study systemic risks in the EU now appear to at least be able to apply for access to study X’s data by accessing a web form through a button which appears at the bottom of this page on its developer platform. (Note researchers can be based in the EU but don’t have to be to meet the criteria; they just need to intend to study systemic risks in the EU.)…(More)”.