Access to Public Information, Open Data, and Personal Data Protection: How do they dialogue with each other?

Report by Open Data Charter and Civic Compass: “In this study, we aim to examine data protection policies in the European Union and Latin America juxtaposed with initiatives concerning open government data and access to public information. We analyse the regulatory landscape, international rankings, and commitments about each right in four countries from each region to achieve this. Additionally, we explore how these institutions interact with one another, considering their respective stances while delving into existing tensions and exploring possibilities for achieving a balanced approach…(More)”.

A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI

Report by Hannah Chafetz, Sampriti Saxena, and Stefaan G. Verhulst: “Since late 2022, generative AI services and large language models (LLMs) have transformed how many individuals access, and process information. However, how generative AI and LLMs can be augmented with open data from official sources and how open data can be made more accessible with generative AI – potentially enabling a Fourth Wave of Open Data – remains an under explored area. 

For these reasons, The Open Data Policy Lab (a collaboration between The GovLab and Microsoft) decided to explore the possible intersections between open data from official sources and generative AI. Throughout the last year, the team has conducted a range of research initiatives about the potential of open data and generative including a panel discussion, interviews, and Open Data Action Labs – a series of design sprints with a diverse group of industry experts. 

These initiatives were used to inform our latest report, “A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI,” (May 2024) which provides a new framework and recommendations to support open data providers and other interested parties in making open data “ready” for generative AI…

The report outlines five scenarios in which open data from official sources (e.g. open government and open research data) and generative AI can intersect. Each of these scenarios includes case studies from the field and a specific set of requirements that open data providers can focus on to become ready for a scenario. These include…(More)” (Arxiv).

Png Cover Page 26

The Open Data Maturity Ranking is shoddy – it badly needs to be re-thought

Article by Olesya Grabova: “Digitalising government is essential for Europe’s future innovation and economic growth and one of the keys to achieving this is open data – information that public entities gather, create, or fund, and it’s accessible to all to freely use.

This includes everything from public budget details to transport schedules. Open data’s benefits are vast — it fuels research, boosts innovation, and can even save lives in wartime through the creation of chatbots with information about bomb shelter locations. It’s estimated that its economic value will reach a total of EUR 194 billion for EU countries and the UK by 2030.

This is why correctly measuring European countries’ progress in open data is so important. And that’s why the European Commission developed the Open Data Maturity (ODM) ranking, which annually measures open data quality, policies, online portals, and impact across 35 European countries.

Alas, however, it doesn’t work as well as it should and this needs to be addressed.

A closer look at the report’s overall approach reveals the ranking hardly reflects countries’ real progress when it comes to open data. This flawed system, rather than guiding countries towards genuine improvement, risks misrepresenting their actual progress and misleads citizens about their country’s advancements, which further stalls opportunities for innovation.

Take Slovakia. It’s apparently the biggest climber,  leaping from 29th to 10th place in just over a year. One would expect that the country has made significant progress in making public sector information available and stimulating its reuse – one of the OMD assessment’s key elements.

A deeper examination reveals that this isn’t the case. Looking at the ODM’s methodology highlights where it falls short… and how it can be fixed…(More)”.

Mechanisms for Researcher Access to Online Platform Data

Status Report by the EU/USA: “Academic and civil society research on prominent online platforms has become a crucial way to understand the information environment and its impact on our societies. Scholars across the globe have leveraged application programming interfaces (APIs) and web crawlers to collect public user-generated content and advertising content on online platforms to study societal issues ranging from technology-facilitated gender-based violence, to the impact of media on mental health for children and youth. Yet, a changing landscape of platforms’ data access mechanisms and policies has created uncertainty and difficulty for critical research projects.

The United States and the European Union have a shared commitment to advance data access for researchers, in line with the high-level principles on access to data from online platforms for researchers announced at the EU-U.S. Trade and Technology Council (TTC) Ministerial Meeting in May 2023.1 Since the launch of the TTC, the EU Digital Services Act (DSA) has gone into effect, requiring providers of Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) to provide increased transparency into their services. The DSA includes provisions on transparency reports, terms and conditions, and explanations for content moderation decisions. Among those, two provisions provide important access to publicly available content on platforms:

• DSA Article 40.12 requires providers of VLOPs/VLOSEs to provide academic and civil society researchers with data that is “publicly accessible in their online interface.”
• DSA Article 39 requires providers of VLOPs/VLOSEs to maintain a public repository of advertisements.

The announcements related to new researcher access mechanisms mark an important development and opportunity to better understand the information environment. This status report summarizes a subset of mechanisms that are available to European and/or United States researchers today, following, in part VLOPs and VLOSEs measures to comply with the DSA. The report aims at showcasing the existing access modalities and encouraging the use of these mechanisms to study the impact of online platform’s design and decisions on society. The list of mechanisms reviewed is included in the Appendix…(More)”

DC launched an AI tool for navigating the city’s open data

Article by Kaela Roeder: “In a move echoing local governments’ increasing attention toward generative artificial intelligence across the country, the nation’s capital now aims to make navigating its open data easier through a new public beta pilot.

DC Compass, launched in March, uses generative AI to answer user questions and create maps from open data sets, ranging from the district’s population to what different trees are planted in the city. The Office of the Chief Technology Officer (OCTO) partnered with the geographic information system (GIS) technology company Esri, which has an office in Vienna, Virginia, to create the new tool.

This debut follows Mayor Muriel Bowser’s signing of DC’s AI Values and Strategic Plan in February. The order requires agencies to assess if using AI is in alignment with the values it sets forth, including that there’s a clear benefit to people; a plan for “meaningful accountability” for the tool; and transparency, sustainability, privacy and equity at the forefront of deployment.

These values are key when launching something like DC Compass, said Michael Rupert, the interim chief technology officer for digital services at the Office of the Chief Technology Officer.

“The way Mayor Bowser rolled out the mayor’s order and this value statement, I think gives residents and businesses a little more comfort that we aren’t just writing a check and seeing what happens,” Rupert said. “That we’re actually methodically going about it in a responsible way, both morally and fiscally.”..(More)”.

Screenshot of AI portal with black text and data tables over white background


How Copyright May Destroy Our Access To The World’s Academic Knowledge

Article by Glyn Moody: “The shift from analogue to digital has had a massive impact on most aspects of life. One area where that shift has the potential for huge benefits is in the world of academic publishing. Academic papers are costly to publish and distribute on paper, but in a digital format they can be shared globally for almost no cost. That’s one of the driving forces behind the open access movement. But as Walled Culture has reported, resistance from the traditional publishing world has slowed the shift to open access, and undercut the benefits that could flow from it.

That in itself is bad news, but new research from Martin Paul Eve (available as open access) shows that the way the shift to digital has been managed by publishers brings with it a new problem. For all their flaws, analogue publications have the great virtue that they are durable: once a library has a copy, it is likely to be available for decades, if not centuries. Digital scholarly articles come with no such guarantee. The Internet is constantly in flux, with many publishers and sites closing down each year, often without notice. That’s a problem when sites holding archival copies of scholarly articles vanish, making it harder, perhaps impossible, to access important papers. Eve explored whether publishers were placing copies of the articles they published in key archives. Ideally, digital papers would be available in multiple archives to ensure resilience, but the reality is that very few publishers did this. Ars Technica has a good summary of Eve’s results:

When Eve broke down the results by publisher, less than 1 percent of the 204 publishers had put the majority of their content into multiple archives. (The cutoff was 75 percent of their content in three or more archives.) Fewer than 10 percent had put more than half their content in at least two archives. And a full third seemed to be doing no organized archiving at all.

At the individual publication level, under 60 percent were present in at least one archive, and over a quarter didn’t appear to be in any of the archives at all. (Another 14 percent were published too recently to have been archived or had incomplete records.)..(More)”.

Meta Kills a Crucial Transparency Tool At the Worst Possible Time

Interview by Vittoria Elliott: “Earlier this month, Meta announced that it would be shutting down CrowdTangle, the social media monitoring and transparency tool that has allowed journalists and researchers to track the spread of mis- and disinformation. It will cease to function on August 14, 2024—just months before the US presidential election.

Meta’s move is just the latest example of a tech company rolling back transparency and security measures as the world enters the biggest global election year in history. The company says it is replacing CrowdTangle with a new Content Library API, which will require researchers and nonprofits to apply for access to the company’s data. But the Mozilla Foundation and 140 other civil society organizations protested last week that the new offering lacks much of CrowdTangle’s functionality, asking the company to keep the original tool operating until January 2025.

Meta spokesperson Andy Stone countered in posts on X that the groups’ claims “are just wrong,” saying the new Content Library will contain “more comprehensive data than CrowdTangle” and be made available to nonprofits, academics, and election integrity experts. When asked why commercial newsrooms, like WIRED, are to be excluded from the Content Library, Meta spokesperson Eric Porterfield said,  that it was “built for research purposes.” While journalists might not have direct access he suggested they could use commercial social network analysis tools, or “partner with an academic institution to help answer a research question related to our platforms.”

Brandon Silverman, cofounder and former CEO of CrowdTangle, who continued to work on the tool after Facebook acquired it in 2016, says it’s time to force platforms to open up their data to outsiders. The conversation has been edited for length and clarity…(More)”.

Meta to shut off data access to journalists

Article by Sara Fischer: “Meta plans to officially shutter CrowdTangle, the analytics tool widely used by journalists and researchers to see what’s going viral on Facebook and Instagram, the company’s president of global affairs Nick Clegg told Axios in an interview.

Why it matters: The company plans to instead offer select researchers access to a set of new data tools, but news publishers, journalists or anyone with commercial interests will not be granted access to that data.

The big picture: The effort comes amid a broader pivot from Meta away from news and politics and more toward user-generated viral videos.

  • Meta acquired CrowdTangle in 2016 at a time when publishers were heavily reliant on the tech giant for traffic.
  • In recent years, it’s stopped investing in the tool, making it less reliable.

The new research tools include Meta’s Content Library, which it launched last year, and an API, or backend interface used by developers.

  • Both tools offer researchers access to huge swaths of data from publicly accessible content across Facebook and Instagram.
  • The tools are available in 180 languages and offer global data.
  • Researchers must apply for access to those tools through the Inter-university Consortium for Political and Social Research at the University of Michigan, which will vet their requests…(More)”

New Horizons

An Introduction to the 2nd Edition of the State of Open Data by Renata Avila and Tim Davies: “The struggle to deliver on the vision that data, this critical resource of modern societies, should be widely available, well structured, and shared for all to use, has been a long one. It has been a struggle involving thousands upon thousands of individuals, organisations, and communities. Without their efforts, public procurement would be opaque, smart-cities even more corporate controlled, transport systems less integrated, and pandemic responses less rapid. Across numerous initiatives, open data has become more embedded as a way to support accountability, enable collaboration, and to better unlock the value of data. 

However, much like the climber reaching the top of the foothills, and for the first time seeing the hard climb of the whole mountain coming into view, open data advocates, architects, and community builders have not reached the end of their journey. As we move into the middle of the 2020s, action on open data faces new and significant challenges if we are to see a future in which open and enabling data infrastructures and ecosystems are the norm rather than a sparse patchwork of exceptions. Building open infrastructures to power social change for the next century is no small task, and to meet the challenges ahead, we will need all that the lessons we can gather from more than 15 years of open data action to date…Across the collection, we can find two main pathways to broader participation explored. On the one hand are discussions of widening public engagement and data literacy, creating a more diverse constituency of people interested and able to engage with data projects in a voluntary capacity. On the other, are calls for more formalisation of data governance, embedding citizen voices within increasingly structured data collaborations and ensuring that affected stakeholders are consulted on, or given a role in, key data decisions. Mariel García-Montes (Data Literacy) underscores the case for an equity-first approach to the first pathway, highlighting how generalist data literacy can be used for or against the public good, and calling for approaches to data literacy building that centre on an understanding of inequality and power. In writing on urban development, Stefaan G. Verhulst and Sampriti Saxena (Urban Development) point to a number of examples of the latter approach in which cities are experimenting with various forms of deliberative conversations and processes…(More)”.

A Plan to Develop Open Science’s Green Shoots into a Thriving Garden

Article by Greg Tananbaum, Chelle Gentemann, Kamran Naim, and Christopher Steven Marcum: “…As it’s moved from an abstract set of principles about access to research and data into the realm of real-world activities, the open science movement has mirrored some of the characteristics of the open source movement: distributed, independent, with loosely coordinated actions happening in different places at different levels. Globally, many things are happening, often disconnected, but still interrelated: open science has sowed a constellation of thriving green shoots, not quite yet a garden, but all growing rapidly on arable soil.

Streamlining research processes, reducing duplication of efforts, and accelerating scientific discoveries could ensure that the fruits of open science processes and products are more accessible and equitably distributed.

It is now time to consider how much faster and farther the open science movement could go with more coordination. What efficiencies might be realized if disparate efforts could better harmonize across geographies, disciplines, and sectors? How would an intentional, systems-level approach to aligning incentives, infrastructure, training, and other key components of a rationally functioning research ecosystem advance the wider goals of the movement? Streamlining research processes, reducing duplication of efforts, and accelerating scientific discoveries could ensure that the fruits of open science processes and products are more accessible and equitably distributed…(More)”