How Copyright May Destroy Our Access To The World’s Academic Knowledge


Article by Glyn Moody: “The shift from analogue to digital has had a massive impact on most aspects of life. One area where that shift has the potential for huge benefits is in the world of academic publishing. Academic papers are costly to publish and distribute on paper, but in a digital format they can be shared globally for almost no cost. That’s one of the driving forces behind the open access movement. But as Walled Culture has reported, resistance from the traditional publishing world has slowed the shift to open access, and undercut the benefits that could flow from it.

That in itself is bad news, but new research from Martin Paul Eve (available as open access) shows that the way the shift to digital has been managed by publishers brings with it a new problem. For all their flaws, analogue publications have the great virtue that they are durable: once a library has a copy, it is likely to be available for decades, if not centuries. Digital scholarly articles come with no such guarantee. The Internet is constantly in flux, with many publishers and sites closing down each year, often without notice. That’s a problem when sites holding archival copies of scholarly articles vanish, making it harder, perhaps impossible, to access important papers. Eve explored whether publishers were placing copies of the articles they published in key archives. Ideally, digital papers would be available in multiple archives to ensure resilience, but the reality is that very few publishers did this. Ars Technica has a good summary of Eve’s results:

When Eve broke down the results by publisher, less than 1 percent of the 204 publishers had put the majority of their content into multiple archives. (The cutoff was 75 percent of their content in three or more archives.) Fewer than 10 percent had put more than half their content in at least two archives. And a full third seemed to be doing no organized archiving at all.

At the individual publication level, under 60 percent were present in at least one archive, and over a quarter didn’t appear to be in any of the archives at all. (Another 14 percent were published too recently to have been archived or had incomplete records.)..(More)”.

Monitoring global trade using data on vessel traffic


Article by Graham Pilgrim, Emmanuelle Guidetti and Annabelle Mourougane: “Rising uncertainties and geo-political tensions, together with more complex trade relations have increased the demand for data and tools to monitor global trade in a timely manner. At the same time, advances in Big Data Analytics and access to a huge quantity of alternative data – outside the realm of official statistics – have opened new avenues to monitor trade. These data can help identify bottlenecks and disruptions in real time but need to be cleaned and validated.

One such alternative data source is the Automatic Identification System (AIS), developed by the International Maritime Organisation, facilitating the tracking of vessels across the globe. The system includes messages transmitted by ships to land or satellite receivers, available in quasi real time. While it was primarily designed to ensure vessel safety, this data is particularly well suited for providing insights on trade developments, as over 80% in volume of international merchandise trade is carried by sea (UNCTAD, 2022). Furthermore, AIS data holds granular vessel information and detailed location data, which combined with other data sources can enable the identification of activity at a port (or even berth) level, by vessel type or by the jurisdiction of vessel ownership.

For a number of years, the UN Global Platform has made AIS data available to those compiling official statistics, such as National Statistics Offices (NSOs) or International Organisations. This has facilitated the development of new methodologies, for instance the automated identification of port locations (Irish Central Statistics Office, 2022). The data has also been exploited by data scientists and research centres to monitor trade in specific commodities such as Liquefied Natural Gas (QuantCube Technology, 2022) or to analyse port and shipping operations in a specific country (Tsalamanis et al., 2018). Beyond trade, the dataset has been used to track CO2 emissions from the maritime sector (Clarke et al., 2023).

New work from the OECD Statistics and Data Directorate contributes to existing research in this field in two major ways. First, it proposes a new methodology to identify ports, at a higher level of precision than in past research. Second, it builds indicators to monitor port congestion and trends in maritime trade flows and provides a tool to get detailed information and better understand those flows…(More)”.

What Does Information Integrity Mean for Democracies?


Article by Kamya Yadav and Samantha Lai: “Democracies around the world are encountering unique challenges with the rise of new technologies. Experts continue to debate how social media has impacted democratic discourse, pointing to how algorithmic recommendationsinfluence operations, and cultural changes in norms of communication alter the way people consume information. Meanwhile, developments in artificial intelligence (AI) surface new concerns over how the technology might affect voters’ decision-making process. Already, we have seen its increased use in relation to political campaigning. 

In the run-up to Pakistan’s 2024 presidential elections, former Prime Minister Imran Khan used an artificially generated speech to campaign while imprisoned. Meanwhile, in the United States, a private company used an AI-generated imitation of President Biden’s voice to discourage people from voting. In response, the Federal Communications Commission outlawed the use of AI-generated robocalls.

Evolving technologies present new threats. Disinformation, misinformation, and propaganda are all different faces of the same problem: Our information environment—the ecosystem in which we disseminate, create, receive, and process information—is not secure and we lack coherent goals to direct policy actions. Formulating short-term, reactive policy to counter or mitigate the effects of disinformation or propaganda can only bring us so far. Beyond defending democracies from unending threats, we should also be looking at what it will take to strengthen them. This begs the question: How do we work toward building secure and resilient information ecosystems? How can policymakers and democratic governments identify policy areas that require further improvement and shape their actions accordingly?…(More)”.

Power and Governance in the Age of AI


Reflections by several experts: “The best way to think about ChatGPT is as the functional equivalent of expensive private education and tutoring. Yes, there is a free version, but there is also a paid subscription that gets you access to the latest breakthroughs and a more powerful version of the model. More money gets you more power and privileged access. As a result, in my courses at Middlebury College this spring, I was obliged to include the following statement in my syllabus:

“Policy on the use of ChatGPT: You may all use the free version however you like and are encouraged to do so. For purposes of equity, use of the subscription version is forbidden and will be considered a violation of the Honor Code. Your professor has both versions and knows the difference. To ensure you are learning as much as possible from the course readings, careful citation will be mandatory in both your informal and formal writing.”

The United States fails to live up to its founding values when it supports a luxury brand-driven approach to educating its future leaders that is accessible to the privileged and a few select lottery winners. One such “winning ticket” student in my class this spring argued that the quality-education-for-all issue was of such importance for the future of freedom that he would trade his individual good fortune at winning an education at Middlebury College for the elimination of ALL elite education in the United States so that quality education could be a right rather than a privilege.

A democracy cannot function if the entire game seems to be rigged and bought by elites. This is true for the United States and for democracies in the making or under challenge around the world. Consequently, in partnership with other liberal democracies, the U.S. government must do whatever it can to render both public and private governance more transparent and accountable. We should not expect authoritarian states to help us uphold liberal democratic values, nor should we expect corporations to do so voluntarily…(More)”.

Limiting Data Broker Sales in the Name of U.S. National Security: Questions on Substance and Messaging


Article by Peter Swire and Samm Sacks: “A new executive order issued today contains multiple provisions, most notably limiting bulk sales of personal data to “countries of concern.” The order has admirable national security goals but quite possibly would be ineffective and may be counterproductive. There are serious questions about both the substance and the messaging of the order. 

The new order combines two attractive targets for policy action. First, in this era of bipartisan concern about China, the new order would regulate transactions specifically with “countries of concern,” notably China, but also others such as Iran and North Korea. A key rationale for the order is to prevent China from amassing sensitive information about Americans, for use in tracking and potentially manipulating military personnel, government officials, or anyone else of interest to the Chinese regime. 

Second, the order targets bulk sales, to countries of concern, of sensitive personal information by data brokers, such as genomic, biometric, and precise geolocation data. The large and growing data broker industry has come under well-deserved bipartisan scrutiny for privacy risks. Congress has held hearings and considered bills to regulate such brokers. California has created a data broker registry and last fall passed the Delete Act to enable individuals to require deletion of their personal data. In January, the Federal Trade Commission issued an order prohibiting data broker Outlogic from sharing or selling sensitive geolocation data, finding that the company had acted without customer consent, in an unfair and deceptive manner. In light of these bipartisan concerns, a new order targeting both China and data brokers has a nearly irresistible political logic.

Accurate assessment of the new order, however, requires an understanding of this order as part of a much bigger departure from the traditional U.S. support for free and open flows of data across borders. Recently, in part for national security reasons, the U.S. has withdrawn its traditional support in the World Trade Organization (WTO) for free and open data flows, and the Department of Commerce has announced a proposed rule, in the name of national security, that would regulate U.S.-based cloud providers when selling to foreign countries, including for purposes of training artificial intelligence (AI) models. We are concerned that these initiatives may not sufficiently account for the national security advantages of the long-standing U.S. position and may have negative effects on the U.S. economy.

Despite the attractiveness of the regulatory targets—data brokers and countries of concern—U.S. policymakers should be cautious as they implement this order and the other current policy changes. As discussed below, there are some possible privacy advances as data brokers have to become more careful in their sales of data, but a better path would be to ensure broader privacy and cybersecurity safeguards to better protect data and critical infrastructure systems from sophisticated cyberattacks from China and elsewhere…(More)”.

Once upon a bureaucrat: Exploring the role of stories in government


Article by Thea Snow: “When you think of a profession associated with stories, what comes to mind? Journalist, perhaps? Or author? Maybe, at a stretch, you might think about a filmmaker. But I would hazard a guess that “public servant” would unlikely be one of the first professions that come to mind. However, recent research suggests that we should be thinking more deeply about the connections between stories and government.

Since 2021, the Centre for Public Impact, in partnership with Dusseldorp Forum and Hands Up Mallee, has been exploring the role of storytelling in the context of place-based systems change work. Our first report, Storytelling for Systems Change: Insights from the Field, focused on the way communities use stories to support place-based change. Our second report, Storytelling for Systems Change: Listening to Understand, focused more on how stories are perceived and used by those in government who are funding and supporting community-led systems change initiatives.

To shape these reports, we have spent the past few years speaking to community members, collective impact backbone teams, storytelling experts, academics, public servants, data analysts, and more. Here’s some of what we’ve heard…(More)”.

Mark the good stuff: Content provenance and the fight against disinformation


BBC Blog: “BBC News’s Verify team is a dedicated group of 60 journalists who fact-check, verify video, counter disinformation, analyse data and – crucially – explain complex stories in the pursuit of truth. On Monday, March 4th, Verify published their first article using a new open media provenance technology called C2PA. The C2PA standard is a technology that records digitally signed information about the provenance of imagery, video and audio – information (or signals) that shows where a piece of media has come from and how it’s been edited. Like an audit trail or a history, these signals are called ‘content credentials’.

Content credentials can be used to help audiences distinguish between authentic, trustworthy media and content that has been faked. The digital signature attached to the provenance information ensures that when the media is “validated”, the person or computer reading the image can be sure that it came from the BBC (or any other source with its own x.509 certificate).

This is important for two reasons. First, it gives publishers like the BBC the ability to share transparently with our audiences what we do every day to deliver great journalism. It also allows us to mark content that is shared across third party platforms (like Facebook) so audiences can trust that when they see a piece of BBC content it does in fact come from the BBC.

For the past three years, BBC R&D has been an active partner in the development of the C2PA standard. It has been developed in collaboration with major media and technology partners, including Microsoft, the New York Times and Adobe. Membership in C2PA is growing to include organisations from all over the world, from established hardware manufacturers like Canon, to technology leaders like OpenAI, fellow media organisations like NHK, and even the Publicis Group covering the advertising industry. Google has now joined the C2PA steering committee and social media companies are leaning in too: Meta has recently announced they are actively assessing implementing C2PA across their platforms…(More)”.

The AI data scraping challenge:  How can we proceed responsibly?


Article by Lee Tiedrich: “Society faces an urgent and complex artificial intelligence (AI) data scraping challenge.  Left unsolved, it could threaten responsible AI innovation.  Data scraping refers to using web crawlers or other means to obtain data from third-party websites or social media properties.  Today’s large language models (LLMs) depend on vast amounts of scraped data for training and potentially other purposes.  Scraped data can include facts, creative content, computer code, personal information, brands, and just about anything else.  At least some LLM operators directly scrape data from third-party sites.  Common CrawlLAION, and other sites make scraped data readily accessible.  Meanwhile, Bright Data and others offer scraped data for a fee. 

In addition to fueling commercial LLMs, scraped data can provide researchers with much-needed data to advance social good.  For instance, Environmental Journal explains how scraped data enhances sustainability analysis.  Nature reports that scraped data improves research about opioid-related deaths.  Training data in different languages can help make AI more accessible for users in Africa and other underserved regions.  Access to training data can even advance the OECD AI Principles by improving safety and reducing bias and other harms, particularly when such data is suitable for the AI system’s intended purpose…(More)”.

Societal challenges and big qualitative data require a new era of methodological pragmatism


Blog by Alex Gillespie, Vlad Glăveanu, and Constance de Saint-Laurent: “The ‘classic’ methods we use today in psychology and the social sciences might seem relatively fixed, but they are the product of collective responses to concerns within a historical context. The 20th century methods of questionnaires and interviews made sense in a world where researchers did not have access to what people did or said, and even if they did, could not analyse it at scale. Questionnaires and interviews were suited to 20th century concerns (shaped by colonialism, capitalism, and the ideological battles of the Cold War) for understanding, classifying, and mapping opinions and beliefs.

However, what social scientists are faced with today is different due to the culmination of two historical trends. The first has to do with the nature of the problems we face. Inequalities, the climate emergency and current wars are compounded by a general rise in nationalism, populism, and especially post-truth discourses and ideologies. Nationalism and populism are not new, but the scale and sophistication of misinformation threatens to undermine collective responses to collective problems.

It is often said that we live in the age of ‘big data’, but what is less often said is that this is in fact the age of ‘big qualitative data’.

The second trend refers to technology and its accelerated development, especially the unprecedented accumulation of naturally occurring data (digital footprints) combined with increasingly powerful methods for data analysis (traditional and generative AI). It is often said that we live in the age of ‘big data’, but what is less often said is that this is in fact the age of ‘big qualitative data’. The biggest datasets are unstructured qualitative data (each minute adds 2.5 million Google text searches, 500 thousand photos on Snapchat, 500 hours of YouTube videos) and the most significant AI advances leverage this qualitative data and make it tractable for social research.

These two trends have been fuelling the rise in mixed methods research…(More)” (See also their new book ‘Pragmatism and Methodology’ (open access)

Evaluating LLMs Through a Federated, Scenario-Writing Approach


Article by Bogdana “Bobi” Rakova: “What do screenwriters, AI builders, researchers, and survivors of gender-based violence have in common? I’d argue they all imagine new, safe, compassionate, and empowering approaches to building understanding.

In partnership with Kwanele South Africa, I lead an interdisciplinary team, exploring this commonality in the context of evaluating large language models (LLMs) — more specifically, chatbots that provide legal and social assistance in a critical context. The outcomes of our engagement are a series of evaluation objectives and scenarios that contribute to an evaluation protocol with the core tenet that when we design for the most vulnerable, we create better futures for everyone. In what follows I describe our process. I hope this methodological approach and our early findings will inspire other evaluation efforts to meaningfully center the margins in building more positive futures that work for everyone…(More)”