Unlocking AI for All: The Case for Public Data Banks


Article by Kevin Frazier: “The data relied on by OpenAI, Google, Meta, and other artificial intelligence (AI) developers is not readily available to other AI labs. Google and Meta relied, in part, on data gathered from their own products to train and fine-tune their models. OpenAI used tactics to acquire data that now would not work or may be more likely to be found in violation of the law (whether such tactics violated the law when originally used by OpenAI is being worked out in the courts). Upstart labs as well as research outfits find themselves with a dearth of data. Full realization of the positive benefits of AI, such as being deployed in costly but publicly useful ways (think tutoring kids or identifying common illnesses), as well as complete identification of the negative possibilities of AI (think perpetuating cultural biases) requires that labs other than the big players have access to quality, sufficient data.

The proper response is not to return to an exploitative status quo. Google, for example, may have relied on data from YouTube videos without meaningful consent from users. OpenAI may have hoovered up copyrighted data with little regard for the legal and social ramifications of that approach. In response to these questionable approaches, data has (rightfully) become harder to acquire. Cloudflare has equipped websites with the tools necessary to limit data scraping—the process of extracting data from another computer program. Regulators have developed new legal limits on data scraping or enforced old ones. Data owners have become more defensive over their content and, in some cases, more litigious. All of these largely positive developments from the perspective of data creators (which is to say, anyone and everyone who uses the internet) diminish the odds of newcomers entering the AI space. The creation of a public AI training data bank is necessary to ensure the availability of enough data for upstart labs and public research entities. Such banks would prevent those new entrants from having to go down the costly and legally questionable path of trying to hoover up as much data as possible…(More)”.

Artificial Intelligence as a Catalyzer for Open Government Data Ecosystems: A Typological Theory Approach


Paper by Anthony Simonofski et al: “Artificial Intelligence (AI) within digital government has witnessed growing interest as it can improve governance processes and stimulate citizen engagement. Despite the rise of Generative AI, discussions on AI fusion with Open Government Data (OGD) remain limited to specific implementations and scattered across disciplines. Drawing from the synthesis of the literature through a systematic review, this study examines and structures how AI can enrich OGD initiatives. Employing a typological approach, ideal profiles of AI application within the OGD lifecycle are formalized, capturing varied roles across the portal and ecosystems perspectives. The resulting conceptual framework identifies eight ideal types of AI applications for OGD: AI as Portal Curator, Explorer, Linker, and Monitor, and AI as Ecosystem Data Retriever, Connecter, Value Developer and Engager. This theoretical foundation shows the under-investigation of some types and will inform policymakers, practitioners, and researchers in leveraging AI to cultivate OGD ecosystems…(More)”.

Community consent: neither a ceiling nor a floor


Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata. 

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

  • Ensuring that information about the research project spreads throughout the community,
  • Removing potential barriers that might be created by resistance from community members,
  • Alleviating the possible concerns of individuals about the perspectives of community leaders, and 
  • Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.

The Role of Open Data in Driving Sectoral Innovation and Global Economic Development


Paper by Olalekan Jamiu Okunleye: “This study assessed the transformative impact of implementing open data principles on fostering innovation across various sectors and enhancing global economic development. Using a comprehensive analysis of secondary data from government portals, industry reports, and global innovation indexes between 2015 to 2019, the research employed panel data regression, correlation analysis, and descriptive statistics to evaluate key relationships. The findings indicate that the availability of open data significantly increases innovation outputs, with robust statistical evidence showing positive correlations between open data sets and sector-specific innovation metrics such as patents filed, R&D expenditure, and the number of startups created. Greater interoperability of open data across international borders contributes to economic growth, particularly through international joint ventures. However, the lack of standardized data formats hampers cross-sector collaboration. Regions with well-established open data policies demonstrate faster technological advancements and economic development compared to regions without such policies. The study highlighted the critical importance of promoting open data initiatives, standardizing data formats, strengthening data governance frameworks, and investing in digital infrastructure and capacity building to optimize open data utilization and drive sustainable development…(More)”.

The societal impact of Open Science: a scoping review


Report by Nicki Lisa Cole, Eva Kormann, Thomas Klebel, Simon Apartis and Tony Ross-Hellauer: “Open Science (OS) aims, in part, to drive greater societal impact of academic research. Government, funder and institutional policies state that it should further democratize research and increase learning and awareness, evidence-based policy-making, the relevance of research to society’s problems, and public trust in research. Yet, measuring the societal impact of OS has proven challenging and synthesized evidence of it is lacking. This study fills this gap by systematically scoping the existing evidence of societal impact driven by OS and its various aspects, including Citizen Science (CS), Open Access (OA), Open/FAIR Data (OFD), Open Code/Software and others. Using the PRISMA Extension for Scoping Reviews and searches conducted in Web of Science, Scopus and relevant grey literature, we identified 196 studies that contain evidence of societal impact. The majority concern CS, with some focused on OA, and only a few addressing other aspects. Key areas of impact found are education and awareness, climate and environment, and social engagement. We found no literature documenting evidence of the societal impact of OFD and limited evidence of societal impact in terms of policy, health, and trust in academic research. Our findings demonstrate a critical need for additional evidence and suggest practical and policy implications…(More)”.

Preparing Researchers for an Era of Freer Information


Article by Peter W.B. Phillips: “If you Google my name along with “Monsanto,” you will find a series of allegations from 2013 that my scholarly work at the University of Saskatchewan, focused on technological change in the global food system, had been unduly influenced by corporations. The allegations made use of seven freedom of information (FOI) requests. Although leadership at my university determined that my publications were consistent with university policy, the ensuing media attention, I feel, has led some colleagues, students, and partners to distance themselves to avoid being implicated by association.

In the years since, I’ve realized that my experience is not unique. I have communicated with other academics who have experienced similar FOI requests related to genetically modified organisms in the United States, Canada, England, Netherlands, and Brazil. And my field is not the only one affected: a 2015 Union of Concerned Scientists report documented requests in multiple states and disciplines—from history to climate science to epidemiology—as well as across ideologies. In the University of California system alone, researchers have received open records requests related to research on the health effects of toxic chemicals, the safety of abortions performed by clinicians rather than doctors, and the green energy production infrastructure. These requests are made possible by laws that permit anyone, for any reason, to gain access to public agencies’ records.

These open records campaigns, which are conducted by individuals and groups across the political spectrum, arise in part from the confluence of two unrelated phenomena: the changing nature of academic research toward more translational, interdisciplinary, and/or team-based investigations and the push for more transparency in taxpayer-funded institutions. Neither phenomenon is inherently negative; in fact, there are strong advantages for science and society in both trends. But problems arise when scholars are caught between them—affecting the individuals involved and potentially influencing the ongoing conduct of research…(More)”

We need a social science of data


Article by Cristina Alaimo and Jannis Kallinikos: “The practical and technical knowledge of data science must be complemented by a scientific field that can respond to these challenges and trace their implications for social practice and institutions.

Determining how such a field will look is not the job of two people but, rather, that of a whole scientific and social discourse that we as a society have the obligation to develop and maintain. Students and data users must know the power and subtlety of the artefacts they study and employ.

Such a scientific field should also provide the basis for analysing the social relations and economic dynamics of data generation and use, which are closely associated with several social groups, professions, communities and firms….(More)”.

Effects of Open Access. Literature study on empirical research 2010–2021


Paper by David Hopf, Sarah Dellmann, Christian Hauschke, and Marco Tullney: “Open access — the free availability of scholarly publications — intuitively offers many benefits. At the same time, some academics, university administrators, publishers, and political decision-makers express reservations. Many empirical studies on the effects of open access have been published in the last decade. This report provides an overview of the state of research from 2010 to 2021. The empirical results on the effects of open access help to determine the advantages and disadvantages of open access and serve as a knowledge base for academics, publishers, research funding and research performing institutions, and policy makers. This overview of current findings can inform decisions about open access and publishing strategies. In addition, this report identifies aspects of the impact of open access that are potentially highly relevant but have not yet been sufficiently studied…(More)”.

Japan’s push to make all research open access is taking shape


Article by Dalmeet Singh Chawla: “The Japanese government is pushing ahead with a plan to make Japan’s publicly funded research output free to read. In June, the science ministry will assign funding to universities to build the infrastructure needed to make research papers free to read on a national scale. The move follows the ministry’s announcement in February that researchers who receive government funding will be required to make their papers freely available to read on the institutional repositories from April 2025.

The Japanese plan “is expected to enhance the long-term traceability of research information, facilitate secondary research and promote collaboration”, says Kazuki Ide, a health-sciences and public-policy scholar at Osaka University in Suita, Japan, who has written about open access in Japan.

The nation is one of the first Asian countries to make notable advances towards making more research open access (OA) and among the first countries in the world to forge a nationwide plan for OA.

The plan follows in the footsteps of the influential Plan S, introduced six years ago by a group of research funders in the United States and Europe known as cOAlition S, to accelerate the move to OA publishing. The United States also implemented an OA mandate in 2022 that requires all research funded by US taxpayers to be freely available from 2026…(More)”.

Towards a pan-EU Freedom of Information Act? Harmonizing Access to Information in the EU through the internal market competence


Paper by Alberto Alemanno and Sébastien Fassiaux: “This paper examines whether – and on what basis – the EU may harmonise the right of access to information across the Union. It does by examining the available legal basis established by relevant international obligations, such as those stemming from the Council of Europe, and EU primary law. Its demonstrates that neither the Council of Europe – through the European Convention of Human Rights and the more recent Trømso Convention – nor the EU – through Article 41 of the EU Charter of Fundamental Rights – do require the EU to enact minimum standards of access to information. That Charter’s provision combined with Articles 10 and 11 TEU do require instead only the EU institutions – not the EU Member States – to ensure public access to documents, including legislative texts and meeting minutes. Regulation 1049/2001 was adopted (originally Art. 255 TEC) on such a legal basis and should be revised accordingly. The paper demonstrates that the most promising legal basis enabling the EU to proceed towards the harmonisation of access to information within the EU is offered by Article 114 TFEU. It argues hat the harmonisation of the conditions governing access to information across Member States would facilitate cross-border activities and trade, thus enhancing the internal market. Moreover, this would ensure equal access to information for all EU citizens and residents, irrespective of their location within the EU. Therefore, the question is not whether but how the EU may – under Article 114 TFEU – act to harmonise access to information. If the EU enjoys wide legislative discretion under Article 114(1) TFEU, this is not absolute but is subject to limits derived from fundamental rights and principles such as proportionality, equality, and subsidiarity. Hence, the need to design the type of harmonisation capable of preserving existing national FOIAs while enhancing the weakest ones. The only type of harmonisation fit for purpose would therefore be minimal, as opposed to maximal, by merely defining the minimum conditions required on each Member State’s national legislation governing the access to information…(More)”.