A shared destiny for public sector data


Blog post by Shona Nicol: “As a data professional, it can sometime feel hard to get others interested in data. Perhaps like many in this profession, I can often express the importance and value of data for good in an overly technical way. However when our biggest challenges in Scotland include eradicating child poverty, growing the economy and tackling the climate emergency, I would argue that we should all take an interest in data because it’s going to be foundational in helping us solve these problems.

Data is already intrinsic to shaping our society and how services are delivered. And public sector data is a vital component in making sure that services for the people of Scotland are being delivered efficiently and effectively. Despite an ever growing awareness of the transformative power of data to improve the design and delivery of services, feedback from public sector staff shows that they can face difficulties when trying to influence colleagues and senior leaders around the need to invest in data.

A vision gap

In the Scottish Government’s data maturity programme and more widely, we regularly hear about the challenges data professionals encounter when trying to enact change. This community tell us that a long-term vision for public sector data for Scotland could help them by providing the context for what they are trying to achieve locally.

Earlier this year we started to scope how we might do this. We recognised that organisations are already working to deliver local and national strategies and policies that relate to data, so any vision had to be able to sit alongside those, be meaningful in different settings, agnostic of technology and relevant to any public sector organisation. We wanted to offer opportunities for alignment, not enforce an instruction manual…(More)”.

Unlocking AI for All: The Case for Public Data Banks


Article by Kevin Frazier: “The data relied on by OpenAI, Google, Meta, and other artificial intelligence (AI) developers is not readily available to other AI labs. Google and Meta relied, in part, on data gathered from their own products to train and fine-tune their models. OpenAI used tactics to acquire data that now would not work or may be more likely to be found in violation of the law (whether such tactics violated the law when originally used by OpenAI is being worked out in the courts). Upstart labs as well as research outfits find themselves with a dearth of data. Full realization of the positive benefits of AI, such as being deployed in costly but publicly useful ways (think tutoring kids or identifying common illnesses), as well as complete identification of the negative possibilities of AI (think perpetuating cultural biases) requires that labs other than the big players have access to quality, sufficient data.

The proper response is not to return to an exploitative status quo. Google, for example, may have relied on data from YouTube videos without meaningful consent from users. OpenAI may have hoovered up copyrighted data with little regard for the legal and social ramifications of that approach. In response to these questionable approaches, data has (rightfully) become harder to acquire. Cloudflare has equipped websites with the tools necessary to limit data scraping—the process of extracting data from another computer program. Regulators have developed new legal limits on data scraping or enforced old ones. Data owners have become more defensive over their content and, in some cases, more litigious. All of these largely positive developments from the perspective of data creators (which is to say, anyone and everyone who uses the internet) diminish the odds of newcomers entering the AI space. The creation of a public AI training data bank is necessary to ensure the availability of enough data for upstart labs and public research entities. Such banks would prevent those new entrants from having to go down the costly and legally questionable path of trying to hoover up as much data as possible…(More)”.

Artificial Intelligence as a Catalyzer for Open Government Data Ecosystems: A Typological Theory Approach


Paper by Anthony Simonofski et al: “Artificial Intelligence (AI) within digital government has witnessed growing interest as it can improve governance processes and stimulate citizen engagement. Despite the rise of Generative AI, discussions on AI fusion with Open Government Data (OGD) remain limited to specific implementations and scattered across disciplines. Drawing from the synthesis of the literature through a systematic review, this study examines and structures how AI can enrich OGD initiatives. Employing a typological approach, ideal profiles of AI application within the OGD lifecycle are formalized, capturing varied roles across the portal and ecosystems perspectives. The resulting conceptual framework identifies eight ideal types of AI applications for OGD: AI as Portal Curator, Explorer, Linker, and Monitor, and AI as Ecosystem Data Retriever, Connecter, Value Developer and Engager. This theoretical foundation shows the under-investigation of some types and will inform policymakers, practitioners, and researchers in leveraging AI to cultivate OGD ecosystems…(More)”.

Community consent: neither a ceiling nor a floor


Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata. 

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

  • Ensuring that information about the research project spreads throughout the community,
  • Removing potential barriers that might be created by resistance from community members,
  • Alleviating the possible concerns of individuals about the perspectives of community leaders, and 
  • Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.

The Role of Open Data in Driving Sectoral Innovation and Global Economic Development


Paper by Olalekan Jamiu Okunleye: “This study assessed the transformative impact of implementing open data principles on fostering innovation across various sectors and enhancing global economic development. Using a comprehensive analysis of secondary data from government portals, industry reports, and global innovation indexes between 2015 to 2019, the research employed panel data regression, correlation analysis, and descriptive statistics to evaluate key relationships. The findings indicate that the availability of open data significantly increases innovation outputs, with robust statistical evidence showing positive correlations between open data sets and sector-specific innovation metrics such as patents filed, R&D expenditure, and the number of startups created. Greater interoperability of open data across international borders contributes to economic growth, particularly through international joint ventures. However, the lack of standardized data formats hampers cross-sector collaboration. Regions with well-established open data policies demonstrate faster technological advancements and economic development compared to regions without such policies. The study highlighted the critical importance of promoting open data initiatives, standardizing data formats, strengthening data governance frameworks, and investing in digital infrastructure and capacity building to optimize open data utilization and drive sustainable development…(More)”.

The societal impact of Open Science: a scoping review


Report by Nicki Lisa Cole, Eva Kormann, Thomas Klebel, Simon Apartis and Tony Ross-Hellauer: “Open Science (OS) aims, in part, to drive greater societal impact of academic research. Government, funder and institutional policies state that it should further democratize research and increase learning and awareness, evidence-based policy-making, the relevance of research to society’s problems, and public trust in research. Yet, measuring the societal impact of OS has proven challenging and synthesized evidence of it is lacking. This study fills this gap by systematically scoping the existing evidence of societal impact driven by OS and its various aspects, including Citizen Science (CS), Open Access (OA), Open/FAIR Data (OFD), Open Code/Software and others. Using the PRISMA Extension for Scoping Reviews and searches conducted in Web of Science, Scopus and relevant grey literature, we identified 196 studies that contain evidence of societal impact. The majority concern CS, with some focused on OA, and only a few addressing other aspects. Key areas of impact found are education and awareness, climate and environment, and social engagement. We found no literature documenting evidence of the societal impact of OFD and limited evidence of societal impact in terms of policy, health, and trust in academic research. Our findings demonstrate a critical need for additional evidence and suggest practical and policy implications…(More)”.

Preparing Researchers for an Era of Freer Information


Article by Peter W.B. Phillips: “If you Google my name along with “Monsanto,” you will find a series of allegations from 2013 that my scholarly work at the University of Saskatchewan, focused on technological change in the global food system, had been unduly influenced by corporations. The allegations made use of seven freedom of information (FOI) requests. Although leadership at my university determined that my publications were consistent with university policy, the ensuing media attention, I feel, has led some colleagues, students, and partners to distance themselves to avoid being implicated by association.

In the years since, I’ve realized that my experience is not unique. I have communicated with other academics who have experienced similar FOI requests related to genetically modified organisms in the United States, Canada, England, Netherlands, and Brazil. And my field is not the only one affected: a 2015 Union of Concerned Scientists report documented requests in multiple states and disciplines—from history to climate science to epidemiology—as well as across ideologies. In the University of California system alone, researchers have received open records requests related to research on the health effects of toxic chemicals, the safety of abortions performed by clinicians rather than doctors, and the green energy production infrastructure. These requests are made possible by laws that permit anyone, for any reason, to gain access to public agencies’ records.

These open records campaigns, which are conducted by individuals and groups across the political spectrum, arise in part from the confluence of two unrelated phenomena: the changing nature of academic research toward more translational, interdisciplinary, and/or team-based investigations and the push for more transparency in taxpayer-funded institutions. Neither phenomenon is inherently negative; in fact, there are strong advantages for science and society in both trends. But problems arise when scholars are caught between them—affecting the individuals involved and potentially influencing the ongoing conduct of research…(More)”

We need a social science of data


Article by Cristina Alaimo and Jannis Kallinikos: “The practical and technical knowledge of data science must be complemented by a scientific field that can respond to these challenges and trace their implications for social practice and institutions.

Determining how such a field will look is not the job of two people but, rather, that of a whole scientific and social discourse that we as a society have the obligation to develop and maintain. Students and data users must know the power and subtlety of the artefacts they study and employ.

Such a scientific field should also provide the basis for analysing the social relations and economic dynamics of data generation and use, which are closely associated with several social groups, professions, communities and firms….(More)”.

Effects of Open Access. Literature study on empirical research 2010–2021


Paper by David Hopf, Sarah Dellmann, Christian Hauschke, and Marco Tullney: “Open access — the free availability of scholarly publications — intuitively offers many benefits. At the same time, some academics, university administrators, publishers, and political decision-makers express reservations. Many empirical studies on the effects of open access have been published in the last decade. This report provides an overview of the state of research from 2010 to 2021. The empirical results on the effects of open access help to determine the advantages and disadvantages of open access and serve as a knowledge base for academics, publishers, research funding and research performing institutions, and policy makers. This overview of current findings can inform decisions about open access and publishing strategies. In addition, this report identifies aspects of the impact of open access that are potentially highly relevant but have not yet been sufficiently studied…(More)”.

Japan’s push to make all research open access is taking shape


Article by Dalmeet Singh Chawla: “The Japanese government is pushing ahead with a plan to make Japan’s publicly funded research output free to read. In June, the science ministry will assign funding to universities to build the infrastructure needed to make research papers free to read on a national scale. The move follows the ministry’s announcement in February that researchers who receive government funding will be required to make their papers freely available to read on the institutional repositories from April 2025.

The Japanese plan “is expected to enhance the long-term traceability of research information, facilitate secondary research and promote collaboration”, says Kazuki Ide, a health-sciences and public-policy scholar at Osaka University in Suita, Japan, who has written about open access in Japan.

The nation is one of the first Asian countries to make notable advances towards making more research open access (OA) and among the first countries in the world to forge a nationwide plan for OA.

The plan follows in the footsteps of the influential Plan S, introduced six years ago by a group of research funders in the United States and Europe known as cOAlition S, to accelerate the move to OA publishing. The United States also implemented an OA mandate in 2022 that requires all research funded by US taxpayers to be freely available from 2026…(More)”.