Philosophy of Open Science


Book by Sabina Leonelli: “The Open Science [OS] movement aims to foster the wide dissemination, scrutiny and re-use of research components for the good of science and society. This Element examines the role played by OS principles and practices within contemporary research and how this relates to the epistemology of science. After reviewing some of the concerns that have prompted calls for more openness, it highlights how the interpretation of openness as the sharing of resources, so often encountered in OS initiatives and policies, may have the unwanted effect of constraining epistemic diversity and worsening epistemic injustice, resulting in unreliable and unethical scientific knowledge. By contrast, this Element proposes to frame openness as the effort to establish judicious connections among systems of practice, predicated on a process-oriented view of research as a tool for effective and responsible agency…(More)”.

Setting data free: The politics of open data for food and agriculture


Paper by M. Fairbairn, and Z. Kish: “Open data is increasingly being promoted as a route to achieve food security and agricultural development. This article critically examines the promotion of open agri-food data for development through a document-based case study of the Global Open Data for Agriculture and Nutrition (GODAN) initiative as well as through interviews with open data practitioners and participant observation at open data events. While the concept of openness is striking for its ideological flexibility, we argue that GODAN propagates an anti-political, neoliberal vision for how open data can enhance agricultural development. This approach centers values such as private innovation, increased production, efficiency, and individual empowerment, in contrast to more political and collectivist approaches to openness practiced by some agri-food social movements. We further argue that open agri-food data projects, in general, have a tendency to reproduce elements of “data colonialism,” extracting data with minimal consideration for the collective harms that may result, and embedding their own values within universalizing information infrastructures…(More)”.

Open data for AI: what now?


UNESCO Report: “…A vast amount of data on environment, industry, agriculture health about the world is now being collected through automatic processes, including sensors. Such data may be readily available, but also are potentially too big for humans to handle or analyse effectively, nonetheless they could serve as input to AI systems. AI and data science techniques have demonstrated great capacity to analyse large amounts of data, as currently illustrated by generative AI systems, and help uncover formerly unknown hidden patterns to deliver actionable information in real-time. However, many contemporary AI systems run on proprietary datasets, but data that fulfil the criteria of open data would benefit AI systems further and mitigate potential hazards of the systems such as lacking fairness, accountability, and transparency.

The aim of these guidelines is to apprise Member States of the value of open data, and to outline how data are curated and opened. Member States are encouraged not only to support openness of high-quality data, but also to embrace the use of AI technologies and facilitate capacity building, training and education in this regard, including inclusive open data as well as AI literacy…(More)”.

How data helped Mexico City reduce high-impact crime by more than 50%


Article by Alfredo Molina Ledesma: “When Claudia Sheimbaum Pardo became Mayor of Mexico City 2018, she wanted a new approach to tackling the city’s most pressing problems. Crime was at the very top of the agenda – only 7% of the city’s inhabitants considered it a safe place. New policies were needed to turn this around.

Data became a central part of the city’s new strategy. The Digital Agency for Public Innovation was created in 2019 – tasked with using data to help transform the city. To put this into action, the city administration immediately implemented an open data policy and launched their official data platform, Portal de Datos Abiertos. The policy and platform aimed to make data that Mexico City collects accessible to anyone: municipal agencies, businesses, academics, and ordinary people.

“The main objective of the open data strategy of Mexico City is to enable more people to make use of the data generated by the government in a simple and interactive manner,” said Jose Merino, Head of the Digital Agency for Public Innovation. “In other words, what we aim for is to democratize the access and use of information.” To achieve this goal a new tool for interactive data visualization called Sistema Ajolote was developed in open source and integrated into the Open Data Portal…

Information that had never been made public before, such as street-level crime from the Attorney General’s Office, is now accessible to everyone. Academics, businesses and civil society organizations can access the data to create solutions and innovations that complement the city’s new policies. One example is the successful “Hoyo de Crimen” app, which proposes safe travel routes based on the latest street-level crime data, enabling people to avoid crime hotspots as they walk or cycle through the city.

Since the introduction of the open data policy – which has contributed to a comprehensive crime reduction and social support strategy – high-impact crime in the city has decreased by 53%, and 43% of Mexico City residents now consider the city to be a safe place…(More)”.

Open Data on GitHub: Unlocking the Potential of AI


Paper by Anthony Cintron Roman, Kevin Xu, Arfon Smith, Jehu Torres Vega, Caleb Robinson, Juan M Lavista Ferres: “GitHub is the world’s largest platform for collaborative software development, with over 100 million users. GitHub is also used extensively for open data collaboration, hosting more than 800 million open data files, totaling 142 terabytes of data. This study highlights the potential of open data on GitHub and demonstrates how it can accelerate AI research. We analyze the existing landscape of open data on GitHub and the patterns of how users share datasets. Our findings show that GitHub is one of the largest hosts of open data in the world and has experienced an accelerated growth of open data assets over the past four years. By examining the open data landscape on GitHub, we aim to empower users and organizations to leverage existing open datasets and improve their discoverability — ultimately contributing to the ongoing AI revolution to help address complex societal issues. We release the three datasets that we have collected to support this analysis as open datasets at this https URL…(More)”

How Does Data Access Shape Science?


Paper by Abhishek Nagaraj & Matteo Tranchero: “This study examines the impact of access to confidential administrative data on the rate, direction, and policy relevance of economics research. To study this question, we exploit the progressive geographic expansion of the U.S. Census Bureau’s Federal Statistical Research Data Centers (FSRDCs). FSRDCs boost data diffusion, help empirical researchers publish more articles in top outlets, and increase citation-weighted publications. Besides direct data usage, spillovers to non-adopters also drive this effect. Further, citations to exposed researchers in policy documents increase significantly. Our findings underscore the importance of data access for scientific progress and evidence-based policy formulation…(More)”.

Critical factors influencing information disclosure in public organisations


Paper by Francisca Tejedo-Romero & Joaquim Filipe Ferraz Esteves Araujo: “Open government initiatives around the world and the passage of freedom of information laws are opening public organisations through information disclosure to ensure transparency and encourage citizen participation and engagement. At the municipal level, social, economic, and political factors are found to account for this trend. However, the findings on this issue are inconclusive and may differ from country to country. This paper contributes to this discussion by analysing a unitary country where the same set of laws and rules governs the constituent municipalities. It seeks to identify critical factors that affect the disclosure of municipal information. For this purpose, a longitudinal study was carried out over a period of 4 years using panel data methodology. The main conclusions seem to point to municipalities’ intention to increase the dissemination of information to reduce low levels of voter turnout and increase civic involvement and political participation. Municipalities governed by leftist parties and those that have high indebtedness are most likely to disclose information. Additionally, internet access has created new opportunities for citizens to access information, which exerts pressure for greater dissemination of information by municipalities. These findings are important to practitioners because they indicate the need to improve citizens’ access to the Internet and maintain information disclosure strategies beyond election periods…(More)”.

Towards High-Value Datasets determination for data-driven development: a systematic literature review


Paper by Anastasija Nikiforova, Nina Rizun, Magdalena Ciesielska, Charalampos Alexopoulos, and Andrea Miletič: “The OGD is seen as a political and socio-economic phenomenon that promises to promote civic engagement and stimulate public sector innovations in various areas of public life. To bring the expected benefits, data must be reused and transformed into value-added products or services. This, in turn, sets another precondition for data that are expected to not only be available and comply with open data principles, but also be of value, i.e., of interest for reuse by the end-user. This refers to the notion of ‘high-value dataset’ (HVD), recognized by the European Data Portal as a key trend in the OGD area in 2022. While there is a progress in this direction, e.g., the Open Data Directive, incl. identifying 6 key categories, a list of HVDs and arrangements for their publication and re-use, they can be seen as ‘core’ / ‘base’ datasets aimed at increasing interoperability of public sector data with a high priority, contributing to the development of a more mature OGD initiative. Depending on the specifics of a region and country – geographical location, social, environmental, economic issues, cultural characteristics, (under)developed sectors and market specificities, more datasets can be recognized as of high value for a particular country. However, there is no standardized approach to assist chief data officers in this. In this paper, we present a systematic review of existing literature on the HVD determination, which is expected to form an initial knowledge base for this process, incl. used approaches and indicators to determine them, data, stakeholders…(More)”.

For chemists, the AI revolution has yet to happen


Editorial Team at Nature: “Many people are expressing fears that artificial intelligence (AI) has gone too far — or risks doing so. Take Geoffrey Hinton, a prominent figure in AI, who recently resigned from his position at Google, citing the desire to speak out about the technology’s potential risks to society and human well-being.

But against those big-picture concerns, in many areas of science you will hear a different frustration being expressed more quietly: that AI has not yet gone far enough. One of those areas is chemistry, for which machine-learning tools promise a revolution in the way researchers seek and synthesize useful new substances. But a wholesale revolution has yet to happen — because of the lack of data available to feed hungry AI systems.

Any AI system is only as good as the data it is trained on. These systems rely on what are called neural networks, which their developers teach using training data sets that must be large, reliable and free of bias. If chemists want to harness the full potential of generative-AI tools, they need to help to establish such training data sets. More data are needed — both experimental and simulated — including historical data and otherwise obscure knowledge, such as that from unsuccessful experiments. And researchers must ensure that the resulting information is accessible. This task is still very much a work in progress…(More)”.

What do data portals do? Tracing the politics of online devices for making data public


Paper by Jonathan Gray: “The past decade has seen the rise of “data portals” as online devices for making data public. They have been accorded a prominent status in political speeches, policy documents, and official communications as sites of innovation, transparency, accountability, and participation. Drawing on research on data portals around the world, data portal software, and associated infrastructures, this paper explores three approaches for studying the social life of data portals as technopolitical devices: (a) interface analysis, (b) software analysis, and (c) metadata analysis. These three approaches contribute to the study of the social lives of data portals as dynamic, heterogeneous, and contested sites of public sector datafication. They are intended to contribute to critically assessing how participation around public sector datafication is invited and organized with portals, as well as to rethinking and recomposing them…(More)”.