Evidence-Based Policymaking: A Path to Data Culture


Article by Sajana Maharjan Amatya and Pranaya Sthapit: “…The first requirement of evidence-based planning is access to a supply of timely and reliable data. In Nepal, local governments produce lots of data, but it is too often locked away in multiple information systems operated by each municipal department. Gaining access to the data in these systems can be difficult because different departments often use different, proprietary formats. These information siloes block a 360 degree view of the available data—to say nothing of issues like redundancy, duplication, and inefficiency—and they frustrate public participation in an age when citizens expect streamlined digital access.

As a first step towards solving this artificial problem of data supply, D4D helps local governments gather their data onto one unified platform to release its full potential. We think of this as creating a “data lake” in each municipality for decentralized, democratic access. Freeing access to this already-existing evidence can open the door to fundamental changes in government procedures and the development and implementation of local policies, plans, and strategies.

Among the most telling shortcomings of Nepal’s legacy data policies has been the way that political interests have held sway in the local planning process, as exemplified by the political decision to distribute equal funds to all wards regardless of their unequal needs. In a more rational system, information about population size and other socioeconomic data about relative need would be a much more important factor in the allocation of funds. The National Planning Commission, a federal agency, has even distributed guidelines to Nepal’s local governments indicating that budgets should not simply be equal from ward to ward. But in practice, municipalities tend to allocate the same budget to each of their wards because elected leaders fear they will lose votes if they don’t get an equal share. Inevitably, ignoring evidence of relative need leads to the ad hoc allocation of funds to small, fragmented initiatives that mainly focus on infrastructure while overlooking other issues.

The application of available data to the planning cycle is what evidence-based planning is all about. The key is to codify the use of data throughout the planning process. So, D4D developed a framework and guidelines for evidence-based budgeting and planning for elected officials, committee members, and concerned citizens…(More)”.

Can Mobility of Care Be Identified From Transit Fare Card Data? A Case Study In Washington D.C.


Paper by Daniela Shuman, et al: “Studies in the literature have found significant differences in travel behavior by gender on public transit that are largely attributable to household and care responsibilities falling disproportionately on women. While the majority of studies have relied on survey and qualitative data to assess “mobility of care”, we propose a novel data-driven workflow utilizing transit fare card transactions, name-based gender inference, and geospatial analysis to identify mobility of care trip making. We find that the share of women travelers trip-chaining in the direct vicinity of mobility of care places of interest is 10% – 15% higher than men….(More)”.

How to design an AI ethics board



Paper by Jonas Schuett, Anka Reuel, Alexis Carlier: “Organizations that develop and deploy artificial intelligence (AI) systems need to take measures to reduce the associated risks. In this paper, we examine how AI companies could design an AI ethics board in a way that reduces risks from AI. We identify five high-level design choices: (1) What responsibilities should the board have? (2) What should its legal structure be? (3) Who should sit on the board? (4) How should it make decisions and should its decisions be binding? (5) What resources does it need? We break down each of these questions into more specific sub-questions, list options, and discuss how different design choices affect the board’s ability to reduce risks from AI. Several failures have shown that designing an AI ethics board can be challenging. This paper provides a toolbox that can help AI companies to overcome these challenges…(More)”.

City data ecosystems between theory and practice: A qualitative exploratory study in seven European cities


Paper by Giovanni Liva, Marina Micheli, Sven Schade, Alexander Kotsev, Matteo Gori and Cristiano Codagnone: “The exponential growth of data collection opens possibilities for analyzing data to address political and societal challenges. Still, European cities are not utilizing the potential of data generated by its citizens, industries, academia, and public authorities for their public service mission. The reasons are complex and relate to an intertwined set of organizational, technological, and legal barriers, although good practices exist that could be scaled, sustained, and further developed. The article contributes to research on data-driven innovation in the public sector comparing high-level expectations on data ecosystems with actual practices of data sharing and innovation at the local and regional level. Our approach consists in triangulating the analysis of in-depth interviews with representatives of the local administrations with documents obtained from the cities. The interviews investigated the experiences and perspectives of local administrations regarding establishing a local or regional data ecosystem. The article examines experiences and obstacles to data sharing within seven administrations investigating what currently prevents the establishment of data ecosystems. The findings are summarized along three main lines. First, the limited involvement of private sector organizations as actors in local data ecosystems through emerging forms of data sharing. Second, the concern over technological aspects and the lack of attention on social or organizational issues. Third, a conceptual decision to apply a centralized and not a federated digital infrastructure…(More)”.

How a small news site built an innovative data project to visualise the impact of climate change on Uruguay’s capital


Interview by Marina Adami: “La ciudad sumergida (The submerged city), an investigation produced by Uruguayan science and technology news site Amenaza Roboto, is one of the winners of this year’s Sigma Awards for data journalism. The project uses maps of the country’s capital, Montevideo, to create impressive visualisations of the impact sea level rises are predicted to have on the city and its infrastructure. The project is a first of its kind for Uruguay, a small South American country in which data journalism is still a novelty. It is also a good example of a way news outlets can investigate and communicate the disastrous effects of climate change in local communities. 

I spoke to Miguel Dobrich, a journalist, educator and digital entrepreneur who worked on the project together with colleagues Gabriel FaríasNatalie Aubet and Nahuel Lamas, to find out what lessons other outlets can take from this project and from Amenaza Roboto’s experiments with analysing public data, collaborating with scientists, and keeping the focus on their communities….(More)”

Towards High-Value Datasets determination for data-driven development: a systematic literature review


Paper by Anastasija Nikiforova, Nina Rizun, Magdalena Ciesielska, Charalampos Alexopoulos, and Andrea Miletič: “The OGD is seen as a political and socio-economic phenomenon that promises to promote civic engagement and stimulate public sector innovations in various areas of public life. To bring the expected benefits, data must be reused and transformed into value-added products or services. This, in turn, sets another precondition for data that are expected to not only be available and comply with open data principles, but also be of value, i.e., of interest for reuse by the end-user. This refers to the notion of ‘high-value dataset’ (HVD), recognized by the European Data Portal as a key trend in the OGD area in 2022. While there is a progress in this direction, e.g., the Open Data Directive, incl. identifying 6 key categories, a list of HVDs and arrangements for their publication and re-use, they can be seen as ‘core’ / ‘base’ datasets aimed at increasing interoperability of public sector data with a high priority, contributing to the development of a more mature OGD initiative. Depending on the specifics of a region and country – geographical location, social, environmental, economic issues, cultural characteristics, (under)developed sectors and market specificities, more datasets can be recognized as of high value for a particular country. However, there is no standardized approach to assist chief data officers in this. In this paper, we present a systematic review of existing literature on the HVD determination, which is expected to form an initial knowledge base for this process, incl. used approaches and indicators to determine them, data, stakeholders…(More)”.

Global Data Stewardship


On-line Course by Stefaan G. Verhulst: “Creating a systematic and sustainable data access program is critical for data stewardship. What you do with your data, how you reuse it, and how you make it available to the general public can help others reimagine what’s possible for data sharing and cross-sector data collaboration. In this course, instructor Stefaan Verhulst shows you how to develop and manage data reuse initiatives as a competent and responsible global data steward.

Following the insights of current research and practical, real-world examples, learn about the growing importance of data stewardship, data supply, and data demand to understand the value proposition and societal case for data reuse. Get tips on designing and implementing data collaboration models, governance framework, and infrastructure, as well as best practices for measuring, sunsetting, and supporting data reuse initiatives. Upon completing this course, you’ll be ready to start pushing your new skill set and continue your data stewardship learning journey….(More)”

For chemists, the AI revolution has yet to happen


Editorial Team at Nature: “Many people are expressing fears that artificial intelligence (AI) has gone too far — or risks doing so. Take Geoffrey Hinton, a prominent figure in AI, who recently resigned from his position at Google, citing the desire to speak out about the technology’s potential risks to society and human well-being.

But against those big-picture concerns, in many areas of science you will hear a different frustration being expressed more quietly: that AI has not yet gone far enough. One of those areas is chemistry, for which machine-learning tools promise a revolution in the way researchers seek and synthesize useful new substances. But a wholesale revolution has yet to happen — because of the lack of data available to feed hungry AI systems.

Any AI system is only as good as the data it is trained on. These systems rely on what are called neural networks, which their developers teach using training data sets that must be large, reliable and free of bias. If chemists want to harness the full potential of generative-AI tools, they need to help to establish such training data sets. More data are needed — both experimental and simulated — including historical data and otherwise obscure knowledge, such as that from unsuccessful experiments. And researchers must ensure that the resulting information is accessible. This task is still very much a work in progress…(More)”.

The latest in homomorphic encryption: A game-changer shaping up


Article by Katharina Koerner: “Privacy professionals are witnessing a revolution in privacy technology. The emergence and maturing of new privacy-enhancing technologies that allow for data use and collaboration without sharing plain text data or sending data to a central location are part of this revolution.

The United Nations, the Organisation for Economic Co-operation and Development, the U.S. White House, the European Union Agency for Cybersecurity, the UK Royal Society, and Singapore’s media and privacy authorities all released reports, guidelines and regulatory sandboxes around the use of PETs in quick succession. We are in an era where there are high hopes for data insights to be leveraged for the public good while maintaining privacy principles and enhanced security.

A prominent example of a PET is fully homomorphic encryption, often mentioned in the same breath as differential privacy, federated learning, secure multiparty computation, private set intersection, synthetic data, zero knowledge proofs or trusted execution environments.

As FHE advances and becomes standardized, it has the potential to revolutionize the way we handle, protect and utilize personal data. Staying informed about the latest advancements in this field can help privacy pros prepare for the changes ahead in this rapidly evolving digital landscape.

Homomorphic encryption: A game changer?

FHE is a groundbreaking cryptographic technique that enables third parties to process information without revealing the data itself by running computations on encrypted data.

This technology can have far-reaching implications for secure data analytics. Requests to a databank can be answered without accessing its plain text data, as the analysis is conducted on data that remains encrypted. This adds a third layer of security for data when in use, along with protecting data at rest and in transit…(More)”.

Data Privacy and Algorithmic Inequality


Paper by Zhuang Liu, Michael Sockin & Wei Xiong: “This paper develops a foundation for a consumer’s preference for data privacy by linking it to the desire to hide behavioral vulnerabilities. Data sharing with digital platforms enhances the matching efficiency for standard consumption goods, but also exposes individuals with self-control issues to temptation goods. This creates a new form of inequality in the digital era—algorithmic inequality. Although data privacy regulations provide consumers with the option to opt out of data sharing, these regulations cannot fully protect vulnerable consumers because of data-sharing externalities. The coordination problem among consumers may also lead to multiple equilibria with drastically different levels of data sharing by consumers. Our quantitative analysis further illustrates that although data is non-rival and beneficial to social welfare, it can also exacerbate algorithmic inequality…(More)”.