Smart Rural: The Open Data Gap


Paper by Johanna Walker et al: “The smart city paradigm has underpinned a great deal of thevuse and production of open data for the benefit of policymakers and citizens. This paper posits that this further enhances the existing urban rural divide. It investigates the availability and use of rural open data along two parameters: pertaining to rural populations, and to key parts of the rural economy (agriculture, fisheries and forestry). It explores the relationship between key statistics of national / rural economies and rural open data; and the use and users of rural open data where it is available. It finds that although countries with more rural populations are not necessarily earlier in their Open Data Maturity journey, there is still a lack of institutionalisation of open data in rural areas; that there is an apparent gap between the importance of agriculture to a country’s GDP and the amount of agricultural data published openly; and lastly, that the smart
city paradigm cannot simply be transferred to the rural setting. It suggests instead the adoption of the emerging ‘smart region’ paradigm as that most likely to support the specific data needs of rural areas….(More)”.

Emerging models of data governance in the age of datafication


Paper by Marina Micheli et al: “The article examines four models of data governance emerging in the current platform society. While major attention is currently given to the dominant model of corporate platforms collecting and economically exploiting massive amounts of personal data, other actors, such as small businesses, public bodies and civic society, take also part in data governance. The article sheds light on four models emerging from the practices of these actors: data sharing pools, data cooperatives, public data trusts and personal data sovereignty. We propose a social science-informed conceptualisation of data governance. Drawing from the notion of data infrastructure we identify the models as a function of the stakeholders’ roles, their interrelationships, articulations of value, and governance principles. Addressing the politics of data, we considered the actors’ competitive struggles for governing data. This conceptualisation brings to the forefront the power relations and multifaceted economic and social interactions within data governance models emerging in an environment mainly dominated by corporate actors. These models highlight that civic society and public bodies are key actors for democratising data governance and redistributing value produced through data. Through the discussion of the models, their underpinning principles and limitations, the article wishes to inform future investigations of socio-technical imaginaries for the governance of data, particularly now that the policy debate around data governance is very active in Europe….(More)”.

Models and Modeling in the Sciences: A Philosophical Introduction


Book by Stephen M. Downes: “Biologists, climate scientists, and economists all rely on models to move their work forward. In this book, Stephen M. Downes explores the use of models in these and other fields to introduce readers to the various philosophical issues that arise in scientific modeling. Readers learn that paying attention to models plays a crucial role in appraising scientific work. 

This book first presents a wide range of models from a number of different scientific disciplines. After assembling some illustrative examples, Downes demonstrates how models shed light on many perennial issues in philosophy of science and in philosophy in general. Reviewing the range of views on how models represent their targets introduces readers to the key issues in debates on representation, not only in science but in the arts as well. Also, standard epistemological questions are cast in new and interesting ways when readers confront the question, “What makes for a good (or bad) model?”…(More)’.

Ethical Challenges and Opportunities Associated With the Ability to Perform Medical Screening From Interactions With Search Engines


Viewpoint by Elad Yom-Tov and Yuval Cherlow: “Recent research has shown the efficacy of screening for serious medical conditions from data collected while people interact with online services. In particular, queries to search engines and the interactions with them were shown to be advantageous for screening a range of conditions including diabetes, several forms of cancer, eating disorders, and depression. These screening abilities offer unique advantages in that they can serve a broad strata of the society, including people in underserved populations and in countries with poor access to medical services. However, these advantages need to be balanced against the potential harm to privacy, autonomy, and nonmaleficence, which are recognized as the cornerstones of ethical medical care. Here, we discuss these opportunities and challenges, both when collecting data to develop online screening services and when deploying them. We offer several solutions that balance the advantages of these services with the ethical challenges they pose….(More)”.

US Government Guide to Global Sharing of Personal Information


Book by IAPP: “The Guide to U.S. Government Practice on Global Sharing of Personal Information, Third Edition is a reference tool on U.S. government practice in G2G-sharing arrangements. The third edition contains new agreements, including the U.S.-U.K. Cloud Act Agreement, EU-U.S. Umbrella Agreement, United States-Mexico-Canada Agreement, and EU-U.S. Privacy Shield framework. This book examines those agreements as a way of establishing how practice has evolved. In addition to reviewing past agreements, international privacy principles of the Organization for Economic Co-operation and Development and Asian-Pacific Economic Cooperation will be reviewed for their relevance to G2G sharing. The guide is intended for lawyers, privacy professionals and individuals who wish to understand U.S. practice for sharing personal information across borders….(More)”.

Monitoring global digital gender inequality using the online populations of Facebook and Google


Paper by Ridhi Kashyap, Masoomali Fatehkia, Reham Al Tamime, and Ingmar Weber: “Background: In recognition of the empowering potential of digital technologies, gender equality in internet access and digital skills is an important target in the United Nations (UN) Sustainable Development Goals (SDGs). Gender-disaggregated data on internet use are limited, particularly in less developed countries.

Objective: We leverage anonymous, aggregate data on the online populations of Google and Facebook users available from their advertising platforms to fill existing data gaps and measure global digital gender inequality.

Methods: We generate indicators of country-level gender gaps on Google and Facebook. Using these online indicators independently and in combination with offline development indicators, we build regression models to predict gender gaps in internet use and digital skills computed using available survey data from the International Telecommunications Union (ITU).

Results: We find that women are significantly underrepresented in the online populations of Google and Facebook in South Asia and sub-Saharan Africa. These platform-specific gender gaps are a strong predictor that women lack internet access and basic digital skills in these populations. Comparing platforms, we find Facebook gender gap indicators perform better than Google indicators at predicting ITU internet use and low-level digital-skill gender gaps. Models using these online indicators outperform those using only offline development indicators. The best performing models, however, are those that combine Facebook and Google online indicators with a country’s development indicators such as the Human Development Index….(More)”.

The economics of Business to Government data sharing


Paper by Bertin Martens and Nestor Duch Brown: “Data and information are fundamental pieces for effective evidence-based policy making and provision of public services. In recent years, some private firms have been collecting large amounts of data, which, were they available to governments, could greatly improve their capacity to take better policy decisions and to increase social welfare. Business-to-Government (B2G) data sharing can result in substantial benefits for society. It can save costs to governments by allowing them to benefit from the use of data collected by businesses without having to collect the same data again. Moreover, it can support the production of new and innovative outputs based on the shared data by different users. Finally, the data available to government may give only an incomplete or even biased picture, while aggregating complementary datasets shared by different parties (including businesses) may result in improved policies with strong social welfare benefits.


The examples assembled by the High Level Expert Group on B2G data sharing show that most of the current B2G data transactions remain one-off experimental pilot projects that do not seem to be sustainable over time. Overall, the volume of B2G operations still seems to be relatively small and clearly sub-optimal from a social welfare perspective. The market does not seem to scale compared to the economic potential for welfare gains in society. There are likely to be significant potential economic benefits from additional B2G data sharing operations. These could be enabled by measures that would seek to improve their governance conditions to contribute to increase the overall number of transactions. To design such measures, it is important to understand the nature of the current barriers for B2G data sharing operations. In this paper, we focus on the more important barriers from an economic perspective: (a) monopolistic data markets, (b) high transaction costs and perceived risks in data sharing and (c) a lack of incentives for private firms to contribute to the production of public benefits. The following reflections are mainly conceptual, since there is currently little quantitative empirical evidence on the different aspects of B2G transactions.

  • Monopolistic data markets. Some firms -like big tech companies for instance- may be in a privileged position as the exclusive providers of the type of data that a public body seeks to access. This position enables the firms to charge a high price for the data beyond a reasonable rate of return on costs. While a monopolistic market is still a functioning market, the resulting price may lead to some governments not being able or willing to purchase the data and therefore may cause social welfare losses. Nonetheless, monopolistic pricing may still be justified from an innovation perspective: it strengthens incentives to invest in more and better data collection systems and thereby increases the supply of data in the long run. In some cases, the data seller may be in a position to price-discriminate between commercial buyers and a public body, charging a lower price to the latter since the data would not be used for commercial purposes.
  • High transaction costs and perceived risks. An important barrier for data sharing comes from the ex-ante costs related to finding a suitable data sharing partner, negotiating a contractual arrangement, re-formatting and cleaning the data, among others. Potentially interested public bodies may not be aware of available datasets or may not be in a position to handle them or understand their advantages and disadvantages. There may also be ex-post risks related to uncertainties in the quality and/or usefulness of the data, the technical implementation of the data sharing deal, ensuring compliance with the agreed conditions, the risk of data leaks to unauthorized third-parties and exposure of personal and confidential data.
  • Lack of incentives. Firms may be reluctant to share data with governments because it might have a negative impact on them. This could be due to suspicions that the data delivered might be used to implement market regulations and to enforce competition rules that could negatively affect firms’ profits. Moreover, if firms share data with government under preferential conditions, they may have difficulties justifying the foregone profit to shareholders, since the benefits generated by better policies or public services fuelled by the private data will occur to society as a whole and are often difficult to express in monetary terms. Finally, firms might be afraid of entering into a competitive disadvantage if they provide data to public bodies – perhaps under preferential conditions – and their competitors do not.

Several mechanisms could be designed to solve the barriers that may be holding back B2G data sharing initiatives. One would be to provide stronger incentives for the data supplier firm to engage in this type of transactions. These incentives can be direct, i.e., monetary, or indirect, i.e., reputational (e.g. as part of corporate social responsibility programmes). Another way would be to ascertain the data transfer by making the transaction mandatory, with a fair cost compensation. An intermediate way would be based on solutions that seek to facilitate voluntary B2G operations without mandating them, for example by reducing the transaction costs and perceived risks for the provider data supplier, e.g. by setting up trusted data intermediary platforms, or appropriate contractual provisions. A possible EU governance framework for B2G data sharing operations could cover these options….(More)”.

Bringing Structure and Design to Data Governance


Report by John Wilbanks et al: “Before COVID-19 took over the world, the Governance team at Sage Bionetworks had started working on an analysis of data governance structures and systems to be published as a “green paper” in late 2020. Today we’re happy to publicly release that paper, Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report.

In the paper, we provide a landscape analysis of models of governance for open data sharing based on our observations in the biomedical sciences. We offer an overview of those observations and show areas where we think this work can expand to supply further support for open data sharing outside the sciences.

The central argument of this paper is that the “right” system of governance is determined by first understanding the nature of the collaborative activities intended. These activities map to types of governance structures, which in turn can be built out of standardized parts — what we call governance design patterns. In this way, governance for data science can be easy to build, follow key laws and ethics regimes, and enable innovative models of collaboration. We provide an initial survey of structures and design patterns, as well as examples of how we leverage this approach to rapidly build out ethics-centered governance in biomedical research.

While there is no one-size-fits-all solution, we argue for learning from ongoing data science collaborations and building on from existing standards and tools. And in so doing, we argue for data governance as a discipline worthy of expertise, attention, standards, and innovation.

We chose to call this report a “green paper” in recognition of its maturity and coverage: it’s a snapshot of our data governance ecosystem in biomedical research, not the world of all data governance, and the entire field of data governance is in its infancy. We have licensed the paper under CC-BY 4.0 and published it in github via Manubot in hopes that the broader data governance community might fill in holes we left, correct mistakes we made, add references and toolkits and reference implementations, and generally treat this as a framework for talking about how we share data…(More)”.

The open source movement takes on climate data


Article by Heather Clancy: “…many companies are moving to disclose “climate risk,” although far fewer are moving to actually minimize it. And as those tasked with preparing those reports can attest, the process of gathering the data for them is frustrating and complex, especially as the level of detail desired and required by investors becomes deeper.

That pain point was the inspiration for a new climate data project launched this week that will be spearheaded by the Linux Foundation, the nonprofit host organization for thousands of the most influential open source software and data initiatives in the world such as GitHub. The foundation is central to the evolution of the Linux software that runs in the back offices of most major financial services firms. 

There are four powerful founding members for the new group, the LF Climate Finance Foundation (LFCF): Insurance and asset management company Allianz, cloud software giants Amazon and Microsoft, and data intelligence powerhouse S&P Global. The foundation’s “planning team” includes World Wide Fund for Nature (WWF), Ceres and the Sustainability Account Standards Board (SASB).

The group’s intention is to collaborate on an open source project called the OS-Climate platform, which will include economic and physical risk scenarios that investors, regulators, companies, financial analysts and others can use for their analysis. 

The idea is to create a “public service utility” where certain types of climate data can be accessed easily, then combined with other, more proprietary information that someone might be using for risk analysis, according to Truman Semans, CEO of OS-Climate, who was instrumental in getting the effort off the ground. “There are a whole lot of initiatives out there that address pieces of the puzzle, but no unified platform to allow those to interoperate,” he told me. There are a whole lot of initiatives out there that address pieces of the puzzle, but no unified platform to allow those to interoperate.

Why does this matter? It helps to understand the history of open source software, which was once a thing that many powerful software companies, notably Microsoft, abhorred because they were worried about the financial hit on their intellectual property. Flash forward to today and the open source software movement, “staffed” by literally millions of software developers, is credited with accelerating the creation of common system-level elements so that companies can focus their own resources on solving problems directly related to their business.

In short, this budding effort could make the right data available more quickly, so that businesses — particularly financial institutions — can make better informed decisions.

Or, as Microsoft’s chief intellectual property counsel, Jennifer Yokoyama, observed in the announcement press release: “Addressing climate issues in a meaningful way requires people and organizations to have access to data to better understand the impact of their actions. Opening up and sharing our contribution of significant and relevant sustainability data through the LF Climate Finance Foundation will help advance the financial modeling and understanding of climate change impact — an important step in affecting political change. We’re excited to collaborate with the other founding members and hope additional organizations will join.”…(More)”

Prioritizing COVID-19 tests based on participatory surveillance and spatial scanning


Paper by O.B Leal-Neto et al: “Participatory surveillance has shown promising results from its conception to its application in several public health events. The use of a collaborative information pathway provides a rapid way for the data collection on symptomatic individuals in the territory, to complement traditional health surveillance systems. In Brazil, this methodology has been used at the national level since 2014 during mass gatherings events since they have great importance for monitoring public health emergencies.

With the occurrence of the COVID-19 pandemic, and the limitation of the main non-pharmaceutical interventions for epidemic control – in this case, testing and social isolation – added to the challenge of existing underreporting of cases and delay of notifications, there is a demand on alternative sources of up to date information to complement the current system for disease surveillance. Several studies have demonstrated the benefits of participatory surveillance in coping with COVID-19, reinforcing the opportunity to modernize the way health surveillance has been carried out. Additionally, spatial scanning techniques have been used to understand syndromic scenarios, investigate outbreaks, and analyze epidemiological risk, constituting relevant tools for health management. While there are limitations in the quality of traditional health systems, the data generated by participatory surveillance reveals an interesting application combining traditional techniques to clarify epidemiological risks that demand urgency in decision-making. Moreover, with the limitations of testing available, the identification of priority areas for intervention is an important activity in the early response to public health emergencies. This study aimed to describe and analyze priority areas for COVID-19 testing combining data from participatory surveillance and traditional surveillance for respiratory syndromes….(More)”.