Paper by André Kurmann, Étienne Lalé and Lien Ta: “The COVID-19 pandemic has led to an explosion of research using private-sector datasets to measure business dynamics and employment in real-time. Yet questions remain about the representativeness of these datasets and how to distinguish business openings and closings from sample churn – i.e., sample entry of already operating businesses and sample exits of businesses that continue operating. This paper proposes new methods to address these issues and applies them to the case of Homebase, a real-time dataset of mostly small service-sector sector businesses that has been used extensively in the literature to study the effects of the pandemic. We match the Homebase establishment records with information on business activity from Safegraph, Google, and Facebook to assess the representativeness of the data and to estimate the probability of business closings and openings among sample exits and entries. We then exploit the high frequency / geographic detail of the data to study whether small service-sector businesses have been hit harder by the pandemic than larger firms, and the extent to which the Paycheck Protection Program (PPP) helped small businesses keep their workforce employed. We find that our real-time estimates of small business dynamics and employment during the pandemic are remarkably representative and closely fit population counterparts from administrative data that have recently become available. Distinguishing business closings and openings from sample churn is critical for these results. We also find that while employment by small businesses contracted more severely in the beginning of the pandemic than employment of larger businesses, it also recovered more strongly thereafter. In turn, our estimates suggests that the rapid rollout of PPP loans significantly mitigated the negative employment effects of the pandemic. Business closings and openings are a key driver for both results, thus underlining the importance of properly correcting for sample churn…(More)”.
One Data Point Can Beat Big Data
Essay by Gerd Gigerenzer: “…In my research group at the Max Planck Institute for Human Development, we’ve studied simple algorithms (heuristics) that perform well under volatile conditions. One way to derive these rules is to rely on psychological AI: to investigate how the human brain deals with situations of disruption and change. Back in 1838, for instance, Thomas Brown formulated the Law of Recency, which states that recent experiences come to mind faster than those in the distant past and are often the sole information that guides human decision. Contemporary research indicates that people do not automatically rely on what they recently experienced, but only do so in unstable situations where the distant past is not a reliable guide for the future. In this spirit, my colleagues and I developed and tested the following “brain algorithm”:
Recency heuristic for predicting the flu: Predict that this week’s proportion of flu-related doctor visits will equal those of the most recent data, from one week ago.
Unlike Google’s secret Flu Trends algorithm, this rule is transparent and can be easily applied by everyone. Its logic can be understood. It relies on a single data point only, which can be looked up on the website of the Center for Disease Control. And it dispenses with combing through 50 million search terms and trial-and-error testing of millions of algorithms. But how well does it actually predict the flu?
Three fellow researchers and I tested the recency rule using the same eight years of data on which Google Flu Trends algorithm was tested, that is, weekly observations between March 2007 and August 2015. During that time, the proportion of flu-related visits among all doctor visits ranged between one percent and eight percent, with an average of 1.8 percent visits per week (Figure 1). This means that if every week you were to make the simple but false prediction that there are zero flu-related doctor visits, you would have a mean absolute error of 1.8 percentage points over four years. Google Flu Trends predicted much better than that, with a mean error of 0.38 percentage points (Figure 2). The recency heuristic had a mean error of only 0.20 percentage points, which is even better. If we exclude the period where the swine flu happened, that is before the first update of Google Flu Trends, the result remains essentially the same (0.38 and 0.19, respectively)….(More)”.
Nowcasting daily population displacement in Ukraine through social media advertising data
Pre-Publication Paper by Douglas R. Leasure et al: “In times of crisis, real-time data mapping population displacements are invaluable for targeted humanitarian response. The Russian invasion of Ukraine on February 24, 2022 forcibly displaced millions of people from their homes including nearly 6m refugees flowing across the border in just a few weeks, but information was scarce regarding displaced and vulnerable populations who remained inside Ukraine. We leveraged near real-time social media marketing data to estimate sub-national population sizes every day disaggregated by age and sex. Our metric of internal displacement estimated that 5.3m people had been internally displaced away from their baseline administrative region by March 14. Results revealed four distinct displacement patterns: large scale evacuations, refugee staging areas, internal areas of refuge, and irregular dynamics. While this innovative approach provided one of the only quantitative estimates of internal displacement in virtual real-time, we conclude by acknowledging risks and challenges for the future…(More)”.
Closing the Data Divide for a More Equitable U.S. Digital Economy
Report by Gillian Diebold: “In the United States, access to many public and private services, including those in the financial, educational, and health-care sectors, are intricately linked to data. But adequate data is not collected equitably from all Americans, creating a new challenge: the data divide, in which not everyone has enough high-quality data collected about them or their communities and therefore cannot benefit from data-driven innovation. This report provides an overview of the data divide in the United States and offers recommendations for how policymakers can address these inequalities…(More)”.
Making Government Data Publicly Available: Guidance for Agencies on Releasing Data Responsibly
Report by Hugh Grant-Chapman, and Hannah Quay-de la Vallee: “Government agencies rely on a wide range of data to effectively deliver services to the populations with which they engage. Civic-minded advocates frequently argue that the public benefits of this data can be better harnessed by making it available for public access. Recent years, however, have also seen growing recognition that the public release of government data can carry certain risks. Government agencies hoping to release data publicly should consider those potential risks in deciding which data to make publicly available and how to go about releasing it.
This guidance offers an introduction to making data publicly available while addressing privacy and ethical data use issues. It is intended for administrators at government agencies that deliver services to individuals — especially those at the state and local levels — who are interested in publicly releasing government data. This guidance focuses on challenges that may arise when releasing aggregated data derived from sensitive information, particularly individual-level data.
The report begins by highlighting key benefits and risks of making government data publicly available. Benefits include empowering members of the general public, supporting research on program efficacy, supporting the work of organizations providing adjacent services, reducing agencies’ administrative burden, and holding government agencies accountable. Potential risks include breaches of individual privacy; irresponsible uses of the data by third parties; and the possibility that the data is not used at all, resulting in wasted resources.
In light of these benefits and risks, the report presents four recommended actions for publishing government data responsibly:
- Establish data governance processes and roles;
- Engage external communities;
- Ensure responsible use and privacy protection; and
- Evaluate resource constraints.
These key considerations also take into account federal and state laws as well as emerging computational and analytical techniques for protecting privacy when releasing data, such as differential privacy techniques and synthetic data. Each of these techniques involves unique benefits and trade-offs to be considered in context of the goals of a given data release…(More)”.
U.S. Government Effort to Tap Private Weather Data Moves Along Slowly
Article by Isabelle Bousquette: “The U.S. government’s six-year-old effort to improve its weather forecasting ability by purchasing data from private-sector satellite companies has started to show results, although the process is moving more slowly than anticipated.
After a period of testing, the National Oceanic and Atmospheric Administration, a scientific, service and regulatory arm of the Commerce Department, began purchasing data from two satellite companies, Spire Global Inc. of Vienna, Va., and GeoOptics Inc. of Pasadena, Calif.
The weather data from these two companies fills gaps in coverage left by NOAA’s own satellites, the agency said. NOAA also began testing data from a third company this year.
Beyond these companies, new entrants to the field offering weather data based on a broader range of technologies have been slow to emerge, the agency said.
“We’re getting a subset of what we hoped,” said Dan St. Jean, deputy director of the Office of System Architecture and Advanced Planning at NOAA’s Satellite and Information Service.
NOAA’s weather forecasts help the government formulate hurricane evacuation plans and make other important decisions. The agency began seeking out private sources of satellite weather data in 2016. The idea was to find a more cost-effective alternative to funding NOAA’s own satellite constellations, the agency said. It also hoped to seed competition and innovation in the private satellite sector.
It isn’t yet clear whether there is a cost benefit to using private data, in part because the relatively small number of competitors in the market has made it challenging to determine a steady market price, NOAA said.
“All the signs in the nascent ‘new space’ industry indicated that there would be a plethora of venture capitalists wanting to compete for NOAA’s commercial pilot/purchase dollars. But that just never materialized,” said Mr. St. Jean…(More)”.
(Re)making data markets: an exploration of the regulatory challenges
Paper by Linnet Taylor, Hellen Mukiri-Smith, Tjaša Petročnik, Laura Savolainen & Aaron Martin: “Regulating the data market will be one of the major challenges of the twenty-first century. In order to think about regulating this market, however, we first need to make its dimensions and dynamics more accessible to observation and analysis. In this paper we explore what the state of the sociological and legal research on markets can tell us about the market for data: what kind of market it is, the practices and configurations of actors that constitute it, and what kinds of data are traded there. We start from the subjective opacity of this market to researchers interested in regulation and governance, review conflicting positions on its extent, diversity and regulability, and then explore comparisons from food and medicine regulation to understand the possible normative and practical implications and aims inherent in attempting to regulate how data is shared and traded. We conclude that there is a strong argument for a normative shift in the aims of regulation with regard to the data market, away from a prioritisation of the economic value of data and toward a more nuanced approach that aims to align the uses of data with the needs and rights of the communities reflected in it…(More)”
Forest data governance as a reflection of forest governance: Institutional change and endurance in Finland and Canada
Paper by Salla Rantala, Brent Swallow, Anu Lähteenmäki-Uutela and Riikka Paloniemi: “The rapid development of new digital technologies for natural resource management has created a need to design and update governance regimes for effective and transparent generation, sharing and use of digital natural resource data. In this paper, we contribute to this novel area of investigation from the perspective of institutional change. We develop a conceptual framework to analyze how emerging natural resource data governance is shaped by related natural resource governance; complex, multilevel systems of actors, institutions and their interplay. We apply this framework to study forest data governance and its roots in forest governance in Finland and Canada. In Finland, an emphasis on open forest data and the associated legal reform represents the instutionalization of a mixed open data-bioeconomy discourse, pushed by higher-level institutional requirements towards greater openness and shaped by changing actor dynamics in relation to diverse forest values. In Canada, a strong institutional lock-in around public-private partnerships in forest management has engendered an approach that is based on voluntary data sharing agreements and fragmented data management, conforming with the entrenched interests of autonomous sub-national actors and thus extending the path-dependence of forest governance to forest data governance. We conclude by proposing how the framework could be further developed and tested to help explain which factors condition the formation of natural resource data institutions and subsequently the (re-)distribution of benefits they govern. Transparent and efficient data approaches can be enabled only if the analysis of data institutions is given equal attention to the technological development of data solutions…(More)”.
Designing Data Spaces: The Ecosystem Approach to Competitive Advantage
Open access book edited by Boris Otto, Michael ten Hompel, and Stefan Wrobel: “…provides a comprehensive view on data ecosystems and platform economics from methodical and technological foundations up to reports from practical implementations and applications in various industries.
To this end, the book is structured in four parts: Part I “Foundations and Contexts” provides a general overview about building, running, and governing data spaces and an introduction to the IDS and GAIA-X projects. Part II “Data Space Technologies” subsequently details various implementation aspects of IDS and GAIA-X, including eg data usage control, the usage of blockchain technologies, or semantic data integration and interoperability. Next, Part III describes various “Use Cases and Data Ecosystems” from various application areas such as agriculture, healthcare, industry, energy, and mobility. Part IV eventually offers an overview of several “Solutions and Applications”, eg including products and experiences from companies like Google, SAP, Huawei, T-Systems, Innopay and many more.
Overall, the book provides professionals in industry with an encompassing overview of the technological and economic aspects of data spaces, based on the International Data Spaces and Gaia-X initiatives. It presents implementations and business cases and gives an outlook to future developments. In doing so, it aims at proliferating the vision of a social data market economy based on data spaces which embrace trust and data sovereignty…(More)”.
Identifying and addressing data asymmetries so as to enable (better) science
Paper by Stefaan Verhulst and Andrew Young: “As a society, we need to become more sophisticated in assessing and addressing data asymmetries—and their resulting political and economic power inequalities—particularly in the realm of open science, research, and development. This article seeks to start filling the analytical gap regarding data asymmetries globally, with a specific focus on the asymmetrical availability of privately-held data for open science, and a look at current efforts to address these data asymmetries. It provides a taxonomy of asymmetries, as well as both their societal and institutional impacts. Moreover, this contribution outlines a set of solutions that could provide a toolbox for open science practitioners and data demand-side actors that stand to benefit from increased access to data. The concept of data liquidity (and portability) is explored at length in connection with efforts to generate an ecosystem of responsible data exchanges. We also examine how data holders and demand-side actors are experimenting with new and emerging operational models and governance frameworks for purpose-driven, cross-sector data collaboratives that connect previously siloed datasets. Key solutions discussed include professionalizing and re-imagining data steward roles and functions (i.e., individuals or groups who are tasked with managing data and their ethical and responsible reuse within organizations). We present these solutions through case studies on notable efforts to address science data asymmetries. We examine these cases using a repurposable analytical framework that could inform future research. We conclude with recommended actions that could support the creation of an evidence base on work to address data asymmetries and unlock the public value of greater science data liquidity and responsible reuse…(More)”.