Paper by Katharine G. Abraham: “The infrastructure and methods for developed countries’ economic statistics, largely established in the mid-20th century, rest almost entirely on survey and administrative data. The increasing difficulty of obtaining survey responses threatens the sustainability of this model. Meanwhile, users of economic data are demanding ever more timely and granular information. “Big data” originally created for other purposes offer the promise of new approaches to the compilation of economic data. Drawing primarily on the U.S. experience, the paper considers the challenges to incorporating big data into the ongoing production of official economic statistics and provides examples of progress towards that goal to date. Beyond their value for the routine production of a standard set of official statistics, new sources of data create opportunities to respond more nimbly to emerging needs for information. The concluding section of the paper argues that national statistical offices should expand their mission to seize these opportunities…(More)”.
Data Spaces: Design, Deployment and Future Directions
Open access book edited by Edward Curry, Simon Scerri, and Tuomo Tuikka: “…aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.
The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively.
- The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.
- The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.
- The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing…(More)”.
A ‘Feminist’ Server to Help People Own Their Own Data
Article by Padmini Ray Murray: “All of our digital lives reside on servers – mostly in corporate server farms owned by the likes of Google, Amazon, Apple, and Microsoft. These farms contain machines that store massive volumes of data generated by every single user of the internet. These vast infrastructures allow people to store, connect, and exchange information on the internet.
Consequently, there is a massive distance between users and where and how the data is stored, which means that individuals have very little control over how their data is stored and used. However, due to the huge reliance on these massive corporate technologies, individuals are left with very little choice but to accept the terms dictated by these businesses. The conceptual alternative of the feminist server was created by groups of feminist and queer activists who were concerned about how little power they have over owning and managing their data on the internet. The idea of the feminist server was described as a project that is interested in “creating a more autonomous infrastructure to ensure that data, projects and memory of feminist groups are properly accessible, preserved and managed” – a safe digital library to store and manage content generated by feminist groups. This was also a direct challenge to the traditionally male-dominated spaces of computer hardware management, spaces which could be very exclusionary and hostile to women or queer individuals who might be interested in learning how to use these technologies.
There are two related ways by which a server can be considered as feminist. The first is based on who runs the server, and the second is based on who owns the server. Feminist critics have pointed out how the running of servers is often in the hands of male experts who are not keen to share and explain the knowledge required to maintain a server – a role known as a systems admin or, colloquially, a “sysadmin” person. Thus the concept of feminist servers emerged out of a need to challenge patriarchal dominance in hardware and infrastructure spaces, to create alternatives that were nurturing, anti-capitalist, and worked on the basis of community and solidarity…(More)”.
New WHO policy requires sharing of all research data
Press release: “Science and public health can benefit tremendously from sharing and reuse of health data. Sharing data allows us to have the fullest possible understanding of health challenges, to develop new solutions, and to make decisions using the best available evidence.
The Research for Health department has helped spearhead the launch of a new policy from the Science Division which covers all research undertaken by or with support from WHO. The goal is to make sure that all research data is shared equitably, ethically and efficiently. Through this policy, WHO indicates its commitment to transparency in order to reach the goal of one billion more people enjoying better health and well-being.
The WHO policy is accompanied by practical guidance to enable researchers to develop and implement a data management and sharing plan, before the research has even started. The guide provides advice on the technical, ethical and legal considerations to ensure that data, even patient data, can be shared for secondary analysis without compromising personal privacy. Data sharing is now a requirement for research funding awarded by WHO and TDR.
“We have seen the problems caused by the lack of data sharing on COVID-19,” said Dr. Soumya Swaminathan, WHO Chief Scientist. “When data related to research activities are shared ethically, equitably and efficiently, there are major gains for science and public health.”
The policy to share data from all research funded or conducted by WHO, and practical guidance to do so, can be found here…(More)”.
The Public Good and Public Attitudes Toward Data Sharing Through IoT
Paper by Karen Mossberger, Seongkyung Cho and Pauline Cheong: “The Internet of Things has created a wealth of new data that is expected to deliver important benefits for IoT users and for society, including for the public good. Much of the literature has focused on data collection through individual adoption of IoT devices, and big data collection by companies with accompanying fears of data misuse. While citizens also increasingly produce data as they move about in public spaces, less is known about citizen support for data collection in smart city environments, or for data sharing for a variety of public-regarding purposes. Through a nationally representative survey of over 2,000 respondents as well as interviews, we explore the willingness of citizens to share their data with different parties and in various circumstances, using the contextual integrity framework, the literature on the ‘publicness’ of organizations, and public value creation. We describe the results of the survey across different uses, for data sharing from devices and for data collection in public spaces. We conduct multivariate regression to predict individual characteristics that influence attitudes toward use of IoT data for public purposes. Across different contexts, from half to 2/3 of survey respondents were willing to share data from their own IoT devices for public benefits, and 80-93% supported the use of sensors in public places for a variety of collective benefits. Yet government is less trusted with this data than other organizations with public purposes, such as universities, nonprofits and health care institutions. Trust in government, among other factors, was significantly related to data sharing and support for smart city data collection. Cultivating trust through transparent and responsible data stewardship will be important for future use of IoT data for public good…(More)”.
Trust Based Resolving of Conflicts for Collaborative Data Sharing in Online Social Networks
Paper by Nisha P. Shetty et al: “Twenty-first century, the era of Internet, social networking platforms like Facebook and Twitter play a predominant role in everybody’s life. Ever increasing adoption of gadgets such as mobile phones and tablets have made social media available all times. This recent surge in online interaction has made it imperative to have ample protection against privacy breaches to ensure a fine grained and a personalized data publishing online. Privacy concerns over communal data shared amongst multiple users are not properly addressed in most of the social media. The proposed work deals with effectively suggesting whether or not to grant access to the data which is co-owned by multiple users. Conflicts in such scenario are resolved by taking into consideration the privacy risk and confidentiality loss observed if the data is shared. For secure sharing of data, a trust framework based on the user’s interest and interaction parameters is put forth. The proposed work can be extended to any data sharing multiuser platform….(More)”.
Uncovering the genetic basis of mental illness requires data and tools that aren’t just based on white people
Article by Hailiang Huang: “Mental illness is a growing public health problem. In 2019, an estimated 1 in 8 people around the world were affected by mental disorders like depression, schizophrenia or bipolar disorder. While scientists have long known that many of these disorders run in families, their genetic basis isn’t entirely clear. One reason why is that the majority of existing genetic data used in research is overwhelmingly from white people.
In 2003, the Human Genome Project generated the first “reference genome” of human DNA from a combination of samples donated by upstate New Yorkers, all of whom were of European ancestry. Researchers across many biomedical fields still use this reference genome in their work. But it doesn’t provide a complete picture of human genetics. Someone with a different genetic ancestry will have a number of variations in their DNA that aren’t captured by the reference sequence.
When most of the world’s ancestries are not represented in genomic data sets, studies won’t be able to provide a true representation of how diseases manifest across all of humanity. Despite this, ancestral diversity in genetic analyses hasn’t improved in the two decades since the Human Genome Project announced its first results. As of June 2021, over 80% of genetic studies have been conducted on people of European descent. Less than 2% have included people of African descent, even though these individuals have the most genetic variation of all human populations.
To uncover the genetic factors driving mental illness, I, Sinéad Chapman and our colleagues at the Broad Institute of MIT and Harvard have partnered with collaborators around the world to launch Stanley Global, an initiative that seeks to collect a more diverse range of genetic samples from beyond the U.S. and Northern Europe, and train the next generation of researchers around the world. Not only does the genetic data lack diversity, but so do the tools and techniques scientists use to sequence and analyze human genomes. So we are implementing a new sequencing technology that addresses the inadequacies of previous approaches that don’t account for the genetic diversity of global populations…(More).
Measuring Small Business Dynamics and Employment with Private-Sector Real-Time Data
Paper by André Kurmann, Étienne Lalé and Lien Ta: “The COVID-19 pandemic has led to an explosion of research using private-sector datasets to measure business dynamics and employment in real-time. Yet questions remain about the representativeness of these datasets and how to distinguish business openings and closings from sample churn – i.e., sample entry of already operating businesses and sample exits of businesses that continue operating. This paper proposes new methods to address these issues and applies them to the case of Homebase, a real-time dataset of mostly small service-sector sector businesses that has been used extensively in the literature to study the effects of the pandemic. We match the Homebase establishment records with information on business activity from Safegraph, Google, and Facebook to assess the representativeness of the data and to estimate the probability of business closings and openings among sample exits and entries. We then exploit the high frequency / geographic detail of the data to study whether small service-sector businesses have been hit harder by the pandemic than larger firms, and the extent to which the Paycheck Protection Program (PPP) helped small businesses keep their workforce employed. We find that our real-time estimates of small business dynamics and employment during the pandemic are remarkably representative and closely fit population counterparts from administrative data that have recently become available. Distinguishing business closings and openings from sample churn is critical for these results. We also find that while employment by small businesses contracted more severely in the beginning of the pandemic than employment of larger businesses, it also recovered more strongly thereafter. In turn, our estimates suggests that the rapid rollout of PPP loans significantly mitigated the negative employment effects of the pandemic. Business closings and openings are a key driver for both results, thus underlining the importance of properly correcting for sample churn…(More)”.
One Data Point Can Beat Big Data
Essay by Gerd Gigerenzer: “…In my research group at the Max Planck Institute for Human Development, we’ve studied simple algorithms (heuristics) that perform well under volatile conditions. One way to derive these rules is to rely on psychological AI: to investigate how the human brain deals with situations of disruption and change. Back in 1838, for instance, Thomas Brown formulated the Law of Recency, which states that recent experiences come to mind faster than those in the distant past and are often the sole information that guides human decision. Contemporary research indicates that people do not automatically rely on what they recently experienced, but only do so in unstable situations where the distant past is not a reliable guide for the future. In this spirit, my colleagues and I developed and tested the following “brain algorithm”:
Recency heuristic for predicting the flu: Predict that this week’s proportion of flu-related doctor visits will equal those of the most recent data, from one week ago.
Unlike Google’s secret Flu Trends algorithm, this rule is transparent and can be easily applied by everyone. Its logic can be understood. It relies on a single data point only, which can be looked up on the website of the Center for Disease Control. And it dispenses with combing through 50 million search terms and trial-and-error testing of millions of algorithms. But how well does it actually predict the flu?
Three fellow researchers and I tested the recency rule using the same eight years of data on which Google Flu Trends algorithm was tested, that is, weekly observations between March 2007 and August 2015. During that time, the proportion of flu-related visits among all doctor visits ranged between one percent and eight percent, with an average of 1.8 percent visits per week (Figure 1). This means that if every week you were to make the simple but false prediction that there are zero flu-related doctor visits, you would have a mean absolute error of 1.8 percentage points over four years. Google Flu Trends predicted much better than that, with a mean error of 0.38 percentage points (Figure 2). The recency heuristic had a mean error of only 0.20 percentage points, which is even better. If we exclude the period where the swine flu happened, that is before the first update of Google Flu Trends, the result remains essentially the same (0.38 and 0.19, respectively)….(More)”.
Nowcasting daily population displacement in Ukraine through social media advertising data
Pre-Publication Paper by Douglas R. Leasure et al: “In times of crisis, real-time data mapping population displacements are invaluable for targeted humanitarian response. The Russian invasion of Ukraine on February 24, 2022 forcibly displaced millions of people from their homes including nearly 6m refugees flowing across the border in just a few weeks, but information was scarce regarding displaced and vulnerable populations who remained inside Ukraine. We leveraged near real-time social media marketing data to estimate sub-national population sizes every day disaggregated by age and sex. Our metric of internal displacement estimated that 5.3m people had been internally displaced away from their baseline administrative region by March 14. Results revealed four distinct displacement patterns: large scale evacuations, refugee staging areas, internal areas of refuge, and irregular dynamics. While this innovative approach provided one of the only quantitative estimates of internal displacement in virtual real-time, we conclude by acknowledging risks and challenges for the future…(More)”.