Trust Based Resolving of Conflicts for Collaborative Data Sharing in Online Social Networks


Paper by Nisha P. Shetty et al: “Twenty-first century, the era of Internet, social networking platforms like Facebook and Twitter play a predominant role in everybody’s life. Ever increasing adoption of gadgets such as mobile phones and tablets have made social media available all times. This recent surge in online interaction has made it imperative to have ample protection against privacy breaches to ensure a fine grained and a personalized data publishing online. Privacy concerns over communal data shared amongst multiple users are not properly addressed in most of the social media. The proposed work deals with effectively suggesting whether or not to grant access to the data which is co-owned by multiple users. Conflicts in such scenario are resolved by taking into consideration the privacy risk and confidentiality loss observed if the data is shared. For secure sharing of data, a trust framework based on the user’s interest and interaction parameters is put forth. The proposed work can be extended to any data sharing multiuser platform….(More)”.

Uncovering the genetic basis of mental illness requires data and tools that aren’t just based on white people


Article by Hailiang Huang: “Mental illness is a growing public health problem. In 2019, an estimated 1 in 8 people around the world were affected by mental disorders like depression, schizophrenia or bipolar disorder. While scientists have long known that many of these disorders run in families, their genetic basis isn’t entirely clear. One reason why is that the majority of existing genetic data used in research is overwhelmingly from white people.

In 2003, the Human Genome Project generated the first “reference genome” of human DNA from a combination of samples donated by upstate New Yorkers, all of whom were of European ancestry. Researchers across many biomedical fields still use this reference genome in their work. But it doesn’t provide a complete picture of human genetics. Someone with a different genetic ancestry will have a number of variations in their DNA that aren’t captured by the reference sequence.

When most of the world’s ancestries are not represented in genomic data sets, studies won’t be able to provide a true representation of how diseases manifest across all of humanity. Despite this, ancestral diversity in genetic analyses hasn’t improved in the two decades since the Human Genome Project announced its first results. As of June 2021, over 80% of genetic studies have been conducted on people of European descent. Less than 2% have included people of African descent, even though these individuals have the most genetic variation of all human populations.

To uncover the genetic factors driving mental illness, ISinéad Chapman and our colleagues at the Broad Institute of MIT and Harvard have partnered with collaborators around the world to launch Stanley Global, an initiative that seeks to collect a more diverse range of genetic samples from beyond the U.S. and Northern Europe, and train the next generation of researchers around the world. Not only does the genetic data lack diversity, but so do the tools and techniques scientists use to sequence and analyze human genomes. So we are implementing a new sequencing technology that addresses the inadequacies of previous approaches that don’t account for the genetic diversity of global populations…(More).

Measuring Small Business Dynamics and Employment with Private-Sector Real-Time Data


Paper by André Kurmann, Étienne Lalé and Lien Ta: “The COVID-19 pandemic has led to an explosion of research using private-sector datasets to measure business dynamics and employment in real-time. Yet questions remain about the representativeness of these datasets and how to distinguish business openings and closings from sample churn – i.e., sample entry of already operating businesses and sample exits of businesses that continue operating. This paper proposes new methods to address these issues and applies them to the case of Homebase, a real-time dataset of mostly small service-sector sector businesses that has been used extensively in the literature to study the effects of the pandemic. We match the Homebase establishment records with information on business activity from Safegraph, Google, and Facebook to assess the representativeness of the data and to estimate the probability of business closings and openings among sample exits and entries. We then exploit the high frequency / geographic detail of the data to study whether small service-sector businesses have been hit harder by the pandemic than larger firms, and the extent to which the Paycheck Protection Program (PPP) helped small businesses keep their workforce employed. We find that our real-time estimates of small business dynamics and employment during the pandemic are remarkably representative and closely fit population counterparts from administrative data that have recently become available. Distinguishing business closings and openings from sample churn is critical for these results. We also find that while employment by small businesses contracted more severely in the beginning of the pandemic than employment of larger businesses, it also recovered more strongly thereafter. In turn, our estimates suggests that the rapid rollout of PPP loans significantly mitigated the negative employment effects of the pandemic. Business closings and openings are a key driver for both results, thus underlining the importance of properly correcting for sample churn…(More)”.

One Data Point Can Beat Big Data


Essay by Gerd Gigerenzer: “…In my research group at the Max Planck Institute for Human Development, we’ve studied simple algorithms (heuristics) that perform well under volatile conditions. One way to derive these rules is to rely on psychological AI: to investigate how the human brain deals with situations of disruption and change. Back in 1838, for instance, Thomas Brown formulated the Law of Recency, which states that recent experiences come to mind faster than those in the distant past and are often the sole information that guides human decision. Contemporary research indicates that people do not automatically rely on what they recently experienced, but only do so in unstable situations where the distant past is not a reliable guide for the future. In this spirit, my colleagues and I developed and tested the following “brain algorithm”:

Recency heuristic for predicting the flu: Predict that this week’s proportion of flu-related doctor visits will equal those of the most recent data, from one week ago.

Unlike Google’s secret Flu Trends algorithm, this rule is transparent and can be easily applied by everyone. Its logic can be understood. It relies on a single data point only, which can be looked up on the website of the Center for Disease Control. And it dispenses with combing through 50 million search terms and trial-and-error testing of millions of algorithms. But how well does it actually predict the flu?

Three fellow researchers and I tested the recency rule using the same eight years of data on which Google Flu Trends algorithm was tested, that is, weekly observations between March 2007 and August 2015. During that time, the proportion of flu-related visits among all doctor visits ranged between one percent and eight percent, with an average of 1.8 percent visits per week (Figure 1). This means that if every week you were to make the simple but false prediction that there are zero flu-related doctor visits, you would have a mean absolute error of 1.8 percentage points over four years. Google Flu Trends predicted much better than that, with a mean error of 0.38 percentage points (Figure 2). The recency heuristic had a mean error of only 0.20 percentage points, which is even better. If we exclude the period where the swine flu happened, that is before the first update of Google Flu Trends, the result remains essentially the same (0.38 and 0.19, respectively)….(More)”.

Nowcasting daily population displacement in Ukraine through social media advertising data


Pre-Publication Paper by Douglas R. Leasure et al: “In times of crisis, real-time data mapping population displacements are invaluable for targeted humanitarian response. The Russian invasion of Ukraine on February 24, 2022 forcibly displaced millions of people from their homes including nearly 6m refugees flowing across the border in just a few weeks, but information was scarce regarding displaced and vulnerable populations who remained inside Ukraine. We leveraged near real-time social media marketing data to estimate sub-national population sizes every day disaggregated by age and sex. Our metric of internal displacement estimated that 5.3m people had been internally displaced away from their baseline administrative region by March 14. Results revealed four distinct displacement patterns: large scale evacuations, refugee staging areas, internal areas of refuge, and irregular dynamics. While this innovative approach provided one of the only quantitative estimates of internal displacement in virtual real-time, we conclude by acknowledging risks and challenges for the future…(More)”.

Closing the Data Divide for a More Equitable U.S. Digital Economy


Report by Gillian Diebold: “In the United States, access to many public and private services, including those in the financial, educational, and health-care sectors, are intricately linked to data. But adequate data is not collected equitably from all Americans, creating a new challenge: the data divide, in which not everyone has enough high-quality data collected about them or their communities and therefore cannot benefit from data-driven innovation. This report provides an overview of the data divide in the United States and offers recommendations for how policymakers can address these inequalities…(More)”.

Making Government Data Publicly Available: Guidance for Agencies on Releasing Data Responsibly


Report by Hugh Grant-Chapman, and Hannah Quay-de la Vallee: “Government agencies rely on a wide range of data to effectively deliver services to the populations with which they engage. Civic-minded advocates frequently argue that the public benefits of this data can be better harnessed by making it available for public access. Recent years, however, have also seen growing recognition that the public release of government data can carry certain risks. Government agencies hoping to release data publicly should consider those potential risks in deciding which data to make publicly available and how to go about releasing it.

This guidance offers an introduction to making data publicly available while addressing privacy and ethical data use issues. It is intended for administrators at government agencies that deliver services to individuals — especially those at the state and local levels — who are interested in publicly releasing government data. This guidance focuses on challenges that may arise when releasing aggregated data derived from sensitive information, particularly individual-level data.

The report begins by highlighting key benefits and risks of making government data publicly available. Benefits include empowering members of the general public, supporting research on program efficacy, supporting the work of organizations providing adjacent services, reducing agencies’ administrative burden, and holding government agencies accountable. Potential risks include breaches of individual privacy; irresponsible uses of the data by third parties; and the possibility that the data is not used at all, resulting in wasted resources.

In light of these benefits and risks, the report presents four recommended actions for publishing government data responsibly:

  1. Establish data governance processes and roles;
  2. Engage external communities;
  3. Ensure responsible use and privacy protection; and
  4. Evaluate resource constraints.

These key considerations also take into account federal and state laws as well as emerging computational and analytical techniques for protecting privacy when releasing data, such as differential privacy techniques and synthetic data. Each of these techniques involves unique benefits and trade-offs to be considered in context of the goals of a given data release…(More)”.

U.S. Government Effort to Tap Private Weather Data Moves Along Slowly


Article by Isabelle Bousquette: “The U.S. government’s six-year-old effort to improve its weather forecasting ability by purchasing data from private-sector satellite companies has started to show results, although the process is moving more slowly than anticipated.

After a period of testing, the National Oceanic and Atmospheric Administration, a scientific, service and regulatory arm of the Commerce Department, began purchasing data from two satellite companies, Spire Global Inc. of Vienna, Va., and GeoOptics Inc. of Pasadena, Calif.

The weather data from these two companies fills gaps in coverage left by NOAA’s own satellites, the agency said. NOAA also began testing data from a third company this year.

Beyond these companies, new entrants to the field offering weather data based on a broader range of technologies have been slow to emerge, the agency said.

“We’re getting a subset of what we hoped,” said Dan St. Jean, deputy director of the Office of System Architecture and Advanced Planning at NOAA’s Satellite and Information Service.

NOAA’s weather forecasts help the government formulate hurricane evacuation plans and make other important decisions. The agency began seeking out private sources of satellite weather data in 2016. The idea was to find a more cost-effective alternative to funding NOAA’s own satellite constellations, the agency said. It also hoped to seed competition and innovation in the private satellite sector.

It isn’t yet clear whether there is a cost benefit to using private data, in part because the relatively small number of competitors in the market has made it challenging to determine a steady market price, NOAA said.

“All the signs in the nascent ‘new space’ industry indicated that there would be a plethora of venture capitalists wanting to compete for NOAA’s commercial pilot/purchase dollars. But that just never materialized,” said Mr. St. Jean…(More)”.

(Re)making data markets: an exploration of the regulatory challenges


Paper by Linnet Taylor, Hellen Mukiri-Smith, Tjaša Petročnik, Laura Savolainen & Aaron Martin: “Regulating the data market will be one of the major challenges of the twenty-first century. In order to think about regulating this market, however, we first need to make its dimensions and dynamics more accessible to observation and analysis. In this paper we explore what the state of the sociological and legal research on markets can tell us about the market for data: what kind of market it is, the practices and configurations of actors that constitute it, and what kinds of data are traded there. We start from the subjective opacity of this market to researchers interested in regulation and governance, review conflicting positions on its extent, diversity and regulability, and then explore comparisons from food and medicine regulation to understand the possible normative and practical implications and aims inherent in attempting to regulate how data is shared and traded. We conclude that there is a strong argument for a normative shift in the aims of regulation with regard to the data market, away from a prioritisation of the economic value of data and toward a more nuanced approach that aims to align the uses of data with the needs and rights of the communities reflected in it…(More)”

Forest data governance as a reflection of forest governance: Institutional change and endurance in Finland and Canada


Paper by Salla Rantala, Brent Swallow, Anu Lähteenmäki-Uutela and Riikka Paloniemi: “The rapid development of new digital technologies for natural resource management has created a need to design and update governance regimes for effective and transparent generation, sharing and use of digital natural resource data. In this paper, we contribute to this novel area of investigation from the perspective of institutional change. We develop a conceptual framework to analyze how emerging natural resource data governance is shaped by related natural resource governance; complex, multilevel systems of actors, institutions and their interplay. We apply this framework to study forest data governance and its roots in forest governance in Finland and Canada. In Finland, an emphasis on open forest data and the associated legal reform represents the instutionalization of a mixed open data-bioeconomy discourse, pushed by higher-level institutional requirements towards greater openness and shaped by changing actor dynamics in relation to diverse forest values. In Canada, a strong institutional lock-in around public-private partnerships in forest management has engendered an approach that is based on voluntary data sharing agreements and fragmented data management, conforming with the entrenched interests of autonomous sub-national actors and thus extending the path-dependence of forest governance to forest data governance. We conclude by proposing how the framework could be further developed and tested to help explain which factors condition the formation of natural resource data institutions and subsequently the (re-)distribution of benefits they govern. Transparent and efficient data approaches can be enabled only if the analysis of data institutions is given equal attention to the technological development of data solutions…(More)”.