Eliminate data asymmetries to democratize data use


Article by Rahul Matthan: “Anyone who possesses a large enough store of data can reasonably expect to glean powerful insights from it. These insights are more often than not used to enhance advertising revenues or ensure greater customer stickiness. In other instances, they’ve been subverted to alter our political preferences and manipulate us into taking decisions we otherwise may not have.

The ability to generate insights places those who have access to these data sets at a distinct advantage over those whose data is contained within them. It allows the former to benefit from the data in ways that the latter may not even have thought possible when they consented to provide it. Given how easily these insights can be used to harm those to whom it pertains, there is a need to mitigate the effects of this data asymmetry.

Privacy law attempts to do this by providing data principals with tools they can use to exert control over their personal data. It requires data collectors to obtain informed consent from data principals before collecting their data and forbids them from using it for any purpose other than that which has been previously notified. This is why, even if that consent has been obtained, data fiduciaries cannot collect more data than is absolutely necessary to achieve the stated purpose and are only allowed to retain that data for as long as is necessary to fulfil the stated purpose.

In India, we’ve gone one step further and built techno-legal solutions to help reduce this data asymmetry. The Data Empowerment and Protection Architecture (DEPA) framework makes it possible to extract data from the silos in which they reside and transfer it on the instructions of the data principal to other entities, which can then use it to provide other services to the data principal. This data micro-portability dilutes the historical advantage that incumbents enjoy on account of collecting data over the entire duration of their customer engagement. It eliminates data asymmetries by establishing the infrastructure that creates a competitive market for data-based services, allowing data principals to choose from a range of options as to how their data could be used for their benefit by service providers.

This, however, is not the only type of asymmetry we have to deal with in this age of big data. In a recent article, Stefaan Verhulst of GovLab at New York University pointed out that it is no longer enough to possess large stores of data—you need to know how to effectively extract value from it. Many businesses might have vast stores of data that they have accumulated over the years they have been in operation, but very few of them are able to effectively extract useful signals from that noisy data.

Without the know-how to translate data into actionable information, merely owning a large data set is of little value.

Unlike data asymmetries, which can be mitigated by making data more widely available, information asymmetries can only be addressed by radically democratizing the techniques and know-how that are necessary for extracting value from data. This know-how is largely proprietary and hard to access even in a fully competitive market. What’s more, in many instances, the computation power required far exceeds the capacity of entities for whom data analysis is not the main purpose of their business…(More)”.

Data and displacement: Ethical and practical issues in data-driven humanitarian assistance for IDPs


Blog by Vicki Squire: “Ten years since the so-called “data revolution” (Pearn et al, 2022), the rise of “innovation” and the proliferation of “data solutions” has rendered the assessment of changing data practices within the humanitarian sector ever more urgent. New data acquisition modalities have provoked a range of controversies across multiple contexts and sites (e.g. Human Rights Watch, 20212022a2022b). Moreover, a range of concerns have been raised about data sharing (e.g. Fast, 2022) and the inequities embedded within humanitarian data (e.g. Data Values, 2022).

With this in mind, the Data and Displacement project set out to explore the practical and ethical implications of data-driven humanitarian assistance in two contexts characterised by high levels of internal displacement: north-eastern Nigeria and South Sudan. Our interdisciplinary research team includes academics from each of the regions under analysis, as well as practitioners from the International Organization for Migration. From the start, the research was designed to centre the lived experiences of Internally Displaced Persons (IDPs), while also shedding light on the production and use of humanitarian data from multiple perspectives.

We conducted primary research during 2021-2022. Our research combines dataset analysis and visualisation techniques with a thematic analysis of 174 semi-structured qualitative interviews. In total we interviewed 182 people: 42 international data experts, donors, and humanitarian practitioners from a range of governmental and non-governmental organisations; 40 stakeholders and practitioners working with IDPs across north-eastern Nigeria and South Sudan (20 in each region); and 100 IDPs in camp-like settings (50 in each region). Our findings point to a disconnect between international humanitarian standards and practices on the ground, the need to revisit existing ethical guidelines such informed consent, and the importance of investing in data literacies…(More)”.

Can Smartphones Help Predict Suicide?


Ellen Barry in The New York Times: “In March, Katelin Cruz left her latest psychiatric hospitalization with a familiar mix of feelings. She was, on the one hand, relieved to leave the ward, where aides took away her shoelaces and sometimes followed her into the shower to ensure that she would not harm herself.

But her life on the outside was as unsettled as ever, she said in an interview, with a stack of unpaid bills and no permanent home. It was easy to slide back into suicidal thoughts. For fragile patients, the weeks after discharge from a psychiatric facility are a notoriously difficult period, with a suicide rate around 15 times the national rate, according to one study.

This time, however, Ms. Cruz, 29, left the hospital as part of a vast research project which attempts to use advances in artificial intelligence to do something that has eluded psychiatrists for centuries: to predict who is likely to attempt suicide and when that person is likely to attempt it, and then, to intervene.

On her wrist, she wore a Fitbit programmed to track her sleep and physical activity. On her smartphone, an app was collecting data about her moods, her movement and her social interactions. Each device was providing a continuous stream of information to a team of researchers on the 12th floor of the William James Building, which houses Harvard’s psychology department.

In the field of mental health, few new areas generate as much excitement as machine learning, which uses computer algorithms to better predict human behavior. There is, at the same time, exploding interest in biosensors that can track a person’s mood in real time, factoring in music choices, social media posts, facial expression and vocal expression.

Matthew K. Nock, a Harvard psychologist who is one of the nation’s top suicide researchers, hopes to knit these technologies together into a kind of early-warning system that could be used when an at-risk patient is released from the hospital…(More)”.

Governing the Environment-Related Data Space


Stefaan G. Verhulst, Anthony Zacharzewski and Christian Hudson at Data & Policy: “Today, The GovLab and The Democratic Society published their report, “Governing the Environment-Related Data Space”, written by Jörn Fritzenkötter, Laura Hohoff, Paola Pierri, Stefaan G. Verhulst, Andrew Young, and Anthony Zacharzewski . The report captures the findings of their joint research centered on the responsible and effective reuse of environment-related data to achieve greater social and environmental impact.

Environment-related data (ERD) encompasses numerous kinds of data across a wide range of sectors. It can best be defined as data related to any element of the Driver-Pressure-State-Impact-Response (DPSIR) Framework. If leveraged effectively, this wealth of data could help society establish a sustainable economy, take action against climate change, and support environmental justice — as recognized recently by French President Emmanuel Macron and UN Secretary General’s Special Envoy for Climate Ambition and Solutions Michael R. Bloomberg when establishing the Climate Data Steering Committee.

While several actors are working to improve access to, as well as promote the (re)use of, ERD data, two key challenges that hamper progress on this front are data asymmetries and data enclosures. Data asymmetries occur due to the ever-increasing amounts of ERD scattered across diverse actors, with larger and more powerful stakeholders often maintaining unequal access. Asymmetries lead to problems with accessibility and findability (data enclosures), leading to limited sharing and collaboration, and stunting the ability to use data and maximize its potential to address public ills.

The risks and costs of data enclosure and data asymmetries are high. Information bottlenecks cause resources to be misallocated, slow scientific progress, and limit our understanding of the environment.

A fit-for-purpose governance framework could offer a solution to these barriers by creating space for more systematic, sustainable, and responsible data sharing and collaboration. Better data sharing can in turn ease information flows, mitigate asymmetries, and minimize data enclosures.

And there are some clear criteria for an effective governance framework…(More)”

Designing a Data Sharing Tool Kit


Paper by Ilka Jussen, Julia Christina Schweihoff, Maleen Stachon and Frederik Möller: “Sharing data is essential to the success of modern data-driven business models. They play a crucial role for companies in creating new and better services and optimizing existing processes. While the interest in data sharing is growing, companies face an array of challenges preventing them from fully exploiting data sharing opportunities. Mitigating these risks and weighing them against their potential is a creative, interdisciplinary task in each company. The paper starts precisely at this point and proposes a Tool Kit with three Visual Inquiry Tool (VIT) to work on finding data sharing potential conjointly. We do this using a design-oriented research approach and contribute to research and practice by providing three VITs that help different stakeholders or companies in an ecosystem to visualize and design their data-sharing activities…(More)”.

Big Data and Official Statistics


Paper by Katharine G. Abraham: “The infrastructure and methods for developed countries’ economic statistics, largely established in the mid-20th century, rest almost entirely on survey and administrative data. The increasing difficulty of obtaining survey responses threatens the sustainability of this model. Meanwhile, users of economic data are demanding ever more timely and granular information. “Big data” originally created for other purposes offer the promise of new approaches to the compilation of economic data. Drawing primarily on the U.S. experience, the paper considers the challenges to incorporating big data into the ongoing production of official economic statistics and provides examples of progress towards that goal to date. Beyond their value for the routine production of a standard set of official statistics, new sources of data create opportunities to respond more nimbly to emerging needs for information. The concluding section of the paper argues that national statistical offices should expand their mission to seize these opportunities…(More)”.

Data Spaces: Design, Deployment and Future Directions


Open access book edited by Edward Curry, Simon Scerri, and Tuomo Tuikka: “…aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.

The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively.

  • The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.
  • The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.
  • The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing…(More)”.

A ‘Feminist’ Server to Help People Own Their Own Data


Article by Padmini Ray Murray: “All of our digital lives reside on servers – mostly in corporate server farms owned by the likes of Google, Amazon, Apple, and Microsoft.  These farms contain machines that store massive volumes of data generated by every single user of the internet. These vast infrastructures allow people to store, connect, and exchange information on the internet. 

Consequently, there is a massive distance between users and where and how the data is stored, which means that individuals have very little control over how their data is stored and used. However, due to the huge reliance on these massive corporate technologies, individuals are left with very little choice but to accept the terms dictated by these businesses. The conceptual alternative of the feminist server was created by groups of feminist and queer activists who were concerned about how little power they have over owning and managing their data on the internet. The idea of the feminist server was described as a project that is interested in “creating a more autonomous infrastructure to ensure that data, projects and memory of feminist groups are properly accessible, preserved and managed” – a safe digital library to store and manage content generated by feminist groups. This was also a direct challenge to the traditionally male-dominated spaces of computer hardware management, spaces which could be very exclusionary and hostile to women or queer individuals who might be interested in learning how to use these technologies. 

There are two related ways by which a server can be considered as feminist. The first is based on who runs the server, and the second is based on who owns the server. Feminist critics have pointed out how the running of servers is often in the hands of male experts who are not keen to share and explain the knowledge required to maintain a server – a role known as a systems admin or, colloquially, a “sysadmin” person. Thus the concept of feminist servers emerged out of a need to challenge patriarchal dominance in hardware and infrastructure spaces, to create alternatives that were nurturing, anti-capitalist, and worked on the basis of community and solidarity…(More)”.

New WHO policy requires sharing of all research data


Press release: “Science and public health can benefit tremendously from sharing and reuse of health data. Sharing data allows us to have the fullest possible understanding of health challenges, to develop new solutions, and to make decisions using the best available evidence.

The Research for Health department has helped spearhead the launch of a new policy from the Science Division which covers all research undertaken by or with support from WHO. The goal is to make sure that all research data is shared equitably, ethically and efficiently. Through this policy, WHO indicates its commitment to transparency in order to reach the goal of one billion more people enjoying better health and well-being.

The WHO policy is accompanied by practical guidance to enable researchers to develop and implement a data management and sharing plan, before the research has even started. The guide provides advice on the technical, ethical and legal considerations to ensure that data, even patient data, can be shared for secondary analysis without compromising personal privacy.  Data sharing is now a requirement for research funding awarded by WHO and TDR. 

“We have seen the problems caused by the lack of data sharing on COVID-19,” said Dr. Soumya Swaminathan, WHO Chief Scientist. “When data related to research activities are shared ethically, equitably and efficiently, there are major gains for science and public health.”

The policy to share data from all research funded or conducted by WHO, and practical guidance to do so, can be found here…(More)”.

The Public Good and Public Attitudes Toward Data Sharing Through IoT


Paper by Karen Mossberger, Seongkyung Cho and Pauline Cheong: “The Internet of Things has created a wealth of new data that is expected to deliver important benefits for IoT users and for society, including for the public good. Much of the literature has focused on data collection through individual adoption of IoT devices, and big data collection by companies with accompanying fears of data misuse. While citizens also increasingly produce data as they move about in public spaces, less is known about citizen support for data collection in smart city environments, or for data sharing for a variety of public-regarding purposes. Through a nationally representative survey of over 2,000 respondents as well as interviews, we explore the willingness of citizens to share their data with different parties and in various circumstances, using the contextual integrity framework, the literature on the ‘publicness’ of organizations, and public value creation. We describe the results of the survey across different uses, for data sharing from devices and for data collection in public spaces. We conduct multivariate regression to predict individual characteristics that influence attitudes toward use of IoT data for public purposes. Across different contexts, from half to 2/3 of survey respondents were willing to share data from their own IoT devices for public benefits, and 80-93% supported the use of sensors in public places for a variety of collective benefits. Yet government is less trusted with this data than other organizations with public purposes, such as universities, nonprofits and health care institutions. Trust in government, among other factors, was significantly related to data sharing and support for smart city data collection. Cultivating trust through transparent and responsible data stewardship will be important for future use of IoT data for public good…(More)”.