Data for Social Good: Non-Profit Sector Data Projects


Open Access Book by Jane Farmer, Anthony McCosker, Kath Albury & Amir Aryani: “In February 2020, just pre-COVID, a group of managers from community organisations met with us researchers about data for social good. “We want to collaborate with data,” said one CEO. “We want to find the big community challenges, work together to fix them and monitor the change we make over ten years.” The managers created a small, pooled fund and, through the 2020–2021 COVID lockdowns, used Zoom to workshop. Together we identified organisations’ datasets, probed their strengths and weaknesses, and found ways to share and visualise data. There were early frustrations about what data was available, its ‘granularity’ and whether new insights about the community could be found, but about half-way through the project, there was a tipping point, and something changed. While still focused on discovery from visualisations comparing their data by suburb, the group started to talk about other benefits. Through drawing in staff from across their organisations, they saw how the work of departments could be integrated by using data, and they developed new confidence in using analytics techniques. Together, the organisations developed an understanding of each other’s missions and services, while developing new relationships, trust and awareness of the possibilities of collaborating to address community needs. Managers completed the pilot having codesigned an interactive Community Resilience Dashboard, which enabled them to visualise their own organisations’ data and open public data to reveal new landscapes about community financial wellbeing and social determinants of health. They agreed they also had so much more: a collective data-capable partnership, internally and across organisations, with new potential to achieve community social justice driven by data.

We use this story to signify how right now is a special—indeed critical—time for non-profit organisations and communities to build their capability to work with data. Certainly, in high-income countries, there is pressure on non-profits to operate like commercial businesses—prioritising efficiency and using data about their outputs and impacts to compete for funding. However, beyond the immediate operational horizon, non-profits can use data analytics techniques to drive community social justice and potentially impact on the institutional capability of the whole social welfare sector. Non-profits generate a lot of data but innovating with technology is not a traditional competence, and it demands infrastructure investment and specialist workforce. Given their meagre access to funding, this book examines how non-profits of different types and sizes can use data for social good and find a path to data capability. The aim is to inspire and give practical examples of how non-profits can make data useful. While there is an emerging range of novel data for social good cases around the world, the case studies featured in this book exemplify our research and developing thinking in experimental data projects with diverse non-profits that harnessed various types of data. We outline a way to gain data capability through collaborating internally across departments and with other external non-profits and skilled data analytics partners. We term this way of working collaborative data action…(More)”.

Wicked Problems Might Inspire Greater Data Sharing


Paper by Susan Ariel Aaronson: “In 2021, the United Nations Development Program issued a plea in their 2021 Digital Economy Report. “ Global data-sharing can help address major global development challenges such as poverty, health, hunger and climate change. …Without global cooperation on data and information, research to develop the vaccine and actions to tackle the impact of the pandemic would have been a much more difficult task. Thus, in the same way as some data can be public goods, there is a case for some data to be considered as global public goods, which need to be addressed and provided through global governance.” (UNDP: 2021, 178). Global public goods are goods and services with benefits and costs that potentially extend to all countries, people, and generations. Global data sharing can also help solve what scholars call wicked problems—problems so complex that they require innovative, cost effective and global mitigating strategies. Wicked problems are problems that no one knows how to solve without
creating further problems. Hence, policymakers must find ways to encourage greater data sharing among entities that hold large troves of various types of data, while protecting that data from theft, manipulation etc. Many factors impede global data sharing for public good purposes; this analysis focuses on two.
First, policymakers generally don’t think about data as a global public good; they view data as a commercial asset that they should nurture and control. While they may understand that data can serve the public interest, they are more concerned with using data to serve their country’s economic interest. Secondly, many leaders of civil society and business see the data they have collected as proprietary data. So far many leaders of private entities with troves of data are not convinced that their organization will benefit from such sharing. At the same time, companies voluntarily share some data for social good purposes.

However, data cannot meet its public good purpose if data is not shared among societal entities. Moreover, if data as a sovereign asset, policymakers are unlikely to encourage data sharing across borders oriented towards addressing shared problems. Consequently, society will be less able to use data as both a commercial asset and as a resource to enhance human welfare. As the Bennet Institute and ODI have argued, “value comes from data being brought together, and that requires organizations to let others use the data they hold.” But that also means the entities that collected the data may not accrue all of the benefits from that data (Bennett Institute and ODI: 2020a: 4). In short, private entities are not sufficiently incentivized to share data in the global public good…(More)”.

Global healthcare fairness: We should be sharing more, not less, data


Paper by Kenneth P. Seastedt et al: “The availability of large, deidentified health datasets has enabled significant innovation in using machine learning (ML) to better understand patients and their diseases. However, questions remain regarding the true privacy of this data, patient control over their data, and how we regulate data sharing in a way that does not encumber progress or further potentiate biases for underrepresented populations. After reviewing the literature on potential reidentifications of patients in publicly available datasets, we argue that the cost—measured in terms of access to future medical innovations and clinical software—of slowing ML progress is too great to limit sharing data through large publicly available databases for concerns of imperfect data anonymization. This cost is especially great for developing countries where the barriers preventing inclusion in such databases will continue to rise, further excluding these populations and increasing existing biases that favor high-income countries. Preventing artificial intelligence’s progress towards precision medicine and sliding back to clinical practice dogma may pose a larger threat than concerns of potential patient reidentification within publicly available datasets. While the risk to patient privacy should be minimized, we believe this risk will never be zero, and society has to determine an acceptable risk threshold below which data sharing can occur—for the benefit of a global medical knowledge system….(More)”.

Eliminate data asymmetries to democratize data use


Article by Rahul Matthan: “Anyone who possesses a large enough store of data can reasonably expect to glean powerful insights from it. These insights are more often than not used to enhance advertising revenues or ensure greater customer stickiness. In other instances, they’ve been subverted to alter our political preferences and manipulate us into taking decisions we otherwise may not have.

The ability to generate insights places those who have access to these data sets at a distinct advantage over those whose data is contained within them. It allows the former to benefit from the data in ways that the latter may not even have thought possible when they consented to provide it. Given how easily these insights can be used to harm those to whom it pertains, there is a need to mitigate the effects of this data asymmetry.

Privacy law attempts to do this by providing data principals with tools they can use to exert control over their personal data. It requires data collectors to obtain informed consent from data principals before collecting their data and forbids them from using it for any purpose other than that which has been previously notified. This is why, even if that consent has been obtained, data fiduciaries cannot collect more data than is absolutely necessary to achieve the stated purpose and are only allowed to retain that data for as long as is necessary to fulfil the stated purpose.

In India, we’ve gone one step further and built techno-legal solutions to help reduce this data asymmetry. The Data Empowerment and Protection Architecture (DEPA) framework makes it possible to extract data from the silos in which they reside and transfer it on the instructions of the data principal to other entities, which can then use it to provide other services to the data principal. This data micro-portability dilutes the historical advantage that incumbents enjoy on account of collecting data over the entire duration of their customer engagement. It eliminates data asymmetries by establishing the infrastructure that creates a competitive market for data-based services, allowing data principals to choose from a range of options as to how their data could be used for their benefit by service providers.

This, however, is not the only type of asymmetry we have to deal with in this age of big data. In a recent article, Stefaan Verhulst of GovLab at New York University pointed out that it is no longer enough to possess large stores of data—you need to know how to effectively extract value from it. Many businesses might have vast stores of data that they have accumulated over the years they have been in operation, but very few of them are able to effectively extract useful signals from that noisy data.

Without the know-how to translate data into actionable information, merely owning a large data set is of little value.

Unlike data asymmetries, which can be mitigated by making data more widely available, information asymmetries can only be addressed by radically democratizing the techniques and know-how that are necessary for extracting value from data. This know-how is largely proprietary and hard to access even in a fully competitive market. What’s more, in many instances, the computation power required far exceeds the capacity of entities for whom data analysis is not the main purpose of their business…(More)”.

Data and displacement: Ethical and practical issues in data-driven humanitarian assistance for IDPs


Blog by Vicki Squire: “Ten years since the so-called “data revolution” (Pearn et al, 2022), the rise of “innovation” and the proliferation of “data solutions” has rendered the assessment of changing data practices within the humanitarian sector ever more urgent. New data acquisition modalities have provoked a range of controversies across multiple contexts and sites (e.g. Human Rights Watch, 20212022a2022b). Moreover, a range of concerns have been raised about data sharing (e.g. Fast, 2022) and the inequities embedded within humanitarian data (e.g. Data Values, 2022).

With this in mind, the Data and Displacement project set out to explore the practical and ethical implications of data-driven humanitarian assistance in two contexts characterised by high levels of internal displacement: north-eastern Nigeria and South Sudan. Our interdisciplinary research team includes academics from each of the regions under analysis, as well as practitioners from the International Organization for Migration. From the start, the research was designed to centre the lived experiences of Internally Displaced Persons (IDPs), while also shedding light on the production and use of humanitarian data from multiple perspectives.

We conducted primary research during 2021-2022. Our research combines dataset analysis and visualisation techniques with a thematic analysis of 174 semi-structured qualitative interviews. In total we interviewed 182 people: 42 international data experts, donors, and humanitarian practitioners from a range of governmental and non-governmental organisations; 40 stakeholders and practitioners working with IDPs across north-eastern Nigeria and South Sudan (20 in each region); and 100 IDPs in camp-like settings (50 in each region). Our findings point to a disconnect between international humanitarian standards and practices on the ground, the need to revisit existing ethical guidelines such informed consent, and the importance of investing in data literacies…(More)”.

Can Smartphones Help Predict Suicide?


Ellen Barry in The New York Times: “In March, Katelin Cruz left her latest psychiatric hospitalization with a familiar mix of feelings. She was, on the one hand, relieved to leave the ward, where aides took away her shoelaces and sometimes followed her into the shower to ensure that she would not harm herself.

But her life on the outside was as unsettled as ever, she said in an interview, with a stack of unpaid bills and no permanent home. It was easy to slide back into suicidal thoughts. For fragile patients, the weeks after discharge from a psychiatric facility are a notoriously difficult period, with a suicide rate around 15 times the national rate, according to one study.

This time, however, Ms. Cruz, 29, left the hospital as part of a vast research project which attempts to use advances in artificial intelligence to do something that has eluded psychiatrists for centuries: to predict who is likely to attempt suicide and when that person is likely to attempt it, and then, to intervene.

On her wrist, she wore a Fitbit programmed to track her sleep and physical activity. On her smartphone, an app was collecting data about her moods, her movement and her social interactions. Each device was providing a continuous stream of information to a team of researchers on the 12th floor of the William James Building, which houses Harvard’s psychology department.

In the field of mental health, few new areas generate as much excitement as machine learning, which uses computer algorithms to better predict human behavior. There is, at the same time, exploding interest in biosensors that can track a person’s mood in real time, factoring in music choices, social media posts, facial expression and vocal expression.

Matthew K. Nock, a Harvard psychologist who is one of the nation’s top suicide researchers, hopes to knit these technologies together into a kind of early-warning system that could be used when an at-risk patient is released from the hospital…(More)”.

Governing the Environment-Related Data Space


Stefaan G. Verhulst, Anthony Zacharzewski and Christian Hudson at Data & Policy: “Today, The GovLab and The Democratic Society published their report, “Governing the Environment-Related Data Space”, written by Jörn Fritzenkötter, Laura Hohoff, Paola Pierri, Stefaan G. Verhulst, Andrew Young, and Anthony Zacharzewski . The report captures the findings of their joint research centered on the responsible and effective reuse of environment-related data to achieve greater social and environmental impact.

Environment-related data (ERD) encompasses numerous kinds of data across a wide range of sectors. It can best be defined as data related to any element of the Driver-Pressure-State-Impact-Response (DPSIR) Framework. If leveraged effectively, this wealth of data could help society establish a sustainable economy, take action against climate change, and support environmental justice — as recognized recently by French President Emmanuel Macron and UN Secretary General’s Special Envoy for Climate Ambition and Solutions Michael R. Bloomberg when establishing the Climate Data Steering Committee.

While several actors are working to improve access to, as well as promote the (re)use of, ERD data, two key challenges that hamper progress on this front are data asymmetries and data enclosures. Data asymmetries occur due to the ever-increasing amounts of ERD scattered across diverse actors, with larger and more powerful stakeholders often maintaining unequal access. Asymmetries lead to problems with accessibility and findability (data enclosures), leading to limited sharing and collaboration, and stunting the ability to use data and maximize its potential to address public ills.

The risks and costs of data enclosure and data asymmetries are high. Information bottlenecks cause resources to be misallocated, slow scientific progress, and limit our understanding of the environment.

A fit-for-purpose governance framework could offer a solution to these barriers by creating space for more systematic, sustainable, and responsible data sharing and collaboration. Better data sharing can in turn ease information flows, mitigate asymmetries, and minimize data enclosures.

And there are some clear criteria for an effective governance framework…(More)”

Designing a Data Sharing Tool Kit


Paper by Ilka Jussen, Julia Christina Schweihoff, Maleen Stachon and Frederik Möller: “Sharing data is essential to the success of modern data-driven business models. They play a crucial role for companies in creating new and better services and optimizing existing processes. While the interest in data sharing is growing, companies face an array of challenges preventing them from fully exploiting data sharing opportunities. Mitigating these risks and weighing them against their potential is a creative, interdisciplinary task in each company. The paper starts precisely at this point and proposes a Tool Kit with three Visual Inquiry Tool (VIT) to work on finding data sharing potential conjointly. We do this using a design-oriented research approach and contribute to research and practice by providing three VITs that help different stakeholders or companies in an ecosystem to visualize and design their data-sharing activities…(More)”.

Big Data and Official Statistics


Paper by Katharine G. Abraham: “The infrastructure and methods for developed countries’ economic statistics, largely established in the mid-20th century, rest almost entirely on survey and administrative data. The increasing difficulty of obtaining survey responses threatens the sustainability of this model. Meanwhile, users of economic data are demanding ever more timely and granular information. “Big data” originally created for other purposes offer the promise of new approaches to the compilation of economic data. Drawing primarily on the U.S. experience, the paper considers the challenges to incorporating big data into the ongoing production of official economic statistics and provides examples of progress towards that goal to date. Beyond their value for the routine production of a standard set of official statistics, new sources of data create opportunities to respond more nimbly to emerging needs for information. The concluding section of the paper argues that national statistical offices should expand their mission to seize these opportunities…(More)”.

Data Spaces: Design, Deployment and Future Directions


Open access book edited by Edward Curry, Simon Scerri, and Tuomo Tuikka: “…aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.

The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively.

  • The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.
  • The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.
  • The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing…(More)”.