We the Dead: Preserving Data at the End of the World


Book by Brian Michael Murphy: “Locked away in refrigerated vaults, sanitized by gas chambers, and secured within bombproof caverns deep under mountains are America’s most prized materials: the ever-expanding collection of records that now accompany each of us from birth to death. This data complex backs up and protects our most vital information against decay and destruction, and yet it binds us to corporate and government institutions whose power is also preserved in its bunkers, infrastructures, and sterilized spaces.

We the Dead traces the emergence of the data complex in the early twentieth century and guides readers through its expansion in a series of moments when Americans thought they were living just before the end of the world. Depression-era eugenicists feared racial contamination and the downfall of the white American family, while contemporary technologists seek ever denser and more durable materials for storing data, from microetched metal discs to cryptocurrency keys encoded in synthetic DNA. Artfully written and packed with provocative ideas, this haunting book illuminates the dark places of the data complex and the ways it increasingly blurs the lines between human and machine, biological body and data body, life and digital afterlife…(More)”.

CNSTAT Report Emphasizes the Need for a National Data Infrastructure


Article by Molly Gahagen: “Having credible and accessible data is essential for various sectors of society to function. In the recent report, “Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good,” by the Committee on National Statistics (CNSTAT) of the National Academies of Sciences, Engineering, and Medicine, the importance of national data infrastructure is emphasized…

Emphasizing the need for reliable statistics for national, state and local government officials, as well as businesses and citizens, the report cites the need for a modern national data infrastructure incorporating data from multiple federal agencies. Initial recommendations and potential outcomes of such a system are contained in the report.

Recommendations include practices to incorporate data from many sources, safeguard privacy, freely share statistics with the public, ensure transparency and create a modern system that would allow for easy access and enhanced security.

Potential outcomes of this infrastructure highlighted by the authors of the report include increased evidence-based policymaking on several levels of government, uniform regulations for data reporting and users accessing the data and increased security. The report describes how this would tie into increased initiatives to promote research and evidence-based policymaking, including through the passing of the Evidence-Based Policymaking Act of 2018 in Congress.

CNSTAT’s future reports seek to address blending multiple data sources, data equity, technology and tools, among other topics…(More)”.

Four ways that AI and robotics are helping to transform other research fields


Article by Michael Eisenstein: “Artificial intelligence (AI) is already proving a revolutionary tool for bioinformatics; the AlphaFold database set up by London-based company DeepMind, owned by Google, is allowing scientists to predict the structures of 200 million proteins across 1 million species. But other fields are benefiting too. Here, we describe the work of researchers pursuing cutting-edge AI and robotics techniques to better anticipate the planet’s changing climate, uncover the hidden history behind artworks, understand deep sea ecology and develop new materials.

Marine biology with a soft touch

It takes a tough organism to withstand the rigours of deep-sea living. But these resilient species are also often remarkably delicate, ranging from soft and squishy creatures such as jellyfish and sea cucumbers, to firm but fragile deep-sea fishes and corals. Their fragility makes studying these organisms a complex task.

The rugged metal manipulators found on many undersea robots are more likely to harm such specimens than to retrieve them intact. But ‘soft robots’ based on flexible polymers are giving marine biologists such as David Gruber, of the City University of New York, a gentler alternative for interacting with these enigmatic denizens of the deep…(More)”.

Global healthcare fairness: We should be sharing more, not less, data


Paper by Kenneth P. Seastedt et al: “The availability of large, deidentified health datasets has enabled significant innovation in using machine learning (ML) to better understand patients and their diseases. However, questions remain regarding the true privacy of this data, patient control over their data, and how we regulate data sharing in a way that does not encumber progress or further potentiate biases for underrepresented populations. After reviewing the literature on potential reidentifications of patients in publicly available datasets, we argue that the cost—measured in terms of access to future medical innovations and clinical software—of slowing ML progress is too great to limit sharing data through large publicly available databases for concerns of imperfect data anonymization. This cost is especially great for developing countries where the barriers preventing inclusion in such databases will continue to rise, further excluding these populations and increasing existing biases that favor high-income countries. Preventing artificial intelligence’s progress towards precision medicine and sliding back to clinical practice dogma may pose a larger threat than concerns of potential patient reidentification within publicly available datasets. While the risk to patient privacy should be minimized, we believe this risk will never be zero, and society has to determine an acceptable risk threshold below which data sharing can occur—for the benefit of a global medical knowledge system….(More)”.

Investment Case: Multiplying Progress Through Data Ecosystems


Report by Dalberg: “Data and data ecosystems enable decision makers to improve lives and livelihoods by better understanding the world around them and acting in more effective and targeted ways. In a time of growing crises and shrinking budgets, it is imperative that every dollar is spent in the most efficient and equitable way. Data ecosystems provide decision makers with the information needed to assess and predict challenges, identify and customize solutions, and monitor and evaluate real-time progress. Together, this enables decisions that are more collaborative, effective, efficient, equitable, timely, and transparent. And this is only getting easier—ongoing advances in our ability to harness and apply data are creating opportunities to better target resources and create even more transformative impact…(More)”.

Eliminate data asymmetries to democratize data use


Article by Rahul Matthan: “Anyone who possesses a large enough store of data can reasonably expect to glean powerful insights from it. These insights are more often than not used to enhance advertising revenues or ensure greater customer stickiness. In other instances, they’ve been subverted to alter our political preferences and manipulate us into taking decisions we otherwise may not have.

The ability to generate insights places those who have access to these data sets at a distinct advantage over those whose data is contained within them. It allows the former to benefit from the data in ways that the latter may not even have thought possible when they consented to provide it. Given how easily these insights can be used to harm those to whom it pertains, there is a need to mitigate the effects of this data asymmetry.

Privacy law attempts to do this by providing data principals with tools they can use to exert control over their personal data. It requires data collectors to obtain informed consent from data principals before collecting their data and forbids them from using it for any purpose other than that which has been previously notified. This is why, even if that consent has been obtained, data fiduciaries cannot collect more data than is absolutely necessary to achieve the stated purpose and are only allowed to retain that data for as long as is necessary to fulfil the stated purpose.

In India, we’ve gone one step further and built techno-legal solutions to help reduce this data asymmetry. The Data Empowerment and Protection Architecture (DEPA) framework makes it possible to extract data from the silos in which they reside and transfer it on the instructions of the data principal to other entities, which can then use it to provide other services to the data principal. This data micro-portability dilutes the historical advantage that incumbents enjoy on account of collecting data over the entire duration of their customer engagement. It eliminates data asymmetries by establishing the infrastructure that creates a competitive market for data-based services, allowing data principals to choose from a range of options as to how their data could be used for their benefit by service providers.

This, however, is not the only type of asymmetry we have to deal with in this age of big data. In a recent article, Stefaan Verhulst of GovLab at New York University pointed out that it is no longer enough to possess large stores of data—you need to know how to effectively extract value from it. Many businesses might have vast stores of data that they have accumulated over the years they have been in operation, but very few of them are able to effectively extract useful signals from that noisy data.

Without the know-how to translate data into actionable information, merely owning a large data set is of little value.

Unlike data asymmetries, which can be mitigated by making data more widely available, information asymmetries can only be addressed by radically democratizing the techniques and know-how that are necessary for extracting value from data. This know-how is largely proprietary and hard to access even in a fully competitive market. What’s more, in many instances, the computation power required far exceeds the capacity of entities for whom data analysis is not the main purpose of their business…(More)”.

Data and displacement: Ethical and practical issues in data-driven humanitarian assistance for IDPs


Blog by Vicki Squire: “Ten years since the so-called “data revolution” (Pearn et al, 2022), the rise of “innovation” and the proliferation of “data solutions” has rendered the assessment of changing data practices within the humanitarian sector ever more urgent. New data acquisition modalities have provoked a range of controversies across multiple contexts and sites (e.g. Human Rights Watch, 20212022a2022b). Moreover, a range of concerns have been raised about data sharing (e.g. Fast, 2022) and the inequities embedded within humanitarian data (e.g. Data Values, 2022).

With this in mind, the Data and Displacement project set out to explore the practical and ethical implications of data-driven humanitarian assistance in two contexts characterised by high levels of internal displacement: north-eastern Nigeria and South Sudan. Our interdisciplinary research team includes academics from each of the regions under analysis, as well as practitioners from the International Organization for Migration. From the start, the research was designed to centre the lived experiences of Internally Displaced Persons (IDPs), while also shedding light on the production and use of humanitarian data from multiple perspectives.

We conducted primary research during 2021-2022. Our research combines dataset analysis and visualisation techniques with a thematic analysis of 174 semi-structured qualitative interviews. In total we interviewed 182 people: 42 international data experts, donors, and humanitarian practitioners from a range of governmental and non-governmental organisations; 40 stakeholders and practitioners working with IDPs across north-eastern Nigeria and South Sudan (20 in each region); and 100 IDPs in camp-like settings (50 in each region). Our findings point to a disconnect between international humanitarian standards and practices on the ground, the need to revisit existing ethical guidelines such informed consent, and the importance of investing in data literacies…(More)”.

Can Smartphones Help Predict Suicide?


Ellen Barry in The New York Times: “In March, Katelin Cruz left her latest psychiatric hospitalization with a familiar mix of feelings. She was, on the one hand, relieved to leave the ward, where aides took away her shoelaces and sometimes followed her into the shower to ensure that she would not harm herself.

But her life on the outside was as unsettled as ever, she said in an interview, with a stack of unpaid bills and no permanent home. It was easy to slide back into suicidal thoughts. For fragile patients, the weeks after discharge from a psychiatric facility are a notoriously difficult period, with a suicide rate around 15 times the national rate, according to one study.

This time, however, Ms. Cruz, 29, left the hospital as part of a vast research project which attempts to use advances in artificial intelligence to do something that has eluded psychiatrists for centuries: to predict who is likely to attempt suicide and when that person is likely to attempt it, and then, to intervene.

On her wrist, she wore a Fitbit programmed to track her sleep and physical activity. On her smartphone, an app was collecting data about her moods, her movement and her social interactions. Each device was providing a continuous stream of information to a team of researchers on the 12th floor of the William James Building, which houses Harvard’s psychology department.

In the field of mental health, few new areas generate as much excitement as machine learning, which uses computer algorithms to better predict human behavior. There is, at the same time, exploding interest in biosensors that can track a person’s mood in real time, factoring in music choices, social media posts, facial expression and vocal expression.

Matthew K. Nock, a Harvard psychologist who is one of the nation’s top suicide researchers, hopes to knit these technologies together into a kind of early-warning system that could be used when an at-risk patient is released from the hospital…(More)”.

Hurricane Ian Destroyed Their Homes. Algorithms Sent Them Money


Article by Chris Stokel-Walker: “The algorithms that power Skai’s damage assessments are trained by manually labeling satellite images of a couple of hundred buildings in a disaster-struck area that are known to have been damaged. The software can then, at speed, detect damaged buildings across the whole affected area. A research paper on the underlying technology presented at a 2020 academic workshop on AI for disaster response claimed the auto-generated damage assessments match those of human experts with between 85 and 98 percent accuracy.

In Florida this month, GiveDirectly sent its push notification offering $700 to any user of the Providers app with a registered address in neighborhoods of Collier, Charlotte, and Lee Counties where Google’s AI system deemed more than 50 percent of buildings had been damaged. So far, 900 people have taken up the offer, and half of those have been paid. If every recipient takes up GiveDirectly’s offer, the organization will pay out $2.4 million in direct financial aid.

Some may be skeptical of automated disaster response. But in the chaos after an event like a hurricane making landfall, the conventional, human response can be far from perfect. Diaz points to an analysis GiveDirectly conducted looking at their work after Hurricane Harvey, which hit Texas and Louisiana in 2017, before the project with Google. Two out of the three areas that were most damaged and economically depressed were initially overlooked. A data-driven approach is “much better than what we’ll have from boots on the ground and word of mouth,” Diaz says.

GiveDirectly and Google’s hands-off, algorithm-led approach to aid distribution has been welcomed by some disaster assistance experts—with caveats. Reem Talhouk, a research fellow at Northumbria University’s School of Design and Centre for International Development in the UK, says that the system appears to offer a more efficient way of delivering aid. And it protects the dignity of recipients, who don’t have to queue up for handouts in public…(More)”.

‘Dark data’ is killing the planet – we need digital decarbonisation


Article by Tom Jackson and Ian R. Hodgkinson: “More than half of the digital data firms generate is collected, processed and stored for single-use purposes. Often, it is never re-used. This could be your multiple near-identical images held on Google Photos or iCloud, a business’s outdated spreadsheets that will never be used again, or data from internet of things sensors that have no purpose.

This “dark data” is anchored to the real world by the energy it requires. Even data that is stored and never used again takes up space on servers – typically huge banks of computers in warehouses. Those computers and those warehouses all use lots of electricity.

This is a significant energy cost that is hidden in most organisations. Maintaining an effective organisational memory is a challenge, but at what cost to the environment?

In the drive towards net zero many organisations are trying to reduce their carbon footprints. Guidance has generally centred on reducing traditional sources of carbon production, through mechanisms such as carbon offsetting via third parties (planting trees to make up for emissions from using petrol, for instance).

While most climate change activists are focused on limiting emissions from the automotive, aviation and energy industries, the processing of digital data is already comparable to these sectors and is still growing. In 2020, digitisation was purported to generate 4% of global greenhouse gas emissions. Production of digital data is increasing fast – this year the world is expected to generate 97 zettabytes (that is: 97 trillion gigabytes) of data. By 2025, it could almost double to 181 zettabytes. It is therefore surprising that little policy attention has been placed on reducing the digital carbon footprint of organisations…(More)”.