Community consent: neither a ceiling nor a floor


Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata. 

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

  • Ensuring that information about the research project spreads throughout the community,
  • Removing potential barriers that might be created by resistance from community members,
  • Alleviating the possible concerns of individuals about the perspectives of community leaders, and 
  • Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.

The Data That Powers A.I. Is Disappearing Fast


Article by Kevin Roose: “For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up.

Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.

The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites’ terms of service.

“We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.

Data is the main ingredient in today’s generative A.I. systems, which are fed billions of examples of text, images and videos. Much of that data is scraped from public websites by researchers and compiled in large data sets, which can be downloaded and freely used, or supplemented with data from other sources…(More)”.

The Five Stages Of AI Grief


Essay by Benjamin Bratton: “Alignment” toward “human-centered AI” are just words representing our hopes and fears related to how AI feels like it is out of control — but also to the idea that complex technologies were never under human control to begin with. For reasons more political than perceptive, some insist that “AI” is not even “real,” that it is just math or just an ideological construction of capitalism turning itself into a naturalized fact. Some critics are clearly very angry at the all-too-real prospects of pervasive machine intelligence. Others recognize the reality of AI but are convinced it is something that can be controlled by legislative sessions, policy papers and community workshops. This does not ameliorate the depression felt by still others, who foresee existential catastrophe.

All these reactions may confuse those who see the evolution of machine intelligence, and the artificialization of intelligence itself, as an overdetermined consequence of deeper developments. What to make of these responses?

Sigmund Freud used the term “Copernican” to describe modern decenterings of the human from a place of intuitive privilege. After Nicolaus Copernicus and Charles Darwin, he nominated psychoanalysis as the third such revolution. He also characterized the response to such decenterings as “traumas.”

Trauma brings grief. This is normal. In her 1969 book, “On Death and Dying,” the Swiss psychiatrist Elizabeth Kübler-Ross identified the “five stages of grief”: denial, anger, bargaining, depression and acceptance. Perhaps Copernican Traumas are no different…(More)”.

Reliability of U.S. Economic Data Is in Jeopardy, Study Finds


Article by Ben Casselman: “A report says new approaches and increased spending are needed to ensure that government statistics remain dependable and free of political influence.

Federal Reserve officials use government data to help determine when to raise or lower interest rates. Congress and the White House use it to decide when to extend jobless benefits or send out stimulus payments. Investors place billions of dollars worth of bets that are tied to monthly reports on job growth, inflation and retail sales.

But a new study says the integrity of that data is in increasing jeopardy.

The report, issued on Tuesday by the American Statistical Association, concludes that government statistics are reliable right now. But that could soon change, the study warns, citing factors including shrinking budgets, falling survey response rates and the potential for political interference.

The authors — statisticians from George Mason University, the Urban Institute and other institutions — likened the statistical system to physical infrastructure like highways and bridges: vital, but often ignored until something goes wrong.

“We do identify this sort of downward spiral as a threat, and that’s what we’re trying to counter,” said Nancy Potok, who served as chief statistician of the United States from 2017 to 2019 and was one of the report’s authors. “We’re not there yet, but if we don’t do something, that threat could become a reality, and in the not-too-distant future.”

The report, “The Nation’s Data at Risk,” highlights the threats facing statistics produced across the federal government, including data on education, health, crime and demographic trends.

But the risks to economic data are particularly notable because of the attention it receives from policymakers and investors. Most of that data is based on surveys of households or businesses. And response rates to government surveys have plummeted in recent years, as they have for private polls. The response rate to the Current Population Survey — the monthly survey of about 60,000 households that is the basis for the unemployment rate and other labor force statistics — has fallen to about 70 percent in recent months, from nearly 90 percent a decade ago…(More)”.

Precision public health in the era of genomics and big data


Paper by Megan C. Roberts et al: “Precision public health (PPH) considers the interplay between genetics, lifestyle and the environment to improve disease prevention, diagnosis and treatment on a population level—thereby delivering the right interventions to the right populations at the right time. In this Review, we explore the concept of PPH as the next generation of public health. We discuss the historical context of using individual-level data in public health interventions and examine recent advancements in how data from human and pathogen genomics and social, behavioral and environmental research, as well as artificial intelligence, have transformed public health. Real-world examples of PPH are discussed, emphasizing how these approaches are becoming a mainstay in public health, as well as outstanding challenges in their development, implementation and sustainability. Data sciences, ethical, legal and social implications research, capacity building, equity research and implementation science will have a crucial role in realizing the potential for ‘precision’ to enhance traditional public health approaches…(More)”.

An Algorithm Told Police She Was Safe. Then Her Husband Killed Her.


Article by Adam Satariano and Roser Toll Pifarré: “Spain has become dependent on an algorithm to combat gender violence, with the software so woven into law enforcement that it is hard to know where its recommendations end and human decision-making begins. At its best, the system has helped police protect vulnerable women and, overall, has reduced the number of repeat attacks in domestic violence cases. But the reliance on VioGén has also resulted in victims, whose risk levels are miscalculated, getting attacked again — sometimes leading to fatal consequences.

Spain now has 92,000 active cases of gender violence victims who were evaluated by VioGén, with most of them — 83 percent — classified as facing little risk of being hurt by their abuser again. Yet roughly 8 percent of women who the algorithm found to be at negligible risk and 14 percent at low risk have reported being harmed again, according to Spain’s Interior Ministry, which oversees the system.

At least 247 women have also been killed by their current or former partner since 2007 after being assessed by VioGén, according to government figures. While that is a tiny fraction of gender violence cases, it points to the algorithm’s flaws. The New York Times found that in a judicial review of 98 of those homicides, 55 of the slain women were scored by VioGén as negligible or low risk for repeat abuse…(More)”.

10 profound answers about the math behind AI


Article by Ethan Siegel: “Why do machines learn? Even in the recent past, this would have been a ridiculous question, as machines — i.e., computers — were only capable of executing whatever instructions a human programmer had programmed into them. With the rise of generative AI, or artificial intelligence, however, machines truly appear to be gifted with the ability to learn, refining their answers based on continued interactions with both human and non-human users. Large language model-based artificial intelligence programs, such as ChatGPT, Claude, Gemini and more, are now so widespread that they’re replacing traditional tools, including Google searches, in applications all across the world.

How did this come to be? How did we so swiftly come to live in an era where many of us are happy to turn over aspects of our lives that traditionally needed a human expert to a computer program? From financial to medical decisions, from quantum systems to protein folding, and from sorting data to finding signals in a sea of noise, many programs that leverage artificial intelligence (AI) and machine learning (ML) are far superior at these tasks compared with even the greatest human experts.

In his new book, Why Machines Learn: The Elegant Math Behind Modern AI, science writer Anil Ananthaswamy explores all of these aspects and more. I was fortunate enough to get to do a question-and-answer interview with him, and here are the 10 most profound responses he was generous enough to give….(More)”

The Great Scrape: The Clash Between Scraping and Privacy


Paper by Daniel J. Solove and Woodrow Hartzog: “Artificial intelligence (AI) systems depend on massive quantities of data, often gathered by “scraping” – the automated extraction of large amounts of data from the internet. A great deal of scraped data is about people. This personal data provides the grist for AI tools such as facial recognition, deep fakes, and generative AI. Although scraping enables web searching, archival, and meaningful scientific research, scraping for AI can also be objectionable or even harmful to individuals and society.

Organizations are scraping at an escalating pace and scale, even though many privacy laws are seemingly incongruous with the practice. In this Article, we contend that scraping must undergo a serious reckoning with privacy law.  Scraping violates nearly all of the key principles in privacy laws, including fairness; individual rights and control; transparency; consent; purpose specification and secondary use restrictions; data minimization; onward transfer; and data security. With scraping, data protection laws built around these requirements are ignored.

Scraping has evaded a reckoning with privacy law largely because scrapers act as if all publicly available data were free for the taking. But the public availability of scraped data shouldn’t give scrapers a free pass. Privacy law regularly protects publicly available data, and privacy principles are implicated even when personal data is accessible to others.

This Article explores the fundamental tension between scraping and privacy law. With the zealous pursuit and astronomical growth of AI, we are in the midst of what we call the “great scrape.” There must now be a great reconciliation…(More)”.

Digital Ethology


Book edited by Tomáš Paus and Hye-Chung Kum: “Countless permutations of physical, built, and social environments surround us in space and time, influencing the air we breathe, how hot or cold we are, how many steps we take, and with whom we interact as we go about our daily lives. Assessing the dynamic processes that play out between humans and the environment is challenging. explores how aggregate area-level data, produced at multiple locations and points in time, can reveal bidirectional—and iterative—relationships between human behavior and the environment through their digital footprints.

Experts from geospatial and data science, behavioral and brain science, epidemiology and public health, ethics, law, and urban planning consider how humans transform their environments and how environments shape human behavior…(More)”.

Mapping the Landscape of AI-Powered Nonprofits


Article by Kevin Barenblat: “Visualize the year 2050. How do you see AI having impacted the world? Whatever you’re picturing… the reality will probably be quite a bit different. Just think about the personal computer. In its early days circa the 1980s, tech companies marketed the devices for the best use cases they could imagine: reducing paperwork, doing math, and keeping track of forgettable things like birthdays and recipes. It was impossible to imagine that decades later, the larger-than-a-toaster-sized devices would be smaller than the size of Pop-Tarts, connect with billions of other devices, and respond to voice and touch.

It can be hard for us to see how new technologies will ultimately be used. The same is true of artificial intelligence. With new use cases popping up every day, we are early in the age of AI. To make sense of all the action, many landscapes have been published to organize the tech stacks and private sector applications of AI. We could not, however, find an overview of how nonprofits are using AI for impact…

AI-powered nonprofits (APNs) are already advancing solutions to many social problems, and Google.org’s recent research brief AI in Action: Accelerating Progress Towards the Sustainable Development Goals shows that AI is driving progress towards all 17 SDGs. Three goals that stand out with especially strong potential to be transformed by AI are SDG 3 (Good Health and Well-Being), SDG 4 (Quality Education), and SDG 13 (Climate Action). As such, this series focuses on how AI-powered nonprofits are transforming the climate, health care, and education sectors…(More)”.