Community consent: neither a ceiling nor a floor


Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata. 

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

  • Ensuring that information about the research project spreads throughout the community,
  • Removing potential barriers that might be created by resistance from community members,
  • Alleviating the possible concerns of individuals about the perspectives of community leaders, and 
  • Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.

The Data That Powers A.I. Is Disappearing Fast


Article by Kevin Roose: “For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up.

Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.

The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites’ terms of service.

“We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.

Data is the main ingredient in today’s generative A.I. systems, which are fed billions of examples of text, images and videos. Much of that data is scraped from public websites by researchers and compiled in large data sets, which can be downloaded and freely used, or supplemented with data from other sources…(More)”.

Exploring Digital Biomarkers for Depression Using Mobile Technology


Paper by Yuezhou Zhang et al: “With the advent of ubiquitous sensors and mobile technologies, wearables and smartphones offer a cost-effective means for monitoring mental health conditions, particularly depression. These devices enable the continuous collection of behavioral data, providing novel insights into the daily manifestations of depressive symptoms.

We found several significant links between depression severity and various behavioral biomarkers: elevated depression levels were associated with diminished sleep quality (assessed through Fitbit metrics), reduced sociability (approximated by Bluetooth), decreased levels of physical activity (quantified by step counts and GPS data), a slower cadence of daily walking (captured by smartphone accelerometers), and disturbances in circadian rhythms (analyzed across various data streams).
Leveraging digital biomarkers for assessing and continuously monitoring depression introduces a new paradigm in early detection and development of customized intervention strategies. Findings from these studies not only enhance our comprehension of depression in real-world settings but also underscore the potential of mobile technologies in the prevention and management of mental health issues…(More)”

Designing an Effective Governance Model for Data Collaboratives


Paper by Federico Bartolomucci & Francesco Leoni: “Data Collaboratives have gained traction as interorganizational partnerships centered on data exchange. They enhance the collective capacity of responding to contemporary societal challenges using data, while also providing participating organizations with innovation capabilities and reputational benefits. Unfortunately, data collaboratives often fail to advance beyond the pilot stage and are therefore limited in their capacity to deliver systemic change. The governance setting adopted by a data collaborative affects how it acts over the short and long term. We present a governance design model to develop context-dependent data collaboratives. Practitioners can use the proposed model and list of key reflective questions to evaluate the critical aspects of designing a governance model for their data collaboratives…(More)”.

The 4M Roadmap: A Higher Road to Profitability by Using Big Data for Social Good


Report by Brennan Lake: “As the private sector faces conflicting pressures to either embrace or shun socially responsible practices, companies with privately held big-data assets must decide whether to share access to their data for public good. While some managers object to data sharing over concerns of privacy and product cannibalization, others launch well intentioned yet short-lived CSR projects that fail to deliver on lofty goals.

By embedding Shared-Value principles into ‘Data-for-Good’ programs, data-rich firms can launch responsible data-sharing initiatives that minimize risk, deliver sustained impact, and improve overall competitiveness in the process.

The 4M Roadmap by Brennan Lake, a Big-Data and Social Impact professional, guides managers to adopt a ‘Data-for-Good’ model that emphasizes four key pillars of value-creation: Mission, Messaging, Methods, and Monetization. Through deep analysis and private-sector case studies, The 4M Roadmap demonstrates how companies can engage in responsible data sharing to benefit society and business alike…(More)”.

A New National Purpose: Harnessing Data for Health


Report by the Tony Blair Institute: “We are at a pivotal moment where the convergence of large health and biomedical data sets, artificial intelligence and advances in biotechnology is set to revolutionise health care, drive economic growth and improve the lives of citizens. And the UK has strengths in all three areas. The immense potential of the UK’s health-data assets, from the NHS to biobanks and genomics initiatives, can unlock new diagnostics and treatments, deliver better and more personalised care, prevent disease and ultimately help people live longer, healthier lives.

However, realising this potential is not without its challenges. The complex and fragmented nature of the current health-data landscape, coupled with legitimate concerns around privacy and public trust, has made for slow progress. The UK has had a tendency to provide short-term funding across multiple initiatives, which has led to an array of individual projects – many of which have struggled to achieve long-term sustainability and deliver tangible benefits to patients.

To overcome these challenges, it will be necessary to be bold and imaginative. We must look for ways to leverage the unique strengths of the NHS, such as its nationwide reach and cradle-to-grave data coverage, to create a health-data ecosystem that is much more than the sum of its many parts. This will require us to think differently about how we collect, manage and utilise health data, and to create new partnerships and models of collaboration that break down traditional silos and barriers. It will mean treating data as a key health resource and managing it accordingly.

One model to do this is the proposed sovereign National Data Trust (NDT) – an endeavour to streamline access to and curation of the UK’s valuable health-data assets…(More)”.

On the Meaning of Community Consent in a Biorepository Context


Article by Astha Kapoor, Samuel Moore, and Megan Doerr: “Biorepositories, vital for medical research, collect and store human biological samples and associated data for future use. However, our reliance solely on the individual consent of data contributors for biorepository data governance is becoming inadequate. Big data analysis focuses on large-scale behaviors and patterns, shifting focus from singular data points to identifying data “journeys” relevant to a collective. The individual becomes a small part of the analysis, with the harms and benefits emanating from the data occurring at an aggregated level.

Community refers to a particular qualitative aspect of a group of people that is not well captured by quantitative measures in biorepositories. This is not an excuse to dodge the question of how to account for communities in a biorepository context; rather, it shows that a framework is needed for defining different types of community that may be approached from a biorepository perspective. 

Engaging with communities in biorepository governance presents several challenges. Moving away from a purely individualized understanding of governance towards a more collectivizing approach necessitates an appreciation of the messiness of group identity, its ephemerality, and the conflicts entailed therein. So while community implies a certain degree of homogeneity (i.e., that all members of a community share something in common), it is important to understand that people can simultaneously consider themselves a member of a community while disagreeing with many of its members, the values the community holds, or the positions for which it advocates. The complex nature of community participation therefore requires proper treatment for it to be useful in a biorepository governance context…(More)”.

Data governance for the ecological transition: An infrastructure perspective


Article by Charlotte Ducuing: “This article uses infrastructure studies to provide a critical analysis of the European Union’s (EU) ambition to regulate data for the ecological transition. The EU’s regulatory project implicitly qualifies data as an infrastructure for a better economy and society. However, current EU law does not draw all the logical consequences derived from this qualification of data as infrastructure, which is one main reason why EU data legislation for the ecological transition may not deliver on its high political expectations. The ecological transition does not play a significant normative role in EU data legislation and is largely overlooked in the data governance literature. By drawing inferences from the qualification of data as an infrastructure more consistently, the article opens avenues for data governance that centre the ecological transition as a normative goal…(More)”.

We don’t need an AI manifesto — we need a constitution


Article by Vivienne Ming: “Loans drive economic mobility in America, even as they’ve been a historically powerful tool for discrimination. I’ve worked on multiple projects to reduce that bias using AI. What I learnt, however, is that even if an algorithm works exactly as intended, it is still solely designed to optimise the financial returns to the lender who paid for it. The loan application process is already impenetrable to most, and now your hopes for home ownership or small business funding are dying in a 50-millisecond computation…

In law, the right to a lawyer and judicial review are a constitutional guarantee in the US and an established civil right throughout much of the world. These are the foundations of your civil liberties. When algorithms act as an expert witness, testifying against you but immune to cross examination, these rights are not simply eroded — they cease to exist.

People aren’t perfect. Neither ethics training for AI engineers nor legislation by woefully uninformed politicians can change that simple truth. I don’t need to assume that Big Tech chief executives are bad actors or that large companies are malevolent to understand that what is in their self-interest is not always in mine. The framers of the US Constitution recognised this simple truth and sought to leverage human nature for a greater good. The Constitution didn’t simply assume people would always act towards that greater good. Instead it defined a dynamic mechanism — self-interest and the balance of power — that would force compromise and good governance. Its vision of treating people as real actors rather than better angels produced one of the greatest frameworks for governance in history.

Imagine you were offered an AI-powered test for post-partum depression. My company developed that very test and it has the power to change your life, but you may choose not to use it for fear that we might sell the results to data brokers or activist politicians. You have a right to our AI acting solely for your health. It was for this reason I founded an independent non-profit, The Human Trust, that holds all of the data and runs all of the algorithms with sole fiduciary responsibility to you. No mother should have to choose between a life-saving medical test and her civil rights…(More)”.

Establish Data Collaboratives To Foster Meaningful Public Involvement


Article by Gwen Ottinger: “Federal agencies are striving to expand the role of the public, including members of marginalized communities, in developing regulatory policy. At the same time, agencies are considering how to mobilize data of increasing size and complexity to ensure that policies are equitable and evidence-based. However, community engagement has rarely been extended to the process of examining and interpreting data. This is a missed opportunity: community members can offer critical context to quantitative data, ground-truth data analyses, and suggest ways of looking at data that could inform policy responses to pressing problems in their lives. Realizing this opportunity requires a structure for public participation in which community members can expect both support from agency staff in accessing and understanding data and genuine openness to new perspectives on quantitative analysis. 

To deepen community involvement in developing evidence-based policy, federal agencies should form Data Collaboratives in which staff and members of the public engage in mutual learning about available datasets and their affordances for clarifying policy problems…(More)”.