data collaboratives

Using internet search data as part of medical research

Curated on September 7, 2024September 7, 2024 by Stefaan Verhulst

Blog by Susan Thomas and Matthew Thompson: “…In the UK, almost 50 million health-related searches are made using Google per year. Globally there are 100s of millions of health-related searches every day. And, of course, people are doing these searches in real-time, looking for answers to their concerns in the moment. It’s also possible that, even if people aren’t noticing and searching about changes to their health, their behaviour is changing. Maybe they are searching more at night because they are having difficulty sleeping or maybe they are spending more (or less) time online. Maybe an individual’s search history could actually be really useful for researchers. This realisation has led medical researchers to start to explore whether individuals’ online search activity could help provide those subtle, almost unnoticeable signals that point to the beginning of a serious illness.

Our recent review found 23 studies have been published so far that have done exactly this. These studies suggest that online search activity among people later diagnosed with a variety of conditions ranging from pancreatic cancer and stroke to mood disorders, was different to people who did not have one of these conditions.

One of these studies was published by researchers at Imperial College London, who used online search activity to identify signals of women with gynaecological malignancies. They found that women with malignant (e.g. ovarian cancer) and benign conditions had different search patterns, up to two months prior to a GP referral.

Pause for a moment, and think about what this could mean. Ovarian cancer is one of the most devastating cancers women get. It’s desperately hard to detect early – and yet there are signals of this cancer visible in women’s internet searches months before diagnosis?…(More)”.

Data sovereignty for local governments. Considerations and enablers

Curated on September 2, 2024September 2, 2024 by Stefaan Verhulst

Report by JRC Data sovereignty for local governments refers to a capacity to control and/or access data, and to foster a digital transformation aligned with societal values and EU Commission political priorities. Data sovereignty clauses are an instrument that local governments may use to compel companies to share data of public interest. Albeit promising, little is known about the peculiarities of this instrument and how it has been implemented so far. This policy brief aims at filling the gap by systematising existing knowledge and providing policy-relevant recommendations for its wider implementation…(More)”.

Community consent: neither a ceiling nor a floor

Curated on July 21, 2024July 21, 2024 by Stefaan Verhulst

Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata.

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

Ensuring that information about the research project spreads throughout the community,
Removing potential barriers that might be created by resistance from community members,
Alleviating the possible concerns of individuals about the perspectives of community leaders, and
Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.

The Data That Powers A.I. Is Disappearing Fast

Curated on July 21, 2024July 21, 2024 by Stefaan Verhulst

Article by Kevin Roose: “For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up.

Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.

The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites’ terms of service.

“We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.

Data is the main ingredient in today’s generative A.I. systems, which are fed billions of examples of text, images and videos. Much of that data is scraped from public websites by researchers and compiled in large data sets, which can be downloaded and freely used, or supplemented with data from other sources…(More)”.

Exploring Digital Biomarkers for Depression Using Mobile Technology

Curated on July 8, 2024July 10, 2024 by Stefaan Verhulst

Paper by Yuezhou Zhang et al: “With the advent of ubiquitous sensors and mobile technologies, wearables and smartphones offer a cost-effective means for monitoring mental health conditions, particularly depression. These devices enable the continuous collection of behavioral data, providing novel insights into the daily manifestations of depressive symptoms.

We found several significant links between depression severity and various behavioral biomarkers: elevated depression levels were associated with diminished sleep quality (assessed through Fitbit metrics), reduced sociability (approximated by Bluetooth), decreased levels of physical activity (quantified by step counts and GPS data), a slower cadence of daily walking (captured by smartphone accelerometers), and disturbances in circadian rhythms (analyzed across various data streams).
Leveraging digital biomarkers for assessing and continuously monitoring depression introduces a new paradigm in early detection and development of customized intervention strategies. Findings from these studies not only enhance our comprehension of depression in real-world settings but also underscore the potential of mobile technologies in the prevention and management of mental health issues…(More)”

Designing an Effective Governance Model for Data Collaboratives

Curated on July 4, 2024July 4, 2024 by Stefaan Verhulst

Paper by Federico Bartolomucci & Francesco Leoni: “Data Collaboratives have gained traction as interorganizational partnerships centered on data exchange. They enhance the collective capacity of responding to contemporary societal challenges using data, while also providing participating organizations with innovation capabilities and reputational benefits. Unfortunately, data collaboratives often fail to advance beyond the pilot stage and are therefore limited in their capacity to deliver systemic change. The governance setting adopted by a data collaborative affects how it acts over the short and long term. We present a governance design model to develop context-dependent data collaboratives. Practitioners can use the proposed model and list of key reflective questions to evaluate the critical aspects of designing a governance model for their data collaboratives…(More)”.

The 4M Roadmap: A Higher Road to Profitability by Using Big Data for Social Good

Curated on June 25, 2024June 25, 2024 by Stefaan Verhulst

Report by Brennan Lake: “As the private sector faces conflicting pressures to either embrace or shun socially responsible practices, companies with privately held big-data assets must decide whether to share access to their data for public good. While some managers object to data sharing over concerns of privacy and product cannibalization, others launch well intentioned yet short-lived CSR projects that fail to deliver on lofty goals.

By embedding Shared-Value principles into ‘Data-for-Good’ programs, data-rich firms can launch responsible data-sharing initiatives that minimize risk, deliver sustained impact, and improve overall competitiveness in the process.

The 4M Roadmap by Brennan Lake, a Big-Data and Social Impact professional, guides managers to adopt a ‘Data-for-Good’ model that emphasizes four key pillars of value-creation: Mission, Messaging, Methods, and Monetization. Through deep analysis and private-sector case studies, The 4M Roadmap demonstrates how companies can engage in responsible data sharing to benefit society and business alike…(More)”.

A New National Purpose: Harnessing Data for Health

Curated on May 25, 2024May 29, 2024 by Stefaan Verhulst

Report by the Tony Blair Institute: “We are at a pivotal moment where the convergence of large health and biomedical data sets, artificial intelligence and advances in biotechnology is set to revolutionise health care, drive economic growth and improve the lives of citizens. And the UK has strengths in all three areas. The immense potential of the UK’s health-data assets, from the NHS to biobanks and genomics initiatives, can unlock new diagnostics and treatments, deliver better and more personalised care, prevent disease and ultimately help people live longer, healthier lives.

However, realising this potential is not without its challenges. The complex and fragmented nature of the current health-data landscape, coupled with legitimate concerns around privacy and public trust, has made for slow progress. The UK has had a tendency to provide short-term funding across multiple initiatives, which has led to an array of individual projects – many of which have struggled to achieve long-term sustainability and deliver tangible benefits to patients.

To overcome these challenges, it will be necessary to be bold and imaginative. We must look for ways to leverage the unique strengths of the NHS, such as its nationwide reach and cradle-to-grave data coverage, to create a health-data ecosystem that is much more than the sum of its many parts. This will require us to think differently about how we collect, manage and utilise health data, and to create new partnerships and models of collaboration that break down traditional silos and barriers. It will mean treating data as a key health resource and managing it accordingly.

One model to do this is the proposed sovereign National Data Trust (NDT) – an endeavour to streamline access to and curation of the UK’s valuable health-data assets…(More)”.

On the Meaning of Community Consent in a Biorepository Context

Curated on May 21, 2024May 21, 2024 by Stefaan Verhulst

Article by Astha Kapoor, Samuel Moore, and Megan Doerr: “Biorepositories, vital for medical research, collect and store human biological samples and associated data for future use. However, our reliance solely on the individual consent of data contributors for biorepository data governance is becoming inadequate. Big data analysis focuses on large-scale behaviors and patterns, shifting focus from singular data points to identifying data “journeys” relevant to a collective. The individual becomes a small part of the analysis, with the harms and benefits emanating from the data occurring at an aggregated level.

Community refers to a particular qualitative aspect of a group of people that is not well captured by quantitative measures in biorepositories. This is not an excuse to dodge the question of how to account for communities in a biorepository context; rather, it shows that a framework is needed for defining different types of community that may be approached from a biorepository perspective.

Engaging with communities in biorepository governance presents several challenges. Moving away from a purely individualized understanding of governance towards a more collectivizing approach necessitates an appreciation of the messiness of group identity, its ephemerality, and the conflicts entailed therein. So while community implies a certain degree of homogeneity (i.e., that all members of a community share something in common), it is important to understand that people can simultaneously consider themselves a member of a community while disagreeing with many of its members, the values the community holds, or the positions for which it advocates. The complex nature of community participation therefore requires proper treatment for it to be useful in a biorepository governance context…(More)”.

Data governance for the ecological transition: An infrastructure perspective

Curated on May 19, 2024May 22, 2024 by Stefaan Verhulst

Article by Charlotte Ducuing: “This article uses infrastructure studies to provide a critical analysis of the European Union’s (EU) ambition to regulate data for the ecological transition. The EU’s regulatory project implicitly qualifies data as an infrastructure for a better economy and society. However, current EU law does not draw all the logical consequences derived from this qualification of data as infrastructure, which is one main reason why EU data legislation for the ecological transition may not deliver on its high political expectations. The ecological transition does not play a significant normative role in EU data legislation and is largely overlooked in the data governance literature. By drawing inferences from the qualification of data as an infrastructure more consistently, the article opens avenues for data governance that centre the ecological transition as a normative goal…(More)”.