Patients are Pooling Data to Make Diabetes Research More Representative


Blog by Tracy Kariuki: “Saira Khan-Gallo knows how overwhelming managing and living healthily with diabetes can be. As a person living with type 1 diabetes for over two decades, she understands how tracking glucose levels, blood pressure, blood cholesterol, insulin intake, and, and, and…could all feel like drowning in an infinite pool of numbers.

But that doesn’t need to be the case. This is why Tidepool, a non-profit tech organization composed of caregivers and other people living with diabetes such as Gallo, is transforming diabetes data management. Its data visualization platform enables users to make sense of the data and derive insights into their health status….

Through its Big Data Donation Project, Tidepool has been supporting the advancement of diabetes research by sharing anonymized data from people living with diabetes with researchers.

To date, more than 40,000 individuals have chosen to donate data uploaded from their diabetes devices like blood glucose meters, insulin pumps and continuous glucose monitors, which is then shared by Tidepool with students, academics, researchers, and industry partners — Making the database larger than many clinical trials. For instance, Oregon Health and Science University have used datasets collected from Tidepool to build an algorithm that predicts hypoglycemia, which is low blood sugar, with the goal of advancing closed loop therapy for diabetes management…(More)”.

What prevents us from reusing medical real-world data in research


Paper by Julia Gehrmann, Edit Herczog, Stefan Decker & Oya Beyan: “Recent studies show that Medical Data Science (MDS) carries great potential to improve healthcare. Thereby, considering data from several medical areas and of different types, i.e. using multimodal data, significantly increases the quality of the research results. On the other hand, the inclusion of more features in an MDS analysis means that more medical cases are required to represent the full range of possible feature combinations in a quantity that would be sufficient for a meaningful analysis. Historically, data acquisition in medical research applies prospective data collection, e.g. in clinical studies. However, prospectively collecting the amount of data needed for advanced multimodal data analyses is not feasible for two reasons. Firstly, such a data collection process would cost an enormous amount of money. Secondly, it would take decades to generate enough data for longitudinal analyses, while the results are needed now. A worthwhile alternative is using real-world data (RWD) from clinical systems of e.g. university hospitals. This data is immediately accessible in large quantities, providing full flexibility in the choice of the analyzed research questions. However, when compared to prospectively curated data, medical RWD usually lacks quality due to the specificities of medical RWD outlined in section 2. The reduced quality makes its preparation for analysis more challenging…(More)”.

AI tools are designing entirely new proteins that could transform medicine


Article by Ewen Callaway: “OK. Here we go.” David Juergens, a computational chemist at the University of Washington (UW) in Seattle, is about to design a protein that, in 3-billion-plus years of tinkering, evolution has never produced.

On a video call, Juergens opens a cloud-based version of an artificial intelligence (AI) tool he helped to develop, called RFdiffusion. This neural network, and others like it, are helping to bring the creation of custom proteins — until recently a highly technical and often unsuccessful pursuit — to mainstream science.

These proteins could form the basis for vaccines, therapeutics and biomaterials. “It’s been a completely transformative moment,” says Gevorg Grigoryan, the co-founder and chief technical officer of Generate Biomedicines in Somerville, Massachusetts, a biotechnology company applying protein design to drug development.

The tools are inspired by AI software that synthesizes realistic images, such as the Midjourney software that, this year, was famously used to produce a viral image of Pope Francis wearing a designer white puffer jacket. A similar conceptual approach, researchers have found, can churn out realistic protein shapes to criteria that designers specify — meaning, for instance, that it’s possible to speedily draw up new proteins that should bind tightly to another biomolecule. And early experiments show that when researchers manufacture these proteins, a useful fraction do perform as the software suggests.

The tools have revolutionized the process of designing proteins in the past year, researchers say. “It is an explosion in capabilities,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City, whose team has developed one such tool for protein design. “You can now create designs that have sought-after qualities.”

“You’re building a protein structure customized for a problem,” says David Baker, a computational biophysicist at UW whose group, which includes Juergens, developed RFdiffusion. The team released the software in March 2023, and a paper describing the neural network appears this week in Nature1. (A preprint version was released in late 2022, at around the same time that several other teams, including AlQuraishi’s2 and Grigoryan’s3, reported similar neural networks)…(More)”.

COVID-19 digital contact tracing worked — heed the lessons for future pandemics


Article by Marcel Salathé: “During the first year of the COVID-19 pandemic, around 50 countries deployed digital contact tracing. When someone tested positive for SARS-CoV-2, anyone who had been in close proximity to that person (usually for 15 minutes or more) would be notified as long as both individuals had installed the contact-tracing app on their devices.

Digital contact tracing received much media attention, and much criticism, in that first year. Many worried that the technology provided a way for governments and technology companies to have even more control over people’s lives than they already do. Others dismissed the apps as a failure, after public-health authorities hit problems in deploying them.

Three years on, the data tell a different story.

The United Kingdom successfully integrated a digital contact-tracing app with other public-health programmes and interventions, and collected data to assess the app’s effectiveness. Several analyses now show that, even with the challenges of introducing a new technology during an emergency, and despite relatively low uptake, the app saved thousands of lives. It has also become clearer that many of the problems encountered elsewhere were not to do with the technology itself, but with integrating a twenty-first-century technology into what are largely twentieth-century public-health infrastructures…(More)”.

Non-traditional data sources in obesity research: a systematic review of their use in the study of obesogenic environments


Paper by Julia Mariel Wirtz Baker, Sonia Alejandra Pou, Camila Niclis, Eugenia Haluszka & Laura Rosana Aballay: “The field of obesity epidemiology has made extensive use of traditional data sources, such as health surveys and reports from official national statistical systems, whose variety of data can be at times limited to explore a wider range of determinants relevant to obesity. Over time, other data sources began to be incorporated into obesity research, such as geospatial data (web mapping platforms, satellite imagery, and other databases embedded in Geographic Information Systems), social network data (such as Twitter, Facebook, Instagram, or other social networks), digital device data and others. The data revolution, facilitated by the massive use of digital devices with hundreds of millions of users and the emergence of the “Internet of Things” (IoT), has generated huge volumes of data from everywhere: customers, social networks and sensors, in addition to all the traditional sources mentioned above. In the research area, it offers fruitful opportunities, contributing in ways that traditionally sourced research data could not.

An international expert panel in obesity and big data pointed out some key factors in the definition of Big Data, stating that “it is always digital, has a large sample size, and a large volume or variety or velocity of variables that require additional computing power, as well as specialist skills in computer programming, database management and data science analytics”. Our interpretation of non-traditional data sources is an approximation to this definition, assuming that they are sources not traditionally used in obesity epidemiology and environmental studies, which can include digital devices, social media and geospatial data within a GIS, the latter mainly based on complex indexes that require advanced data analysis techniques and expertise.

Beyond the still discussed limitations, Big Data can be assumed as a great opportunity to improve the study of obesogenic environments, since it has been announced as a powerful resource that can provide new knowledge about human behaviour and social phenomena. Besides, it can contribute to the formulation and evaluation of policies and the development of interventions for obesity prevention. However, in this field of research, the suitability of these novel data sources is still a subject of considerable discussion, and their use has not been investigated from the obesogenic environment approach…(More)”.

Health Care Data Is a Researcher’s Gold Mine


Article by James O’Shaughnessy: “The UK’s National Health Service should aim to become the world’s leading platform for health research and development. We’ve seen some great examples of the potential we have for world-class research during the pandemic, with examples like the RECOVERY trial and the Covid vaccine platform, and since then through the partnerships with Moderna, Grail, and BioNTech. However, these examples of partnership with industry are often ad hoc arrangements. In general, funding and prestige are concentrated on research labs and early-phase trials, but when it comes to helping health care companies through the commercialization stages of their products, both public and private sector funding is much harder to access. This makes it hard for startups partnering with the NHS to scale their products and sell them on the domestic and international markets.

Instead, we need a systematic approach to leverage our strengths, such as the scale of the NHS, the diversity of our population, and the deep patient phenotyping that our data assets enable. That will give us the opportunity to generate vast amounts of real-world data about health care drugs and technologies—like pricing, performance, and safety—that can prepare companies to scale their innovations and go to market.

To achieve that, there are obstacles to overcome. For instance, setting up research projects is incredibly time-consuming. We have very bureaucratic processes that make the UK one of the slowest places in Europe to set up research studies.

Patients need more access to research. However, there’s really poor information at the moment about where clinical trials are taking place in the country and what kind of patients they are recruiting. We need a clinical clinicaltrials.gov.uk website to give that sort of information.

There’s a significant problem when it comes to the question of patient consent to participate in a R&D. Legally, unless patients have said explicitly that they want to be approached for a research project or a clinical trial, they can’t be contacted for that purpose. The catch-22 is that, of course, most patients are not aware of this, and you can’t legally contact them to inform them. We need to allow ethically approved researchers to proactively approach people to take part in studies which might be of benefit to them…(More)”.

Gamifying medical data labeling to advance AI


Article by Zach Winn: “…Duhaime began exploring ways to leverage collective intelligence to improve medical diagnoses. In one experiment, he trained groups of lay people and medical school students that he describes as “semiexperts” to classify skin conditions, finding that by combining the opinions of the highest performers he could outperform professional dermatologists. He also found that by combining algorithms trained to detect skin cancer with the opinions of experts, he could outperform either method on its own….The DiagnosUs app, which Duhaime developed with Centaur co-founders Zach Rausnitz and Tom Gellatly, is designed to help users test and improve their skills. Duhaime says about half of users are medical school students and the other half are mostly doctors, nurses, and other medical professionals…

The approach stands in sharp contrast to traditional data labeling and AI content moderation, which are typically outsourced to low-resource countries.

Centaur’s approach produces accurate results, too. In a paper with researchers from Brigham and Women’s Hospital, Massachusetts General Hospital (MGH), and Eindhoven University of Technology, Centaur showed its crowdsourced opinions labeled lung ultrasounds as reliably as experts did…

Centaur has found that the best performers come from surprising places. In 2021, to collect expert opinions on EEG patterns, researchers held a contest through the DiagnosUs app at a conference featuring about 50 epileptologists, each with more than 10 years of experience. The organizers made a custom shirt to give to the contest’s winner, who they assumed would be in attendance at the conference.

But when the results came in, a pair of medical students in Ghana, Jeffery Danquah and Andrews Gyabaah, had beaten everyone in attendance. The highest-ranked conference attendee had come in ninth…(More)”

Fighting poverty with synthetic data


Article by Jack Gisby, Anna Kiknadze, Thomas Mitterling, and Isabell Roitner-Fransecky: “If you have ever used a smartwatch or other wearable tech to track your steps, heart rate, or sleep, you are part of the “quantified self” movement. You are voluntarily submitting millions of intimate data points for collection and analysis. The Economist highlighted the benefits of good quality personal health and wellness data—increased physical activity, more efficient healthcare, and constant monitoring of chronic conditions. However, not everyone is enthusiastic about this trend. Many fear corporations will use the data to discriminate against the poor and vulnerable. For example, insurance firms could exclude patients based on preconditions obtained from personal data sharing.

Can we strike a balance between protecting the privacy of individuals and gathering valuable information? This blog explores applying a synthetic populations approach in New York City,  a city with an established reputation for using big data approaches to support urban management, including for welfare provisions and targeted policy interventions.

To better understand poverty rates at the census tract level, World Data Lab, with the support of the Sloan Foundation, generated a synthetic population based on the borough of Brooklyn. Synthetic populations rely on a combination of microdata and summary statistics:

  • Microdata consists of personal information at the individual level. In the U.S., such data is available at the Public Use Microdata Area (PUMA) level. PUMA are geographic areas partitioning the state, containing no fewer than 100,000 people each. However, due to privacy concerns, microdata is unavailable at the more granular census tract level. Microdata consists of both household and individual-level information, including last year’s household income, the household size, the number of rooms, and the age, sex, and educational attainment of each individual living in the household.
  • Summary statistics are based on populations rather than individuals and are available at the census tract level, given that there are fewer privacy concerns. Census tracts are small statistical subdivisions of a county, averaging about 4,000 inhabitants. In New York City, a census tract roughly equals a building block. Similar to microdata, summary statistics are available for individuals and households. On the census tract level, we know the total population, the corresponding demographic breakdown, the number of households within different income brackets, the number of households by number of rooms, and other similar variables…(More)”.

When What’s Right Is Also Wrong: The Pandemic As A Corporate Social Responsibility Paradox


Article by Heidi Reed: “When the COVID-19 pandemic first hit, businesses were faced with difficult decisions where making the ‘right choice’ just wasn’t possible. For example, if a business chose to shut down, it might protect employees from catching COVID, but at the same time, it would leave them without a paycheck. This was particularly true in the U.S. where the government played a more limited role in regulating business behavior, leaving managers and owners to make hard choices.

In this way, the pandemic is a societal paradox in which the social objectives of public health and economic prosperity are both interdependent and contradictory. How does the public judge businesses then when they make decisions favoring one social objective over another? To answer this question, I qualitatively surveyed the American public at the start of the COVID-19 crisis about what they considered to be responsible and irresponsible business behavior in response to the pandemic. Analyzing their answers led me to create the 4R Model of Moral Sensemaking of Competing Social Problems.

The 4R Model relies on two dimensions: the extent to which people prioritize one social problem over another and the extent to which they exhibit psychological discomfort (i.e. cognitive dissonance). In the first mode, Reconcile, people view the problems as compatible. There is no need to prioritize then and no resulting dissonance. These people think, “Businesses can just convert to making masks to help the cause and still make a profit.”

The second mode, Resign, similarly does not prioritize one problem over another; however, the problems are seen as competing, suggesting a high level of cognitive dissonance. These people might say, “It’s dangerous to stay open, but if the business closes, people will lose their jobs. Both decisions are bad.”

In the third mode, Ranking, people use prioritizing to reduce cognitive dissonance. These people say things like, “I understand people will be fired, but it’s more important to stop the virus.”

In the fourth and final mode, Rectify, people start by ranking but show signs of lingering dissonance as they acknowledge the harm created by prioritizing one problem over another. Unlike with the Resign mode, they try to find ways to reduce this harm. A common response in this mode would be, “Businesses should shut down, but they should also try to help employees file for unemployment.”

The 4R model has strong implications for other grand challenges where there may be competing social objectives such as in addressing climate change. To this end, the typology helps corporate social responsibility (CSR) decision-makers understand how they may be judged when businesses are forced to re- or de-prioritize CSR dimensions. In other words, it helps us understand how people make moral sense of business behavior when the right thing to do is paradoxically also the wrong thing…(More)”

Local Data Spaces: Leveraging trusted research environments for secure location-based policy research


Paper by Jacob L. Macdonald, Mark A. Green, Maurizio Gibin, Simon Leech, Alex Singleton and Paul Longely: “This work explores the use of Trusted Research Environments for the secure analysis of sensitive, record-level data on local coronavirus disease-2019 (COVID-19) inequalities and economic vulnerabilities. The Local Data Spaces (LDS) project was a targeted rapid response and cross-disciplinary collaborative initiative using the Office for National Statistics’ Secure Research Service for localized comparison and analysis of health and economic outcomes over the course of the COVID-19 pandemic. Embedded researchers worked on co-producing a range of locally focused insights and reports built on secure secondary data and made appropriately open and available to the public and all local stakeholders for wider use. With secure infrastructure and overall data governance practices in place, accredited researchers were able to access a wealth of detailed data and resources to facilitate more targeted local policy analysis. Working with data within such infrastructure as part of a larger research project involved advanced planning and coordination to be efficient. As new and novel granular data resources become securely available (e.g., record-level administrative digital health records or consumer data), a range of local policy insights can be gained across issues of public health or local economic vitality. Many of these new forms of data however often come with a large degree of sensitivity around issues of personal identifiability and how the data is used for public-facing research and require secure and responsible use. Learning to work appropriately with secure data and research environments can open up many avenues for collaboration and analysis…(More)”