Non-traditional data sources in obesity research: a systematic review of their use in the study of obesogenic environments


Paper by Julia Mariel Wirtz Baker, Sonia Alejandra Pou, Camila Niclis, Eugenia Haluszka & Laura Rosana Aballay: “The field of obesity epidemiology has made extensive use of traditional data sources, such as health surveys and reports from official national statistical systems, whose variety of data can be at times limited to explore a wider range of determinants relevant to obesity. Over time, other data sources began to be incorporated into obesity research, such as geospatial data (web mapping platforms, satellite imagery, and other databases embedded in Geographic Information Systems), social network data (such as Twitter, Facebook, Instagram, or other social networks), digital device data and others. The data revolution, facilitated by the massive use of digital devices with hundreds of millions of users and the emergence of the “Internet of Things” (IoT), has generated huge volumes of data from everywhere: customers, social networks and sensors, in addition to all the traditional sources mentioned above. In the research area, it offers fruitful opportunities, contributing in ways that traditionally sourced research data could not.

An international expert panel in obesity and big data pointed out some key factors in the definition of Big Data, stating that “it is always digital, has a large sample size, and a large volume or variety or velocity of variables that require additional computing power, as well as specialist skills in computer programming, database management and data science analytics”. Our interpretation of non-traditional data sources is an approximation to this definition, assuming that they are sources not traditionally used in obesity epidemiology and environmental studies, which can include digital devices, social media and geospatial data within a GIS, the latter mainly based on complex indexes that require advanced data analysis techniques and expertise.

Beyond the still discussed limitations, Big Data can be assumed as a great opportunity to improve the study of obesogenic environments, since it has been announced as a powerful resource that can provide new knowledge about human behaviour and social phenomena. Besides, it can contribute to the formulation and evaluation of policies and the development of interventions for obesity prevention. However, in this field of research, the suitability of these novel data sources is still a subject of considerable discussion, and their use has not been investigated from the obesogenic environment approach…(More)”.

Health Care Data Is a Researcher’s Gold Mine


Article by James O’Shaughnessy: “The UK’s National Health Service should aim to become the world’s leading platform for health research and development. We’ve seen some great examples of the potential we have for world-class research during the pandemic, with examples like the RECOVERY trial and the Covid vaccine platform, and since then through the partnerships with Moderna, Grail, and BioNTech. However, these examples of partnership with industry are often ad hoc arrangements. In general, funding and prestige are concentrated on research labs and early-phase trials, but when it comes to helping health care companies through the commercialization stages of their products, both public and private sector funding is much harder to access. This makes it hard for startups partnering with the NHS to scale their products and sell them on the domestic and international markets.

Instead, we need a systematic approach to leverage our strengths, such as the scale of the NHS, the diversity of our population, and the deep patient phenotyping that our data assets enable. That will give us the opportunity to generate vast amounts of real-world data about health care drugs and technologies—like pricing, performance, and safety—that can prepare companies to scale their innovations and go to market.

To achieve that, there are obstacles to overcome. For instance, setting up research projects is incredibly time-consuming. We have very bureaucratic processes that make the UK one of the slowest places in Europe to set up research studies.

Patients need more access to research. However, there’s really poor information at the moment about where clinical trials are taking place in the country and what kind of patients they are recruiting. We need a clinical clinicaltrials.gov.uk website to give that sort of information.

There’s a significant problem when it comes to the question of patient consent to participate in a R&D. Legally, unless patients have said explicitly that they want to be approached for a research project or a clinical trial, they can’t be contacted for that purpose. The catch-22 is that, of course, most patients are not aware of this, and you can’t legally contact them to inform them. We need to allow ethically approved researchers to proactively approach people to take part in studies which might be of benefit to them…(More)”.

Gamifying medical data labeling to advance AI


Article by Zach Winn: “…Duhaime began exploring ways to leverage collective intelligence to improve medical diagnoses. In one experiment, he trained groups of lay people and medical school students that he describes as “semiexperts” to classify skin conditions, finding that by combining the opinions of the highest performers he could outperform professional dermatologists. He also found that by combining algorithms trained to detect skin cancer with the opinions of experts, he could outperform either method on its own….The DiagnosUs app, which Duhaime developed with Centaur co-founders Zach Rausnitz and Tom Gellatly, is designed to help users test and improve their skills. Duhaime says about half of users are medical school students and the other half are mostly doctors, nurses, and other medical professionals…

The approach stands in sharp contrast to traditional data labeling and AI content moderation, which are typically outsourced to low-resource countries.

Centaur’s approach produces accurate results, too. In a paper with researchers from Brigham and Women’s Hospital, Massachusetts General Hospital (MGH), and Eindhoven University of Technology, Centaur showed its crowdsourced opinions labeled lung ultrasounds as reliably as experts did…

Centaur has found that the best performers come from surprising places. In 2021, to collect expert opinions on EEG patterns, researchers held a contest through the DiagnosUs app at a conference featuring about 50 epileptologists, each with more than 10 years of experience. The organizers made a custom shirt to give to the contest’s winner, who they assumed would be in attendance at the conference.

But when the results came in, a pair of medical students in Ghana, Jeffery Danquah and Andrews Gyabaah, had beaten everyone in attendance. The highest-ranked conference attendee had come in ninth…(More)”

Fighting poverty with synthetic data


Article by Jack Gisby, Anna Kiknadze, Thomas Mitterling, and Isabell Roitner-Fransecky: “If you have ever used a smartwatch or other wearable tech to track your steps, heart rate, or sleep, you are part of the “quantified self” movement. You are voluntarily submitting millions of intimate data points for collection and analysis. The Economist highlighted the benefits of good quality personal health and wellness data—increased physical activity, more efficient healthcare, and constant monitoring of chronic conditions. However, not everyone is enthusiastic about this trend. Many fear corporations will use the data to discriminate against the poor and vulnerable. For example, insurance firms could exclude patients based on preconditions obtained from personal data sharing.

Can we strike a balance between protecting the privacy of individuals and gathering valuable information? This blog explores applying a synthetic populations approach in New York City,  a city with an established reputation for using big data approaches to support urban management, including for welfare provisions and targeted policy interventions.

To better understand poverty rates at the census tract level, World Data Lab, with the support of the Sloan Foundation, generated a synthetic population based on the borough of Brooklyn. Synthetic populations rely on a combination of microdata and summary statistics:

  • Microdata consists of personal information at the individual level. In the U.S., such data is available at the Public Use Microdata Area (PUMA) level. PUMA are geographic areas partitioning the state, containing no fewer than 100,000 people each. However, due to privacy concerns, microdata is unavailable at the more granular census tract level. Microdata consists of both household and individual-level information, including last year’s household income, the household size, the number of rooms, and the age, sex, and educational attainment of each individual living in the household.
  • Summary statistics are based on populations rather than individuals and are available at the census tract level, given that there are fewer privacy concerns. Census tracts are small statistical subdivisions of a county, averaging about 4,000 inhabitants. In New York City, a census tract roughly equals a building block. Similar to microdata, summary statistics are available for individuals and households. On the census tract level, we know the total population, the corresponding demographic breakdown, the number of households within different income brackets, the number of households by number of rooms, and other similar variables…(More)”.

When What’s Right Is Also Wrong: The Pandemic As A Corporate Social Responsibility Paradox


Article by Heidi Reed: “When the COVID-19 pandemic first hit, businesses were faced with difficult decisions where making the ‘right choice’ just wasn’t possible. For example, if a business chose to shut down, it might protect employees from catching COVID, but at the same time, it would leave them without a paycheck. This was particularly true in the U.S. where the government played a more limited role in regulating business behavior, leaving managers and owners to make hard choices.

In this way, the pandemic is a societal paradox in which the social objectives of public health and economic prosperity are both interdependent and contradictory. How does the public judge businesses then when they make decisions favoring one social objective over another? To answer this question, I qualitatively surveyed the American public at the start of the COVID-19 crisis about what they considered to be responsible and irresponsible business behavior in response to the pandemic. Analyzing their answers led me to create the 4R Model of Moral Sensemaking of Competing Social Problems.

The 4R Model relies on two dimensions: the extent to which people prioritize one social problem over another and the extent to which they exhibit psychological discomfort (i.e. cognitive dissonance). In the first mode, Reconcile, people view the problems as compatible. There is no need to prioritize then and no resulting dissonance. These people think, “Businesses can just convert to making masks to help the cause and still make a profit.”

The second mode, Resign, similarly does not prioritize one problem over another; however, the problems are seen as competing, suggesting a high level of cognitive dissonance. These people might say, “It’s dangerous to stay open, but if the business closes, people will lose their jobs. Both decisions are bad.”

In the third mode, Ranking, people use prioritizing to reduce cognitive dissonance. These people say things like, “I understand people will be fired, but it’s more important to stop the virus.”

In the fourth and final mode, Rectify, people start by ranking but show signs of lingering dissonance as they acknowledge the harm created by prioritizing one problem over another. Unlike with the Resign mode, they try to find ways to reduce this harm. A common response in this mode would be, “Businesses should shut down, but they should also try to help employees file for unemployment.”

The 4R model has strong implications for other grand challenges where there may be competing social objectives such as in addressing climate change. To this end, the typology helps corporate social responsibility (CSR) decision-makers understand how they may be judged when businesses are forced to re- or de-prioritize CSR dimensions. In other words, it helps us understand how people make moral sense of business behavior when the right thing to do is paradoxically also the wrong thing…(More)”

Local Data Spaces: Leveraging trusted research environments for secure location-based policy research


Paper by Jacob L. Macdonald, Mark A. Green, Maurizio Gibin, Simon Leech, Alex Singleton and Paul Longely: “This work explores the use of Trusted Research Environments for the secure analysis of sensitive, record-level data on local coronavirus disease-2019 (COVID-19) inequalities and economic vulnerabilities. The Local Data Spaces (LDS) project was a targeted rapid response and cross-disciplinary collaborative initiative using the Office for National Statistics’ Secure Research Service for localized comparison and analysis of health and economic outcomes over the course of the COVID-19 pandemic. Embedded researchers worked on co-producing a range of locally focused insights and reports built on secure secondary data and made appropriately open and available to the public and all local stakeholders for wider use. With secure infrastructure and overall data governance practices in place, accredited researchers were able to access a wealth of detailed data and resources to facilitate more targeted local policy analysis. Working with data within such infrastructure as part of a larger research project involved advanced planning and coordination to be efficient. As new and novel granular data resources become securely available (e.g., record-level administrative digital health records or consumer data), a range of local policy insights can be gained across issues of public health or local economic vitality. Many of these new forms of data however often come with a large degree of sensitivity around issues of personal identifiability and how the data is used for public-facing research and require secure and responsible use. Learning to work appropriately with secure data and research environments can open up many avenues for collaboration and analysis…(More)”

Opportunities and Challenges in Reusing Public Genomics Data


Introduction to Special Issue by Mahmoud Ahmed and Deok Ryong Kim: “Genomics data is accumulating in public repositories at an ever-increasing rate. Large consortia and individual labs continue to probe animal and plant tissue and cell cultures, generating vast amounts of data using established and novel technologies. The human genome project kickstarted the era of systems biology (1, 2). Ambitious projects followed to characterize non-coding regions, variations across species, and between populations (3, 4, 5). The cost reduction allowed individual labs to generate numerous smaller high-throughput datasets (6, 7, 8, 9). As a result, the scientific community should consider strategies to overcome the challenges and maximize the opportunities to use these resources for research and the public good. In this collection, we will elicit opinions and perspectives from researchers in the field on the opportunities and challenges of reusing public genomics data. The articles in this research topic converge on the need for data sharing while acknowledging the challenges that come with it. Two articles defined and highlighted the distinction between data and metadata. The characteristic of each should be considered when designing optimal sharing strategies. One article focuses on the specific issues surrounding the sharing of genomics interval data, and another on balancing the need for protecting pediatric rights and the sharing benefits.

The definition of what counts as data is itself a moving target. As technology advances, data can be produced in more ways and from novel sources. Events of recent years have highlighted this fact. “The pandemic has underscored the urgent need to recognize health data as a global public good with mechanisms to facilitate rapid data sharing and governance,” Schwalbe and colleagues (2020). The challenges facing these mechanisms could be technical, economic, legal, or political. Defining what data is and its type, therefore, is necessary to overcome these barriers because “the mechanisms to facilitate data sharing are often specific to data types.” Unlike genomics data, which has established platforms, sharing clinical data “remains in a nascent phase.” The article by Patrinos and colleagues (2022) considers the strong ethical imperative for protecting pediatric data while acknowledging the need not to overprotections. The authors discuss a model of consent for pediatric research that can balance the need to protect participants and generate health benefits.

Xue et al. (2023) focus on reusing genomic interval data. Identifying and retrieving the relevant data can be difficult, given the state of the repositories and the size of these data. Similarly, integrating interval data in reference genomes can be hard. The author calls for standardized formats for the data and the metadata to facilitate reuse.

Sheffield and colleagues (2023) highlight the distinction between data and metadata. Metadata describes the characteristics of the sample, experiment, and analysis. The nature of this information differs from that of the primary data in size, source, and ways of use. Therefore, an optimal strategy should consider these specific attributes for sharing metadata. Challenges specifics to sharing metadata include the need for standardized terms and formats, making it portable and easier to find.

We go beyond the reuse issue to highlight two other aspects that might increase the utility of available public data in Ahmed et al. (2023). These are curation and integration…(More)”.

Behavioural Incentive Design for Health Policy


Book by Joan Costa-Font, Tony Hockley, Caroline Rudisill: “Behavioural economics has become a popular way of tackling a broad range of issues in public policy. By presenting a more descriptive and possibly accurate representation of human behaviour than traditional economics, Behavioural Incentive Design for Health Policy tries to make sense of decisions that follow a wider conception of welfare, influenced by social norms and narratives, pro-social motivations and choice architectures which were generally neglected by standard economics. The authors show how this model can be applied to tackle a wide range of issues in public health, including smoking, the obesity crisis, exercise uptake, alcoholism, preventive screenings and attitudes towards vaccinations. It shows not only how behavioural economics allows us to better understand such challenges, but also how it can design effective incentives for addressing them. This book is an extensive reassessment of the interaction between behavioural incentives and health….(More)”.

Artificial Intelligence in the COVID-19 Response


Report by Sean Mann, Carl Berdahl, Lawrence Baker, and Federico Girosi: “We conducted a scoping review to identify AI applications used in the clinical and public health response to COVID-19. Interviews with stakeholders early in the research process helped inform our research questions and guide our study design. We conducted a systematic search, screening, and full text review of both academic and gray literature…

  • AI is still an emerging technology in health care, with growing but modest rates of adoption in real-world clinical and public health practice. The COVID-19 pandemic showcased the wide range of clinical and public health functions performed by AI as well as the limited evidence available on most AI products that have entered use.
  • We identified 66 AI applications (full list in Appendix A) used to perform a wide range of diagnostic, prognostic, and treatment functions in the clinical response to COVID-19. This included applications used to analyze lung images, evaluate user-reported symptoms, monitor vital signs, predict infections, and aid in breathing tube placement. Some applications were used by health systems to help allocate scarce resources to patients.
  • Many clinical applications were deployed early in the pandemic, and most were used in the United States, other high-income countries, or China. A few applications were used to care for hundreds of thousands or even millions of patients, although most were used to an unknown or limited extent.
  • We identified 54 AI-based public health applications used in the pandemic response. These included AI-enabled cameras used to monitor health-related behavior and health messaging chatbots used to answer questions about COVID-19. Other applications were used to curate public health information, produce epidemiologic forecasts, or help prioritize communities for vaccine allocation and outreach efforts.
  • We found studies supporting the use of 39 clinical applications and 8 public health applications, although few of these were independent evaluations, and we found no clinical trials evaluating any application’s impact on patient health. We found little evidence available on entire classes of applications, including some used to inform care decisions such as patient deterioration monitors.
  • Further research is needed, particularly independent evaluations on application performance and health impacts in real-world care settings. New guidance may be needed to overcome the unique challenges to evaluating AI application impacts on patient- and population-level health outcomes….(More)” – See also: The #Data4Covid19 Review

Technological Obsolescence


Essay by Jonathan Coopersmith: “In addition to killing over a million Americans, Covid-19 revealed embarrassing failures of local, state, and national public health systems to accurately and effectively collect, transmit, and process information. To some critics and reporters, the visible and easily understood face of those failures was the continued use of fax machines.

In reality, the critics were attacking the symptom, not the problem. Instead of “why were people still using fax machines?,” the better question was “what factors made fax machines more attractive than more capable technologies?” Those answers provide a better window into the complex, evolving world of technological obsolescence, a key component of our modern world—and on a smaller scale, provide a template to decide whether the NAE and other organizations should retain their fax machines.

The marketing dictionary of Monash University Business School defines technological obsolescence as “when a technical product or service is no longer needed or wanted even though it could still be in working order.” Significantly, the source is a business school, which implies strong economic and social factors in decision making about technology.  

Determining technological obsolescence depends not just on creators and promoters of new technologies but also on users, providers, funders, accountants, managers, standards setters—and, most importantly, competing needs and options. In short, it’s complicated.  

Like most aspects of technology, perspectives on obsolescence depend on your position. If existing technology meets your needs, upgrading may not seem worth the resources needed (e.g., for purchase and training). If, on the other hand, your firm or organization depends on income from providing, installing, servicing, training, advising, or otherwise benefiting from a new technology, not upgrading could jeopardize your future, especially in a very competitive market. And if you cannot find the resources to upgrade, you—and your users—may incur both visible and invisible costs…(More)”.