Cities, health, and the big data revolution


Blog by Harvard Public Health: “Cities influence our health in unexpected ways. From sidewalks to crosswalks, the built environment affects how much we move, impacting our risk for diseases like obesity and diabetes. A recent New York City study underscores that focusing solely on infrastructure, without understanding how people use it, can lead to ineffective interventions. Researchers analyzed over two million Google Street View images, combining them with health and demographic data to reveal these dynamics. Harvard Public Health spoke with Rumi Chunara, director of New York University’s Center for Health Data Science and lead author of the study.

Why study this topic?

We’re seeing an explosion of new data sources, like street-view imagery, being used to make decisions. But there’s often a disconnect—people using these tools don’t always have the public health knowledge to interpret the data correctly. We wanted to highlight the importance of combining data science and domain expertise to ensure interventions are accurate and impactful.

What did you find?

We discovered that the relationship between built environment features and health outcomes isn’t straightforward. It’s not just about having sidewalks; it’s about how often people are using them. Improving physical activity levels in a community could have a far greater impact on health outcomes than simply adding more infrastructure.

It also revealed the importance of understanding the local context. For instance, Google Street View data sometimes misclassifies sidewalks, particularly near highways or bridges, leading to inaccurate conclusions. Relying solely on this data, without accounting for these nuances, could result in less effective interventions…(More)”.

Establish data collaboratives to foster meaningful public involvement


Article by Gwen Ottinger: “…Data Collaboratives would move public participation and community engagement upstream in the policy process by creating opportunities for community members to contribute their lived experience to the assessment of data and the framing of policy problems. This would in turn foster two-way communication and trusting relationships between government and the public. Data Collaboratives would also help ensure that data and their uses in federal government are equitable, by inviting a broader range of perspectives on how data analysis can promote equity and where relevant data are missing. Finally, Data Collaboratives would be one vehicle for enabling individuals to participate in science, technology, engineering, math, and medicine activities throughout their lives, increasing the quality of American science and the competitiveness of American industry…(More)”.

Big data for decision-making in public transport management: A comparison of different data sources


Paper by Valeria Maria Urbano, Marika Arena, and Giovanni Azzone: “The conventional data used to support public transport management have inherent constraints related to scalability, cost, and the potential to capture space and time variability. These limitations underscore the importance of exploring innovative data sources to complement more traditional ones.

For public transport operators, who are tasked with making pivotal decisions spanning planning, operation, and performance measurement, innovative data sources are a frontier that is still largely unexplored. To fill this gap, this study first establishes a framework for evaluating innovative data sources, highlighting the specific characteristics that data should have to support decision-making in the context of transportation management. Second, a comparative analysis is conducted, using empirical data collected from primary public transport operators in the Lombardy region, with the aim of understanding whether and to what extent different data sources meet the above requirements.

The findings of this study support transport operators in selecting data sources aligned with different decision-making domains, highlighting related benefits and challenges. This underscores the importance of integrating different data sources to exploit their complementarities…(More)”.

Overcoming challenges associated with broad sharing of human genomic data


Paper by Jonathan E. LoTempio Jr & Jonathan D. Moreno: “Since the Human Genome Project, the consensus position in genomics has been that data should be shared widely to achieve the greatest societal benefit. This position relies on imprecise definitions of the concept of ‘broad data sharing’. Accordingly, the implementation of data sharing varies among landmark genomic studies. In this Perspective, we identify definitions of broad that have been used interchangeably, despite their distinct implications. We further offer a framework with clarified concepts for genomic data sharing and probe six examples in genomics that produced public data. Finally, we articulate three challenges. First, we explore the need to reinterpret the limits of general research use data. Second, we consider the governance of public data deposition from extant samples. Third, we ask whether, in light of changing concepts of broad, participants should be encouraged to share their status as participants publicly or not. Each of these challenges is followed with recommendations…(More)”.

Digitalizing sewage: The politics of producing, sharing, and operationalizing data from wastewater-based surveillance


Paper by Josie Wittmer, Carolyn Prouse, and Mohammed Rafi Arefin: “Expanded during the COVID-19 pandemic, Wastewater-Based Surveillance (WBS) is now heralded by scientists and policy makers alike as the future of monitoring and governing urban health. The expansion of WBS reflects larger neoliberal governance trends whereby digitalizing states increasingly rely on producing big data as a ‘best practice’ to surveil various aspects of everyday life. With a focus on three South Asian cities, our paper investigates the transnational pathways through which WBS data is produced, made known, and operationalized in ‘evidence-based’ decision-making in a time of crisis. We argue that in South Asia, wastewater surveillance data is actively produced through fragile but power-laden networks of transnational and local knowledge, funding, and practices. Using mixed qualitative methods, we found these networks produced artifacts like dashboards to communicate data to the public in ways that enabled claims to objectivity, ethical interventions, and transparency. Interrogating these representations, we demonstrate how these artifacts open up messy spaces of translation that trouble linear notions of objective data informing accountable, transparent, and evidence-based decision-making for diverse urban actors. By thinking through the production of precarious biosurveillance infrastructures, we respond to calls for more robust ethical and legal frameworks for the field and suggest that the fragility of WBS infrastructures has important implications for the long-term trajectories of urban public health governance in the global South…(More)”

Behaviour-based dependency networks between places shape urban economic resilience


Paper by Takahiro Yabe et al: “Disruptions, such as closures of businesses during pandemics, not only affect businesses and amenities directly but also influence how people move, spreading the impact to other businesses and increasing the overall economic shock. However, it is unclear how much businesses depend on each other during disruptions. Leveraging human mobility data and same-day visits in five US cities, we quantify dependencies between points of interest encompassing businesses, stores and amenities. We find that dependency networks computed from human mobility exhibit significantly higher rates of long-distance connections and biases towards specific pairs of point-of-interest categories. We show that using behaviour-based dependency relationships improves the predictability of business resilience during shocks by around 40% compared with distance-based models, and that neglecting behaviour-based dependencies can lead to underestimation of the spatial cascades of disruptions. Our findings underscore the importance of measuring complex relationships in patterns of human mobility to foster urban economic resilience to shocks…(More)”.

Data solidarity: Operationalising public value through a digital tool


Paper by Seliem El-Sayed, Ilona Kickbusch & Barbara Prainsack: “Most data governance frameworks are designed to protect the individuals from whom data originates. However, the impacts of digital practices extend to a broader population and are embedded in significant power asymmetries within and across nations. Further, inequities in digital societies impact everyone, not just those directly involved. Addressing these challenges requires an approach which moves beyond individual data control and is grounded in the values of equity and a just contribution of benefits and risks from data use. Solidarity-based data governance (in short: data solidarity), suggests prioritising data uses over data type and proposes that data uses that generate public value should be actively facilitated, those that generate significant risks and harms should be prohibited or strictly regulated, and those that generate private benefits with little or no public value should be ‘taxed’ so that profits generated by corporate data users are reinvested in the public domain. In the context of global health data governance, the public value generated by data use is crucial. This contribution clarifies the meaning, importance, and potential of public value within data solidarity and outlines methods for its operationalisation through the PLUTO tool, specifically designed to assess the public value of data uses…(More)”.

Kickstarting Collaborative, AI-Ready Datasets in the Life Sciences with Government-funded Projects


Article by Erika DeBenedictis, Ben Andrew & Pete Kelly: “In the age of Artificial Intelligence (AI), large high-quality datasets are needed to move the field of life science forward. However, the research community lacks strategies to incentivize collaboration on high-quality data acquisition and sharing. The government should fund collaborative roadmapping, certification, collection, and sharing of large, high-quality datasets in life science. In such a system, nonprofit research organizations engage scientific communities to identify key types of data that would be valuable for building predictive models, and define quality control (QC) and open science standards for collection of that data. Projects are designed to develop automated methods for data collection, certify data providers, and facilitate data collection in consultation with researchers throughout various scientific communities. Hosting of the resulting open data is subsidized as well as protected by security measures. This system would provide crucial incentives for the life science community to identify and amass large, high-quality open datasets that will immensely benefit researchers…(More)”.

Trust but Verify: A Guide to Conducting Due Diligence When Leveraging Non-Traditional Data in the Public Interest


New Report by Sara Marcucci, Andrew J. Zahuranec, and Stefaan Verhulst: “In an increasingly data-driven world, organizations across sectors are recognizing the potential of non-traditional data—data generated from sources outside conventional databases, such as social media, satellite imagery, and mobile usage—to provide insights into societal trends and challenges. When harnessed thoughtfully, this data can improve decision-making and bolster public interest projects in areas as varied as disaster response, healthcare, and environmental protection. However, with these new data streams come heightened ethical, legal, and operational risks that organizations need to manage responsibly. That’s where due diligence comes in, helping to ensure that data initiatives are beneficial and ethical.

The report, Trust but Verify: A Guide to Conducting Due Diligence When Leveraging Non-Traditional Data in the Public Interest, co-authored by Sara Marcucci, Andrew J. Zahuranec, and Stefaan Verhulst, offers a comprehensive framework to guide organizations in responsible data partnerships. Whether you’re a public agency or a private enterprise, this report provides a six-step process to ensure due diligence and maintain accountability, integrity, and trust in data initiatives…(More) (Blog)”.

Innovating with Non-Traditional Data: Recent Use Cases for Unlocking Public Value


Article by Stefaan Verhulst and Adam Zable: “Non-Traditional Data (NTD): “data that is digitally captured (e.g. mobile phone records), mediated (e.g. social media), or observed (e.g. satellite imagery), using new instrumentation mechanisms, often privately held.”

Digitalization and the resulting datafication have introduced a new category of data that, when re-used responsibly, can complement traditional data in addressing public interest questions—from public health to environmental conservation. Unlocking these often privately held datasets through data collaboratives is a key focus of what we have called The Third Wave of Open Data

To help bridge this gap, we have curated below recent examples of the use of NTD for research and decision-making that were published the past few months. They are organized into five categories:

  • Health and Well-being;
  • Humanitarian Aid;
  • Environment and Climate;
  • Urban Systems and Mobility, and 
  • Economic and Labor Dynamics…(More)”.