Destination? Care Blocks!


Blog by Natalia González Alarcón, Hannah Chafetz, Diana Rodríguez Franco, Uma Kalkar, Bapu Vaitla, & Stefaan G. Verhulst: “Time poverty” caused by unpaid care work overload, such as washing, cleaning, cooking, and caring for their care-receivers is a structural consequence of gender inequality. In the City of Bogotá, 1.2 million women — 30% of their total women’s population — carry out unpaid care work full-time. If such work was compensated, it would represent 13% of Bogotá’s GDP and 20% of the country’s GDP. Moreover, the care burden falls disproportionately on women’s shoulder and prevents them from furthering their education, achieving financial autonomy, participating in their community, and tending to their personal wellbeing.

To address the care burden and its spillover consequences on women’s economic autonomy, well-being and political participation, in October 2020, Bogotá Mayor Claudia López launched the Care Block Initiative. Care Blocks, or Manzanas del cuidado, are centralized areas for women’s economic, social, medical, educational, and personal well-being and advancement. They provide services simultaneously for caregivers and care-receivers.

As the program expands from 19 existing Care Blocks to 45 Care Blocks by the end of 2035, decision-makers face another issue: mobility is a critical and often limiting factor for women when accessing Care Blocks in Bogotá.

On May 19th, 2023, The GovLabData2X, and the Secretariat for Women’s Affairs, in the City Government of Bogotá co-hosted a studio that aimed to scope a purposeful and gender-conscious data collaborative that addresses mobility-related issues affecting the access of Care Blocks in Bogotá. Convening experts across the gender, mobility, policy, and data ecosystems, the studio focused on (1) prioritizing the critical questions as it relates to mobility and access to Care Blocks and (2) identifying the data sources and actors that could be tapped into to set up a new data collaborative…(More)”.

Health Care Data Is a Researcher’s Gold Mine


Article by James O’Shaughnessy: “The UK’s National Health Service should aim to become the world’s leading platform for health research and development. We’ve seen some great examples of the potential we have for world-class research during the pandemic, with examples like the RECOVERY trial and the Covid vaccine platform, and since then through the partnerships with Moderna, Grail, and BioNTech. However, these examples of partnership with industry are often ad hoc arrangements. In general, funding and prestige are concentrated on research labs and early-phase trials, but when it comes to helping health care companies through the commercialization stages of their products, both public and private sector funding is much harder to access. This makes it hard for startups partnering with the NHS to scale their products and sell them on the domestic and international markets.

Instead, we need a systematic approach to leverage our strengths, such as the scale of the NHS, the diversity of our population, and the deep patient phenotyping that our data assets enable. That will give us the opportunity to generate vast amounts of real-world data about health care drugs and technologies—like pricing, performance, and safety—that can prepare companies to scale their innovations and go to market.

To achieve that, there are obstacles to overcome. For instance, setting up research projects is incredibly time-consuming. We have very bureaucratic processes that make the UK one of the slowest places in Europe to set up research studies.

Patients need more access to research. However, there’s really poor information at the moment about where clinical trials are taking place in the country and what kind of patients they are recruiting. We need a clinical clinicaltrials.gov.uk website to give that sort of information.

There’s a significant problem when it comes to the question of patient consent to participate in a R&D. Legally, unless patients have said explicitly that they want to be approached for a research project or a clinical trial, they can’t be contacted for that purpose. The catch-22 is that, of course, most patients are not aware of this, and you can’t legally contact them to inform them. We need to allow ethically approved researchers to proactively approach people to take part in studies which might be of benefit to them…(More)”.

Opening industry data: The private sector’s role in addressing societal challenges


Paper by Jennifer Hansen and Yiu-Shing Pang: “This commentary explores the potential of private companies to advance scientific progress and solve social challenges through opening and sharing their data. Open data can accelerate scientific discoveries, foster collaboration, and promote long-term business success. However, concerns regarding data privacy and security can hinder data sharing. Companies have options to mitigate the challenges through developing data governance mechanisms, collaborating with stakeholders, communicating the benefits, and creating incentives for data sharing, among others. Ultimately, open data has immense potential to drive positive social impact and business value, and companies can explore solutions for their specific circumstances and tailor them to their specific needs…(More)”.

Fighting poverty with synthetic data


Article by Jack Gisby, Anna Kiknadze, Thomas Mitterling, and Isabell Roitner-Fransecky: “If you have ever used a smartwatch or other wearable tech to track your steps, heart rate, or sleep, you are part of the “quantified self” movement. You are voluntarily submitting millions of intimate data points for collection and analysis. The Economist highlighted the benefits of good quality personal health and wellness data—increased physical activity, more efficient healthcare, and constant monitoring of chronic conditions. However, not everyone is enthusiastic about this trend. Many fear corporations will use the data to discriminate against the poor and vulnerable. For example, insurance firms could exclude patients based on preconditions obtained from personal data sharing.

Can we strike a balance between protecting the privacy of individuals and gathering valuable information? This blog explores applying a synthetic populations approach in New York City,  a city with an established reputation for using big data approaches to support urban management, including for welfare provisions and targeted policy interventions.

To better understand poverty rates at the census tract level, World Data Lab, with the support of the Sloan Foundation, generated a synthetic population based on the borough of Brooklyn. Synthetic populations rely on a combination of microdata and summary statistics:

  • Microdata consists of personal information at the individual level. In the U.S., such data is available at the Public Use Microdata Area (PUMA) level. PUMA are geographic areas partitioning the state, containing no fewer than 100,000 people each. However, due to privacy concerns, microdata is unavailable at the more granular census tract level. Microdata consists of both household and individual-level information, including last year’s household income, the household size, the number of rooms, and the age, sex, and educational attainment of each individual living in the household.
  • Summary statistics are based on populations rather than individuals and are available at the census tract level, given that there are fewer privacy concerns. Census tracts are small statistical subdivisions of a county, averaging about 4,000 inhabitants. In New York City, a census tract roughly equals a building block. Similar to microdata, summary statistics are available for individuals and households. On the census tract level, we know the total population, the corresponding demographic breakdown, the number of households within different income brackets, the number of households by number of rooms, and other similar variables…(More)”.

Local Data Spaces: Leveraging trusted research environments for secure location-based policy research


Paper by Jacob L. Macdonald, Mark A. Green, Maurizio Gibin, Simon Leech, Alex Singleton and Paul Longely: “This work explores the use of Trusted Research Environments for the secure analysis of sensitive, record-level data on local coronavirus disease-2019 (COVID-19) inequalities and economic vulnerabilities. The Local Data Spaces (LDS) project was a targeted rapid response and cross-disciplinary collaborative initiative using the Office for National Statistics’ Secure Research Service for localized comparison and analysis of health and economic outcomes over the course of the COVID-19 pandemic. Embedded researchers worked on co-producing a range of locally focused insights and reports built on secure secondary data and made appropriately open and available to the public and all local stakeholders for wider use. With secure infrastructure and overall data governance practices in place, accredited researchers were able to access a wealth of detailed data and resources to facilitate more targeted local policy analysis. Working with data within such infrastructure as part of a larger research project involved advanced planning and coordination to be efficient. As new and novel granular data resources become securely available (e.g., record-level administrative digital health records or consumer data), a range of local policy insights can be gained across issues of public health or local economic vitality. Many of these new forms of data however often come with a large degree of sensitivity around issues of personal identifiability and how the data is used for public-facing research and require secure and responsible use. Learning to work appropriately with secure data and research environments can open up many avenues for collaboration and analysis…(More)”

Opportunities and Challenges in Reusing Public Genomics Data


Introduction to Special Issue by Mahmoud Ahmed and Deok Ryong Kim: “Genomics data is accumulating in public repositories at an ever-increasing rate. Large consortia and individual labs continue to probe animal and plant tissue and cell cultures, generating vast amounts of data using established and novel technologies. The human genome project kickstarted the era of systems biology (1, 2). Ambitious projects followed to characterize non-coding regions, variations across species, and between populations (3, 4, 5). The cost reduction allowed individual labs to generate numerous smaller high-throughput datasets (6, 7, 8, 9). As a result, the scientific community should consider strategies to overcome the challenges and maximize the opportunities to use these resources for research and the public good. In this collection, we will elicit opinions and perspectives from researchers in the field on the opportunities and challenges of reusing public genomics data. The articles in this research topic converge on the need for data sharing while acknowledging the challenges that come with it. Two articles defined and highlighted the distinction between data and metadata. The characteristic of each should be considered when designing optimal sharing strategies. One article focuses on the specific issues surrounding the sharing of genomics interval data, and another on balancing the need for protecting pediatric rights and the sharing benefits.

The definition of what counts as data is itself a moving target. As technology advances, data can be produced in more ways and from novel sources. Events of recent years have highlighted this fact. “The pandemic has underscored the urgent need to recognize health data as a global public good with mechanisms to facilitate rapid data sharing and governance,” Schwalbe and colleagues (2020). The challenges facing these mechanisms could be technical, economic, legal, or political. Defining what data is and its type, therefore, is necessary to overcome these barriers because “the mechanisms to facilitate data sharing are often specific to data types.” Unlike genomics data, which has established platforms, sharing clinical data “remains in a nascent phase.” The article by Patrinos and colleagues (2022) considers the strong ethical imperative for protecting pediatric data while acknowledging the need not to overprotections. The authors discuss a model of consent for pediatric research that can balance the need to protect participants and generate health benefits.

Xue et al. (2023) focus on reusing genomic interval data. Identifying and retrieving the relevant data can be difficult, given the state of the repositories and the size of these data. Similarly, integrating interval data in reference genomes can be hard. The author calls for standardized formats for the data and the metadata to facilitate reuse.

Sheffield and colleagues (2023) highlight the distinction between data and metadata. Metadata describes the characteristics of the sample, experiment, and analysis. The nature of this information differs from that of the primary data in size, source, and ways of use. Therefore, an optimal strategy should consider these specific attributes for sharing metadata. Challenges specifics to sharing metadata include the need for standardized terms and formats, making it portable and easier to find.

We go beyond the reuse issue to highlight two other aspects that might increase the utility of available public data in Ahmed et al. (2023). These are curation and integration…(More)”.

From the Economic Graph to Economic Insights: Building the Infrastructure for Delivering Labor Market Insights from LinkedIn Data


Blog by Patrick Driscoll and Akash Kaura: “LinkedIn’s vision is to create economic opportunity for every member of the global workforce. Since its inception in 2015, the Economic Graph Research and Insights (EGRI) team has worked to make this vision a reality by generating labor market insights such as:

In this post, we’ll describe how the EGRI Data Foundations team (Team Asimov) leverages LinkedIn’s cutting-edge data infrastructure tools such as Unified Metrics PlatformPinot, and Datahub to ensure we can deliver data and insights robustly, securely, and at scale to a myriad of partners. We will illustrate this through a case study of how we built the pipeline for our most well-known and oft-cited flagship metric: the LinkedIn Hiring Rate…(More)”.

WHO Launches Global Infectious Disease Surveillance Network


Article by Shania Kennedy: “The World Health Organization (WHO) launched the International Pathogen Surveillance Network (IPSN), a public health network to prevent and detect infectious disease threats before they become epidemics or pandemics.

IPSN will rely on insights generated from pathogen genomics, which helps analyze the genetic material of viruses, bacteria, and other disease-causing micro-organisms to determine how they spread and how infectious or deadly they may be.

Using these data, researchers can identify and track diseases to improve outbreak prevention, response, and treatments.

“The goal of this new network is ambitious, but it can also play a vital role in health security: to give every country access to pathogen genomic sequencing and analytics as part of its public health system,” said WHO Director-General Tedros Adhanom Ghebreyesus, PhD, in the press release.  “As was so clearly demonstrated to us during the COVID-19 pandemic, the world is stronger when it stands together to fight shared health threats.”

Genomics capacity worldwide was scaled up during the pandemic, but the press release indicates that many countries still lack effective tools and systems for public health data collection and analysis. This lack of resources and funding could slow the development of a strong global health surveillance infrastructure, which IPSN aims to help address.

The network will bring together experts in genomics and data analytics to optimize routine disease surveillance, including for COVID-19. According to the press release, pathogen genomics-based analyses of the SARS-COV-2 virus helped speed the development of effective vaccines and the identification of more transmissible virus variants…(More)”.

Crime, inequality and public health: a survey of emerging trends in urban data science


Paper by Massimiliano Luca, Gian Maria Campedelli, Simone Centellegher, Michele Tizzoni, and Bruno Lepri: “Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations’ Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale…(More)”.

Can Mobility of Care Be Identified From Transit Fare Card Data? A Case Study In Washington D.C.


Paper by Daniela Shuman, et al: “Studies in the literature have found significant differences in travel behavior by gender on public transit that are largely attributable to household and care responsibilities falling disproportionately on women. While the majority of studies have relied on survey and qualitative data to assess “mobility of care”, we propose a novel data-driven workflow utilizing transit fare card transactions, name-based gender inference, and geospatial analysis to identify mobility of care trip making. We find that the share of women travelers trip-chaining in the direct vicinity of mobility of care places of interest is 10% – 15% higher than men….(More)”.