What prevents us from reusing medical real-world data in research


Paper by Julia Gehrmann, Edit Herczog, Stefan Decker & Oya Beyan: “Recent studies show that Medical Data Science (MDS) carries great potential to improve healthcare. Thereby, considering data from several medical areas and of different types, i.e. using multimodal data, significantly increases the quality of the research results. On the other hand, the inclusion of more features in an MDS analysis means that more medical cases are required to represent the full range of possible feature combinations in a quantity that would be sufficient for a meaningful analysis. Historically, data acquisition in medical research applies prospective data collection, e.g. in clinical studies. However, prospectively collecting the amount of data needed for advanced multimodal data analyses is not feasible for two reasons. Firstly, such a data collection process would cost an enormous amount of money. Secondly, it would take decades to generate enough data for longitudinal analyses, while the results are needed now. A worthwhile alternative is using real-world data (RWD) from clinical systems of e.g. university hospitals. This data is immediately accessible in large quantities, providing full flexibility in the choice of the analyzed research questions. However, when compared to prospectively curated data, medical RWD usually lacks quality due to the specificities of medical RWD outlined in section 2. The reduced quality makes its preparation for analysis more challenging…(More)”.

Unleashing the power of data for electric vehicles and charging infrastructure


Report by Thomas Deloison: “As the world moves toward widespread electric vehicle (EV) adoption, a key challenge lies ahead: deploying charging infrastructure rapidly and effectively. Solving this challenge will be essential to decarbonize transport, which has a higher reliance on fossil fuels than any other sector and accounts for a fifth of global carbon emissions. However, the companies and governments investing in charging infrastructure face significant hurdles, including high initial capital costs and difficulties related to infrastructure planning, permitting, grid connections and grid capacity development.

Data has the power to facilitate these processes: increased predictability and optimized planning and infrastructure management go a long way in easing investments and accelerating deployment. Last year, members of the World Business Council for Sustainable Development (WBCSD) demonstrated that digital solutions based on data sharing could reduce carbon emissions from charging by 15% and unlock crucial grid capacity and capital efficiency gains.

Exceptional advances in data, analytics and connectivity are making digital solutions a potent tool to plan and manage transport, energy and infrastructure. Thanks to the deployment of sensors and the rise of connectivity,  businesses are collecting information faster than ever before, allowing for data flows between physical assets. Charging infrastructure operators, automotive companies, fleet operators, energy providers, building managers and governments collect insights on all aspects of electric vehicle charging infrastructure (EVCI), from planning and design to charging experiences at the station.

The real value of data lies in its aggregationThis will require breaking down siloes across industries and enabling digital collaboration. A digital action framework released by WBCSD, in collaboration with Arcadis, Fujitsu and other member companies and partners, introduces a set of recommendations for companies and governments to realize the full potential of digital solutions and accelerate EVCI deployments:

  • Map proprietary data, knowledge gaps and digital capacity across the value chain to identify possible synergies. The highest value potential from digital solutions will lie at the nexus of infrastructure, consumer behavior insights, grid capacity and transport policy. For example, to ensure the deployment of charging stations where they will be most needed and at the right capacity level, it is crucial to plan investments within energy grid capacity, spatial constraints and local projected demand for EVs.
  • Develop internal data collection and storage capacity with due consideration for existing structures for data sharing. A variety of schemes allow actors to engage in data sharing or monetization. Yet, their use is limited by mismatched use of data standards and specification and process uncertainty. Companies must build a strong understanding of these structures internally by providing internal training and guidance, and invest in sound data collection, storage and analysis capacity.
  • Foster a policy environment that supports digital collaboration across sectors and industries. Digital policies must provide incentives and due diligence frameworks to guide data exchanges across industries and support the adoption of common standards and protocols. For instance, it will be crucial to integrate linkages with energy systems and infrastructure beyond roads in the rollout of the European mobility data space…(More)”.

Questions as a Device for Data Responsibility: Toward a New Science of Questions to Steer and Complement the Use of Data Science for the Public Good in a Polycentric Way


Paper by Stefaan G. Verhulst: “We are at an inflection point today in our search to responsibly handle data in order to maximize the public good while limiting both private and public risks. This paper argues that the way we formulate questions should be given more consideration as a device for modern data responsibility. We suggest that designing a polycentric process for co-defining the right questions can play an important role in ensuring that data are used responsibly, and with maximum positive social impact. In making these arguments, we build on two bodies of knowledge—one conceptual and the other more practical. These observations are supplemented by the author’s own experience as founder and lead of “The 100 Questions Initiative.” The 100 Questions Initiative uses a unique participatory methodology to identify the world’s 100 most pressing, high-impact questions across a variety of domains—including migration, gender inequality, air quality, the future of work, disinformation, food sustainability, and governance—that could be answered by unlocking datasets and other resources. This initiative provides valuable practical insights and lessons into building a new “science of questions” and builds on theoretical and practical knowledge to outline a set of benefits of using questions for data responsibility. More generally, this paper argues that, combined with other methods and approaches, questions can help achieve a variety of key data responsibility goals, including data minimization and proportionality, increasing participation, and enhancing accountability…(More)”.

Weather Warning Inequity: Lack of Data Collection Stations Imperils Vulnerable People


Article by Chelsea Harvey: “Devastating floods and landslides triggered by extreme downpours killed hundreds of people in Rwanda and the Democratic Republic of Congo in May, when some areas saw more than 7 inches of rain in a day.

Climate change is intensifying rainstorms throughout much of the world, yet scientists haven’t been able to show that the event was influenced by warming.

That’s because they don’t have enough data to investigate it.

Weather stations are sparse across Africa, making it hard for researchers to collect daily information on rainfall and other weather variables. The data that does exist often isn’t publicly available.

“The main issue in some countries in Africa is funding,” said Izidine Pinto, a senior researcher on weather and climate at the Royal Netherlands Meteorological Institute. “The meteorological offices don’t have enough funding.”

There’s often too little money to build or maintain weather stations, and strapped-for-cash governments often choose to sell the data they do collect rather than make it free to researchers.

That’s a growing problem as the planet warms and extreme weather worsens. Reliable forecasts are needed for early warning systems that direct people to take shelter or evacuate before disasters strike. And long-term climate data is necessary for scientists to build computer models that help make predictions about the future.

The science consortium World Weather Attribution is the latest research group to run into problems. It investigates the links between climate change and individual extreme weather events all over the globe. In the last few months alone, the organization has demonstrated the influence of global warming on extreme heat in South Asia and the Mediterranean, floods in Italy, and drought in eastern Africa.

Most of its research finds that climate change is making weather events more likely to occur or more intense.

The group recently attempted to investigate the influence of climate change on the floods in Rwanda and Congo. But the study was quickly mired in challenges.

The team was able to acquire some weather station data, mainly in Rwanda, Joyce Kimutai, a research associate at Imperial College London and a co-author of the study, said at a press briefing announcing the findings Thursday. But only a few stations provided sufficient data, making it impossible to define the event or to be certain that climate model simulations were accurate…(More)”.

Non-traditional data sources in obesity research: a systematic review of their use in the study of obesogenic environments


Paper by Julia Mariel Wirtz Baker, Sonia Alejandra Pou, Camila Niclis, Eugenia Haluszka & Laura Rosana Aballay: “The field of obesity epidemiology has made extensive use of traditional data sources, such as health surveys and reports from official national statistical systems, whose variety of data can be at times limited to explore a wider range of determinants relevant to obesity. Over time, other data sources began to be incorporated into obesity research, such as geospatial data (web mapping platforms, satellite imagery, and other databases embedded in Geographic Information Systems), social network data (such as Twitter, Facebook, Instagram, or other social networks), digital device data and others. The data revolution, facilitated by the massive use of digital devices with hundreds of millions of users and the emergence of the “Internet of Things” (IoT), has generated huge volumes of data from everywhere: customers, social networks and sensors, in addition to all the traditional sources mentioned above. In the research area, it offers fruitful opportunities, contributing in ways that traditionally sourced research data could not.

An international expert panel in obesity and big data pointed out some key factors in the definition of Big Data, stating that “it is always digital, has a large sample size, and a large volume or variety or velocity of variables that require additional computing power, as well as specialist skills in computer programming, database management and data science analytics”. Our interpretation of non-traditional data sources is an approximation to this definition, assuming that they are sources not traditionally used in obesity epidemiology and environmental studies, which can include digital devices, social media and geospatial data within a GIS, the latter mainly based on complex indexes that require advanced data analysis techniques and expertise.

Beyond the still discussed limitations, Big Data can be assumed as a great opportunity to improve the study of obesogenic environments, since it has been announced as a powerful resource that can provide new knowledge about human behaviour and social phenomena. Besides, it can contribute to the formulation and evaluation of policies and the development of interventions for obesity prevention. However, in this field of research, the suitability of these novel data sources is still a subject of considerable discussion, and their use has not been investigated from the obesogenic environment approach…(More)”.

Data collaborations at a local scale: Lessons learnt in Rennes (2010–2021)


Paper by Simon Chignard and Marion Glatron: “Data sharing is a requisite for developing data-driven innovation and collaboration at the local scale. This paper aims to identify key lessons and recommendations for building trustworthy data governance at the local scale, including the public and private sectors. Our research is based on the experience gained in Rennes Metropole since 2010 and focuses on two thematic use cases: culture and energy. For each one, we analyzed how the power relations between actors and the local public authority shape the modalities of data sharing and exploitation. The paper will elaborate on challenges and opportunities at the local level, in perspective with the national and European frameworks…(More)”.

Destination? Care Blocks!


Blog by Natalia González Alarcón, Hannah Chafetz, Diana Rodríguez Franco, Uma Kalkar, Bapu Vaitla, & Stefaan G. Verhulst: “Time poverty” caused by unpaid care work overload, such as washing, cleaning, cooking, and caring for their care-receivers is a structural consequence of gender inequality. In the City of Bogotá, 1.2 million women — 30% of their total women’s population — carry out unpaid care work full-time. If such work was compensated, it would represent 13% of Bogotá’s GDP and 20% of the country’s GDP. Moreover, the care burden falls disproportionately on women’s shoulder and prevents them from furthering their education, achieving financial autonomy, participating in their community, and tending to their personal wellbeing.

To address the care burden and its spillover consequences on women’s economic autonomy, well-being and political participation, in October 2020, Bogotá Mayor Claudia López launched the Care Block Initiative. Care Blocks, or Manzanas del cuidado, are centralized areas for women’s economic, social, medical, educational, and personal well-being and advancement. They provide services simultaneously for caregivers and care-receivers.

As the program expands from 19 existing Care Blocks to 45 Care Blocks by the end of 2035, decision-makers face another issue: mobility is a critical and often limiting factor for women when accessing Care Blocks in Bogotá.

On May 19th, 2023, The GovLabData2X, and the Secretariat for Women’s Affairs, in the City Government of Bogotá co-hosted a studio that aimed to scope a purposeful and gender-conscious data collaborative that addresses mobility-related issues affecting the access of Care Blocks in Bogotá. Convening experts across the gender, mobility, policy, and data ecosystems, the studio focused on (1) prioritizing the critical questions as it relates to mobility and access to Care Blocks and (2) identifying the data sources and actors that could be tapped into to set up a new data collaborative…(More)”.

Health Care Data Is a Researcher’s Gold Mine


Article by James O’Shaughnessy: “The UK’s National Health Service should aim to become the world’s leading platform for health research and development. We’ve seen some great examples of the potential we have for world-class research during the pandemic, with examples like the RECOVERY trial and the Covid vaccine platform, and since then through the partnerships with Moderna, Grail, and BioNTech. However, these examples of partnership with industry are often ad hoc arrangements. In general, funding and prestige are concentrated on research labs and early-phase trials, but when it comes to helping health care companies through the commercialization stages of their products, both public and private sector funding is much harder to access. This makes it hard for startups partnering with the NHS to scale their products and sell them on the domestic and international markets.

Instead, we need a systematic approach to leverage our strengths, such as the scale of the NHS, the diversity of our population, and the deep patient phenotyping that our data assets enable. That will give us the opportunity to generate vast amounts of real-world data about health care drugs and technologies—like pricing, performance, and safety—that can prepare companies to scale their innovations and go to market.

To achieve that, there are obstacles to overcome. For instance, setting up research projects is incredibly time-consuming. We have very bureaucratic processes that make the UK one of the slowest places in Europe to set up research studies.

Patients need more access to research. However, there’s really poor information at the moment about where clinical trials are taking place in the country and what kind of patients they are recruiting. We need a clinical clinicaltrials.gov.uk website to give that sort of information.

There’s a significant problem when it comes to the question of patient consent to participate in a R&D. Legally, unless patients have said explicitly that they want to be approached for a research project or a clinical trial, they can’t be contacted for that purpose. The catch-22 is that, of course, most patients are not aware of this, and you can’t legally contact them to inform them. We need to allow ethically approved researchers to proactively approach people to take part in studies which might be of benefit to them…(More)”.

Opening industry data: The private sector’s role in addressing societal challenges


Paper by Jennifer Hansen and Yiu-Shing Pang: “This commentary explores the potential of private companies to advance scientific progress and solve social challenges through opening and sharing their data. Open data can accelerate scientific discoveries, foster collaboration, and promote long-term business success. However, concerns regarding data privacy and security can hinder data sharing. Companies have options to mitigate the challenges through developing data governance mechanisms, collaborating with stakeholders, communicating the benefits, and creating incentives for data sharing, among others. Ultimately, open data has immense potential to drive positive social impact and business value, and companies can explore solutions for their specific circumstances and tailor them to their specific needs…(More)”.

Fighting poverty with synthetic data


Article by Jack Gisby, Anna Kiknadze, Thomas Mitterling, and Isabell Roitner-Fransecky: “If you have ever used a smartwatch or other wearable tech to track your steps, heart rate, or sleep, you are part of the “quantified self” movement. You are voluntarily submitting millions of intimate data points for collection and analysis. The Economist highlighted the benefits of good quality personal health and wellness data—increased physical activity, more efficient healthcare, and constant monitoring of chronic conditions. However, not everyone is enthusiastic about this trend. Many fear corporations will use the data to discriminate against the poor and vulnerable. For example, insurance firms could exclude patients based on preconditions obtained from personal data sharing.

Can we strike a balance between protecting the privacy of individuals and gathering valuable information? This blog explores applying a synthetic populations approach in New York City,  a city with an established reputation for using big data approaches to support urban management, including for welfare provisions and targeted policy interventions.

To better understand poverty rates at the census tract level, World Data Lab, with the support of the Sloan Foundation, generated a synthetic population based on the borough of Brooklyn. Synthetic populations rely on a combination of microdata and summary statistics:

  • Microdata consists of personal information at the individual level. In the U.S., such data is available at the Public Use Microdata Area (PUMA) level. PUMA are geographic areas partitioning the state, containing no fewer than 100,000 people each. However, due to privacy concerns, microdata is unavailable at the more granular census tract level. Microdata consists of both household and individual-level information, including last year’s household income, the household size, the number of rooms, and the age, sex, and educational attainment of each individual living in the household.
  • Summary statistics are based on populations rather than individuals and are available at the census tract level, given that there are fewer privacy concerns. Census tracts are small statistical subdivisions of a county, averaging about 4,000 inhabitants. In New York City, a census tract roughly equals a building block. Similar to microdata, summary statistics are available for individuals and households. On the census tract level, we know the total population, the corresponding demographic breakdown, the number of households within different income brackets, the number of households by number of rooms, and other similar variables…(More)”.