Local Data Spaces: Leveraging trusted research environments for secure location-based policy research


Paper by Jacob L. Macdonald, Mark A. Green, Maurizio Gibin, Simon Leech, Alex Singleton and Paul Longely: “This work explores the use of Trusted Research Environments for the secure analysis of sensitive, record-level data on local coronavirus disease-2019 (COVID-19) inequalities and economic vulnerabilities. The Local Data Spaces (LDS) project was a targeted rapid response and cross-disciplinary collaborative initiative using the Office for National Statistics’ Secure Research Service for localized comparison and analysis of health and economic outcomes over the course of the COVID-19 pandemic. Embedded researchers worked on co-producing a range of locally focused insights and reports built on secure secondary data and made appropriately open and available to the public and all local stakeholders for wider use. With secure infrastructure and overall data governance practices in place, accredited researchers were able to access a wealth of detailed data and resources to facilitate more targeted local policy analysis. Working with data within such infrastructure as part of a larger research project involved advanced planning and coordination to be efficient. As new and novel granular data resources become securely available (e.g., record-level administrative digital health records or consumer data), a range of local policy insights can be gained across issues of public health or local economic vitality. Many of these new forms of data however often come with a large degree of sensitivity around issues of personal identifiability and how the data is used for public-facing research and require secure and responsible use. Learning to work appropriately with secure data and research environments can open up many avenues for collaboration and analysis…(More)”

Opportunities and Challenges in Reusing Public Genomics Data


Introduction to Special Issue by Mahmoud Ahmed and Deok Ryong Kim: “Genomics data is accumulating in public repositories at an ever-increasing rate. Large consortia and individual labs continue to probe animal and plant tissue and cell cultures, generating vast amounts of data using established and novel technologies. The human genome project kickstarted the era of systems biology (1, 2). Ambitious projects followed to characterize non-coding regions, variations across species, and between populations (3, 4, 5). The cost reduction allowed individual labs to generate numerous smaller high-throughput datasets (6, 7, 8, 9). As a result, the scientific community should consider strategies to overcome the challenges and maximize the opportunities to use these resources for research and the public good. In this collection, we will elicit opinions and perspectives from researchers in the field on the opportunities and challenges of reusing public genomics data. The articles in this research topic converge on the need for data sharing while acknowledging the challenges that come with it. Two articles defined and highlighted the distinction between data and metadata. The characteristic of each should be considered when designing optimal sharing strategies. One article focuses on the specific issues surrounding the sharing of genomics interval data, and another on balancing the need for protecting pediatric rights and the sharing benefits.

The definition of what counts as data is itself a moving target. As technology advances, data can be produced in more ways and from novel sources. Events of recent years have highlighted this fact. “The pandemic has underscored the urgent need to recognize health data as a global public good with mechanisms to facilitate rapid data sharing and governance,” Schwalbe and colleagues (2020). The challenges facing these mechanisms could be technical, economic, legal, or political. Defining what data is and its type, therefore, is necessary to overcome these barriers because “the mechanisms to facilitate data sharing are often specific to data types.” Unlike genomics data, which has established platforms, sharing clinical data “remains in a nascent phase.” The article by Patrinos and colleagues (2022) considers the strong ethical imperative for protecting pediatric data while acknowledging the need not to overprotections. The authors discuss a model of consent for pediatric research that can balance the need to protect participants and generate health benefits.

Xue et al. (2023) focus on reusing genomic interval data. Identifying and retrieving the relevant data can be difficult, given the state of the repositories and the size of these data. Similarly, integrating interval data in reference genomes can be hard. The author calls for standardized formats for the data and the metadata to facilitate reuse.

Sheffield and colleagues (2023) highlight the distinction between data and metadata. Metadata describes the characteristics of the sample, experiment, and analysis. The nature of this information differs from that of the primary data in size, source, and ways of use. Therefore, an optimal strategy should consider these specific attributes for sharing metadata. Challenges specifics to sharing metadata include the need for standardized terms and formats, making it portable and easier to find.

We go beyond the reuse issue to highlight two other aspects that might increase the utility of available public data in Ahmed et al. (2023). These are curation and integration…(More)”.

From the Economic Graph to Economic Insights: Building the Infrastructure for Delivering Labor Market Insights from LinkedIn Data


Blog by Patrick Driscoll and Akash Kaura: “LinkedIn’s vision is to create economic opportunity for every member of the global workforce. Since its inception in 2015, the Economic Graph Research and Insights (EGRI) team has worked to make this vision a reality by generating labor market insights such as:

In this post, we’ll describe how the EGRI Data Foundations team (Team Asimov) leverages LinkedIn’s cutting-edge data infrastructure tools such as Unified Metrics PlatformPinot, and Datahub to ensure we can deliver data and insights robustly, securely, and at scale to a myriad of partners. We will illustrate this through a case study of how we built the pipeline for our most well-known and oft-cited flagship metric: the LinkedIn Hiring Rate…(More)”.

WHO Launches Global Infectious Disease Surveillance Network


Article by Shania Kennedy: “The World Health Organization (WHO) launched the International Pathogen Surveillance Network (IPSN), a public health network to prevent and detect infectious disease threats before they become epidemics or pandemics.

IPSN will rely on insights generated from pathogen genomics, which helps analyze the genetic material of viruses, bacteria, and other disease-causing micro-organisms to determine how they spread and how infectious or deadly they may be.

Using these data, researchers can identify and track diseases to improve outbreak prevention, response, and treatments.

“The goal of this new network is ambitious, but it can also play a vital role in health security: to give every country access to pathogen genomic sequencing and analytics as part of its public health system,” said WHO Director-General Tedros Adhanom Ghebreyesus, PhD, in the press release.  “As was so clearly demonstrated to us during the COVID-19 pandemic, the world is stronger when it stands together to fight shared health threats.”

Genomics capacity worldwide was scaled up during the pandemic, but the press release indicates that many countries still lack effective tools and systems for public health data collection and analysis. This lack of resources and funding could slow the development of a strong global health surveillance infrastructure, which IPSN aims to help address.

The network will bring together experts in genomics and data analytics to optimize routine disease surveillance, including for COVID-19. According to the press release, pathogen genomics-based analyses of the SARS-COV-2 virus helped speed the development of effective vaccines and the identification of more transmissible virus variants…(More)”.

Crime, inequality and public health: a survey of emerging trends in urban data science


Paper by Massimiliano Luca, Gian Maria Campedelli, Simone Centellegher, Michele Tizzoni, and Bruno Lepri: “Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations’ Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale…(More)”.

Can Mobility of Care Be Identified From Transit Fare Card Data? A Case Study In Washington D.C.


Paper by Daniela Shuman, et al: “Studies in the literature have found significant differences in travel behavior by gender on public transit that are largely attributable to household and care responsibilities falling disproportionately on women. While the majority of studies have relied on survey and qualitative data to assess “mobility of care”, we propose a novel data-driven workflow utilizing transit fare card transactions, name-based gender inference, and geospatial analysis to identify mobility of care trip making. We find that the share of women travelers trip-chaining in the direct vicinity of mobility of care places of interest is 10% – 15% higher than men….(More)”.

How a small news site built an innovative data project to visualise the impact of climate change on Uruguay’s capital


Interview by Marina Adami: “La ciudad sumergida (The submerged city), an investigation produced by Uruguayan science and technology news site Amenaza Roboto, is one of the winners of this year’s Sigma Awards for data journalism. The project uses maps of the country’s capital, Montevideo, to create impressive visualisations of the impact sea level rises are predicted to have on the city and its infrastructure. The project is a first of its kind for Uruguay, a small South American country in which data journalism is still a novelty. It is also a good example of a way news outlets can investigate and communicate the disastrous effects of climate change in local communities. 

I spoke to Miguel Dobrich, a journalist, educator and digital entrepreneur who worked on the project together with colleagues Gabriel FaríasNatalie Aubet and Nahuel Lamas, to find out what lessons other outlets can take from this project and from Amenaza Roboto’s experiments with analysing public data, collaborating with scientists, and keeping the focus on their communities….(More)”

Global Data Stewardship


On-line Course by Stefaan G. Verhulst: “Creating a systematic and sustainable data access program is critical for data stewardship. What you do with your data, how you reuse it, and how you make it available to the general public can help others reimagine what’s possible for data sharing and cross-sector data collaboration. In this course, instructor Stefaan Verhulst shows you how to develop and manage data reuse initiatives as a competent and responsible global data steward.

Following the insights of current research and practical, real-world examples, learn about the growing importance of data stewardship, data supply, and data demand to understand the value proposition and societal case for data reuse. Get tips on designing and implementing data collaboration models, governance framework, and infrastructure, as well as best practices for measuring, sunsetting, and supporting data reuse initiatives. Upon completing this course, you’ll be ready to start pushing your new skill set and continue your data stewardship learning journey….(More)”

Big data proves mobility is not gender-neutral


Blog by Ellin Ivarsson, Aiga Stokenberg and Juan Ignacio Fulponi: “All over the world, there is growing evidence showing that women and men travel differently. While there are many reasons behind this, one key factor is the persistence of traditional gender norms and roles that translate into different household responsibilities, different work schedules, and, ultimately, different mobility needs. Greater overall risk aversion and sensitivity to safety issues also play an important role in how women get around. Yet gender often remains an afterthought in the transport sector, meaning most policies or infrastructure investment plans are not designed to take into account the specific mobility needs of women.

The good news is that big data can help change that. In a recent study, the World Bank Transport team combined several data sources to analyze how women travel around the Buenos Aires Metropolitan Area (AMBA), including mobile phone signal data, congestion data from Waze, public transport smart card data, and data from a survey implemented by the team in early 2022 with over 20,300 car and motorcycle users.

Our research revealed that, on average, women in AMBA travel less often than men, travel shorter distances, and tend to engage in more complex trips with multiple stops and purposes. On average, 65 percent of the trips made by women are shorter than 5 kilometers, compared to 60 percent among men. Also, women’s hourly travel patterns are different, with 10 percent more trips than men during the mid-day off-peak hour, mostly originating in central AMBA. This reflects the larger burden of household responsibilities faced by women – such as picking children up from school – and the fact that women tend to work more irregular hours…(More)” See also Gender gaps in urban mobility.

Digital Equity 2.0: How to Close the Data Divide


Report by Gillian Diebold: “For the last decade, closing the digital divide, or the gap between those subscribing to broadband and those not subscribing, has been a top priority for policymakers. But high-speed Internet and computing device access are no longer the only barriers to fully participating and benefiting from the digital economy. Data is also increasingly essential, including in health care, financial services, and education. Like the digital divide, a gap has emerged between the data haves and the data have-nots, and this gap has introduced a new set of inequities: the data divide.

Policymakers have put a great deal of effort into closing the digital divide, and there is now near-universal acceptance of the notion that obtaining widespread Internet access generates social and economic benefits. But closing the data divide has received little attention. Moreover, efforts to improve data collection are typically overshadowed by privacy advocates’ warnings against collecting any data. In fact, unlike the digital divide, many ignore the data divide or argue that the way to close it is to collect vastly less data.1 But without substantial efforts to increase data representation and access, certain individuals and communities will be left behind in an increasingly data-driven world.

This report describes the multipronged efforts needed to address digital inequity. For the digital divide, policymakers have expanded digital connectivity, increased digital literacy, and improved access to digital devices. For the data divide, policymakers should similarly take a holistic approach, including by balancing privacy and data innovation, increasing data collection efforts across a wide array of fronts, enhancing access to data, improving data quality, and improving data analytics efforts. Applying lessons from the digital divide to this new challenge will help policymakers design effective and efficient policy and create a more equitable and effective data economy for all Americans…(More)”.