data collaboratives

Gender gaps in urban mobility

Curated on June 18, 2020June 18, 2020 by Stefaan Verhulst

Paper by Laetitia Gauvin, Michele Tizzoni, Simone Piaggesi, Andrew Young, Natalia Adler, Stefaan Verhulst, Leo Ferres & Ciro Cattuto in Humanities and Social Sciences Communications: “Mobile phone data have been extensively used to study urban mobility. However, studies based on gender-disaggregated large-scale data are still lacking, limiting our understanding of gendered aspects of urban mobility and our ability to design policies for gender equality. Here we study urban mobility from a gendered perspective, combining commercial and open datasets for the city of Santiago, Chile.

We analyze call detail records for a large cohort of anonymized mobile phone users and reveal a gender gap in mobility: women visit fewer unique locations than men, and distribute their time less equally among such locations. Mapping this mobility gap over administrative divisions, we observe that a wider gap is associated with lower income and lack of public and private transportation options. Our results uncover a complex interplay between gendered mobility patterns, socio-economic factors and urban affordances, calling for further research and providing insights for policymakers and urban planners….(More)”.

Why local data is the key to successful place making

Curated on June 17, 2020June 18, 2020 by Stefaan Verhulst

Blog by Sally Kerr: “The COVID emergency has brought many challenges that were unimaginable a few months ago. The first priorities were safety and health, but when lockdown started one of the early issues was accessing and sharing local data to help everyone deal with and live through the emergency. Communities grappled with the scarcity of local data, finding it difficult to source for some services, food deliveries and goods. This was not a new issue, but the pandemic brought it into sharp relief.

Local data use covers a broad spectrum. People moving to a new area want information about the environment — schools, amenities, transport, crime rates and local health. For residents, continuing knowledge of business opening hours, events, local issues, council plans and roadworks remains important, not only for everyday living but to help understand issues and future plans that will change their environment. Really local data (hyperlocal data) is either fragmented or unavailable, making it difficult for local people to stay informed, whilst larger data sets about an area (e.g. population, school performance) are not always easy to understand or use. They sit in silos owned by different sectors, on disparate websites, usually collated for professional or research use.

Third sector organisations in a community will gather data relevant to their work such as contacts and event numbers but may not source wider data sets about the area, such as demographics, to improve their work. Using this data could strengthen future grant applications by validating their work. For Government or Health bodies carrying out place making community projects, there is a reliance on their own or national data sources supplemented with qualitative data snapshots. Their dependence on tried and tested sources is due to time and resource pressures but means there is no time to gather that rich seam of local data that profiles individual needs.

Imagine a future community where local data is collected and managed together for both official organisations and the community itself. Where there are shared aims and varied use. Current and relevant data would be accessible and easy to understand, provided in formats that suit the user — from data scientist to school child. A curated data hub would help citizens learn data skills and carry out collaborative projects on anything from air quality to local biodiversity, managing the data and offering increased insight and useful validation for wider decision making. Costs would be reduced with duplication and effort reduced….(More)”.

Laying the Foundation for Effective Partnerships: An Examination of Data Sharing Agreements

Curated on June 17, 2020June 17, 2020 by Stefaan Verhulst

Paper by Hayden Dahmm: “In the midst of the COVID-19 pandemic, data has never been more salient. COVID has generated new data demands and increased cross-sector data collaboration. Yet, these data collaborations require careful planning and evaluation of risks and opportunities, especially when sharing sensitive data. Data sharing agreements (DSAs) are written agreements that establish the terms for how data are shared between parties and are important for establishing accountability and trust.

However, negotiating DSAs is often time consuming, and collaborators lacking legal or financial capacity are disadvantaged. Contracts for Data Collaboration (C4DC) is a joint initiative between SDSN TReNDS, NYU’s GovLab, the World Economic Forum, and the University of Washington, working to strengthen trust and transparency of data collaboratives. The partners have created an online library of DSAs which represents a selection of data applications and contexts.

This report introduces C4DC and its DSA library. We demonstrate how the library can support the data community to strengthen future data collaborations by showcasing various DSA applications and key considerations. First, we explain our method of analyzing the agreements and consider how six major issues are addressed by different agreements in the library. Key issues discussed include data use, access, breaches, proprietary issues, publicization of the analysis, and deletion of data upon termination of the agreement. For each of these issues, we describe approaches illustrated with examples from the library. While our analysis suggests some pertinent issues are regularly not addressed in DSAs, we have identified common areas of practice that may be helpful for entities negotiating partnership agreements to consider in the future….(More)”.

Sector-Specific (Data-) Access Regimes of Competitors

Curated on June 10, 2020 by Stefaan Verhulst

Paper by Jörg Hoffmann: “The expected economic and social benefits of data access and sharing are enormous. And yet, particularly in the B2B context, data sharing of privately held data between companies has not taken off at efficient scale. This already led to the adoption of sector specific data governance and access regimes. Two of these regimes are enshrined in the PSD2 that introduced an access to account and a data portability rule for specific account information for third party payment providers.

This paper analyses these sector-specific access and portability regimes and identifies regulatory shortcomings that should be addressed and can serve as further guidance for further data access regulation. It first develops regulatory guidelines that build around the multiple regulatory dimensions of data and the potential adverse effects that may be created by too broad data access regimes.

In this regard the paper assesses the role of factual data exclusivity for data driven innovation incentives for undertakings, the role of industrial policy driven market regulation within the principle of a free market economy, the impact of data sharing on consumer sovereignty and choice, and ultimately data induced-distortions of competition. It develops the findings by taking recourse to basic IP and information economics and the EU competition law case law pertaining refusal to supply cases, the rise of ‘surveillance capitalism’ and to current competition policy considerations with regard to the envisioned preventive competition control regime tackling data rich ‘undertakings of paramount importance for competition across markets’ in Germany. This is then followed by an analysis of the PSD2 access and portability regimes in light of the regulatory principles….(More)”.

How data analysis helped Mozambique stem a cholera outbreak

Curated on June 3, 2020June 3, 2020 by Stefaan Verhulst

Andrew Jack at the Financial Times: “When Mozambique was hit by two cyclones in rapid succession last year — causing death and destruction from a natural disaster on a scale not seen in Africa for a generation — government officials added an unusual recruit to their relief efforts. Apart from the usual humanitarian and health agencies, the National Health Institute also turned to Zenysis, a Silicon Valley start-up.

As the UN and non-governmental organisations helped to rebuild lives and tackle outbreaks of disease including cholera, Zenysis began gathering and analysing large volumes of disparate data. “When we arrived, there were 400 new cases of cholera a day and they were doubling every 24 hours,” says Jonathan Stambolis, the company’s chief executive. “None of the data was shared [between agencies]. Our software harmonised and integrated fragmented sources to produce a coherent picture of the outbreak, the health system’s ability to respond and the resources available.

“Three and a half weeks later, they were able to get infections down to zero in most affected provinces,” he adds. The government attributed that achievement to the availability of high-quality data to brief the public and international partners.

“They co-ordinated the response in a way that drove infections down,” he says. Zenysis formed part of a “virtual control room”, integrating information to help decision makers understand what was happening in the worst hit areas, identify sources of water contamination and where to prioritise cholera vaccinations.

It supported an “mAlert system”, which integrated health surveillance data into a single platform for analysis. The output was daily reports distilled from data issued by health facilities and accommodation centres in affected areas, disease monitoring and surveillance from laboratory testing….(More)”.

Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data

Curated on June 2, 2020June 2, 2020 by Stefaan Verhulst

Book by Khaled El Emam, Lucy Mosquera, and Richard Hoptroff: “Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data—fake data generated from real data—so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue.

Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution.

This book describes:

Steps for generating synthetic data using multivariate normal distributions
Methods for distribution fitting covering different goodness-of-fit metrics
How to replicate the simple structure of original data
An approach for modeling data structure to consider complex relationships
Multiple approaches and metrics you can use to assess data utility
How analysis performed on real data can be replicated with synthetic data
Privacy implications of synthetic data and methods to assess identity disclosure…(More)”.

Using Data for COVID-19 Requires New and Innovative Governance Approaches

Curated on May 28, 2020May 28, 2020 by Stefaan Verhulst

Stefaan G. Verhulst and Andrew Zahuranec at Data & Policy blog: “There has been a rapid increase in the number of data-driven projects and tools released to contain the spread of COVID-19. Over the last three months, governments, tech companies, civic groups, and international agencies have launched hundreds of initiatives. These efforts range from simple visualizations of public health data to complex analyses of travel patterns.

When designed responsibly, data-driven initiatives could provide the public and their leaders the ability to be more effective in addressing the virus. The Atlantic andNew York Times have both published work that relies on innovative data use. These and other examples, detailed in our #Data4COVID19 repository, can fill vital gaps in our understanding and allow us to better respond and recover to the crisis.

But data is not without risk. Collecting, processing, analyzing and using any type of data, no matter how good intention of its users, can lead to harmful ends. Vulnerable groups can be excluded. Analysis can be biased. Data use can reveal sensitive information about people and locations. In addressing all these hazards, organizations need to be intentional in how they work throughout the data lifecycle.

Decision Provenance: Documenting decisions and decision makers across the Data Life Cycle

Unfortunately the individuals and teams responsible for making these design decisions at each critical point of the data lifecycle are rarely identified or recognized by all those interacting with these data systems.

The lack of visibility into the origins of these decisions can impact professional accountability negatively as well as limit the ability of actors to identify the optimal intervention points for mitigating data risks and to avoid missed use of potentially impactful data. Tracking decision provenance is essential.

As Jatinder Singh, Jennifer Cobbe, and Chris Norval of the University of Cambridge explain, decision provenance refers to tracking and recording decisions about the collection, processing, sharing, analyzing, and use of data. It involves instituting mechanisms to force individuals to explain how and why they acted. It is about using documentation to provide transparency and oversight in the decision-making process for everyone inside and outside an organization.

Toward that end, The GovLab at NYU Tandon developed the Decision Provenance Mapping. We designed this tool for designated data stewards tasked with coordinating the responsible use of data across organizational priorities and departments….(More)”

Unlock the Hidden Value of Your Data

Curated on May 21, 2020May 21, 2020 by Stefaan Verhulst

Stefaan G. Verhulst at the Harvard Business Review: “Twenty years ago, Kevin Rivette and David Kline wrote a book about the hidden value contained within companies’ underutilized patents. These patents, Rivette and Kline argued, represented “Rembrandts in the Attic” (the title of their book). Patents, the authors suggested, shouldn’t be seen merely as passive properties, but as strategic assets — a “new currency” that could be deployed in the quest for competition, brand reputation, and advances in research and development.

We are still living in the knowledge economy, and organizations are still trying to figure out how to unlock under-utilized assets. But the currency has changed: Today’s Rembrandts in the attic are data.

It is widely accepted now that the vast amounts of data that companies generate represents a tremendous repository of potential value. This value is monetary, and also social; it contains terrific potential to impact the public good. But do organizations — and do we as a society — know how to unlock this value? Do we know how to find the insights hidden in our digital attics and use them to improve society and peoples’ lives?

In what follows, I outline four steps that could help organizations maximize their data assets for public good. If there is an overarching theme, it is about the value of re-using data. Recent years have seen a growing open data movement, in which previously siloed government datasets have been made accessible to outside groups. Despite occasional trepidation on the part of data holders, research has consistently shown that such initiatives can be value-enhancing for both data holders and society. The same is true for private sector data assets. Better and more transparent reuse of data is arguably the single most important measure we can take to unleash this dual potential.

To help maximize data for the public good, we need to:

Develop methodologies to measure the value of data...
Develop structures to incentivize collaboration. ….
Encourage data collaboratives. …
Identify and nurture data stewards. …(More)”

Removing the pump handle: Stewarding data at times of public health emergency

Curated on May 21, 2020May 21, 2020 by Stefaan Verhulst

Reema Patel at Significance: “There is a saying, incorrectly attributed to Mark Twain, that states: “History never repeat itself but it rhymes”. Seeking to understand the implications of the current crisis for the effective use of data, I’ve drawn on the nineteenth-century cholera outbreak in London’s Soho to identify some “rhyming patterns” that might inform our approaches to data use and governance at this time of public health crisis.

Where better to begin than with the work of Victorian pioneer John Snow? In 1854, Snow’s use of a dot map to illustrate clusters of cholera cases around public water pumps, and of statistics to establish the connection between the quality of water sources and cholera outbreaks, led to a breakthrough in public health interventions – and, famously, the removal of the handle of a water pump in Broad Street.

Data is vital

We owe a lot to Snow, especially now. His examples teaches us that data has a central role to play in saving lives, and that the effective use of (and access to) data is critical for enabling timely responses to public health emergencies.

Take, for instance, transport app CityMapper’s rapid redeployment of its aggregated transport data. In the early days of the Covid-19 pandemic, this formed part of an analysis of compliance with social distancing restrictions across a range of European cities. There is also the US-based health weather map, which uses anonymised and aggregated data to visualise fever, specifically influenza-like illnesses. This data helped model early indications of where, and how quickly, Covid-19 was spreading….

Ethics and human rights still matter

As the current crisis evolves, many have expressed concern that the pandemic will be used to justify the rapid roll out of surveillance technologies that do not meet ethical and human rights standards, and that this will be done in the name of the “public good”. Examples of these technologies include symptom- and contact-tracing applications. Privacy experts are also increasingly concerned that governments will be trading off more personal data than is necessary or proportionate to respond to the public health crisis.

Many ethical and human rights considerations (including those listed at the bottom of this piece) are at risk of being overlooked at this time of emergency, and governments would be wise not to press ahead regardless, ignoring legitimate concerns about rights and standards. Instead, policymakers should begin to address these concerns by asking how we can prepare (now and in future) to establish clear and trusted boundaries for the use of data (personal and non-personal) in such crises.

Democratic states in Europe and the US have not, in recent memory, prioritised infrastructures and systems for a crisis of this scale – and this has contributed to our current predicament. Contrast this with Singapore, which suffered outbreaks of SARS and H1N1, and channelled this experience into implementing pandemic preparedness measures.

We cannot undo the past, but we can begin planning and preparing constructively for the future, and that means strengthening global coordination and finding mechanisms to share learning internationally. Getting the right data infrastructure in place has a central role to play in addressing ethical and human rights concerns around the use of data….(More)”.

The Law and Policy of Government Access to Private Sector Data (‘B2G Data Sharing’)

Curated on May 20, 2020May 20, 2020 by Stefaan Verhulst

Paper by Heiko Richter: “The tremendous rate of technological advancement in recent years has fostered a policy de-bate about improving the state’s access to privately held data (‘B2G data sharing’). Access to such ‘data of general interest’ can significantly improve social welfare and serve the common good. At the same time, expanding the state’s access to privately held data poses risks. This chapter inquires into the potential and limits of mandatory access rules, which would oblige private undertakings to grant access to data for specific purposes that lie in the public interest. The article discusses the key questions that access rules should address and develops general principles for designing and implementing such rules. It puts particular emphasis on the opportunities and limitations for the implementation of horizontal B2G access frameworks. Finally, the chapter outlines concrete recommendations for legislative reforms….(More)”.