Paper by Adrienne Colborne and Michael Smit: “Curated, labeled, high-quality data is a valuable commodity for tasks such as business analytics and machine learning. Open data is a common source of such data—for example, retail analytics draws on open demographic data, and weather forecast systems draw on open atmospheric and ocean data. Open data is released openly by governments to achieve various objectives, such as transparency, informing citizen engagement, or supporting private enterprise.
Critical examination of ongoing social changes, including the post-truth phenomenon, suggests the quality, integrity, and authenticity of open data may be at risk. We introduce this risk through various lenses, describe some of the types of risk we expect using a threat model approach, identify approaches to mitigate each risk, and present real-world examples of cases where the risk has already caused harm. As an initial assessment of awareness of this disinformation risk, we compare our analysis to perspectives captured during open data stakeholder consultations in Canada…(More)”.
Article by the OECD: “…In January 2020, 117 organisations – including journals, funding bodies, and centres for disease prevention – signed a statement titled “Sharing research data and findings relevant to the novel coronavirus outbreak”, committing to provide immediate open access for peer-reviewed publications at least for the duration of the outbreak, to make research findings available via preprint servers, and to share results immediately with the World Health Organization (WHO). This was followed in March by the Public Health Emergency COVID-19 Initiative, launched by 12 countries1 at the level of chief science advisors or equivalent, calling for open access to publications and machine-readable access to data related to COVID-19, which resulted in an even stronger commitment by publishers.
The Open COVID Pledge was launched in April 2020 by an international coalition of scientists, lawyers, and technology companies, and calls on authors to make all intellectual property (IP) under their control available, free of charge, and without encumbrances to help end the COVID-19 pandemic, and reduce the impact of the disease….
Remaining challenges
While clinical, epidemiological and laboratory data about COVID-19 is widely available, including genomic sequencing of the pathogen, a number of challenges remain:
All data is not sufficiently findable, accessible, interoperable and reusable (FAIR), or not yet FAIR data.
Sources of data tend to be dispersed, even though many pooling initiatives are under way, curation needs to be operated “on the fly”.
Providing access to personal health record sharing needs to be readily accessible, pending the patient’s consent. Legislation aimed at fostering interoperability and avoiding information blocking are yet to be passed in many OECD countries. Access across borders is even more difficult under current data protection frameworks in most OECD countries.
In order to achieve the dual objectives of respecting privacy while ensuring access to machine readable, interoperable and reusable clinical data, the Virus Outbreak Data Network (VODAN) proposes to create FAIR data repositories which could be used by incoming algorithms (virtual machines) to ask specific research questions.
In addition, many issues arise around the interpretation of data – this can be illustrated by the widely followed epidemiological statistics. Typically, the statistics concern “confirmed cases”, “deaths” and “recoveries”. Each of these items seem to be treated differently in different countries, and are sometimes subject to methodological changes within the same country.
Specific standards for COVID-19 data therefore need to be established, and this is one of the priorities of the UK COVID-19 Strategy. A working group within Research Data Alliance has been set up to propose such standards at an international level.
In some cases it could be inferred that the transparency of the statistics may have guided governments to restrict testing in order to limit the number of “confirmed cases” and avoid the rapid rise of numbers. Lower testing rates can in turn reduce the efficiency of quarantine measures, lowering the overall efficiency of combating the disease….(More)”.
Article by Virginia Barbour and Martin Borchert: “In the few months since the first case of COVID-19 was identified, the underlying cause has been isolated, its symptoms agreed on, its genome sequenced, diagnostic tests developed, and potential treatments and vaccines are on the horizon. The astonishingly short time frame of these discoveries has only happened through a global open science effort.
The principles and practices underpinning open science are what underpin good research—research that is reliable, reproducible, and has the broadest impact possible. It specifically requires the application of principles and practices that make research FAIR (Findable, Accessible, Interoperable, Reusable); researchers are making their data and preliminary publications openly accessible, and then publishers are making the peer-reviewed research immediately and freely available to all. The rapid dissemination of research—through preprints in particular as well as journal articles—stands in contrast to what happened in the 2003 SARS outbreak when the majority of research on the disease was published well after the outbreak had ended.
Many outside observers might reasonably assume, given the digital world we all now inhabit, that science usually works like this. Yet this is very far from the norm for most research. Science is not something that just happens in response to emergencies or specific events—it is an ongoing, largely publicly funded, national and international enterprise….
Once published, even access to research is not seamless. The majority of academic journals still require a subscription to access. Subscriptions are expensive; Australian universities alone currently spend more than $300 million per year on subscriptions to academic journals. Access to academic journals also varies between universities with varying library budgets. The main markets for subscriptions to the commercial journal literature are higher education and health, with some access to government and commercial….(More)”.
Blog by Walter J. Radermacher at Data & Policy: “It is rightly pointed out that in the midst of a crisis of enormous dimensions we needed high quality statistics with utmost urgency, but that instead we are in danger of drowning in an ocean of data and information. The pandemic is accompanied and exacerbated by an infodemic. At this moment, and in this confusion and search for solutions, it seems appropriate to take advice from previous initiatives and draw lessons for the current situation. More than 20 years ago in the United Kingdom, the report “Statistics — A Matter of Trust” laid the foundations for overcoming the previously spreading crisis of confidence through a solidly structured statistical system. This report does not stand alone in international comparison. Rather, it is one of a series of global, European and national measures and agreements which, since the fall of the Berlin Wall in 1989, have strengthened official statistics as the backbone of policy in democratic societies, with the UN Fundamental Statistical Principles and the EU Statistics Code of Practice being prominent representatives. So, if we want to deal with our current difficulties, we should address precisely those points that have emerged as determining factors for the quality of statistics, with the following three questions: What (statistical products, quality profile)? How (methods)? Who (institutions)? The aim must be to ensure that statistical information is suitable for facilitating the resolution of conflicts by eliminating the need to argue about the facts and only about the conclusions to be drawn from them.
In the past, this task would have led relatively quickly to a situation where the need for information would have been directed to official statistics as the preferred provider; this has changed recently for many reasons. On the one hand, there is the danger that the much-cited data revolution and learning algorithms (so-called AI) are presented as an alternative to official statistics (which are perceived as too slow, too inflexible and too expensive), instead of emphasizing possible commonalities and cross-fertilization possibilities. On the other hand, after decades of austerity policies, official statistics are in a similarly defensive situation to that of the public health system in many respects and in many countries: There is a lack of financial reserves, personnel and know-how for the new and innovative work now so urgently needed.
It is therefore required, as in the 1990s, to ask the fundamental question again, namely, do we (still and again) really deserve official statistics as the backbone of democratic decision-making, and if so, what should their tasks be, how should they be financed and anchored in the political system?…(More)”.
Blog by Amen Ra Mashariki: “Governments should protect the data and privacy rights of their communities even during emergencies. It is a false trade-off to require more data without protection. We can and should do both — collect the appropriate data and protect it. Establishing and protecting the data rights and privacy of our communities’ underserved, underrepresented, disabled, and vulnerable residents is the only way we can combat the negative impact of COVID-19 or any other crisis.
Building trust is critical. Governments can strengthen data privacy protocols, beef up transparency mechanisms, and protect the public’s data rights in the name of building trust — especially with the most vulnerable populations. Otherwise, residents will opt out of engaging with government, and without their information, leaders like first responders will be blind to their existence when making decisions and responding to emergencies, as we are seeing with COVID-19.
As Chief Analytics Officer of New York City, I often remembered the words of Defense Secretary Donald Rumsfeld, especially with regards to using data during emergencies, that there are “known knowns, known unknowns, and unknown unknowns, and we will always get hurt by the unknown unknowns.” Meaning the things we didn’t know — the data that we didn’t have — was always going to be what hurt us during times of emergencies….
There are three key steps that governments can do right now to use data most effectively to respond to emergencies — both for COVID-19 and in the future.
Seek Open Data First
In times of crisis and emergencies, many believe that government and private entities, either purposefully or inadvertently, are willing to trample on the data rights of the public in the name of appropriate crisis response. This should not be a trade-off. We can respond to crises while keeping data privacy and data rights in the forefront of our minds. Rather than dismissing data rights, governments can start using data that is already openly available. This seems like a simple step, but it does two very important things. First, it forces you to understand the data that is already available in your jurisdiction. Second, it grows your ability to fill the gaps with respect to what you know about the city by looking outside of city government. …(More)”.
Andrew Young at The GovLab: “The GovLab and UNICEF, as part of the Responsible Data for Children initiative (RD4C), are pleased to share a set of user-friendly tools to support organizations and practitioners seeking to operationalize the RD4C Principles. These principles—Purpose-Driven, People-Centric, Participatory, Protective of Children’s Rights, Proportional, Professionally Accountable, and Prevention of Harms Across the Data Lifecycle—are especially important in the current moment, as actors around the world are taking a data-driven approach to the fight against COVID-19.
The initial components of the RD4C Toolkit are:
The RD4C Data Ecosystem Mapping Toolintends to help users to identify the systems generating data about children and the key components of those systems. After using this tool, users will be positioned to understand the breadth of data they generate and hold about children; assess data systems’ redundancies or gaps; identify opportunities for responsible data use; and achieve other insights.
The RD4C Decision Provenance Mapping methodology provides a way for actors designing or assessing data investments for children to identify key decision points and determine which internal and external parties influence those decision points. This distillation can help users to pinpoint any gaps and develop strategies for improving decision-making processes and advancing more professionally accountable data practices.
The RD4C Opportunity and Risk Diagnostic provides organizations with a way to take stock of the RD4C principles and how they might be realized as an organization reviews a data project or system. The high-level questions and prompts below are intended to help users identify areas in need of attention and to strategize next steps for ensuring more responsible handling of data for and about children across their organization.
Finally, the Data for Children Collaborative with UNICEF developed an Ethical Assessment that “forms part of [their] safe data ecosystem, alongside data management and data protection policies and practices.” The tool reflects the RD4C Principles and aims to “provide an opportunity for project teams to reflect on the material consequences of their actions, and how their work will have real impacts on children’s lives.
To learn more about Responsible Data for Children, visit rd4c.org or contact rd4c [at] thegovlab.org. To join the RD4C conversation and be alerted to future releases, subscribe at this link.”
Blogpost by Ania Calderon: “The rapid spread of this disease is exposing fault lines in our political and social balance — most visibly in the lack of protection for the poorest or investment in healthcare systems. It’s also forcing us to think about how we can work across jurisdictions and political contexts to foster better collaboration, build trust in institutions, and save lives.
As we said recently in a call for Open COVID-19 Data, governments need data from other countries to model and flatten the curve, but there is little consistency in how they gather it. Meanwhile, the consequences of different approaches show the balance required in effectively implementing open data policies. For example, Singapore has published detailed personal data about every coronavirus patient, including where they work and live and whether they had contact with others. This helped the city-state keep its infection and death rates extremely low in the early stages of the epidemic, but also led to proportionality concerns as people might be targeted and harmed.
Overall, few governments are publishing the information on which they are basing these huge decisions. This makes it hard to collaborate, scrutinise, and build trust. For example, the models can only be as good as the data that feed them, and we need to understand their limitations. Opening up the data and the source code behind them would give citizens confidence that officials were making decisions in the public’s interest rather than their political ones. It would also foster the international joined-up action needed to meet this challenge. And it would allow non-state actors into the process to plug gaps and deliver and scale effective solutions quickly.
As we say in our strategy, openness needs to be balanced with both individual and collective data rights, and policies need to account for context.
People may be ok to give up some of their privacy — like having their movements tracked by government smartphone apps — if that can help combat a global health crisis, but that would seem an unthinkable invasion of privacy to many in less exceptional times. We rightly worry how this data might be used later on, and by whom. Which shows that data systems need to be able to respond to changing times, while holding fundamental human rights and civil liberties in check.
As with so many things, this crisis is forcing the world to question orthodoxies around individual and collective data rights and needs. It shines a light on policies and approaches which might help avoid future disasters and build a fairer, healthier, more collaborative society overall….(More)”.
Without a doubt, there is truth in such statements. But they also leave out a major shortcoming — the fact that much of the most useful data continue to remain inaccessible, hidden in silos, behind digital walls, and in untapped “treasuries.”
For close to a decade, the technology and public interest community have pushed the idea of open data. At its core, open data represents a new paradigm of data availability and access. The movement borrows from the language of open source and is rooted in notions of a “knowledge commons”, a concept developed, among others, by scholars like Nobel Prize winner Elinor Ostrom.
Milestones and Limitations in Open Data
Significant milestones have been achieved in the short history of the open data movement. Around the world, an ever-increasing number of governments at the local, state and national levels now release large datasets for the public’s benefit. For example, New York City requires that all public data be published on a single web portal. The current portal site contains thousands of datasets that fuel projects on topics as diverse as school bullying, sanitation, and police conduct. In California, the Forest Practice Watershed Mapper allows users to track the impact of timber harvesting on aquatic life through the use of the state’s open data. Similarly, Denmark’s Building and Dwelling Register releases address data to the public free of charge, improving transparent property assessment for all interested parties.
A growing number of private companies have also initiated or engaged in “Data Collaborative”projects to leverage their private data toward the public interest. For example, Valassis, a direct-mail marketing company, shared its massive address database with community groups in New Orleans to visualize and track block-by-block repopulation rates after Hurricane Katrina. A wide number of data collaboratives are also currently being launched to respond to the COVID-19 pandemic. Through its COVID-19 Data Collaborative Program, the location-intelligence company Cuebiq is providing researchers access to the company’s data to study, for instance, the impacts of social distancing policies in Italy and New York City. The health technology company Kinsa Health’s US Health Weather initiative is likewise visualizing the rate of fever across the United States using data from its network of Smart Thermometers, thereby providing early indications regarding the location of likely COVID-19 outbreaks.
Yet despite such initiatives, many open data projects (and data collaboratives) remain fledgling — especially those at the state and local level.
Among other issues, the field has trouble scaling projects beyond initial pilots, and many potential stakeholders — private sector and government “owners” of data, as well as public beneficiaries — remain skeptical of open data’s value. In addition, terabytes of potentially transformative data remain inaccessible for re-use. It is absolutely imperative that we continue to make the case to all stakeholders regarding the importance of open data, and of moving it from an interesting idea to an impactful reality. In order to do this, we need a new resource — one that can inform the public and data owners, and that would guide decision-makers on how to achieve open data in a responsible manner, without undermining privacy and other rights.
Purpose of the Open Data Policy Lab
Today, with support from Microsoft and under the counsel of a global advisory board of open data leaders, The GovLab is launching an initiative designed precisely to build such a resource.
Our Open Data Policy Lab will draw on lessons and experiences from around the world to conduct analysis, provide guidance, build community, and take action to accelerate the responsible re-use and opening of data for the benefit of society and the equitable spread of economic opportunity…(More)”.
Essay by Frank D. LoMonte at the Journal of Civic Information: “In an April 1 interview with NPR’s “Morning Edition,” retired U.S. Army Gen. Stanley A. McChrystal, former commander of U.S. forces in Iraq, explained that, in a crisis situation, accurate information from government authorities can be crucial in reassuring the public – and in the absence of accurate information, speculation and rumor will proliferate. Joni Mitchell, who’s probably never before appeared in the same paragraph with Stanley McChrystal, might have put it a touch more poetically: “Don’t it always seem to go; That you don’t know what you’ve got ’til it’s gone.”
The outbreak of the coronavirus strain COVID-19, which prompted the U.S. Department of Health and Human Services to declare a public health emergency on Jan. 31, 2020,3 is introducing Americans to a newfound world of austerity and loss. Professional haircuts, sit-down restaurant meals and recreational plane flights increasingly seem like memories from a bygone golden age (small inconveniences, to be sure, alongside the suffering of thousands who’ve died and the families they’ve left behind).
Access to information from government agencies, too, is adapting to a mail-order, drive-through society. As public-health authorities reached consensus that the spread of COVID-19 could be contained only by eliminating non-essential travel and group gatherings, strict adherence to open-meeting and public-records laws became a casualty alongside salad bars and theme-park rides. Governors and legislatures relaxed, or entirely waived, compliance with statutes that require agencies to open their meetings to in-person public attendance and promptly fulfill requests for documents.
As with all other areas of public life, some sacrifices in open-government formalities are unavoidable. With agencies down to a sustenance-level crew of essential workers, it’s unrealistic to expect that decades-old paper documents will be speedily located and produced. And it’s unsafe to invite people to congregate at public hearings to address their elected officials. But the public shouldn’t be alone in the sacrifice….(More)”.
Article by Jeni Tennison: “Studying the past is futile in an unprecedented crisis. Science is the answer – and open-source information is paramount…Data is a necessary ingredient in day-to-day decision-making – but in this rapidly evolving situation, it’s especially vital. Everything has changed, almost overnight. Demands for food, transport, and energy have been overhauled as more people stop travelling and work from home. Jobs have been lost in some sectors, and workers are desperately needed in others. Historic experience can no longer tell us how our society or economy is working. Past models hold little predictive power in an unprecedented situation. To know what is happening right now, we need up-to-date information….
This data is also crucial for scientists, who can use it to replicate and build upon each other’s work. Yet no open data has been published alongside the evidence for the UK government’s coronavirus response. While a model that informed the US government’s response is freely available as a Google spreadsheet, the Imperial College London model that prompted the current lockdown has still not been published as open-source code. Making data open – publishing it on the web, in spreadsheets, without restrictions on access – is the best way to ensure it can be used by the people who need it most.
There is currently no open data available on UK hospitalisation rates; no regional, age or gender breakdown of daily deaths. The more granular breakdown of registered deaths provided by the Office of National Statistics is only published on a weekly basis, and with a delay. It is hard to tell whether this data does not exist or the NHS has prioritised creating dashboards for government decision makers rather than informing the rest of the country. But the UK is making progress with regard to data: potential Covid-19 cases identified through online and call-centre triage are now being published daily by NHS Digital.
Of course, not all data should be open. Singapore has been publishing detailed data about every infected person, including their age, gender, workplace, where they have visited and whether they had contact with other infected people. This can both harm the people who are documented and incentivise others to lie to authorities, undermining the quality of data.
When people are concerned about how data about them is handled, they demand transparency. To retain our trust, governments need to be open about how data is collected and used, how it’s being shared, with whom, and for what purpose. Openness about the use of personal data to help tackle the Covid-19 crisis will become more pressing as governments seek to develop contact tracing apps and immunity passports….(More)”.