What the drive for open science data can learn from the evolving history of open government data


Stefaan Verhulst, Andrew Young, and Andrew Zahuranec at The Conversation: “Nineteen years ago, a group of international researchers met in Budapest to discuss a persistent problem. While experts published an enormous amount of scientific and scholarly material, few of these works were accessible. New research remained locked behind paywalls run by academic journals. The result was researchers struggled to learn from one another. They could not build on one another’s findings to achieve new insights. In response to these problems, the group developed the Budapest Open Access Initiative, a declaration calling for free and unrestricted access to scholarly journal literature in all academic fields.

In the years since, open access has become a priority for a growing number of universitiesgovernments, and journals. But while access to scientific literature has increased, access to the scientific data underlying this research remains extremely limited. Researchers can increasingly see what their colleagues are doing but, in an era defined by the replication crisis, they cannot access the data to reproduce the findings or analyze it to produce new findings. In some cases there are good reasons to keep access to the data limited – such as confidentiality or sensitivity concerns – yet in many other cases data hoarding still reigns.

To make scientific research data open to citizens and scientists alike, open science data advocates can learn from open data efforts in other domains. By looking at the evolving history of the open government data movement, scientists can see both limitations to current approaches and identify ways to move forward from them….(More) (French version)”.

Wikipedia Is Finally Asking Big Tech to Pay Up


Noam Cohen at Wired: “From the start, Google and Wikipedia have been in a kind of unspoken partnership: Wikipedia produces the information Google serves up in response to user queries, and Google builds up Wikipedia’s reputation as a source of trustworthy information. Of course, there have been bumps, including Google’s bold attempt to replace Wikipedia with its own version of user-generated articles, under the clumsy name “Knol,” short for knowledge. Knol never did catch on, despite Google’s offer to pay the principal author of an article a share of advertising money. But after that failure, Google embraced Wikipedia even tighter—not only linking to its articles but reprinting key excerpts on its search result pages to quickly deliver Wikipedia’s knowledge to those seeking answers.

The two have grown in tandem over the past 20 years, each becoming its own household word. But whereas one mushroomed into a trillion-dollar company, the other has remained a midsize nonprofit, depending on the generosity of individual users, grant-giving foundations, and the Silicon Valley giants themselves to stay afloat. Now Wikipedia is seeking to rebalance its relationships with Google and other big tech firms like Amazon, Facebook, and Apple, whose platforms and virtual assistants lean on Wikipedia as a cost-free virtual crib sheet.

Today, the Wikimedia Foundation, which operates the Wikipedia project in more than 300 languages as well as other wiki-projects, is announcing the launch of a commercial product, Wikimedia Enterprise. The new service is designed for the sale and efficient delivery of Wikipedia’s content directly to these online behemoths (and eventually, to smaller companies too)….(More)”.

The Handbook: How to regulate?


Handbook edited by the Regulatory Institute: “…presents an inventory of regulatory techniques from over 40 jurisdictions and a basic universal method. The Handbook is based on the idea that officials with an inventory of regulatory techniques have more choices and can develop better regulations. The same goes for officials using methodological knowledge. The Handbook is made available free of charge because better regulations benefit us all….

The purpose of the Handbook is to assist officials involved in regulatory activities. Readers can draw inspiration from it, can learn how colleagues have tackled a certain regulatory challenge and can even develop a tailor-made systematic approach to improve their regulation. The Handbook can also be used as a basis for training courses or for self-training.

The Handbook is not intended to be read from A to Z. Instead, readers are invited to pick and choose the sections that are relevant to them. The Handbook was not developed to be the authoritative source of how to regulate, but to offer in the most neutral and objective way possibilities for improving regulation…

The Handbook explores the empty space between:

  • the constitution or similar documents setting the legal frame,
  • the sector-specific policies followed by the government, administration, or institution,
  • the impact assessment, better regulation, simplification, and other regulatory policies,
  • applicable drafting instructions or recommendations, and
  • the procedural settings of the respective jurisdiction….(More)”.

Thinking systems


Paper by Geoff Mulgan: “…describes methods for understanding how vital everyday systems work, and how they could work better, through improved shared cognition – observation, memory, creativity and judgement – organised as commons.

Much of our life we depend on systems: interconnected webs of activity that link many organisations, technologies and people. These bring us food and clothing; energy for warmth and light; mobility including rail, cars and global air travel; care, welfare and handling of waste. Arguably the biggest difference between the modern world and the world of a few centuries ago is the thickness and complexity of these systems. These have brought huge gains.

But one of their downsides is that they have made the world around us harder to understand or shape. A good example is the Internet: essential to much of daily life but largely obscure and opaque to its users. Its physical infrastructures, management, protocols and flows are almost unknown except to specialists, as are its governance structures and processes (if you are in any doubt, just ask a random sample of otherwise well-informed people). Other vital systems like those for food, energy or care are also hardly visible to those within them as well as those dependent on them. This makes it much harder to hold them to account, or to ensure they take account of more voices and needs. We often feel that the world is much more accessible thanks to powerful search engines and ubiquitous data. But try to get a picture of the systems around you and you quickly discover just how much is opaque and obscure.

If you think seriously about these systems it’s also hard not to be struck by another feature. Our systems generally use much more data and knowledge than their equivalents in the past. But this progress also highlights what’s missing in the data they use (often including the most important wants and needs). Moreover, huge amounts of potentially relevant data is lost immediately or never captured and how much that is captured is then neither organised nor shared. The result is a strangely lop-sided world: vast quantities of data are gathered and organised at great expense for some purposes (notably defense or click-through advertising)

So how could we recapture our systems and help them make the most of intelligence of all kinds? The paper shares methods and approaches that could make our everyday systems richer in intelligence and also easier to guide. It advocates:

· A cognitive approach to systems – focusing on how they think, and specifically how they observe, analyse, create and remember. It argues that this approach can help to bridge the often abstract language of systems thinking and practical action

· It advocates that much of this systems intelligence needs to be organised as a commons – which is very rarely the case now

· And it advocates new structures and roles within government and other organisations, and the growth of a practice of systems architects with skills straddling engineering, management, data and social science – who are adept at understanding, designing and improving intelligent systems that are transparent and self-aware.

The background to the paper is the great paradox of systems right now: there is a vast literature, a small industry of consultancies and labs, and no shortage of rhetorical commitment in many fields. Yet these have had at best uneven impact on how decisions are made or large organisations are run….(More)”.

The Ethics and Laws of Medical Big Data


Chapter by Hrefna Gunnarsdottir et al: “The COVID-19 pandemic has highlighted that leveraging medical big data can help to better predict and control outbreaks from the outset. However, there are still challenges to overcome in the 21st century to efficiently use medical big data, promote innovation and public health activities and, at the same time, adequately protect individuals’ privacy. The metaphor that property is a “bundle of sticks”, each representing a different right, applies equally to medical big data. Understanding medical big data in this way raises a number of questions, including: Who has the right to make money off its buying and selling, or is it inalienable? When does medical big data become sufficiently stripped of identifiers that the rights of an individual concerning the data disappear? How have different regimes such as the General Data Protection Regulation in Europe and the Health Insurance Portability and Accountability Act in the US answered these questions differently? In this chapter, we will discuss three topics: (1) privacy and data sharing, (2) informed consent, and (3) ownership. We will identify and examine ethical and legal challenges and make suggestions on how to address them. In our discussion of each of the topics, we will also give examples related to the use of medical big data during the COVID-19 pandemic, though the issues we raise extend far beyond it….(More)”.

The Third Wave of Open Data Toolkit


The GovLab: “Today, as part of Open Data Week 2021, the Open Data Policy Lab is launching  The Third Wave of Open Data Toolkit, which provides organizations with specific operational guidance on how to foster responsible, effective, and purpose-driven re-use. The toolkit—authored by Andrew Young, Andrew J. Zahuranec, Stefaan G. Verhulst, and Kateryna Gazaryan—supports the work of data stewards, responsible data leaders at public, private, and civil society organizations empowered to seek new ways to create public value through cross-sector data collaboration. The toolkit provides this support a few different ways. 

First, it offers a framework to make sense of the present and future open data ecosystem. Acknowledging that data re-use is the result of many stages, the toolkit separates each stage, identifying the ways the data lifecycle plays into data collaboration, the way data collaboration plays into the production of insights, the way insights play into conditions that enable further collaboration, and so on. By understanding the processes that data is created and used, data stewards can promote better and more impactful data management. 

Third Wave Framework

Second, the toolkit offers eight primers showing how data stewards can operationalize the actions previously identified as being part of the third wave. Each primer includes a brief explanation of what each action entails, offers some specific ways data stewards can implement these actions, and lists some supplementary pieces that might be useful in this work. The primers, which are available as part of the toolkit and as standalone two-pagers, are…(More)”.

2030 Compass CoLab


About: “2030 Compass CoLab invites a group of experts, using an online platform, to contribute their perspectives on potential interactions between the goals in the UN’s 2030 Agenda for Sustainable Development.

By combining the insight of participants who posses broad and diverse knowledge, we hope to develop a richer understanding of how the Sustainable Development Goals (SDGs) may be complementary or conflicting.

Compass 2030 CoLab is part of a larger project, The Agenda 2030 Compass Methodology and toolbox for strategic decision making, funded by Vinnova, Sweden’s government agency for innovation.

Other elements of the larger project include:

  • Deliberations by a panel of experts who will convene in a series of live meetings to undertake in-depth analysis on interactions between the goals. 
  • Quanitative analysis of SDG indicators time series data, which will examine historical correlations between progress on the SDGs.
  • Development of a knowledge repository, residing in a new software tool under development as part of the project. This tool will be made available as a resource to guide the decisions of corporate executives, policy makers, and leaders of NGOs.

The overall project was inspired by the work of researchers at the Stockholm Environment Institute, described in Towards systemic and contextual priority setting for implementing the 2030 Agenda, a 2018 paper in Sustainability Science by Nina Weitz, Henrik Carlsen, Måns Nilsson, and Kristian Skånberg….(More)”.

DNA databases are too white, so genetics doesn’t help everyone. How do we fix that?


Tina Hesman Saey at ScienceNews: “It’s been two decades since the Human Genome Project first unveiled a rough draft of our genetic instruction book. The promise of that medical moon shot was that doctors would soon be able to look at an individual’s DNA and prescribe the right medicines for that person’s illness or even prevent certain diseases.

That promise, known as precision medicine, has yet to be fulfilled in any widespread way. True, researchers are getting clues about some genetic variants linked to certain conditions and some that affect how drugs work in the body. But many of those advances have benefited just one group: people whose ancestral roots stem from Europe. In other words, white people.

Instead of a truly human genome that represents everyone, “what we have is essentially a European genome,” says Constance Hilliard, an evolutionary historian at the University of North Texas in Denton. “That data doesn’t work for anybody apart from people of European ancestry.”

She’s talking about more than the Human Genome Project’s reference genome. That database is just one of many that researchers are using to develop precision medicine strategies. Often those genetic databases draw on data mainly from white participants. But race isn’t the issue. The problem is that collectively, those data add up to a catalog of genetic variants that don’t represent the full range of human genetic diversity.

When people of African, Asian, Native American or Pacific Island ancestry get a DNA test to determine if they inherited a variant that may cause cancer or if a particular drug will work for them, they’re often left with more questions than answers. The results often reveal “variants of uncertain significance,” leaving doctors with too little useful information. This happens less often for people of European descent. That disparity could change if genetics included a more diverse group of participants, researchers agree (SN: 9/17/16, p. 8).

One solution is to make customized reference genomes for populations whose members die from cancer or heart disease at higher rates than other groups, for example, or who face other worse health outcomes, Hilliard suggests….(More)”.

Machine Learning Shows Social Media Greatly Affects COVID-19 Beliefs


Jessica Kent at HealthITAnalytics: “Using machine learning, researchers found that people’s biases about COVID-19 and its treatments are exacerbated when they read tweets from other users, a study published in JMIR showed.

The analysis also revealed that scientific events, like scientific publications, and non-scientific events, like speeches from politicians, equally influence health belief trends on social media.

The rapid spread of COVID-19 has resulted in an explosion of accurate and inaccurate information related to the pandemic – mainly across social media platforms, researchers noted.

“In the pandemic, social media has contributed to much of the information and misinformation and bias of the public’s attitude toward the disease, treatment and policy,” said corresponding study author Yuan Luo, chief Artificial Intelligence officer at the Institute for Augmented Intelligence in Medicine at Northwestern University Feinberg School of Medicine.

“Our study helps people to realize and re-think the personal decisions that they make when facing the pandemic. The study sends an ‘alert’ to the audience that the information they encounter daily might be right or wrong, and guide them to pick the information endorsed by solid scientific evidence. We also wanted to provide useful insight for scientists or healthcare providers, so that they can more effectively broadcast their voice to targeted audiences.”…(More)”.

Open Data Day 2021: How to unlock its potential moving forward?


Stefaan Verhulst, Andrew Young, and Andrew Zahuranec at Data and Policy: “For over a decade, data advocates have reserved one day out of the year to celebrate open data. Open Data Day 2021 comes at a time of unprecedented upheaval. As the world remains in the grip of COVID-19, open data researchers and practitioners must confront the challenge of how to use open data to address the types of complex, emergent challenges that are likely to define the rest of this century (and beyond). Amid threats like the ongoing pandemic, climate change, and systemic poverty, there is renewed pressure to find ways that open data can solve complex social, cultural, economic and political problems.

Over the past year, the Open Data Policy Lab, an initiative of The GovLab at NYU’s Tandon School of Engineering, held several sessions with leaders of open data from around the world. Over the course of these sessions, which we called the Summer of Open Data, we studied various strategies and trends, and identified future pathways for open data leaders to pursue. The results of this research suggest an emergent Third Wave of Open Data— one that offers a clear pathway for stakeholders of all types to achieve Open Data Day’s goal of “showing the benefits of open data and encouraging the adoption of open data policies in government, business, and civil society.”

The Third Wave of Open Data is central to how data is being collected, stored, shared, used, and reused around the world. In what follows, we explain this notion further, and argue that it offers a useful rubric through which to take stock of where we are — and to consider future goals — as we mark this latest iteration of Open Data Day.

The Past and Present of Open Data

The history of open data can be divided into several waves, each reflecting the priorities and values of the era in which they emerged….(More)”.

Image for post
The Three Waves of Open Data