To turn the open data revolution from idea to reality, we need more evidence

Stefaan Verhulst at apolitical: “The idea that we are living in a data age — one characterised by unprecedented amounts of information with unprecedented potential — has  become mainstream. We regularly read “data is the new oil,” or “data is the most valuable commodity in the global economy.”

Doubtlessly, there is truth in these statements. But a major, often unacknowledged problem is how much data remains inaccessible, hidden in siloes and behind walls.

For close to a decade, the technology and public interest community has pushed the idea of open data. At its core, open data represents a new paradigm of information and information access.

Rooted in notions of an information commons — developed by scholars like Nobel Prize winner Elinor Ostrom — and borrowing from the language of open source, open data begins from the premise that data collected from the public, often using public funds or publicly funded infrastructure, should also belong to the public — or at least, be made broadly accessible to those pursuing public-interest goals.

The open data movement has reached significant milestones in its short history. An ever-increasing number of governments across both developed and developing economies have released large datasets for the public’s benefit….

Similarly, a growing number of private companies have “Data Collaboratives” leveraging their data — with various degrees of limitations — to serve the public interest.

Despite such initiatives, many open data projects (and data collaboratives) remain fledgling. The field has trouble scaling projects beyond initial pilots. In addition, many potential stakeholders — private sector and government “owners” of data, as well as public beneficiaries — remain sceptical of open data’s value. Such limitations need to be overcome if open data and its benefits are to spread. We need hard evidence of its impact.

Ironically, the field is held back by an absence of good data on open data — that is, a lack of reliable empirical evidence that could guide new initiatives.

At the GovLab, a do-tank at New York University, we study the impact of open data. One of our overarching conclusions is that we need a far more solid evidence base to move open data from being a good idea to reality.

What do we know? Several initiatives undertaken at the GovLab offer insight. Our ODImpactwebsite now includes more than 35 detailed case studies of open government data projects. These examples provide powerful evidence not only that open data can work but also about howit works….

We have also launched an Open Data Periodic Table to better understand what conditions predispose an open data project toward success or failure. For example, having a clear problem definition, as well as the capacity and culture to carry out open data projects, are vital. Successful projects also build cross-sector partnerships around open data and its potential uses and establish practices to assess and mitigate risks, and have transparent and responsive governance structures….(More)”.

The Three Goals and Five Functions of Data Stewards

Medium Article by Stefaan G. Verhulst: “…Yet even as we see more data steward-type roles defined within companies, there exists considerable confusion about just what they should be doing. In particular, we have noticed a tendency to conflate the roles of data stewards with those of individuals or groups who might be better described as chief privacy, chief data or security officers. This slippage is perhaps understandable, but our notion of the role is somewhat broader. While privacy and security are of course key components of trusted and effective data collaboratives, the real goal is to leverage private data for broader social goals — while preventing harm.

So what are the necessary attributes of data stewards? What are their roles, responsibilities, and goals of data stewards? And how can they be most effective, both as champions of sharing within organizations and as facilitators for leveraging data with external entities? These are some of the questions we seek to address in our current research, and below we outline some key preliminary findings.

The following “Three Goals” and “Five Functions” can help define the aspirations of data stewards, and what is needed to achieve the goals. While clearly only a start, these attributes can help guide companies currently considering setting up sharing initiatives or establishing data steward-like roles.

The Three Goals of Data Stewards

  • Collaborate: Data stewards are committed to working and collaborating with others, with the goal of unlocking the inherent value of data when a clear case exists that it serves the public good and that it can be used in a responsible manner.
  • Protect: Data stewards are committed to managing private data ethically, which means sharing information responsibly, and preventing harm to potential customers, users, corporate interests, the wider public and of course those individuals whose data may be shared.
  • Act: Data stewards are committed to pro-actively acting in order to identify partners who may be in a better position to unlock value and insights contained within privately held data.


How Charities Are Using Artificial Intelligence to Boost Impact

Nicole Wallace at the Chronicle of Philanthropy: “The chaos and confusion of conflict often separate family members fleeing for safety. The nonprofit Refunite uses advanced technology to help loved ones reconnect, sometimes across continents and after years of separation.

Refugees register with the service by providing basic information — their name, age, birthplace, clan and subclan, and so forth — along with similar facts about the people they’re trying to find. Powerful algorithms search for possible matches among the more than 1.1 million individuals in the Refunite system. The analytics are further refined using the more than 2,000 searches that the refugees themselves do daily.

The goal: find loved ones or those connected to them who might help in the hunt. Since Refunite introduced the first version of the system in 2010, it has helped more than 40,000 people reconnect.

One factor complicating the work: Cultures define family lineage differently. Refunite co-founder Christopher Mikkelsen confronted this problem when he asked a boy in a refugee camp if he knew where his mother was. “He asked me, ‘Well, what mother do you mean?’ ” Mikkelsen remembers. “And I went, ‘Uh-huh, this is going to be challenging.’ ”

Fortunately, artificial intelligence is well suited to learn and recognize different family patterns. But the technology struggles with some simple things like distinguishing the image of a chicken from that of a car. Mikkelsen believes refugees in camps could offset this weakness by tagging photographs — “car” or “not car” — to help train algorithms. Such work could earn them badly needed cash: The group hopes to set up a system that pays refugees for doing such work.

“To an American, earning $4 a day just isn’t viable as a living,” Mikkelsen says. “But to the global poor, getting an access point to earning this is revolutionizing.”

Another group, Wild Me, a nonprofit created by scientists and technologists, has created an open-source software platform that combines artificial intelligence and image recognition, to identify and track individual animals. Using the system, scientists can better estimate the number of endangered animals and follow them over large expanses without using invasive techniques….

To fight sex trafficking, police officers often go undercover and interact with people trying to buy sex online. Sadly, demand is high, and there are never enough officers.

Enter Seattle Against Slavery. The nonprofit’s tech-savvy volunteers created chatbots designed to disrupt sex trafficking significantly. Using input from trafficking survivors and law-enforcement agencies, the bots can conduct simultaneous conversations with hundreds of people, engaging them in multiple, drawn-out conversations, and arranging rendezvous that don’t materialize. The group hopes to frustrate buyers so much that they give up their hunt for sex online….

A Philadelphia charity is using machine learning to adapt its services to clients’ needs.

Benefits Data Trust helps people enroll for government-assistance programs like food stamps and Medicaid. Since 2005, the group has helped more than 650,000 people access $7 billion in aid.

The nonprofit has data-sharing agreements with jurisdictions to access more than 40 lists of people who likely qualify for government benefits but do not receive them. The charity contacts those who might be eligible and encourages them to call the Benefits Data Trust for help applying….(More)”.

What is a data trust?

Essay by Jack Hardinges at ODI: “There are different interpretations of what a data trust is, or should be…

There’s not a well-used definition of ‘a data trust’, or even consensus on what one is. Much of the recent interest in data trusts in the UK has been fuelled by them being recommended as a way to ‘share data in a fair, safe and equitable way’ by a UK government-commissioned independent review into Artificial Intelligence (AI) in 2017. However, there has been wider international interest in the concept for some time.

At a very high level, the aim of data trusts appears to be to give people and organisations confidence when enabling access to data in ways that provide them with some value (either directly or indirectly) in return. Beyond that high level goal, there are a variety of thoughts about what form they should take. In our work so far, we’ve found different interpretations of the term ‘data trust’:

  • A data trust as a repeatable framework of terms and mechanisms.
  • A data trust as a mutual organisation.
  • A data trust as a legal structure.
  • A data trust as a store of data.
  • A data trust as public oversight of data access….(More)”

Data Stewards: Data Leadership to Address 21st Century Challenges

Post by Stefaan Verhulst: “…Over the last two years, we have focused on the opportunities (and challenges) surrounding what we call “data collaboratives.” Data collaboratives are an emerging form of public-private partnership, in which information held by companies (or other entities) is shared with the public sector, civil society groups, research institutes and international organizations. …

For all its promise, the practice of data collaboratives remains ad hoc and limited. In part, this is a result of the lack of a well-defined, professionalized concept of data stewardship within corporations that has a mandate to explore ways to harness the potential of their data towards positive public ends.

Today, each attempt to establish a cross-sector partnership built on the analysis of private-sector data requires significant and time-consuming efforts, and businesses rarely have personnel tasked with undertaking such efforts and making relevant decisions.

As a consequence, the process of establishing data collaboratives and leveraging privately held data for evidence-based policy making and service delivery is onerous, generally one-off, not informed by best practices or any shared knowledge base, and prone to dissolution when the champions involved move on to other functions.

By establishing data stewardship as a corporate function, recognized and trusted within corporations as a valued responsibility, and by creating the methods and tools needed for responsible data-sharing, the practice of data collaboratives can become regularized, predictable, and de-risked….

To take stock of current practice and scope needs and opportunities we held a small yet in-depth kick-off event at the offices of the Cloudera Foundation in San Francisco on May 8th 2018 that was attended by representatives from Linkedin, Facebook, Uber, Mastercard, DigitalGlobe, Cognizant, Streetlight Data, the World Economic Forum, and Nethope — among others.

Four Key Take Aways

The discussions were varied and wide-ranging.

Several reflected on the risks involved — including the risks of NOT sharing or collaborating on privately held data that could improve people’s lives (and in some occasions save lives).

Others warned that the window of opportunity to increase the practice of data collaboratives may be closing — given new regulatory requirements and other barriers that may disincentivize corporations from engaging with third parties around their data.

Ultimately four key take aways emerged. These areas — at the nexus of opportunities and challenges — are worth considering further, because they help us better understand both the potential and limitations of data collaboratives….(More)”

City Data Exchange – Lessons Learned From A Public/Private Data Collaboration

Report by the Municipality of Copenhagen: “The City Data Exchange (CDE) is the product of a collaborative project between the Municipality of Copenhagen, the Capital Region of Denmark, and Hitachi. The purpose of the project is to examine the possibilities of creating a marketplace for the exchange of data between public and private organizations.

The CDE consists of three parts:

  • A collaboration between the different partners on supply, and demand of specific data;
  • A platform for selling and purchasing data aimed at both public, and private organizations;
  • An effort to establish further experience in the field of data exchange between public, and private organizations.

In 2013, the City of Copenhagen, and the Copenhagen Region decided to invest in the creation of a marketplace for the exchange of public, and private sector data. The initial investment was meant as a seed towards a self-sustained marketplace. This was an innovative approach to test the readiness of the market to deliver new data-sharing solutions.

The CDE is the result of a tender by the Municipality of Copenhagen and the Capital Region of Denmark in 2015. Hitachi Consulting won the tender and has invested, and worked with the Municipality of Copenhagen, and the Capital Region of Denmark to establish an organization and a technical platform.

The City Data Exchange (CDE) has closed a gap in regional data infrastructure. Both public-and private sector organizations have used the CDE to gain insights into data use cases, new external data sources, GDPR issues, and to explore the value of their data. Before the CDE was launched, there were only a few options available to purchase or sell data.

The City and the Region of Copenhagen are utilizing the insights from the CDE project to improve their internal activities and to shape new policies. The lessons from the CDE also provide insights into a wider national infrastructure for effective data sharing. Based on the insights from approximately 1000 people that the CDE has been in contact with, the recommendations are:

  • Start with the use case, as it is key to engage the data community that will use the data;
  • Create a data competence hub, where the data community can meet and get support;
  • Create simple standards and guidelines for data publishing.

The following paper presents some of the key findings from our work with the CDE. It has been compiled by Smart City Insights on behalf of the partners of the City Data Exchange project…(More)”.

4 reasons why Data Collaboratives are key to addressing migration

Stefaan Verhulst and Andrew Young at the Migration Data Portal: “If every era poses its dilemmas, then our current decade will surely be defined by questions over the challenges and opportunities of a surge in migration. The issues in addressing migration safely, humanely, and for the benefit of communities of origin and destination are varied and complex, and today’s public policy practices and tools are not adequate. Increasingly, it is clear, we need not only new solutions but also new, more agile, methods for arriving at solutions.

Data are central to meeting these challenges and to enabling public policy innovation in a variety of ways. Yet, for all of data’s potential to address public challenges, the truth remains that most data generated today are in fact collected by the private sector. These data contains tremendous possible insights and avenues for innovation in how we solve public problems. But because of access restrictions, privacy concerns and often limited data science capacity, their vast potential often goes untapped.

Data Collaboratives offer a way around this limitation.

Data Collaboratives: A new form of Public-Private Partnership for a Data Age

Data Collaboratives are an emerging form of partnership, typically between the private and public sectors, but often also involving civil society groups and the education sector. Now in use across various countries and sectors, from health to agriculture to economic development, they allow for the opening and sharing of information held in the private sector, in the process freeing data silos up to serve public ends.

Although still fledgling, we have begun to see instances of Data Collaboratives implemented toward solving specific challenges within the broad and complex refugee and migrant space. As the examples we describe below suggest (which we examine in more detail Stanford Social Innovation Review), the use of such Collaboratives is geographically dispersed and diffuse; there is an urgent need to pull together a cohesive body of knowledge to more systematically analyze what works, and what doesn’t.

This is something we have started to do at the GovLab. We have analyzed a wide variety of Data Collaborative efforts, across geographies and sectors, with a goal of understanding when and how they are most effective.

The benefits of Data Collaboratives in the migration field

As part of our research, we have identified four main value propositions for the use of Data Collaboratives in addressing different elements of the multi-faceted migration issue. …(More)”,

Data Stewards: Data Leadership to Address the Challenges of the 21st Century

Data Stewards_screenshot

The GovLab at the NYU Tandon School of Engineering is pleased to announce the launch of its Data Stewards website — a new portal for connecting organizations across sectors that seek to promote responsible data leadership that can address the challenges of the 21st century — developed with generous support from the William and Flora Hewlett Foundation.

Increasingly, the private sector is collaborating with the public sector and researchers on ways to use private-sector data and analytical expertise for public good. With these new practices of data collaborations come the need to reimagine roles and responsibilities to steer the process of using this data, and the insights it can generate, to address society’s biggest questions and challenges: Data Stewards.

Today, establishing and sustaining these new collaborative and accountable approaches requires significant and time-consuming effort and investment of resources for both data holders on the supply side, and institutions that represent the demand. By establishing Data Stewardship as a function — recognized within the private sector as a valued responsibility — the practice of Data Collaboratives can become more predictable, scaleable, sustainable and de-risked.

Together with BrightFront Group and Adapt, we are:

  • Exploring the needs and priorities of current private sector Data Stewards who act as change agents within their firms. Responsible for determining what, when, how and with whom to share private data for public good, these individuals are critical catalysts for ensuring insights are turned into action.
  • Identifying and connecting existing Data Stewards across sectors and regions to create an online and in-person community for exchanging knowledge and best practices.
  • Developing methodologies, tools and frameworks to use data more responsibly, systematically and efficiently to decrease the transaction cost, time and energy currently needed to establish Data Collaboratives.

To learn more about the Data Stewards Initiative, including new insights, ideas, tools and information about the Data Steward of the Year Award program, visit

If you are a Data Steward, or would like to join a community of practice to learn from your peers, please contact to join the Network of Data Stewards.

For more information about The GovLab, visit

Creating a Machine Learning Commons for Global Development

Blog by Hamed Alemohammad: “Advances in sensor technology, cloud computing, and machine learning (ML) continue to converge to accelerate innovation in the field of remote sensing. However, fundamental tools and technologies still need to be developed to drive further breakthroughs and to ensure that the Global Development Community (GDC) reaps the same benefits that the commercial marketplace is experiencing. This process requires us to take a collaborative approach.

Data collaborative innovation — that is, a group of actors from different data domains working together toward common goals — might hold the key to finding solutions for some of the global challenges that the world faces. That is why Radiant.Earth is investing in new technologies such as Cloud Optimized GeoTiffsSpatial Temporal Asset Catalogues (STAC), and ML. Our approach to advance ML for global development begins with creating open libraries of labeled images and algorithms. This initiative and others require — and, in fact, will thrive as a result of — using a data collaborative approach.

“Data is only as valuable as the decisions it enables.”

This quote by Ion Stoica, professor of computer science at the University of California, Berkeley, may best describe the challenge facing those of us who work with geospatial information:

How can we extract greater insights and value from the unending tsunami of data that is before us, allowing for more informed and timely decision making?…(More).

UK can lead the way on ethical AI, says Lords Committee

Lords Select Committee: “The UK is in a strong position to be a world leader in the development of artificial intelligence (AI). This position, coupled with the wider adoption of AI, could deliver a major boost to the economy for years to come. The best way to do this is to put ethics at the centre of AI’s development and use concludes a report by the House of Lords Select Committee on Artificial Intelligence, AI in the UK: ready, willing and able?, published today….

One of the recommendations of the report is for a cross-sector AI Code to be established, which can be adopted nationally, and internationally. The Committee’s suggested five principles for such a code are:

  1. Artificial intelligence should be developed for the common good and benefit of humanity.
  2. Artificial intelligence should operate on principles of intelligibility and fairness.
  3. Artificial intelligence should not be used to diminish the data rights or privacy of individuals, families or communities.
  4. All citizens should have the right to be educated to enable them to flourish mentally, emotionally and economically alongside artificial intelligence.
  5. The autonomous power to hurt, destroy or deceive human beings should never be vested in artificial intelligence.

Other conclusions from the report include:

  • Many jobs will be enhanced by AI, many will disappear and many new, as yet unknown jobs, will be created. Significant Government investment in skills and training will be necessary to mitigate the negative effects of AI. Retraining will become a lifelong necessity.
  • Individuals need to be able to have greater personal control over their data, and the way in which it is used. The ways in which data is gathered and accessed needs to change, so that everyone can have fair and reasonable access to data, while citizens and consumers can protect their privacy and personal agency. This means using established concepts, such as open data, ethics advisory boards and data protection legislation, and developing new frameworks and mechanisms, such as data portability and data trusts.
  • The monopolisation of data by big technology companies must be avoided, and greater competition is required. The Government, with the Competition and Markets Authority, must review the use of data by large technology companies operating in the UK.
  • The prejudices of the past must not be unwittingly built into automated systems. The Government should incentivise the development of new approaches to the auditing of datasets used in AI, and also to encourage greater diversity in the training and recruitment of AI specialists.
  • Transparency in AI is needed. The industry, through the AI Council, should establish a voluntary mechanism to inform consumers when AI is being used to make significant or sensitive decisions.
  • At earlier stages of education, children need to be adequately prepared for working with, and using, AI. The ethical design and use of AI should become an integral part of the curriculum.
  • The Government should be bold and use targeted procurement to provide a boost to AI development and deployment. It could encourage the development of solutions to public policy challenges through speculative investment. There have been impressive advances in AI for healthcare, which the NHS should capitalise on.
  • It is not currently clear whether existing liability law will be sufficient when AI systems malfunction or cause harm to users, and clarity in this area is needed. The Committee recommend that the Law Commission investigate this issue.
  • The Government needs to draw up a national policy framework, in lockstep with the Industrial Strategy, to ensure the coordination and successful delivery of AI policy in the UK….(More)”.