Better Data for Better Policy: Accessing New Data Sources for Statistics Through Data Collaboratives


Medium Blog by Stefaan Verhulst: “We live in an increasingly quantified world, one where data is driving key business decisions. Data is claimed to be the new competitive advantage. Yet, paradoxically, even as our reliance on data increases and the call for agile, data-driven policy making becomes more pronounced, many Statistical Offices are confronted with shrinking budgets and an increased demand to adjust their practices to a data age. If Statistical Offices fail to find new ways to deliver “evidence of tomorrow”, by leveraging new data sources, this could mean that public policy may be formed without access to the full range of available and relevant intelligence — as most business leaders have. At worst, a thinning evidence base and lack of rigorous data foundation could lead to errors and more “fake news,” with possibly harmful public policy implications.

While my talk was focused on the key ways data can inform and ultimately transform the full policy cycle (see full presentation here), a key premise I examined was the need to access, utilize and find insight in the vast reams of data and data expertise that exist in private hands through the creation of new kinds of public and private partnerships or “data collaboratives” to establish more agile and data-driven policy making.

Screen Shot 2017-10-20 at 5.18.23 AM

Applied to statistics, such approaches have already shown promise in a number of settings and countries. Eurostat itself has, for instance, experimented together with Statistics Belgium, with leveraging call detail records provided by Proximus to document population density. Statistics Netherlands (CBS) recently launched a Center for Big Data Statistics (CBDS)in partnership with companies like Dell-EMC and Microsoft. Other National Statistics Offices (NSOs) are considering using scanner data for monitoring consumer prices (Austria); leveraging smart meter data (Canada); or using telecom data for complementing transportation statistics (Belgium). We are now living undeniably in an era of data. Much of this data is held by private corporations. The key task is thus to find a way of utilizing this data for the greater public good.

Value Proposition — and Challenges

There are several reasons to believe that public policy making and official statistics could indeed benefit from access to privately collected and held data. Among the value propositions:

  • Using private data can increase the scope and breadth and thus insights offered by available evidence for policymakers;
  • Using private data can increase the quality and credibility of existing data sets (for instance, by complementing or validating them);
  • Private data can increase the timeliness and thus relevance of often-outdated information held by statistical agencies (social media streams, for example, can provide real-time insights into public behavior); and
  • Private data can lower costs and increase other efficiencies (for example, through more sophisticated analytical methods) for statistical organizations….(More)”.

“Nudge units” – where they came from and what they can do


Zeina Afif at the Worldbank: “You could say that the first one began in 2009, when the US government recruited Cass Sunstein to head The Office of Information and Regulatory Affairs (OIRA) to streamline regulations. In 2010, the UK established the first Behavioural Insights Unit (BIT) on a trial basis, under the Cabinet Office. Other countries followed suit, including the US, Australia, Canada, Netherlands, and Germany. Shortly after, countries such as India, Indonesia, Peru, Singapore, and many others started exploring the application of behavioral insights to their policies and programs. International institutions such as the World Bank, UN agencies, OECD, and EU have also established behavioral insights units to support their programs. And just this month, the Sustainable Energy Authority of Ireland launched its own Behavioral Economics Unit.

The Future
As eMBeD, the behavioral science unit at the World Bank, continues to support governments across the globe in the implementation of their units, here are some common questions we often get asked.

What are the models for a Behavioral Insights Unit in Government?
As of today, over a dozen countries have integrated behavioral insights with their operations. While there is not one model to prescribe, the setup varies from centralized or decentralized to networked….

In some countries, the units were first established at the ministerial level. One example is MineduLab in Peru, which was set up with eMBeD’s help. The unit works as an innovation lab, testing rigorous and leading research in education and behavioral science to address issues such as teacher absenteeism and motivation, parents’ engagement, and student performance….

What should be the structure of the team?
Most units start with two to four full-time staff. Profiles include policy advisors, social psychologists, experimental economists, and behavioral scientists. Experience in the public sector is essential to navigate the government and build support. It is also important to have staff familiar with designing and running experiments. Other important skills include psychology, social psychology, anthropology, design thinking, and marketing. While these skills are not always readily available in the public sector, it is important to note that all behavioral insights units partnered with academics and experts in the field.

The U.S. team, originally called the Social and Behavioral Sciences Team, is staffed mostly by seconded academic faculty, researchers, and other departmental staff. MineduLab in Peru partnered with leading experts, including the Abdul Latif Jameel Poverty Action Lab (J-PAL), Fortalecimiento de la Gestión de la Educación (FORGE), Innovations for Poverty Action (IPA), and the World Bank….(More)”

A Brief History of Living Labs: From Scattered Initiatives to Global Movement


Paper by Seppo Leminen, Veli-Pekka Niitamo, and Mika Westerlund presented at the Open Living Labs Day Conference: “This paper analyses the emergence of living labs based on a literature review and interviews with early living labs experts. Our study makes a contribution to the growing literature of living labs by analysing the emergence of living labs from the perspectives of (i) early living lab pioneers, (ii) early living lab activities in Europe and especially Nokia Corporation, (iii) framework programs of the European Union supporting the development of living labs, (iv) emergence of national living lab networks, and (v) emergence of the European Network of Living Labs (ENoLL). Moreover, the paper highlights major events in the emergence of living lab movement and labels three consecutive phases of the global living lab movement as (i) toward a new paradigm, (ii) practical experiences, and (iii) professional living labs….(More)”.

Open Space: The Global Effort for Open Access to Environmental Satellite Data


Book by Mariel Borowitz: “Key to understanding and addressing climate change is continuous and precise monitoring of environmental conditions. Satellites play an important role in collecting climate data, offering comprehensive global coverage that can’t be matched by in situ observation. And yet, as Mariel Borowitz shows in this book, much satellite data is not freely available but restricted; this remains true despite the data-sharing advocacy of international organizations and a global open data movement. Borowitz examines policies governing the sharing of environmental satellite data, offering a model of data-sharing policy development and applying it in case studies from the United States, Europe, and Japan—countries responsible for nearly half of the unclassified government Earth observation satellites.

Borowitz develops a model that centers on the government agency as the primary actor while taking into account the roles of such outside actors as other government officials and non-governmental actors, as well as the economic, security, and normative attributes of the data itself. The case studies include the U.S. National Aeronautics and Space Administration (NASA) and the U.S. National Oceanographic and Atmospheric Association (NOAA), and the United States Geological Survey (USGS); the European Space Agency (ESA) and the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT); and the Japanese Aerospace Exploration Agency (JAXA) and the Japanese Meteorological Agency (JMA). Finally, she considers the policy implications of her findings for the future and provides recommendations on how to increase global sharing of satellite data….(More)”.

Priceless? A new framework for estimating the cost of open government reforms


New paper by Praneetha Vissapragada and Naomi Joswiak: “The Open Government Costing initiative, seeded with funding from the World Bank, was undertaken to develop a practical and actionable approach to pinpointing the full economic costs of various open government programs. The methodology developed through this initiative represents an important step towards conducting more sophisticated cost-benefit analyses – and ultimately understanding the true value – of open government reforms intended to increase citizen engagement, promote transparency and accountability, and combat corruption, insights that have been sorely lacking in the open government community to date. The Open Government Costing Framework and Methods section (Section 2 of this report) outlines the critical components needed to conduct cost analysis of open government programs, with the ultimate objective of putting a price tag on key open government reform programs in various countries at a particular point in time. This framework introduces a costing process that employs six essential steps for conducting a cost study, including (1) defining the scope of the program, (2) identifying types of costs to assess, (3) developing a framework for costing, (4) identifying key components, (5) conducting data collection and (6) conducting data analysis. While the costing methods are built on related approaches used for analysis in other sectors such as health and nutrition, this framework and methodology was specifically adapted for open government programs and thus addresses the unique challenges associated with these types of initiatives. Using the methods outlined in this document, we conducted a cost analysis of two case studies: (1) ProZorro, an e-procurement program in Ukraine; and (2) Sierra Leone’s Open Data Program….(More)”

Crowdsourced Morality Could Determine the Ethics of Artificial Intelligence


Dom Galeon in Futurism: “As artificial intelligence (AI) development progresses, experts have begun considering how best to give an AI system an ethical or moral backbone. A popular idea is to teach AI to behave ethically by learning from decisions made by the average person.

To test this assumption, researchers from MIT created the Moral Machine. Visitors to the website were asked to make choices regarding what an autonomous vehicle should do when faced with rather gruesome scenarios. For example, if a driverless car was being forced toward pedestrians, should it run over three adults to spare two children? Save a pregnant woman at the expense of an elderly man?

The Moral Machine was able to collect a huge swath of this data from random people, so Ariel Procaccia from Carnegie Mellon University’s computer science department decided to put that data to work.

In a new study published online, he and Iyad Rahwan — one of the researchers behind the Moral Machine — taught an AI using the Moral Machine’s dataset. Then, they asked the system to predict how humans would want a self-driving car to react in similar but previously untested scenarios….

This idea of having to choose between two morally problematic outcomes isn’t new. Ethicists even have a name for it: the double-effect. However, having to apply the concept to an artificially intelligent system is something humankind has never had to do before, and numerous experts have shared their opinions on how best to go about it.

OpenAI co-chairman Elon Musk believes that creating an ethical AI is a matter of coming up with clear guidelines or policies to govern development, and governments and institutions are slowly heeding Musk’s call. Germany, for example, crafted the world’s first ethical guidelines for self-driving cars. Meanwhile, Google parent company Alphabet’s AI DeepMind now has an ethics and society unit.

Other experts, including a team of researchers from Duke University, think that the best way to move forward is to create a “general framework” that describes how AI will make ethical decisions….(More)”.

TfL’s free open data boosts London’s economy


Press Release by Transport for London: “Research by Deloitte shows that the release of open data by TfL is generating annual economic benefits and savings of up to £130m a year…

TfL has worked with a wide range of professional and amateur developers, ranging from start-ups to global innovators, to deliver new products in the form that customers want. This has led to more than 600 apps now being powered specifically using TfL’s open data feeds, used by 42 per cent of Londoners.

The report found that TfL’s data provides the following benefits:

  • Saved time for passengers. TfL’s open data allows customers to plan journeys more accurately using apps with real-time information and advice on how to adjust their routes. This provides greater certainty on when the next bus/Tube will arrive and saves time – estimated at between £70m and £90m per year.
  • Better information to plan journeys, travel more easily and take more journeys. Customers can use apps to better plan journeys, enabling them to use TfL services more regularly and access other services. Conservatively, the value of these journeys is estimated at up to £20m per year.
  • Creating commercial opportunities for third party developers. A wide range of companies now use TfL’s open data commercially to help generate revenue, many of whom are based in London. Having free and up-to-date access to this data increases the ‘Gross Value Add’ (analogous to GDP) that these companies contribute to the London economy, both directly and across the supply chain and wider economy, of between £12m and £15m per year.
  • Leveraging value and savings from partnerships with major customer facing technology platform owners. TfL receives back significant data on areas it does not itself collect data (e.g. crowdsourced traffic data). This allows TfL to get an even better understanding of journeys in London and improve its operations….(More).

How online citizenship is unsettling rights and identities


James Bridle at Open Democracy: “Historically, and for those lucky enough to be born under the aegis of stable governments and national regimes, there have been two ways in which citizenship is acquired at birth. Jus soli – the right of soil – confers citizenship upon those born within the territory of a state regardless of their parentage. This right is common in the Americas, but less so elsewhere (and, since 2004, is to be found nowhere in Europe). More frequently, Jus sanguinis – the right of blood – determines a person’s citizenship based on the rights held by their parents. One might be denied citizenship in the place of one’s birth, but obtain it elsewhere….

One of the places we see traditional notions of the nation state and its methods of organisation and control – particularly the assignation of citizenship – coming under greatest stress is online, in the apparently borderless expanses of the internet, where information and data flow almost without restriction across the boundaries between states. And as our rights and protections are increasingly assigned not to our corporeal bodies but to our digital selves – the accumulations of information which stand as proxies for us in our relationships to states, banks, and corporations – so new forms of citizenship arise at these transnational digital junctions.

Jus algoritmi is a term coined by John Cheney-Lippold to describe a new form of citizenship which is produced by the surveillance state, whose primary mode of operation, like other state forms before it, is control through identification and categorisation. Jus algoritmi – the right of the algorithm – refers to the increasing use of software to make judgements about an individual’s citizenship status, and thus to decide what rights they have, and what operations upon their person are permitted….(More)”.

Blockchain Could Help Us Reclaim Control of Our Personal Data


Michael Mainelli at Harvard Business Review: “…numerous smaller countries, such as Singapore, are exploring national identity systems that span government and the private sector. One of the more successful stories of governments instituting an identity system is Estonia, with its ID-kaarts. Reacting to cyber-attacks against the nation, the Estonian government decided that it needed to become more digital, and even more secure. They decided to use a distributed ledger to build their system, rather than a traditional central database. Distributed ledgers are used in situations where multiple parties need to share authoritative information with each other without a central third party, such as for data-logging clinical assessments or storing data from commercial deals. These are multi-organization databases with a super audit trail. As a result, the Estonian system provides its citizens with an all-digital government experience, significantly reduced bureaucracy, and significantly high citizen satisfaction with their government dealings.

Cryptocurrencies such as Bitcoin have increased the awareness of distributed ledgers with their use of a particular type of ledger — blockchain — to hold the details of coin accounts among millions of users. Cryptocurrencies have certainly had their own problems with their wallets and exchanges — even ID-kaarts are not without their technical problems — but the distributed ledger technology holds firm for Estonia and for cryptocurrencies. These technologies have been working in hostile environments now for nearly a decade.

The problem with a central database like the ones used to house social security numbers, or credit reports, is that once it’s compromised, a thief has the ability to copy all of the information stored there. Hence the huge numbers of people that can be affected — more than 140 million people in the Equifax breach, and more than 50 million at Home Depot — though perhaps Yahoo takes the cake with more than three billion alleged customer accounts hacked.  Of course, if you can find a distributed ledger online, you can copy it, too. However, a distributed ledger, while available to everyone, may be unreadable if its contents are encrypted. Bitcoin’s blockchain is readable to all, though you can encrypt things in comments. Most distributed ledgers outside cryptocurrencies are encrypted in whole or in part. The effect is that while you can have a copy of the database, you can’t actually read it.

This characteristic of encrypted distributed ledgers has big implications for identity systems.  You can keep certified copies of identity documents, biometric test results, health data, or academic and training certificates online, available at all times, yet safe unless you give away your key. At a whole system level, the database is very secure. Each single ledger entry among billions would need to be found and then individually “cracked” at great expense in time and computing, making the database as a whole very safe.

Distributed ledgers seem ideal for private distributed identity systems, and many organizations are working to provide such systems to help people manage the huge amount of paperwork modern society requires to open accounts, validate yourself, or make payments.  Taken a small step further, these systems can help you keep relevant health or qualification records at your fingertips.  Using “smart” ledgers, you can forward your documentation to people who need to see it, while keeping control of access, including whether another party can forward the information. You can even revoke someone’s access to the information in the future….(More)”.

Using Open Data to Analyze Urban Mobility from Social Networks


Paper by Caio Libânio Melo Jerônimo, Claudio E. C. Campelo, Cláudio de Souza Baptista: “The need to use online technologies that favor the understanding of city dynamics has grown, mainly due to the ease in obtaining the necessary data, which, in most cases, are gathered with no cost from social networks services. With such facility, the acquisition of georeferenced data has become easier, favoring the interest and feasibility in studying human mobility patterns, bringing new challenges for knowledge discovery in GIScience. This favorable scenario also encourages governments to make their data available for public access, increasing the possibilities for data scientist to analyze such data. This article presents an approach to extracting mobility metrics from Twitter messages and to analyzing their correlation with social, economic and demographic open data. The proposed model was evaluated using a dataset of georeferenced Twitter messages and a set of social indicators, both related to Greater London. The results revealed that social indicators related to employment conditions present higher correlation with the mobility metrics than any other social indicators investigated, suggesting that these social variables may be more relevant for studying mobility behaviors….(More)”.