Index: Collective Intelligence


By Hannah Pierce and Audrie Pirkl

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on collective intelligence and was originally published in 2017.

The Collective Intelligence Universe

  • Amount of money that Reykjavik’s Better Neighbourhoods program has provided each year to crowdsourced citizen projects since 2012: € 2 million (Citizens Foundation)
  • Number of U.S. government challenges that people are currently participating in to submit their community solutions: 778 (Challenge.gov).
  • Percent of U.S. arts organizations used social media to crowdsource ideas in 2013, from programming decisions to seminar scheduling details: 52% (Pew Research)
  • Number of Wikipedia members who have contributed to a page in the last 30 days: over 120,000 (Wikipedia Page Statistics)
  • Number of languages that the multinational crowdsourced Letters for Black Lives has been translated into: 23 (Letters for Black Lives)
  • Number of comments in a Reddit thread that established a more comprehensive timeline of the theater shooting in Aurora than the media: 1272 (Reddit)
  • Number of physicians that are members of SERMO, a platform to crowdsource medical research: 800,000 (SERMO)
  • Number of citizen scientist projects registered on SciStarter: over 1,500 (Collective Intelligence 2017 Plenary Talk: Darlene Cavalier)
  • Entrants to NASA’s 2009 TopCoder Challenge: over 1,800 (NASA)

Infrastructure

  • Number of submissions for Block Holm (a digital platform that allows citizens to build “Minecraft” ideas on vacant city lots) within the first six months: over 10,000 (OpenLearn)
  • Number of people engaged to The Participatory Budgeting Project in the U.S.: over 300,000. (Participatory Budgeting Project)
  • Amount of money allocated to community projects through this initiative: $238,000,000

Health

  • Percentage of Internet-using adults with chronic health conditions that have gone online within the US to connect with others suffering from similar conditions: 23% (Pew Research)
  • Number of posts to Patient Opinion, a UK based platform for patients to provide anonymous feedback to healthcare providers: over 120,000 (Nesta)
    • Percent of NHS health trusts utilizing the posts to improve services in 2015: 90%
    • Stories posted per month: nearly 1,000 (The Guardian)
  • Number of tumors reported to the English National Cancer Registration each year: over 300,000 (Gov.UK)
  • Number of users of an open source artificial pancreas system: 310 (Collective Intelligence 2017 Plenary Talk: Dana Lewis)

Government

  • Number of submissions from 40 countries to the 2017 Open (Government) Contracting Innovation Challenge: 88 (The Open Data Institute)
  • Public-service complaints received each day via Indonesian digital platform Lapor!: over 500 (McKinsey & Company)
  • Number of registered users of Unicef Uganda’s weekly, SMS poll U-Report: 356,468 (U-Report)
  • Number of reports regarding government corruption in India submitted to IPaidaBribe since 2011: over 140,000 (IPaidaBribe)

Business

  • Reviews posted since Yelp’s creation in 2009: 121 million reviews (Statista)
  • Percent of Americans in 2016 who trust online customer reviews as much as personal recommendations: 84% (BrightLocal)
  • Number of companies and their subsidiaries mapped through the OpenCorporates platform: 60 million (Omidyar Network)

Crisis Response

Public Safety

  • Number of sexual harassment reports submitted to from 50 cities in India and Nepal to SafeCity, a crowdsourcing site and mobile app: over 4,000 (SafeCity)
  • Number of people that used Facebook’s Safety Check, a feature that is being used in a new disaster mapping project, in the first 24 hours after the terror attacks in Paris: 4.1 million (Facebook)

Rawification and the careful generation of open government data


 and  in Social Studies of Science: “Drawing on a two-year ethnographic study within several French administrations involved in open data programs, this article aims to investigate the conditions of the release of government data – the rawness of which open data policies require. This article describes two sets of phenomena. First, far from being taken for granted, open data emerge in administrations through a progressive process that entails uncertain collective inquiries and extraction work. Second, the opening process draws on a series of transformations, as data are modified to satisfy an important criterion of open data policies: the need for both human and technical intelligibility. There are organizational consequences of these two points, which can notably lead to the visibilization or the invisibilization of data labour. Finally, the article invites us to reconsider the apparent contradiction between the process of data release and the existence of raw data. Echoing the vocabulary of one of the interviewees, the multiple operations can be seen as a ‘rawification’ process by which open government data are carefully generated. Such a notion notably helps to build a relational model of what counts as data and what counts as work….(More)”.

Public Data Is More Important Than Ever–And Now It’s Easier To Find


Meg Miller at Co.Design: “Public data, in theory, is meant to be accessible to everyone. But in practice, even finding it can be near impossible, to say nothing of figuring out what to do with it once you do. Government data websites are often clunky and outdated, and some data is still trapped on physical media–like CDs or individual hard drives.

Tens of thousands of these CDs and hard drives, full of data on topics from Arkansas amusement parks to fire incident reporting, have arrived at the doorstep of the New York-based start-up Enigma over the past four years. The company has obtained thousands upon thousands more datasets by way of Freedom of Information Act (FOIA) requests. Enigma specializes in open data: gathering it, curating it, and analyzing it for insights into a client’s industry, for example, or for public service initiatives.

Enigma also shares its 100,000 datasets with the world through an online platform called Public—the broadest collection of public data that is open and searchable by everyone. Public has been around since Enigma launched in 2013, but today the company is introducing a redesigned version of the site that’s fresher and more user-friendly, with easier navigation and additional features that allow users to drill further down into the data.

But while the first iteration of Public was mostly concerned with making Enigma’s enormous trove of data—which it was already gathering and reformating for client work—accessible to the public, the new site focuses more on linking that data in new ways. For journalists, researchers, and data scientists, the tool will offer more sophisticated ways of making sense of the data that they have access to through Enigma….

…the new homepage also curates featured datasets and collections to enforce a sense of discoverability. For example, an Enigma-curated collection of U.S. sanctions data from the U.S. Treasury Department’s Office of Foreign Assets Control (OFAC) shows data on the restrictions on entities or individuals that American companies can and can’t do business with in an effort to achieve specific national security or foreign policy objectives. A new round of sanctions against Russia have been in the news lately as an effort by President Trump to loosen restrictions on blacklisted businesses and individuals in Russia was overruled by the Senate last week. Enigma’s curated data selection on U.S. sanctions could help journalists contextualize recent events with data that shows changes in sanctions lists over time by presidential administration, for instance–or they could compare the U.S. sanctions list to the European Union’s….(More).

Regulation of Big Data: Perspectives on Strategy, Policy, Law and Privacy


Paper by Pompeu CasanovasLouis de KokerDanuta Mendelson and David Watts: “…presents four complementary perspectives stemming from governance, law, ethics, and computer science. Big, Linked, and Open Data constitute complex phenomena whose economic and political dimensions require a plurality of instruments to enhance and protect citizens’ rights. Some conclusions are offered in the end to foster a more general discussion.

This article contends that the effective regulation of Big Data requires a combination of legal tools and other instruments of a semantic and algorithmic nature. It commences with a brief discussion of the concept of Big Data and views expressed by Australian and UK participants in a study of Big Data use in a law enforcement and national security perspective. The second part of the article highlights the UN’s Special Rapporteur on the Right to Privacy interest in the themes and the focus of their new program on Big Data. UK law reforms regarding authorisation of warrants for the exercise of bulk data powers is discussed in the third part. Reflecting on these developments, the paper closes with an exploration of the complex relationship between law and Big Data and the implications for regulation and governance of Big Data….(More)”.

Open Data’s Effect on Food Security


Jeremy de Beer, Jeremiah Baarbé, and Sarah Thuswaldner at Open AIR: “Agricultural data is a vital resource in the effort to address food insecurity. This data is used across the food-production chain. For example, farmers rely on agricultural data to decide when to plant crops, scientists use data to conduct research on pests and design disease resistant plants, and governments make policy based on land use data. As the value of agricultural data is understood, there is a growing call for governments and firms to open their agricultural data.

Open data is data that anyone can access, use, or share. Open agricultural data has the potential to address food insecurity by making it easier for farmers and other stakeholders to access and use the data they need. Open data also builds trust and fosters collaboration among stakeholders that can lead to new discoveries to address the problems of feeding a growing population.

 

A network of partnerships is growing around agricultural data research. The Open African Innovation Research (Open AIR) network is researching open agricultural data in partnership with the Plant Phenotyping and Imaging Research Centre (P2IRC) and the Global Institute for Food Security (GIFS). This research builds on a partnership with the Global Open Data for Agriculture and Nutrition (GODAN) and they are exploring partnerships with Open Data for Development (OD4D) and other open data organizations.

…published two works on open agricultural data. Published in partnership with GODAN, “Ownership of Open Data” describes how intellectual property law defines ownership rights in data. Firms that collect data own the rights to data, which is a major factor in the power dynamics of open data. In July, Jeremiah Baarbé and Jeremy de Beer will be presenting “A Data Commons for Food Security” …The paper proposes a licensing model that allows farmers to benefit from the datasets to which they contribute. The license supports SME data collectors, who need sophisticated legal tools; contributors, who need engagement, privacy, control, and benefit sharing; and consumers who need open access….(More)“.

The final Global Open Data Index is now live


Open Knowledge International: “The updated Global Open Data Index has been published today, along with our report on the state of Open Data this year. The report includes a broad overview of the problems we found around data publication and how we can improve government open data. You can download the full report here.

Also, after the Public Dialogue phase, we have updated the Index. You can see the updated edition here

We will also keep our forum open for discussions about open data quality and publication. You can see the conversation here.”

Inside the Algorithm That Tries to Predict Gun Violence in Chicago


Gun violence in Chicago has surged since late 2015, and much of the news media attention on how the city plans to address this problem has focused on the Strategic Subject List, or S.S.L.

The list is made by an algorithm that tries to predict who is most likely to be involved in a shooting, either as perpetrator or victim. The algorithm is not public, but the city has now placed a version of the list — without names — online through its open data portal, making it possible for the first time to see how Chicago evaluates risk.

We analyzed that information and found that the assigned risk scores — and what characteristics go into them — are sometimes at odds with the Chicago Police Department’s public statements and cut against some common perceptions.

■ Violence in the city is less concentrated at the top — among a group of about 1,400 people with the highest risk scores — than some public comments from the Chicago police have suggested.

■ Gangs are often blamed for the devastating increase in gun violence in Chicago, but gang membership had a small predictive effect and is being dropped from the most recent version of the algorithm.

■ Being a victim of a shooting or an assault is far more predictive of future gun violence than being arrested on charges of domestic violence or weapons possession.

■ The algorithm has been used in Chicago for several years, and its effectiveness is far from clear. Chicago accounted for a large share of the increase in urban murders last year….(More)”.

Our path to better science in less time using open data science tools


Julia S. Stewart Lowndes et al in Nature: “Reproducibility has long been a tenet of science but has been challenging to achieve—we learned this the hard way when our old approaches proved inadequate to efficiently reproduce our own work. Here we describe how several free software tools have fundamentally upgraded our approach to collaborative research, making our entire workflow more transparent and streamlined. By describing specific tools and how we incrementally began using them for the Ocean Health Index project, we hope to encourage others in the scientific community to do the same—so we can all produce better science in less time.

Figure 1: Better science in less time, illustrated by the Ocean Health Index project.
Figure 1

Every year since 2012 we have repeated Ocean Health Index (OHI) methods to track change in global ocean health36,37. Increased reproducibility and collaboration has reduced the amount of time required to repeat methods (size of bubbles) with updated data annually, allowing us to focus on improving methods each year (text labels show the biggest innovations). The original assessment in 2012 focused solely on scientific methods (for example, obtaining and analysing data, developing models, calculating, and presenting results; dark shading). In 2013, by necessity we gave more focus to data science (for example, data organization and wrangling, coding, versioning, and documentation; light shading), using open data science tools. We established R as the main language for all data preparation and modelling (using RStudio), which drastically decreased the time involved to complete the assessment. In 2014, we adopted Git and GitHub for version control, project management, and collaboration. This further decreased the time required to repeat the assessment. We also created the OHI Toolbox, which includes our R package ohicore for core analytical operations used in all OHI assessments. In subsequent years we have continued (and plan to continue) this trajectory towards better science in less time by improving code with principles of tidy data33; standardizing file and data structure; and focusing more on communication, in part by creating websites with the same open data science tools and workflow. See text and Table 1 for more details….(More)”

Open Data Barometer 2016


Open Data Barometer: “Produced by the World Wide Web Foundation as a collaborative work of the Open Data for Development (OD4D) network and with the support of the Omidyar Network, the Open Data Barometer (ODB) aims to uncover the true prevalence and impact of open data initiatives around the world. It analyses global trends, and provides comparative data on countries and regions using an in-depth methodology that combines contextual data, technical assessments and secondary indicators.

Covering 115 jurisdictions in the fourth edition, the Barometer ranks governments on:

  • Readiness for open data initiatives.
  • Implementation of open data programmes.
  • Impact that open data is having on business, politics and civil society.

After three successful editions, the fourth marks another step towards becoming a global policymaking tool with a participatory and inclusive process and a strong regional focus. This year’s Barometer includes an assessment of government performance in fulfilling the Open Data Charter principles.

The Barometer is a truly global and collaborative effort, with input from more than 100 researchers and government representatives. It takes over six months and more than 10,000 hours of research work to compile. During this process, we address more than 20,000 questions and respond to more than 5,000 comments and suggestions.

The ODB global report is a summary of some of the most striking findings. The full data and methodology is available, and is intended to support secondary research and inform better decisions for the progression of open data policies and practices across the world…(More)”.

Using Open Data to Combat Corruption


Robert Palmer at Open Data Charter: “…today we’re launching the Open Up Guide: Using Open Data to Combat Corruption. We think that with the right conditions in place, greater transparency can lead to more accountability, less corruption and better outcomes for citizens. This guide builds on the work in this area already done by the G20’s anti-corruption working group, Transparency International and the Web Foundation.

Inside the guide you’ll find a number of tools including:

  • A short overview on how open data can be used to combat corruption.
  • Use cases and methodologies. A series of case studies highlighting existing and future approaches to the use of open data in the anti-corruption field.
  • 30 priority datasets and the key attributes needed so that they can talk to each other. To address corruption networks it is particularly important that connections can be established and followed across data sets, national borders and different sectors.
  • Data standards. Standards describe what should be published, and the technical details of how it should be made available. The report includes some of the relevant standards for anti-corruption work, and highlights the areas where there are currently no standards.

The guide has been developed by Transparency International-Mexico, Open Contracting Partnership and the Open Data Charter, building on input from government officials, open data experts, civil society and journalists. It’s been designed as a practical tool for governments who want to use open data to fight corruption. However, it’s still a work in progress and we want feedback on how to make it more useful. Please either comment directly on the Google Doc version of the guide, or email us at info@opendatacharter.net….View the full guide.”