Seeing Theory


Seeing Theory is a project designed and created by Daniel Kunin with support from Brown University’s Royce Fellowship Program. The goal of the project is to make statistics more accessible to a wider range of students through interactive visualizations.

Statistics is quickly becoming the most important and multi-disciplinary field of mathematics. According to the American Statistical Association, “statistician” is one of the top ten fastest-growing occupations and statistics is one of the fastest-growing bachelor degrees. Statistical literacy is essential to our data driven society. Yet, for all the increased importance and demand for statistical competence, the pedagogical approaches in statistics have barely changed. Using Mike Bostock’s data visualization software, D3.js, Seeing Theory visualizes the fundamental concepts covered in an introductory college statistics or Advanced Placement statistics class. Students are encouraged to use Seeing Theory as an additional resource to their textbook, professor and peers….(More)”

Tactical Data Engagement guide


and  at the Sunlight Foundation: “United States cities face a critical challenge when it comes to fulfilling the potential of open data: that of moving beyond the mere provision of access to data toward the active facilitation of stakeholder use of data in ways that bring about community impact. Sunlight has been researching innovative projects and strategies that have helped cities tackle this challenge head on. Today we’re excited to share a guide for our new approach to open data in U.S. cities–an approach we’re calling “Tactical Data Engagement,” designed to drive community impact by connecting the dots between open data, public stakeholders, and collaborative action.

Access is critical but we have more work to do

Many city leaders have realized that open data is a valuable innovation to bring to city hall, and have invoked the promise of a new kind of relationship between government and the people: one where government works with the public in new collaborative ways. City mayors, managers, council members, and other leaders are making commitments to this idea in the US, with over 60 US cities having adopted open data reforms since 2006, nearly 20 in 2016 alone–many with the help of the Sunlight team as part of our support of the What Works Cities initiative. While cities are building the public policy infrastructure for open data, they are also making technical advancements as municipal IT and innovation departments build or procure new open data portals and release more and more government datasets proactively online….

However, … these developments alone are not enough. Portals and policies are critical infrastructure for the data-driven open government needed in the 21st century; but there has been and continues to be a disconnect between the rhetoric and promise of open data when compared to what it has meant in terms of practical reform. Let us be clear: the promise of open data is not about data on a website. The promise is for a new kind of relationship between government and the governed, one that brings about collaborative opportunities for impact. While many reforms have been successful in building an infrastructure of access, many have fallen short in leveraging that infrastructure for empowering residents and driving community change.

Announcing Tactical Data Engagement

In order to formulate an approach to help cities go further with their open data programs, Sunlight has been conducting an extensive review of the relevant literature on open data impact, and of the literature on approaches to community stakeholder engagement and co-creation (both civic-tech or open-data driven as well as more traditional)….

The result so far is our “Tactical Data Engagement” Guide (still in beta) designed to address what we see as the the most critical challenge currently facing the open data movement: helping city open data programs build on a new infrastructure of access to facilitate the collaborative use of open data to empower residents and create tangible community impact…(More)”

Open Data Privacy Playbook


A data privacy playbook by Ben Green, Gabe Cunningham, Ariel Ekblaw, Paul Kominers, Andrew Linzer, and Susan Crawford: “Cities today collect and store a wide range of data that may contain sensitive or identifiable information about residents. As cities embrace open data initiatives, more of this information is available to the public. While releasing data has many important benefits, sharing data comes with inherent risks to individual privacy: released data can reveal information about individuals that would otherwise not be public knowledge. In recent years, open data such as taxi trips, voter registration files, and police records have revealed information that many believe should not be released.

Effective data governance is a prerequisite for successful open data programs. The goal of this document is to codify responsible privacy-protective approaches and processes that could be adopted by cities and other government organizations that are publicly releasing data. Our report is organized around four recommendations:

  • Conduct risk-benefit analyses to inform the design and implementation of open data programs.
  • Consider privacy at each stage of the data lifecycle: collect, maintain, release, delete.
  • Develop operational structures and processes that codify privacy management widely throughout the City.
  • Emphasize public engagement and public priorities as essential aspects of data management programs.

Each chapter of this report is dedicated to one of these four recommendations, and provides fundamental context along with specific suggestions to carry them out. In particular, we provide case studies of best practices from numerous cities and a set of forms and tactics for cities to implement our recommendations. The Appendix synthesizes key elements of the report into an Open Data Privacy Toolkit that cities can use to manage privacy when releasing data….(More)”

Connecting the dots: Building the case for open data to fight corruption


Web Foundation: “This research, published with Transparency International, measures the progress made by 5 key countries in implementing the G20 Anti-Corruption Open Data Principles.

These principles, adopted by G20 countries in 2015, committed countries to increasing and improving the publication of public information, driving forward open data as a tool in anti-corruption efforts.

However, this research – looking at Brazil, France, Germany, Indonesia and South Africa – finds a disappointing lack of progress. No country studied has released all the datasets identified as being key to anti-corruption and much of the information is hard to find and hard use.

Key findings:

  • No country released all anti-corruption datasets
  • Quality issues means data is often not useful or useable
  • Much of the data is not published in line with open data standards, making comparability difficult
  • In many countries there is a lack of open data skills among officials in charge of anti-corruption initiatives

Download the overview report here (PDF), and access the individual country case studies BrazilFranceGermanyIndonesia and South Africa… (More)”

Data Disrupts Corruption


Carlos Santiso & Ben Roseth at Stanford Social Innovation Review: “…The Panama Papers scandal demonstrates the power of data analytics to uncover corruption in a world flooded with terabytes needing only the computing capacity to make sense of it all. The Rousse impeachment illustrates how open data can be used to bring leaders to account. Together, these stories show how data, both “big” and “open,” is driving the fight against corruption with fast-paced, evidence-driven, crowd-sourced efforts. Open data can put vast quantities of information into the hands of countless watchdogs and whistleblowers. Big data can turn that information into insight, making corruption easier to identify, trace, and predict. To realize the movement’s full potential, technologists, activists, officials, and citizens must redouble their efforts to integrate data analytics into policy making and government institutions….

Making big data open cannot, in itself, drive anticorruption efforts. “Without analytics,” a 2014 White House report on big data and individual privacy underscored, “big datasets could be stored, and they could be retrieved, wholly or selectively. But what comes out would be exactly what went in.”

In this context, it is useful to distinguish the four main stages of data analytics to illustrate its potential in the global fight against corruption: Descriptive analytics uses data to describe what has happened in analyzing complex policy issues; diagnostic analytics goes a step further by mining and triangulating data to explain why a specific policy problem has happened, identify its root causes, and decipher underlying structural trends; predictive analytics uses data and algorithms to predict what is most likely to occur, by utilizing machine learning; and prescriptive analytics proposes what should be done to cause or prevent something from happening….

Despite the big data movement’s promise for fighting corruption, many challenges remain. The smart use of open and big data should focus not only on uncovering corruption, but also on better understanding its underlying causes and preventing its recurrence. Anticorruption analytics cannot exist in a vacuum; it must fit in a strategic institutional framework that starts with quality information and leads to reform. Even the most sophisticated technologies and data innovations cannot prevent what French novelist Théophile Gautier described as the “inexplicable attraction of corruption, even amongst the most honest souls.” Unless it is harnessed for improvements in governance and institutions, data analytics will not have the impact that it could, nor be sustainable in the long run…(More)”.

Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

Data in public health


Jeremy Berg in Science: “In 1854, physician John Snow helped curtail a cholera outbreak in a London neighborhood by mapping cases and identifying a central public water pump as the potential source. This event is considered by many to represent the founding of modern epidemiology. Data and analysis play an increasingly important role in public health today. This can be illustrated by examining the rise in the prevalence of autism spectrum disorders (ASDs), where data from varied sources highlight potential factors while ruling out others, such as childhood vaccines, facilitating wise policy choices…. A collaboration between the research community, a patient advocacy group, and a technology company (www.mss.ng) seeks to sequence the genomes of 10,000 well-phenotyped individuals from families affected by ASD, making the data freely available to researchers. Studies to date have confirmed that the genetics of autism are extremely complicated—a small number of genomic variations are closely associated with ASD, but many other variations have much lower predictive power. More than half of siblings, each of whom has ASD, have different ASD-associated variations. Future studies, facilitated by an open data approach, will no doubt help advance our understanding of this complex disorder….

A new data collection strategy was reported in 2013 to examine contagious diseases across the United States, including the impact of vaccines. Researchers digitized all available city and state notifiable disease data from 1888 to 2011, mostly from hard-copy sources. Information corresponding to nearly 88 million cases has been stored in a database that is open to interested parties without restriction (www.tycho.pitt.edu). Analyses of these data revealed that vaccine development and systematic vaccination programs have led to dramatic reductions in the number of cases. Overall, it is estimated that ∼100 million cases of serious childhood diseases have been prevented through these vaccination programs.

These examples illustrate how data collection and sharing through publication and other innovative means can drive research progress on major public health challenges. Such evidence, particularly on large populations, can help researchers and policy-makers move beyond anecdotes—which can be personally compelling, but often misleading—for the good of individuals and society….(More)”

DataRefuge


DataRefuge is a public, collaborative project designed to address the following concerns about federal climate and environmental data:

  • What are the best ways to safeguard data?
  • How do federal agencies play crucial roles in data collection, management, and distribution?
  • How do government priorities impact data’s accessibility?
  • Which projects and research fields depend on federal data?
  • Which data sets are of value to research and local communities, and why?

DataRefuge is also an initiative committed to identifying, assessing, prioritizing, securing, and distributing reliable copies of federal climate and environmental data so that it remains available to researchers. Data collected as part of the #DataRefuge initiative will be stored in multiple, trusted locations to help ensure continued accessibility.

DataRefuge acknowledges–and in fact draws attention to–the fact that there are no guarantees of perfectly safe information. But there are ways that we can create safe and trustworthy copies. DataRefuge is thus also a project to develop the best methods, practices, and protocols to do so.

DataRefuge depends on local communities. We welcome new collaborators who want to organize DataRescue Events or build DataRefuge in other ways.

There are many ways to be involved with building DataRefuge. They’re not mutually exclusive!…(More)”

Mapping open data governance models: Who makes decisions about government data and how?


Ana Brandusescu, Danny Lämmerhirt and Stefaan Verhulst call for a systematic and comparative investigation of the different governance models for open data policy and publication….

“An important value proposition behind open data involves increased transparency and accountability of governance. Yet little is known about how open data itself is governed. Who decides and how? How accountable are data holders to both the demand side and policy makers? How do data producers and actors assure the quality of government data? Who, if any, are data stewards within government tasked to make its data open?

Getting a better understanding of open data governance is not only important from an accountability point of view. If there is a better insight of the diversity of decision-making models and structures across countries, the implementation of common open data principles, such as those advocated by the International Open Data Charter, can be accelerated across countries.

In what follows, we seek to develop the initial contours of a research agenda on open data governance models. We start from the premise that different countries have different models to govern and administer their activities – in short, different ‘governance models’. Some countries are more devolved in their decision making, while others seek to organize “public administration” activities more centrally. These governance models clearly impact how open data is governed – providing a broad patchwork of different open data governance across the world and making it difficult to identify who the open data decision makers and data gatekeepers or stewards are within a given country.

For example, if one wants to accelerate the opening up of education data across borders, in some countries this may fall under the authority of sub-national government (such as states, provinces, territories or even cities), while in other countries education is governed by central government or implemented through public-private partnership arrangements. Similarly, transportation or water data may be privatised, while in other cases it may be the responsibility of municipal or regional government. Responsibilities are therefore often distributed across administrative levels and agencies affecting how (open) government data is produced, and published….(More)”

The chaos of South Africa’s taxi system is being tackled with open data


Lynsey Chutel at Quartz: “On any given day in South Africa’s cities the daily commute can be chaotic and unpredictable. A new open source data platform hopes to bring some order to that—or at least help others get it right.

Contributing to that chaos is a formal public transportation system that is inadequate for a growing urban population and an informal transportation network that whizzes through the streets unregulated. Where Is My Transport has done something unique by finally bringing these two systems together on one map.

Where Is My Transport has mapped Cape Town’s transport systems to create an integrated system, incorporating train, bus and minibus taxi routes. This last one is especially difficult, because the thousands of minibuses that ferry most South Africans are notoriously difficult to pin down.

Minibus taxis seat about 15 people and turn any corner into a bus stop, often halting traffic. They travel within neighborhoods and across the country and are the most affordable means of transport for the majority of South Africans. But they are also often unsafe vehicles, at times involved in horrific road accidents.

Devin De Vries, one of the platform’s co-founders, says he was inspired by the Digital Matatus project in Nairobi. The South African platform differs, however, in that it provides open source information for others who think they may have a solution to South Africa’s troubled public transportation system.

“Transport is a complex ecosystem, and we don’t think any one company will solve it, De Vries told Quartz. “That’s why we made our platform open and hope that many endpoints—apps, websites, et cetera—will draw on the data so people can access it.”

This could lead to trip planning apps like Moovit or Transit for African commuters, or help cities better map their public transportation system, De Vries hopes…(More)”