Better Governing Through Data


Editorial Board of the New York Times: “Government bureaucracies, as opposed to casual friendships, are seldom in danger from too much information. That is why a new initiative by the New York City comptroller, Scott Stringer, to use copious amounts of data to save money and solve problems, makes such intuitive sense.

Called ClaimStat, it seeks to collect and analyze information on the thousands of lawsuits and claims filed each year against the city. By identifying patterns in payouts and trouble-prone agencies and neighborhoods, the program is supposed to reduce the cost of claims the way CompStat, the fabled data-tracking program pioneered by the New York Police Department, reduces crime.

There is a great deal of money to be saved: In its 2015 budget, the city has set aside $674 million to cover settlements and judgments from lawsuits brought against it. That amount is projected to grow by the 2018 fiscal year to $782 million, which Mr. Stringer notes is more than the combined budgets of the Departments of Aging and Parks and Recreation and the Public Library.

The comptroller’s office issued a report last month that applied the ClaimStat approach to a handful of city agencies: the Police Department, Parks and Recreation, Health and Hospitals Corporation, Environmental Protection and Sanitation. It notes that the Police Department generates the most litigation of any city agency: 9,500 claims were filed against it in 2013, leading to settlements and judgments of $137.2 million.

After adjusting for the crime rate, the report found that several precincts in the South Bronx and Central Brooklyn had far more claims filed against their officers than other precincts in the city. What does that mean? It’s hard to know, but the implications for policy and police discipline would seem to be a challenge that the mayor, police commissioner and precinct commanders need to figure out. The data clearly point to a problem.

Far more obvious conclusions may be reached from ClaimStat data covering issues like park maintenance and sewer overflows. The city’s tree-pruning budget was cut sharply in 2010, and injury claims from fallen tree branches soared. Multimillion-dollar settlements ensued.

The great promise of ClaimStat is making such shortsightedness blindingly obvious. And in exposing problems like persistent flooding from sewer overflows, ClaimStat can pinpoint troubled areas down to the level of city blocks. (We’re looking at you, Canarsie, and Community District 2 on Staten Island.)

Mayor Bill de Blasio’s administration has offered only mild praise for the comptroller’s excellent idea (“the mayor welcomes all ideas to make the city more effective and better able to serve its citizens”) while noting, perhaps a little defensively, that it is already on top of this, at least where the police are concerned. It has created a “Risk Assessment and Compliance Unit” within the Police Department to examine claims and make recommendations. The mayor’s aides also point out that the city’s payouts have remained flat over the last 12 years, for which they credit a smart risk-assessment strategy that knows when to settle claims and when to fight back aggressively in court.

But the aspiration of a well-run city should not be to hold claims even but to shrink them. And, at a time when anecdotes and rampant theorizing are fueling furious debates over police crime-fighting strategies, it seems beyond arguing that the more actual information, independently examined and publicly available, the better.”

Reddit, Imgur and Twitch team up as 'Derp' for social data research


in The Guardian: “Academic researchers will be granted unprecedented access to the data of major social networks including Imgur, Reddit, and Twitch as part of a joint initiative: The Digital Ecologies Research Partnership (Derp).
Derp – and yes, that really is its name – will be offering data to universities including Harvard, MIT and McGill, to promote “open, publicly accessible, and ethical academic inquiry into the vibrant social dynamics of the web”.
It came about “as a result of Imgur talking with a number of other community platforms online trying to learn about how they work with academic researchers,” says Tim Hwang, the image-sharing site’s head of special initiatives.
“In most cases, the data provided through Derp will already be accessible through public APIs,” he says. “Our belief is that there are ways of doing research better, and in a way that strongly respects user privacy and responsible use of data.
“Derp is an alliance of platforms that all believe strongly in this. In working with academic researchers, we support projects that meet institutional review at their home institution, and all research supported by Derp will be released openly and made publicly available.”
Hwang points to a Stanford paper analysing the success of Reddit’s Random Acts of Pizza subforum as an example of the sort of research Derp hopes to foster. In the research, Tim Althoff, Niloufar Salehi and Tuan Nguyen found that the likelihood of getting a free pizza from the Reddit community depended on a number of factors, including how the request was phrased, how much the user posted on the site, and how many friends they had online. In the end, they were able to predict with 67% accuracy whether or not a given request would be fulfilled.
The grouping aims to solve two problems academic research faces. Researchers themselves find it hard to get data outside of the larges social media platforms, such as Twitter and Facebook. The major services at least have a vibrant community of developers and researchers working on ways to access and use data, but for smaller communities, there’s little help provided.
Yet smaller is relative: Reddit may be a shrimp compared to Facebook, but with 115 million unique visitors every month, it’s still a sizeable community. And so Derp aims to offer “a single point of contact for researchers to get in touch with relevant team members across a range of different community sites….”

State Open Data Policies and Portals


New report by Laura Drees and Daniel Castro at the Center for Data Innovation: “This report provides a snapshot of states’ efforts to create open data policies and portals and ranks states on their progress. The six top-scoring states are Hawaii, Illinois, Maryland, New York, Oklahoma, and Utah. Each of these states has established an open data policy that requires basic government data, such as expenditure information, as well as other agency data, to be published on their open data portals in a machine-readable format. These portals contain extensive catalogs of open data, are relatively simple to navigate, and provide data in machine-readable formats as required. The next highest-ranked state, Connecticut, offers a similarly serviceable, machine-readable open data portal that provides wide varieties of information, but its policy does not requiremachine readability. Of the next three top-ranking states, Texas’s and Rhode Island’s policies require neither machine readability nor government data beyond expenditures; New Hampshire’s policy requires machine readability and many types of data, but its open data portal is not yet fully functional. States creating new open data policies or portals, or refreshing old ones, have many opportunities to learn from the experiences of early adopters in order to fully realize the benefits of data-driven innovation.”
Download printer-friendly PDF.

Reprogramming Government: A Conversation With Mikey Dickerson


Q and A by in The New York Times: “President Obama owes Mikey Dickerson two debts of gratitude. Mr. Dickerson was a crucial member of the team that, in just six weeks, fixed the HealthCare.gov website when the two-year, $400 million health insurance project failed almost as soon as it opened to the public in October.

Mr. Dickerson, 35, also oversaw the computers and wrote software for Mr. Obama’s 2012 re-election campaign, including crucial last-minute programs to figure out ad placement and plan “get out the vote” campaigns in critical areas. It was a good fit for him; since 2006, Mr. Dickerson had worked for Google on its computer systems, which have grown rapidly and are now among the world’s largest.

But last week Mr. Obama lured Mr. Dickerson away from Google. His new job at the White House will be to identify and fix other troubled government computer systems and websites. The engineer says he wants to change how citizens interact with the government as well as prevent catastrophes. He talked on Friday about his new role, in a conversation that has been condensed and edited….”

Open Data: Going Beyond Solving Problems to Making the Impossible Possible


at the Huffington Post: “As a global community, we are producing data at an astounding rate. The pace was recently described as a “new Google every four days” by the highly respected Andreesen Horowitz partner, Peter Levine, in a thought-provoking post addressing the challenge of making sense of this mountain of data.

“… we are now collecting more data each day, so much that 90 percent of the data in the world today has been created in the last two years alone. In fact, every day, we create 2.5 quintillion bytes of data — by some estimates that’s one new Google every four days, and the rate is only increasing. Our desire to use, interact, and learn from this data will become increasingly important and strategic to businesses and society as a whole.”

For the past year and a half, my cofounders and team have focused on what it will take to use, interact and learn from data being produced within the civic sector. It’s one thing to be able to build an app for civic; it’s quite another to build a platform that can manage multiple apps across multiple platforms while addressing the challenges plaguing the “wild west” nature of growth in the quickly emerging market of open data. I was recently invited to share our lessons learned and the promise of the future in mobile open data at the TEDxABQ Technology Salon. Here is a bit of what I shared:
Beyond Solving Problems to New Possibilities
When we first looked at the opportunities for creating apps built on open data, our priority was finding pain points for both cities and the people who lived there. We focused on solving real problems, and it led to some early success. We worked with the City of Albuquerque to deploy their ABQ RIDE app on iOS and Android platforms, and the app not only solved real problems for riders, it also saved real money for the city. The app has grown to over 20,000 regular users and continues to be one of the highest downloaded apps on our platform.
But recently, we’ve started asking questions that go beyond the basic, that change the experience or make things possible in ways that never were before. Here are two I’m incredibly proud to be a part of…”

An Air-Quality Monitor You Take with You


MIT Technology Review: “A startup is building a wearable air-quality monitor using a sensing technology that can cheaply detect the presence of chemicals around you in real time. By reporting the information its sensors gather to an app on your smartphone, the technology could help people with respiratory conditions and those who live in highly polluted areas keep tabs on exposure.
Berkeley, California-based Chemisense also plans to crowdsource data from users to show places around town where certain compounds are identified.
Initially, the company plans to sell a $150 wristband geared toward kids with asthma—of which there are nearly 7 million in the U.S., according to data from the Centers for Disease Control and Prevention— to help them identify places and pollutants that tend to provoke attacks,  and track their exposure to air pollution over time. The company hopes people with other respiratory conditions, and those who are just concerned about air pollution, will be interested, too.
In the U.S., air quality is monitored at thousands of stations across the country; maps and forecasts can be viewed online. But these monitors offer accurate readings only in their location.
Chemisense has not yet made its initial product, but it expects it will be a wristband using polymers treated with charged nanoparticles of carbon such that the polymers swell in the presence of certain chemical vapors, changing the resistance of a circuit.”

Cell Phone Guide For US Protesters, Updated 2014 Edition


EFF: “With major protests in the news again, we decided it’s time to update our cell phone guide for protestors. A lot has changed since we last published this report in 2011, for better and for worse. On the one hand, we’ve learned more about the massive volume of law enforcement requests for cell phone—ranging from location information to actual content—and widespread use of dedicated cell phone surveillance technologies. On the other hand, strong Supreme Court opinions have eliminated any ambiguity about the unconstitutionality of warrantless searches of phones incident to arrest, and a growing national consensus says location data, too, is private.”

Smart cities: moving beyond urban cybernetics to tackle wicked problems


Paper by Robert Goodspeed in the Cambridge Journal of Regions Economy and  Society: This article makes three related arguments. First, that although many definitions of the smart city have been proposed, corporate promoters say a smart city uses information technology to pursue efficient systems through real-time monitoring and control. Second, this definition is not new and equivalent to the idea of urban cybernetics debated in the 1970s. Third, drawing on a discussion of Rio de Janeiro’s Operations Center, I argue that viewing urban problems as wicked problems allows for more fundamental solutions than urban cybernetics, but requires local innovation and stakeholder participation. Therefore the last section describes institutions for municipal innovation and IT-enabled collaborative planning.”

Government opens up: 10k active government users on GitHub


GitHub: “In the summer of 2009, The New York Senate was the first government organization to post code to GitHub, and that fall, Washington DC quickly followed suit. By 2011, cities like Miami, Chicago, and New York; Australian, Canadian, and British government initiatives like Gov.uk; and US Federal agencies like the Federal Communications Commission, General Services Administration, NASA, and Consumer Financial Protection Bureau were all coding in the open as they began to reimagine government for the 21st century.
Fast forward to just last year: The White House Open Data Policy is published as a collaborative, living document, San Francisco laws are now forkable, and government agencies are accepting pull requests from every day developers.
This is all part of a larger trend towards government adopting open source practices and workflows — a trend that spans not only software, but data, and policy as well — and the movement shows no signs of slowing, with government usage on GitHub nearly tripling in the past year, to exceed 10,000 active government users today.

How government uses GitHub

When government works in the open, it acknowledges the idea that government is the world’s largest and longest-running open source project. Open data efforts, efforts like the City of Philadelphia’s open flu shot spec, release machine-readable data in open, immediately consumable formats, inviting feedback (and corrections) from the general public, and fundamentally exposing who made what change when, a necessary check on democracy.
Unlike the private sector, however, where open sourcing the “secret sauce” may hurt the bottom line, with government, we’re all on the same team. With the exception of say, football, Illinois and Wisconsin don’t compete with one another, nor are the types of challenges they face unique. Shared code prevents reinventing the wheel and helps taxpayer dollars go further, with efforts like the White House’s recently released Digital Services Playbook, an effort which invites every day citizens to play a role in making government better, one commit at a time.
However, not all government code is open source. We see that adopting these open source workflows for open collaboration within an agency (or with outside contractors) similarly breaks down bureaucratic walls, and gives like-minded teams the opportunity to work together on common challenges.

Government Today

It’s hard to believe that what started with a single repository just five years ago, has blossomed into a movement where today, more than 10,000 government employees use GitHub to collaborate on code, data, and policy each day….
You can learn more about GitHub in government at government.github.com, and if you’re a government employee, be sure to join our semi-private peer group to learn best practices for collaborating on software, data, and policy in the open.”

Opening Health Data: What Do Researchers Want? Early Experiences With New York's Open Health Data Platform.


Paper by Martin, Erika G. PhD, MPH; Helbig, Natalie PhD, MPA; and Birkhead, Guthrie S. MD, MPH in the Journal of Public Health Management & Practice: “Governments are rapidly developing open data platforms to improve transparency and make information more accessible. New York is a leader, with currently the only state platform devoted to health. Although these platforms could build public health departments’ capabilities to serve more researchers, agencies have little guidance on releasing meaningful and usable data.

Objective: Structured focus groups with researchers and practitioners collected stakeholder feedback on potential uses of open health data and New York’s open data strategy….

Results: There was low awareness of open data, with 67% of researchers reporting never using open data portals prior to the workshop. Participants were interested in data sets that were geocoded, longitudinal, or aggregated to small area granularity and capabilities to link multiple data sets. Multiple environmental conditions and barriers hinder their capacity to use health data for research. Although open data platforms cannot address all barriers, they provide multiple opportunities for public health research and practice, and participants were overall positive about the state’s efforts to release open data.

Conclusions: Open data are not ideal for some researchers because they do not contain individually identifiable data, indicating a need for tiered data release strategies. However, they do provide important new opportunities to facilitate research and foster collaborations among agencies, researchers, and practitioners.”