Open Data Privacy Playbook


A data privacy playbook by Ben Green, Gabe Cunningham, Ariel Ekblaw, Paul Kominers, Andrew Linzer, and Susan Crawford: “Cities today collect and store a wide range of data that may contain sensitive or identifiable information about residents. As cities embrace open data initiatives, more of this information is available to the public. While releasing data has many important benefits, sharing data comes with inherent risks to individual privacy: released data can reveal information about individuals that would otherwise not be public knowledge. In recent years, open data such as taxi trips, voter registration files, and police records have revealed information that many believe should not be released.

Effective data governance is a prerequisite for successful open data programs. The goal of this document is to codify responsible privacy-protective approaches and processes that could be adopted by cities and other government organizations that are publicly releasing data. Our report is organized around four recommendations:

  • Conduct risk-benefit analyses to inform the design and implementation of open data programs.
  • Consider privacy at each stage of the data lifecycle: collect, maintain, release, delete.
  • Develop operational structures and processes that codify privacy management widely throughout the City.
  • Emphasize public engagement and public priorities as essential aspects of data management programs.

Each chapter of this report is dedicated to one of these four recommendations, and provides fundamental context along with specific suggestions to carry them out. In particular, we provide case studies of best practices from numerous cities and a set of forms and tactics for cities to implement our recommendations. The Appendix synthesizes key elements of the report into an Open Data Privacy Toolkit that cities can use to manage privacy when releasing data….(More)”

Data in public health


Jeremy Berg in Science: “In 1854, physician John Snow helped curtail a cholera outbreak in a London neighborhood by mapping cases and identifying a central public water pump as the potential source. This event is considered by many to represent the founding of modern epidemiology. Data and analysis play an increasingly important role in public health today. This can be illustrated by examining the rise in the prevalence of autism spectrum disorders (ASDs), where data from varied sources highlight potential factors while ruling out others, such as childhood vaccines, facilitating wise policy choices…. A collaboration between the research community, a patient advocacy group, and a technology company (www.mss.ng) seeks to sequence the genomes of 10,000 well-phenotyped individuals from families affected by ASD, making the data freely available to researchers. Studies to date have confirmed that the genetics of autism are extremely complicated—a small number of genomic variations are closely associated with ASD, but many other variations have much lower predictive power. More than half of siblings, each of whom has ASD, have different ASD-associated variations. Future studies, facilitated by an open data approach, will no doubt help advance our understanding of this complex disorder….

A new data collection strategy was reported in 2013 to examine contagious diseases across the United States, including the impact of vaccines. Researchers digitized all available city and state notifiable disease data from 1888 to 2011, mostly from hard-copy sources. Information corresponding to nearly 88 million cases has been stored in a database that is open to interested parties without restriction (www.tycho.pitt.edu). Analyses of these data revealed that vaccine development and systematic vaccination programs have led to dramatic reductions in the number of cases. Overall, it is estimated that ∼100 million cases of serious childhood diseases have been prevented through these vaccination programs.

These examples illustrate how data collection and sharing through publication and other innovative means can drive research progress on major public health challenges. Such evidence, particularly on large populations, can help researchers and policy-makers move beyond anecdotes—which can be personally compelling, but often misleading—for the good of individuals and society….(More)”

Using Algorithms To Predict Gentrification


Tanvi Misra in CityLab: “I know it when I see it,” is as true for gentrification as it is for pornography. Usually, it’s when a neighborhood’s property values and demographics are already changing that the worries about displacement set in—rousing housing advocates and community organizers to action. But by that time, it’s often hard to pause, and put in safeguards for the neighborhood’s most vulnerable residents.

But what if there was an early warning system that detects where price appreciation or decline is about to occur? Predictive tools like this have been developed around the country, most notably by researchers in San Francisco. And their value is clear: city leaders and non-profits pinpoint where to preserve existing affordable housing, where to build more, and where to attract business investment ahead of time. But they’re often too academic or too obscure, which is why it’s not yet clear how they’re being used by policymakers and planners.

That’s the problem Ken Steif, at the University of Pennsylvania, is working to solve, in partnership with Alan Mallach, from the Center for Community Progress.

Mallach’s non-profit focused on revitalizing distressed neighborhoods, particularly in “legacy cities.” These are towns like St. Louis, Flint, Dayton, and Baltimore, that have experienced population loss and economic contraction in recent years, and suffer from property vacancies, blight, and unemployment. Mallach is interested in understanding which neighborhoods are likely to continue down that path, and which ones will do a 180-degree turn. Right now, he can intuitively make those predictions, based on his observations on neighborhood characteristics like housing stock, median income, and race. But an objective assessment can help confirm or deny his hypotheses.

That’s where Steif comes in. Having consulted with cities and non-profits on place-based data analytics, Steif has developed a number of algorithms that predict the movement of housing markets using expensive private data from entities like Zillow. Mallach suggested he try his algorithms on Census data, which is free and standardized.

The phenomenon he tested was  ‘endogenous gentrification’—this idea that an increase in home prices moves from wealthy neighborhoods to less expensive ones in its vicinity, like a wave. ..Steif used Census data from 1990 and 2000 to predict housing price change in 2010 in 29 big and small legacy cities. His algorithms took into account the relationship between the median home prices of a census tract to the ones around it, the proximity of census tracts to high-cost areas, and the spatial patterns in home price distribution. It also folded in variables like race, income and housing supply, among others.

After cross-checking the 2010 prediction with actual home prices, he projected the neighborhood change all the way to 2020. His algorithms were able to compute the speed and breadth of the wave of gentrification over time reasonably well, overall…(More)”.

Why Big Data Is a Big Deal for Cities


John M. Kamensky in Governing: “We hear a lot about “big data” and its potential value to government. But is it really fulfilling the high expectations that advocates have assigned to it? Is it really producing better public-sector decisions? It may be years before we have definitive answers to those questions, but new research suggests that it’s worth paying a lot of attention to.

University of Kansas Prof. Alfred Ho recently surveyed 65 mid-size and large cities to learn what is going on, on the front line, with the use of big data in making decisions. He found that big data has made it possible to “change the time span of a decision-making cycle by allowing real-time analysis of data to instantly inform decision-making.” This decision-making occurs in areas as diverse as program management, strategic planning, budgeting, performance reporting and citizen engagement.

Cities are natural repositories of big data that can be integrated and analyzed for policy- and program-management purposes. These repositories include data from public safety, education, health and social services, environment and energy, culture and recreation, and community and business development. They include both structured data, such as financial and tax transactions, and unstructured data, such as recorded sounds from gunshots and videos of pedestrian movement patterns. And they include data supplied by the public, such as the Boston residents who use a phone app to measure road quality and report problems.

These data repositories, Ho writes, are “fundamental building blocks,” but the challenge is to shift the ownership of data from separate departments to an integrated platform where the data can be shared.

There’s plenty of evidence that cities are moving in that direction and that they already are systematically using big data to make operational decisions. Among the 65 cities that Ho examined, he found that 49 have “some form of data analytics initiatives or projects” and that 30 have established “a multi-departmental team structure to do strategic planning for these data initiatives.”….The effective use of big data can lead to dialogs that cut across school-district, city, county, business and nonprofit-sector boundaries. But more importantly, it provides city leaders with the capacity to respond to citizens’ concerns more quickly and effectively….(More)”

Mapping open data governance models: Who makes decisions about government data and how?


Ana Brandusescu, Danny Lämmerhirt and Stefaan Verhulst call for a systematic and comparative investigation of the different governance models for open data policy and publication….

“An important value proposition behind open data involves increased transparency and accountability of governance. Yet little is known about how open data itself is governed. Who decides and how? How accountable are data holders to both the demand side and policy makers? How do data producers and actors assure the quality of government data? Who, if any, are data stewards within government tasked to make its data open?

Getting a better understanding of open data governance is not only important from an accountability point of view. If there is a better insight of the diversity of decision-making models and structures across countries, the implementation of common open data principles, such as those advocated by the International Open Data Charter, can be accelerated across countries.

In what follows, we seek to develop the initial contours of a research agenda on open data governance models. We start from the premise that different countries have different models to govern and administer their activities – in short, different ‘governance models’. Some countries are more devolved in their decision making, while others seek to organize “public administration” activities more centrally. These governance models clearly impact how open data is governed – providing a broad patchwork of different open data governance across the world and making it difficult to identify who the open data decision makers and data gatekeepers or stewards are within a given country.

For example, if one wants to accelerate the opening up of education data across borders, in some countries this may fall under the authority of sub-national government (such as states, provinces, territories or even cities), while in other countries education is governed by central government or implemented through public-private partnership arrangements. Similarly, transportation or water data may be privatised, while in other cases it may be the responsibility of municipal or regional government. Responsibilities are therefore often distributed across administrative levels and agencies affecting how (open) government data is produced, and published….(More)”

Dumpster diving made easier with food donation points


Springwise: “With food waste a substantial contributor to both environmental and social problems, communities around the world are trying to find ways to make better use of leftovers as well as reduce the overall production of unused foodstuffs. One of the biggest challenges in getting leftovers to the people who need them is the logistics of finding and connecting the relevant groups and transporting the food. Several on-demand apps, like this one that matches homeless shelters with companies that have leftover food, are taking the guesswork out of what to do with available food. And retailers are getting smarter, like this one in the United States, now selling produce that would previously have been rejected for aesthetic reasons only.

In Brazil, the Makers Society collective designed a campaign called Prato de Rua (Street Dish) to help link people in possession of edible leftovers with community members in need. The campaign centers around a sticker that is affixed to the side of city dumpsters requesting that donated food be left at the specific points. By providing a more organized approach to getting rid of leftover food, the collective hopes to help people think more carefully about what they are getting rid of and why. At the same time, the initiative helps people who would otherwise be forced to go through the contents of a dumpster for edible remains, access good food more safely and swiftly.

The campaign sticker is available for download for communities globally to take on and adapt the idea….(More)”

Recovering from disasters: Social networks matter more than bottled water and batteries


 at The Conversation: “Almost six years ago, Japan faced a paralyzing triple disaster: a massive earthquake, tsunami, and nuclear meltdowns that forced 470,000 people to evacuate from more than 80 towns, villages and cities. My colleagues and I investigated how communities in the hardest-hit areas reacted to these shocks, and found that social networks – the horizontal and vertical ties that connect us to others – are our most important defense against disasters….

We studied more than 130 cities, towns and villages in Tohoku, looking at factors such as exposure to the ocean, seawall height, tsunami height, voting patterns, demographics, and social capital. We found that municipalities which had higher levels of trust and interaction had lower mortality levels after we controlled for all of those confounding factors.

The kind of social tie that mattered here was horizontal, between town residents. It was a surprising finding given that Japan has spent a tremendous amount of money on physical infrastructure such as seawalls, but invested very little in building social ties and cohesion.

Based on interviews with survivors and a review of the data, we believe that communities with more ties, interaction and shared norms worked effectively to provide help to kin, family and neighbors. In many cases only 40 minutes separated the earthquake and the arrival of the tsunami. During that time, residents literally picked up and carried many elderly people out of vulnerable, low-lying areas. In high-trust neighborhoods, people knocked on doors of those who needed help and escorted them out of harm’s way….

In another study I worked to understand why some 40 cities, towns and villages across the Tohoku region had rebuilt, put children back into schools and restarted businesses at very different rates over a two-year period. Two years after the disasters some communities seemed trapped in amber, struggling to restore even half of their utility service, operating businesses and clean streets. Other cities had managed to rebound completely, placing evacuees in temporary homes, restoring gas and water lines, and clearing debris.

To understand why some cities were struggling, I looked into explanations including the impact of the disaster, the size of the city, financial independence, horizontal ties between cities, and vertical ties from the community to power brokers in Tokyo. In this phase of the recovery, vertical ties were the best predictor of strong recoveries.

Communities that had sent more powerful senior representatives to Tokyo in the years before the disaster did the best. These politicians and local ambassadors helped to push the bureaucracy to send aid, reach out to foreign governments for assistance, and smooth the complex zoning and bureaucratic impediments to recovery…

As communities around the world face disasters more and more frequently, I hope that my research on Japan after 3.11 can provide guidance to residents facing challenges. While physical infrastructure is important for mitigating disaster, communities should also invest time and effort in building social ties….(More)”

Big data may be reinforcing racial bias in the criminal justice system


Laurel Eckhouse at the Washington Post: “Big data has expanded to the criminal justice system. In Los Angeles, police use computerized “predictive policing” to anticipate crimes and allocate officers. In Fort Lauderdale, Fla., machine-learning algorithms are used to set bond amounts. In states across the country, data-driven estimates of the risk of recidivism are being used to set jail sentences.

Advocates say these data-driven tools remove human bias from the system, making it more fair as well as more effective. But even as they have become widespread, we have little information about exactly how they work. Few of the organizations producing them have released the data and algorithms they use to determine risk.

 We need to know more, because it’s clear that such systems face a fundamental problem: The data they rely on are collected by a criminal justice system in which race makes a big difference in the probability of arrest — even for people who behave identically. Inputs derived from biased policing will inevitably make black and Latino defendants look riskier than white defendants to a computer. As a result, data-driven decision-making risks exacerbating, rather than eliminating, racial bias in criminal justice.
Consider a judge tasked with making a decision about bail for two defendants, one black and one white. Our two defendants have behaved in exactly the same way prior to their arrest: They used drugs in the same amount, have committed the same traffic offenses, owned similar homes and took their two children to the same school every morning. But the criminal justice algorithms do not rely on all of a defendant’s prior actions to reach a bail assessment — just those actions for which he or she has been previously arrested and convicted. Because of racial biases in arrest and conviction rates, the black defendant is more likely to have a prior conviction than the white one, despite identical conduct. A risk assessment relying on racially compromised criminal-history data will unfairly rate the black defendant as riskier than the white defendant.

To make matters worse, risk-assessment tools typically evaluate their success in predicting a defendant’s dangerousness on rearrests — not on defendants’ overall behavior after release. If our two defendants return to the same neighborhood and continue their identical lives, the black defendant is more likely to be arrested. Thus, the tool will falsely appear to predict dangerousness effectively, because the entire process is circular: Racial disparities in arrests bias both the predictions and the justification for those predictions.

We know that a black person and a white person are not equally likely to be stopped by police: Evidence on New York’s stop-and-frisk policy, investigatory stops, vehicle searches and drug arrests show that black and Latino civilians are more likely to be stopped, searched and arrested than whites. In 2012, a white attorney spent days trying to get himself arrested in Brooklyn for carrying graffiti stencils and spray paint, a Class B misdemeanor. Even when police saw him tagging the City Hall gateposts, they sped past him, ignoring a crime for which 3,598 people were arrested by the New York Police Department the following year.

Before adopting risk-assessment tools in the judicial decision-making process, jurisdictions should demand that any tool being implemented undergo a thorough and independent peer-review process. We need more transparencyand better data to learn whether these risk assessments have disparate impacts on defendants of different races. Foundations and organizations developing risk-assessment tools should be willing to release the data used to build these tools to researchers to evaluate their techniques for internal racial bias and problems of statistical interpretation. Even better, with multiple sources of data, researchers could identify biases in data generated by the criminal justice system before the data is used to make decisions about liberty. Unfortunately, producers of risk-assessment tools — even nonprofit organizations — have not voluntarily released anonymized data and computational details to other researchers, as is now standard in quantitative social science research….(More)”.

Citizen Empowerment and Innovation in the Data-Rich City


Book edited by C. Certomà, M. Dyer, L. Pocatilu and F. Rizzi: “… analyzes the ongoing transformation in the “smart city” paradigm and explores the possibilities that technological innovations offer for the effective involvement of ordinary citizens in collective knowledge production and decision-making processes within the context of urban planning and management. To so, it pursues an interdisciplinary approach, with contributions from a range of experts including city managers, public policy makers, Information and Communication Technology (ICT) specialists, and researchers. The first two parts of the book focus on the generation and use of data by citizens, with or without institutional support, and the professional management of data in city governance, highlighting the social connectivity and livability aspects essential to vibrant and healthy urban environments. In turn, the third part presents inspiring case studies that illustrate how data-driven solutions can empower people and improve urban environments, including enhanced sustainability. The book will appeal to all those who are interested in the required transformation in the planning, management, and operations of data-rich cities and the ways in which such cities can employ the latest technologies to use data efficiently, promoting data access, data sharing, and interoperability….(More)”.

A City Is Not a Computer


 at Places Journal: “…Modernity is good at renewing metaphors, from the city as machine, to the city as organism or ecology, to the city as cyborgian merger of the technological and the organic. Our current paradigm, the city as computer, appeals because it frames the messiness of urban life as programmable and subject to rational order. Anthropologist Hannah Knox explains, “As technical solutions to social problems, information and communications technologies encapsulate the promise of order over disarray … as a path to an emancipatory politics of modernity.” And there are echoes of the pre-modern, too. The computational city draws power from an urban imaginary that goes back millennia, to the city as an apparatus for record-keeping and information management.

We’ve long conceived of our cities as knowledge repositories and data processors, and they’ve always functioned as such. Lewis Mumford observed that when the wandering rulers of the European Middle Ages settled in capital cities, they installed a “regiment of clerks and permanent officials” and established all manner of paperwork and policies (deeds, tax records, passports, fines, regulations), which necessitated a new urban apparatus, the office building, to house its bureaus and bureaucracy. The classic example is the Uffizi (Offices) in Florence, designed by Giorgio Vasari in the mid-16th century, which provided an architectural template copied in cities around the world. “The repetitions and regimentations of the bureaucratic system” — the work of data processing, formatting, and storage — left a “deep mark,” as Mumford put it, on the early modern city.

Yet the city’s informational role began even earlier than that. Writing and urbanization developed concurrently in the ancient world, and those early scripts — on clay tablets, mud-brick walls, and landforms of various types — were used to record transactions, mark territory, celebrate ritual, and embed contextual information in landscape.  Mumford described the city as a fundamentally communicative space, rich in information:

Through its concentration of physical and cultural power, the city heightened the tempo of human intercourse and translated its products into forms that could be stored and reproduced. Through its monuments, written records, and orderly habits of association, the city enlarged the scope of all human activities, extending them backwards and forwards in time. By means of its storage facilities (buildings, vaults, archives, monuments, tablets, books), the city became capable of transmitting a complex culture from generation to generation, for it marshaled together not only the physical means but the human agents needed to pass on and enlarge this heritage. That remains the greatest of the city’s gifts. As compared with the complex human order of the city, our present ingenious electronic mechanisms for storing and transmitting information are crude and limited.

Mumford’s city is an assemblage of media forms (vaults, archives, monuments, physical and electronic records, oral histories, lived cultural heritage); agents (architectures, institutions, media technologies, people); and functions (storage, processing, transmission, reproduction, contextualization, operationalization). It is a large, complex, and varied epistemological and bureaucratic apparatus. It is an information processor, to be sure, but it is also more than that.

Were he alive today, Mumford would reject the creeping notion that the city is simply the internet writ large. He would remind us that the processes of city-making are more complicated than writing parameters for rapid spatial optimization. He would inject history and happenstance. The city is not a computer. This seems an obvious truth, but it is being challenged now (again) by technologists (and political actors) who speak as if they could reduce urban planning to algorithms.

Why should we care about debunking obviously false metaphors? It matters because the metaphors give rise to technical models, which inform design processes, which in turn shape knowledges and politics, not to mention material cities. The sites and systems where we locate the city’s informational functions — the places where we see information-processing, storage, and transmission “happening” in the urban landscape — shape larger understandings of urban intelligence….(More)”