Push, Pull, and Spill: A Transdisciplinary Case Study in Municipal Open Government


New paper by Jan Whittington et al: “Cities hold considerable information, including details about the daily lives of residents and employees, maps of critical infrastructure, and records of the officials’ internal deliberations. Cities are beginning to realize that this data has economic and other value: If done wisely, the responsible release of city information can also release greater efficiency and innovation in the public and private sector. New services are cropping up that leverage open city data to great effect.

Meanwhile, activist groups and individual residents are placing increasing pressure on state and local government to be more transparent and accountable, even as others sound an alarm over the privacy issues that inevitably attend greater data promiscuity. This takes the form of political pressure to release more information, as well as increased requests for information under the many public records acts across the country.

The result of these forces is that cities are beginning to open their data as never before. It turns out there is surprisingly little research to date into the important and growing area of municipal open data. This article is among the first sustained, cross-disciplinary assessments of an open municipal government system. We are a team of researchers in law, computer science, information science, and urban studies. We have worked hand-in-hand with the City of Seattle, Washington for the better part of a year to understand its current procedures from each disciplinary perspective. Based on this empirical work, we generate a set of recommendations to help the city manage risk latent in opening its data….(More)”

How understanding the ‘shape’ of data could change our world


Gurjeet Singh at the WEF: “We live in an extraordinary time. The capacity to generate and to store data has reached dizzying proportions. What lies within that data represents the chance for this generation to solve its most pressing problems – from disease and climate change, to healthcare and customer understanding.

The magnitude of the opportunity is defined by the magnitude of the data that is created – and it is astonishing….

Despite the technical advances in collection and storage, knowledge generation lags. This is a function of how organizations approach their data, how they conduct analyses, how they automate learning through machine intelligence.

At its heart, it is a mathematical problem. For any dataset the total number of possible hypotheses/queries is exponential in the size of the data. Exponential functions are difficult enough for humans to comprehend; however, to further complicate matters, the size of the data itself is growing exponentially, and is about to hit another inflection point as the Internet of Things kicks in.


What that means is that we are facing double exponential growth in the number of questions that we can ask of our data. If we choose the same approaches that have served us over time – iteratively asking questions of the data until we get the right answer – we will have lost out on opportunity to grasp our generational opportunity.

There are not, and will not ever be enough data scientists in the world to be successful in this approach. We cannot arm enough citizen data scientists with new software to be successful in this approach. Software that makes question asking or hypothesis development more accessible or more efficient miss the central premise that they will only fall further behind as new data becomes available each millisecond.

To truly unlock the value that lies within our data we need to turn our attention to the data, setting aside the questions for later. This too, turns out to be a mathematical problem. Data, it turns out, has shape. That shape has meaning. The shape of data tells you everything you need to know about your data from its obvious features to its secret secrets.

We understand that regression produces lines.

data2

We know that customer segmentation produces groups.

data3

We know that economic growth and interest rates have a cyclical nature (diseases like malaria have this shape too).

data4

By knowing the shape and where we are in the shape, we vastly improve our understanding of where we are, where we have been and perhaps more importantly, what might happen next. In understanding the shape of data we understand every feature of the dataset, immediately grasping what it is important in the data, thus dramatically reducing the number of questions to ask and accelerating the discovery process.

By changing our thinking – and starting with the shape of the data, not a series of questions (which very often come with significant biases) – we can extract knowledge from these rapidly growing, massive and complex datasets.

The knowledge that lies hidden within electronic medical records, billing records and clinical records is enough to transform how we deliver healthcare and how we treat diseases. The knowledge that lies within the massive data stores of governments, universities and other institutions will illuminate the conversation on climate change and point the way to answers on what we need to do to protect the planet for future generations. The knowledge that is obscured by web, transaction, CRM, social and other data will inform a clearer, more meaningful picture of the customer and will, in turn define the optimal way to interact.

This is the opportunity for our generation to turn data into knowledge. To get there will require a different approach, but one with the ability to impact the entirety of humankind….(More)

IBM using Watson to build a “SIRI for Cities”


 at FastCompany: “A new app that incorporates IBM’s Watson cognitive computing platform is like Siri for ordering city services.

IBM said today that the city of Surrey, in British Columbia, Canada, has rolled out the new app, which leverages Watson’s sophisticated language and data analysis system to allow residents to make requests for things like finding out why their trash wasn’t picked up or how to find a lost cat using natural language.

Watson is best known as the computer system that autonomously vanquished the world’s best Jeopardy players during a highly publicized competition in 2011. In the years since, IBM has applied the system to a wide range of computing problems in industries like health care, banking, retail, and education. The system is based on Watson’s ability to understand natural language queries and to analyze huge data sets.

Recently, Watson rolled out a tool designed to help people detect the tone in their writing.

Surrey worked with the developer Purple Forge to build the new city services app, which will be combined with the city’s existing “My Surrey” mobile and web tools. IBM said that residents can ask a wide range of questions on devices like smartphones, laptops, or even Apple Watches. Big Blue said Surrey’s app is the first time Watson has been utilized in a “citizen services” app.

The tool offers a series of frequently asked questions, but also allows residents in the city of nearly half a million to come up with their own. IBM said Surrey officials are hopeful that the app will help them be more responsive to residents’ concerns.

Among the services users can ask about are those provided by Surrey’s police and fire departments, animal control, parking enforcement, trash pickup, and others….(More)”

The Trouble With Disclosure: It Doesn’t Work


Jesse Eisinger at ProPublica: “Louis Brandeis was wrong. The lawyer and Supreme Court justice famously declared that sunlight is the best disinfectant, and we have unquestioningly embraced that advice ever since.

 Over the last century, disclosure and transparency have become our regulatory crutch, the answer to every vexing problem. We require corporations and government to release reams of information on food, medicine, household products, consumer financial tools, campaign finance and crime statistics. We have a booming “report card” industry for a range of services, including hospitals, public schools and restaurants.

All this sunlight is blinding. As new scholarship is demonstrating, the value of all this information is unproved. Paradoxically, disclosure can be useless — and sometimes actually harmful or counterproductive.

“We are doing disclosure as a regulatory move all over the board,” says Adam J. Levitin, a law professor at Georgetown, “The funny thing is, we are doing this despite very little evidence of its efficacy.”

Let’s start with something everyone knows about — the “terms of service” agreements for the likes of iTunes. Like everybody else, I click the “I agree” box, feeling a flash of resentment. I’m certain that in Paragraph 184 is a clause signing away my firstborn to a life of indentured servitude to Timothy D. Cook as his chief caviar spoon keeper.

Our legal theoreticians have determined these opaque monstrosities work because someone, somewhere reads the fine print in these contracts and keeps corporations honest. It turns out what we laymen intuit is true: No one reads them, according to research by a New York University law professor, Florencia Marotta-Wurgler.

In real life, there is no critical mass of readers policing the agreements. And if there were an eagle-eyed crew of legal experts combing through these agreements, what recourse would they have? Most people don’t even know that the Supreme Court has gutted their rights to sue in court, and they instead have to go into arbitration, which usually favors corporations.

The disclosure bonanza is easy to explain. Nobody is against it. It’s politically expedient. Companies prefer such rules, especially in lieu of actual regulations that would curtail bad products or behavior. The opacity lobby — the remora fish class of lawyers, lobbyists and consultants in New York and Washington — knows that disclosure requirements are no bar to dodgy practices. You just have to explain what you’re doing in sufficiently incomprehensible language, a task that earns those lawyers a hefty fee.

Of course, some disclosure works. Professor Levitin cites two examples. The first is an olfactory disclosure. Methane doesn’t have any scent, but a foul smell is added to alert people to a gas leak. The second is ATM. fees. A study in Australia showed that once fees were disclosed, people avoided the high-fee machines and took out more when they had to go to them.

But to Omri Ben-Shahar, co-author of a recent book, ” More Than You Wanted To Know: The Failure of Mandated Disclosure,” these are cherry-picked examples in a world awash in useless disclosures. Of course, information is valuable. But disclosure as a regulatory mechanism doesn’t work nearly well enough, he argues….(More)

Algorithms and Bias


Q. and A. With Cynthia Dwork in the New York Times: “Algorithms have become one of the most powerful arbiters in our lives. They make decisions about the news we read, the jobs we get, the people we meet, the schools we attend and the ads we see.

Yet there is growing evidence that algorithms and other types of software can discriminate. The people who write them incorporate their biases, and algorithms often learn from human behavior, so they reflect the biases we hold. For instance, research has shown that ad-targeting algorithms have shown ads for high-paying jobs to men but not women, and ads for high-interest loans to people in low-income neighborhoods.

Cynthia Dwork, a computer scientist at Microsoft Research in Silicon Valley, is one of the leading thinkers on these issues. In an Upshot interview, which has been edited, she discussed how algorithms learn to discriminate, who’s responsible when they do, and the trade-offs between fairness and privacy.

Q: Some people have argued that algorithms eliminate discriminationbecause they make decisions based on data, free of human bias. Others say algorithms reflect and perpetuate human biases. What do you think?

A: Algorithms do not automatically eliminate bias. Suppose a university, with admission and rejection records dating back for decades and faced with growing numbers of applicants, decides to use a machine learning algorithm that, using the historical records, identifies candidates who are more likely to be admitted. Historical biases in the training data will be learned by the algorithm, and past discrimination will lead to future discrimination.

Q: Are there examples of that happening?

A: A famous example of a system that has wrestled with bias is the resident matching program that matches graduating medical students with residency programs at hospitals. The matching could be slanted to maximize the happiness of the residency programs, or to maximize the happiness of the medical students. Prior to 1997, the match was mostly about the happiness of the programs.

This changed in 1997 in response to “a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the expense of applicants, and whether applicants could ‘game the system,’ ” according to a paper by Alvin Roth and Elliott Peranson published in The American Economic Review.

Q: You have studied both privacy and algorithm design, and co-wrote a paper, “Fairness Through Awareness,” that came to some surprising conclusions about discriminatory algorithms and people’s privacy. Could you summarize those?

A: “Fairness Through Awareness” makes the observation that sometimes, in order to be fair, it is important to make use of sensitive information while carrying out the classification task. This may be a little counterintuitive: The instinct might be to hide information that could be the basis of discrimination….

Q: The law protects certain groups from discrimination. Is it possible to teach an algorithm to do the same?

A: This is a relatively new problem area in computer science, and there are grounds for optimism — for example, resources from the Fairness, Accountability and Transparency in Machine Learning workshop, which considers the role that machines play in consequential decisions in areas like employment, health care and policing. This is an exciting and valuable area for research. …(More)”

Citizen Science used in studying Seasonal Variation in India


Rohin Daswani at the Commons Lab, Woodrow Wilson International Center for Scholars: “Climate change has started affecting many countries around the world. While every country is susceptible to the risks of global warming some countries, such as India, are especially vulnerable.

India’s sheer dependence on rainfall to irrigate its vast agricultural lands and to feed its economy makes it highly vulnerable to climate change. A report from the UN Intergovernmental Panel on Climate Change (IPCC) predicts global temperature will increase between 0.3 and 4.8 degrees Celsius and sea levels will rise 82cm (32 in) by the late 21st century. But what effect will the changing rainfall pattern have on the seasonal variation?

One way to study seasonal variation in India is to analyze the changing patterns of flowering and fruiting of common trees like the Mango and Amaltas trees. SeasonWatch , a program part of the National Center for Biological Sciences (NCBS), the biological wing of the Tata Institute for Fundamental Research, does exactly that. It is an India-wide program that studies the changing seasons by monitoring the seasonal cycles of flowering, fruiting and leaf flush of common trees. And how does it do that? It does it by utilizing the idea of Citizen Science. Anybody, be it children or adults, interested in trees and the effects of climate change can participate. All they have to do is register, select a tree near them and monitor it every week. The data is uploaded to a central website and is analyzed for changing patterns of plant life, and the effects of climate change on plant life cycle. The data is also open source so anyone can get access to it if they wish to. With all this information one could answer questions which were previously impossible to answer such as:

  • How does the flowering of Neem change across India?
  • Is fruiting of Tamarind different in different parts of the country depending on rainfall in the previous year?
  • Is year to year variation in flowering and fruiting time of Mango related to Winter temperatures?

Using Citizen Science and crowdsourcing, programs such as SeasonWatch have expanded the scope and work of conservation biology in various ecosystems across India….(More)”

Tools to Innovate: Data Analytics, Risk Management, and Shared Services


New report by The Business of Government Center: “Today, governments have access to a variety of tools to successfully implement agency programs. For example, Data Analytics—especially of financial data—can be used to better inform decision making by ensuring agencies have the information they need at the point of time that it can be most effective. In addition, governments at all levels can more effectively address risks using new Risk Management approaches. And finally, Shared Services can not only save money, but also stimulate innovation, improve decisionmaking, and increase the quality of services expected by citizens.

The IBM Center has published a variety of reports related to these topics and accordingly, we have brought key findings on these topics together in the compilation that follows. We welcome your thoughts on these issues, and look forward to a continued dialogue with government leaders and stakeholders on actions to help agencies achieve their mission effectively and efficiently….(More)”

Designing Successful Governance Groups


The Berkman Center for Internet & Society, together with the Global Network of Internet and Society Research Centers (NoC), is pleased to announce the release of a new publication, “Designing Successful Governance Groups: Lessons for Leaders from Real-World Examples,” authored by Ryan Budish, Sarah Myers West, and Urs Gasser.

Solutions to many of the world’s most pressing governance challenges, ranging from natural resource management to the governance of the Internet, require leaders to engage in multistakeholder processes. Yet, relatively little is known how to successfully lead such processes.  This paper outlines a set of useful, actionable steps for policymakers and other stakeholders charged with creating, convening, and leading governance groups. The tools for success described in this document are distilled from research published earlier this year by Berkman and the NoC, a comprehensive report entitled “Multistakeholder as Governance Groups: Observations From Case Studies,” which closely examines 12 examples of real-world governance structures from around the globe and draws new conclusions about how to successfully form and operate governance groups.

This new publication, “Designing Successful Governance Groups,” focuses on the operational recommendations drawn from the earlier case studies and their accompanying synthesis paper. It provides an actionable starting place for those interested in understanding some of the critical ingredients for successful multistakeholder governance.

At the core of this paper are three steps that have helped conveners of successful governance groups:

  1. Establish clear success criteria

  2. Set the initial framework conditions for the group

  3. Continually adjust steps 1 and 2 based on evolving contextual factors

The paper explores these three steps in greater detail and explains how they help implement one central idea: Governance groups work best when they are flexible and adaptive to new circumstances and needs and have conveners who understand how their decisions will affect the inclusiveness, transparency, accountability, and effectiveness of the group….(More)”

What We’ve Learned About Sharing Our Data Analysis


Jeremy Singer-Vine at Source: “Last Friday morning, Jessica Garrison, Ken Bensinger, and I published a BuzzFeed News investigation highlighting the ease with which American employers have exploited and abused a particular type of foreign worker—those on seasonal H–2 visas. The article drew on seven months’ worth of reporting, scores of interviews, hundreds of documents—and two large datasets maintained by the Department of Labor.

That same morning, we published the corresponding data, methodologies, and analytic code on GitHub. This isn’t the first time we’ve open-sourced our data and analysis; far from it. But the H–2 project represents our most ambitious effort yet. In this post, I’ll describe our current thinking on “reproducible data analyses,” and how the H–2 project reflects those thoughts.

What Is “Reproducible Data Analysis”?

It’s helpful to break down a couple of slightly oversimplified definitions. Let’s call “open-sourcing” the act of publishing the raw code behind a software project. And let’s call “reproducible data analysis” the act of open-sourcing the code and data required to reproduce a set of calculations.

Journalism has seen a mini-boom of reproducible data analysis in the past year or two. (It’s far froma novel concept, of course.) FiveThirtyEight publishes data and re-runnable computer code for many of their stories. You can download the brains and brawn behind Leo, the New York Times’ statistical model for forecasting the outcome of the 2014 midterm Senate elections. And if you want to re-runBarron’s magazine’s analysis of SEC Rule 605 reports, you can do that, too. The list goes on.

….

Why Reproducible Data Analysis?

At BuzzFeed News, our main motivation is simple: transparency. If an article includes our own calculations (and are beyond a grade-schooler’s pen-and-paper calculations), then you should be able to see—and potentially criticize—how we did it…..

There are reasons, of course, not to publish a fully-reproducible analysis. The most obvious and defensible reason: Your data includes Social Security numbers, state secrets, or other sensitive information. Sometimes, you’ll be able to scrub these bits from your data. Other times, you won’t. (Adetailed methodology is a good alternative.)

How To Publish Reproducible Data Analysis?

At BuzzFeed News, we’re still figuring out the best way to skin this cat. Other news organizations might be arrive at entirely opposite conclusions. That said, here are some tips, based on our experience:

Describe the main data sources, and how you got them. Art appraisers and data-driven reporters agree: Provenance matters. Who collected the data? What universe of things does it quantify? How did you get it?.… (More)”

Open Data and Sub-national Governments: Lessons from Developing Countries


WebFoundation: “Open government data (OGD) as a concept is gaining currency globally due to the strong advocacy of global organisations as Open Government Partnership. In recent years, there has been increased commitment on the part of national governments to proactively disclose information. However, much of the discussion on OGD is at the national level, especially in developing countries where commitments of proactive disclosure is conditioned by the commitments of national governments as expressed through the OGP national action plans. However, the local is important in the context of open data. In decentralized contexts, the local is where data is collected and stored, where there is strong feasibility that data will be published, and where data can generate the most impact when used. This synthesis paper wants to refocus the discussion of open government data in sub-national contexts by analysing nine country papers produced through the Open Data in Developing Countries research project.

Using a common research framework that focuses on context, governance setting, and open data initiatives, the study found out that there is substantial effort on the part of sub-national governments to proactively disclose data, however, the design delimits citizen participation, and eventually, use. Second, context demands diff erent roles for intermediaries and diff erent types of initiatives to create an enabling environment for open data. Finally, data quality will remain a critical challenge for sub-national governments in developing countries and it will temper potential impact that open data will be able to generate. Download the full research paper here