Push, Pull, and Spill: A Transdisciplinary Case Study in Municipal Open Government


New paper by Jan Whittington et al: “Cities hold considerable information, including details about the daily lives of residents and employees, maps of critical infrastructure, and records of the officials’ internal deliberations. Cities are beginning to realize that this data has economic and other value: If done wisely, the responsible release of city information can also release greater efficiency and innovation in the public and private sector. New services are cropping up that leverage open city data to great effect.

Meanwhile, activist groups and individual residents are placing increasing pressure on state and local government to be more transparent and accountable, even as others sound an alarm over the privacy issues that inevitably attend greater data promiscuity. This takes the form of political pressure to release more information, as well as increased requests for information under the many public records acts across the country.

The result of these forces is that cities are beginning to open their data as never before. It turns out there is surprisingly little research to date into the important and growing area of municipal open data. This article is among the first sustained, cross-disciplinary assessments of an open municipal government system. We are a team of researchers in law, computer science, information science, and urban studies. We have worked hand-in-hand with the City of Seattle, Washington for the better part of a year to understand its current procedures from each disciplinary perspective. Based on this empirical work, we generate a set of recommendations to help the city manage risk latent in opening its data….(More)”

Algorithms and Bias


Q. and A. With Cynthia Dwork in the New York Times: “Algorithms have become one of the most powerful arbiters in our lives. They make decisions about the news we read, the jobs we get, the people we meet, the schools we attend and the ads we see.

Yet there is growing evidence that algorithms and other types of software can discriminate. The people who write them incorporate their biases, and algorithms often learn from human behavior, so they reflect the biases we hold. For instance, research has shown that ad-targeting algorithms have shown ads for high-paying jobs to men but not women, and ads for high-interest loans to people in low-income neighborhoods.

Cynthia Dwork, a computer scientist at Microsoft Research in Silicon Valley, is one of the leading thinkers on these issues. In an Upshot interview, which has been edited, she discussed how algorithms learn to discriminate, who’s responsible when they do, and the trade-offs between fairness and privacy.

Q: Some people have argued that algorithms eliminate discriminationbecause they make decisions based on data, free of human bias. Others say algorithms reflect and perpetuate human biases. What do you think?

A: Algorithms do not automatically eliminate bias. Suppose a university, with admission and rejection records dating back for decades and faced with growing numbers of applicants, decides to use a machine learning algorithm that, using the historical records, identifies candidates who are more likely to be admitted. Historical biases in the training data will be learned by the algorithm, and past discrimination will lead to future discrimination.

Q: Are there examples of that happening?

A: A famous example of a system that has wrestled with bias is the resident matching program that matches graduating medical students with residency programs at hospitals. The matching could be slanted to maximize the happiness of the residency programs, or to maximize the happiness of the medical students. Prior to 1997, the match was mostly about the happiness of the programs.

This changed in 1997 in response to “a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the expense of applicants, and whether applicants could ‘game the system,’ ” according to a paper by Alvin Roth and Elliott Peranson published in The American Economic Review.

Q: You have studied both privacy and algorithm design, and co-wrote a paper, “Fairness Through Awareness,” that came to some surprising conclusions about discriminatory algorithms and people’s privacy. Could you summarize those?

A: “Fairness Through Awareness” makes the observation that sometimes, in order to be fair, it is important to make use of sensitive information while carrying out the classification task. This may be a little counterintuitive: The instinct might be to hide information that could be the basis of discrimination….

Q: The law protects certain groups from discrimination. Is it possible to teach an algorithm to do the same?

A: This is a relatively new problem area in computer science, and there are grounds for optimism — for example, resources from the Fairness, Accountability and Transparency in Machine Learning workshop, which considers the role that machines play in consequential decisions in areas like employment, health care and policing. This is an exciting and valuable area for research. …(More)”

Open Data and Sub-national Governments: Lessons from Developing Countries


WebFoundation: “Open government data (OGD) as a concept is gaining currency globally due to the strong advocacy of global organisations as Open Government Partnership. In recent years, there has been increased commitment on the part of national governments to proactively disclose information. However, much of the discussion on OGD is at the national level, especially in developing countries where commitments of proactive disclosure is conditioned by the commitments of national governments as expressed through the OGP national action plans. However, the local is important in the context of open data. In decentralized contexts, the local is where data is collected and stored, where there is strong feasibility that data will be published, and where data can generate the most impact when used. This synthesis paper wants to refocus the discussion of open government data in sub-national contexts by analysing nine country papers produced through the Open Data in Developing Countries research project.

Using a common research framework that focuses on context, governance setting, and open data initiatives, the study found out that there is substantial effort on the part of sub-national governments to proactively disclose data, however, the design delimits citizen participation, and eventually, use. Second, context demands diff erent roles for intermediaries and diff erent types of initiatives to create an enabling environment for open data. Finally, data quality will remain a critical challenge for sub-national governments in developing countries and it will temper potential impact that open data will be able to generate. Download the full research paper here

100 parliaments as open data, ready for you to use


Myfanwy Nixon at mySociety’s blog and OpeningParliament: “If you need data on the people who make up your parliament, another country’s parliament, or indeed all parliaments, you may be in luck.

Every Politician, the latest Poplus project, aims to collect, store and share information about every parliament in the world, past and present—and it already contains 100 of them.

What’s more, it’s all provided as Open Data to anyone who would like to use it to power a civic tech project. We’re thinking parliamentary monitoring organisations, journalists, groups who run access-to-democracy sites like our own WriteToThem, and especially researchers who want to do analysis across multiple countries.

But isn’t that data already available?

Yes and no. There’s no doubt that you can find details of most parliaments online, either on official government websites, on Wikipedia, or on a variety of other places online.

But, as you might expect from data that’s coming from hundreds of different sources, it’s in a multitude of different formats. That makes it very hard to work with in any kind of consistent fashion.

Every Politician standardises all of its data into the Popolo standard and then provides it in two simple downloadable formats:

  • csv, which contains basic data that’s easy to work with on spreadsheets
  • JSON which contains richer data on each person, and is ideal for developers

This standardisation means that it should now be a lot easier to work on projects across multiple countries, or to compare one country’s data with another. It also means that data works well with other Poplus Components….(More)”

Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings


Future of Privacy Forum: “In the wake of last year’s news about the Facebook “emotional contagion” study and subsequent public debate about the role of A/B Testing and ethical concerns around the use of Big Data, FPF Senior Fellow Omer Tene participated in a December symposum on corporate consumer research hosted by Silicon Flatirons. This past month, the Colorado Technology Law Journal published a series of papers that emerged out of the symposium, including “Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings.”

“Beyond the Common Rule,” by Jules Polonetsky, Omer Tene, and Joseph Jerome, continues the Future of Privacy Forum’s effort to build on the notion of consumer subject review boards first advocated by Ryan Calo at FPF’s 2013 Big Data symposium. It explores how researchers, increasingly in corporate settings, are analyzing data and testing theories using often sensitive personal information. Many of these new uses of PII are simply natural extensions of current practices, and are either within the expectations of individuals or the bounds of the FIPPs. Yet many of these projects could involve surprising applications or uses of data, exceeding user expectations, and offering notice and obtaining consent could may not be feasible.

This article expands on ideas and suggestions put forward around the recent discussion draft of the White House Consumer Privacy Bill of Rights, which espouses “Privacy Review Boards” as a safety value for noncontextual data uses. It explores how existing institutional review boards within the academy and for human testing research could offer lessons for guiding principles, providing accountability and enhancing consumer trust, and offers suggestions for how companies — and researchers — can pursue both knowledge and data innovation responsibly and ethically….(More)”

Local open data ecosystems – a prototype map


Ed Parkes and Gail Dawes at Nesta: “It is increasingly recognised that some of the most important open data is published by local authorities (LAs) – data which is important to us like bin collection days, planning applications and even where your local public toilet is. Also given the likely move towards greater decentralisation, firstly through devolution to cities, the importance of the publication of local open data could arguably become more important over the next couple of years. In addition, as of 1st April, there is a new transparency code for local government requiring local authorities to publish further information on things like spending to local land assets. To pre-empt this likely renewed focus on local open data we have begun to develop a prototype map to highlight the UK’s local open data ecosystem.

Already there is some great practice in the publication of open data at a local level – such as Leeds Data Mill, London Datastore, and Open Data Sheffield. This regional activity is also characterised not just by high quality data publication, but also by pulling together through hackdays, challenges and meetups a community interested in the power of open data. This creates an ecosystem of publishers and re-users at a local level. Some of the best practice in relation to developing such an ecosystem was recognised by the last government in the announcement of a group of Local Authority Open Data Champions. Some of these were also recipients of the funding for projects from both the Cabinet Office and through the Open Data User Group.

Outside of this best practice it isn’t always easy to understand how developed smaller, less urban open data agendas are. Other than looking at each councils’ website or increasingly on the data portals that forwarding thinking councils are providing, there is a surprisingly large number of places that local authorities could make their open data available. The most well known of these is the Openly Local project but at the time of writing this now seems to be retired. Perhaps the best catalogue of local authority data is on Data.gov.uk itself. This has 1,449 datasets published by LAs across 200 different organisations. Following that there is the Open Data Communities website which hosts links to LA linked datasets. Using data from the latter, Steve Peters has developed the local data dashboard (which was itself based on the UK Local Government Open Data resource map from Owen Boswarva). In addition, local authorities can also register their open data in the LGA’s Open Data Inventory Service and take it through the ODI’s data certification process.

Prototype map of local open data eco-systems

To try to highlight patterns in local authority open data publication we decided to make a map of activity around the country (although in the first instance we’ve focused on England)….(More)

Yelp’s Consumer Protection Initiative: ProPublica Partnership Brings Medical Info to Yelp


Yelp Official Blog: “…exists to empower and protect consumers, and we’re continually focused on how we can enhance our service while enhancing the ability for consumers to make smart transactional decisions along the way.

A few years ago, we partnered with local governments to launch the LIVES open data standard. Now, millions of consumers find restaurant inspection scores when that information is most relevant: while they’re in the middle of making a dining decision (instead of when they’re signing the check). Studies have shown that displaying this information more prominently has a positive impact.

Today we’re excited to announce we’ve joined forces with ProPublica to incorporate health care statistics and consumer opinion survey data onto the Yelp business pages of more than 25,000 medical treatment facilities. Read more in today’s Washington Post story.

We couldn’t be more excited to partner with ProPublica, the Pulitzer Prize winning non-profit newsroom that produces investigative journalism in the public interest.

The information is compiled by ProPublica from their own research and the Centers for Medicare and Medicaid Services (CMS) for 4,600 hospitals, 15,000 nursing homes, and 6,300 dialysis clinics in the US and will be updated quarterly. Hover text on the business page will explain the statistics, which include number of serious deficiencies and fines per nursing home and emergency room wait times for hospitals. For example, West Kendall Baptist Hospital has better than average doctor communication and an average 33 minute ER wait time, Beachside Nursing Center currently has no deficiencies, and San Mateo Dialysis Center has a better than average patient survival rate.

Now the millions of consumers who use Yelp to find and evaluate everything from restaurants to retail will have even more information at their fingertips when they are in the midst of the most critical life decisions, like which hospital to choose for a sick child or which nursing home will provide the best care for aging parents….(More)

Print Wikipedia


Print Wikipedia is a both a utilitarian visualization of the largest accumulation of human knowledge and a poetic gesture towards the futility of the scale of big data. Michael Mandiberg has written software that parses the entirety of the English-language Wikipedia database and programmatically lays out 7600 volumes, complete with covers, and then uploads them to Lulu.com. In addition, he has compiled a Wikipedia Table of Contents, and a Wikipedia Contributor Appendix…..

Michael Mandiberg is an interdisciplinary artist, scholar, and educator living in Brooklyn, New York. He received his M.F.A. from the California Institute of the Arts and his B.A. from Brown University. His work traces the lines of political and symbolic power online, working on the Internet in order to comment on and intercede in the real flows of information. His work lives at Mandiberg.com.

Print Wikipedia by Michael Mandiberg from Lulu.com on Vimeo.”

 

Four things policy-makers need to know about social media data and real time analytics.


Ella McPherson at LSE’s Impact Blog: “I recently gave evidence to the House of Commons Science and Technology Select Committee. This was based on written evidence co-authored with my colleague, Anne Alexander, and submitted to their ongoing inquiry into social media data and real time analytics. Both Anne and I research the use of social media during contested times; Anne looks at its use by political activists and labour movement organisers in the Arab world, and I look at its use in human rights reporting. In both cases, the need to establish facticity is high, as is the potential for the deliberate or inadvertent falsification of information. Similarly to the case that Carruthers makes about war reporting, we believe that the political-economic, methodological, and ethical issues raised by media dynamics in the context of crisis are bellwethers for the dynamics in more peaceful and mundane contexts.

From our work we have learned four crucial lessons that policy-makers considering this issue should understand:

1.  Social media information is vulnerable to a variety of distortions – some typical of all information, and others more specific to the characteristics of social media communications….

2.  If social media information is used to establish events, it must be verified; while technology can hasten this process, it is unlikely to ever occur real time due to the subjective, human element of judgment required….

 

3.  Verifying social media information may require identifying its source, which has ethical implications related to informed consent and anonymisation….

4.  Another way to think about social media information is as what Hermida calls an ‘awareness system,’ which reduces the need to collect source identities; under this approach, researchers look at volume rather than veracity to recognise information of interest… (More)

How We’re Changing the Way We Respond to Petitions


Jason Goldman (White House) at Medium: “…In 2011 (years before I arrived at the White House), the team here developed a petitions platform called We the People. It provided a clear and easy way for the American people to petition their government — along with a threshold for action. Namely — once a petition gains 100,000 signatures.

This was a new system for the United States government, announced as a flagship effort in the first U.S. Open Government National Action Plan. Right now it exists only for the White House (Hey, Congress! We have anopen API! Get in touch!) Some other countries, including Germany and theUnited Kingdom, do online petitions, too. In fact, the European Parliamenthas even started its own online petitioning platform.

For the most part, we’ve been pretty good about responding — before today, the Obama Administration had responded to 255 petitions that had collectively gathered more than 11 million signatures. That’s more than 91 percent of the petitions that have met our threshold requiring a response. Some responses have taken a little longer than others. But now, I’m happy to say, we have caught up.

Today, the White House is responding to every petition in our We the Peoplebacklog — 20 in all.

This means that nearly 2.5 million people who had petitioned us to take action on something heard back today. And it’s our goal to make that response the start of the conversation, not the final page. The White House is made up of offices that research and analyze the kinds of policy issues raised by these petitions, and leaders from those offices will be taking questions today, and in the weeks to come, from petition signers, on topics such as vaccination policy, community policing, and other petition subjects.

Take a look at more We the People stats here.

We’ll start the conversation on Twitter. Follow @WeThePeople, and join the conversation using hashtag #WeThePeople. (I’ll be personally taking your questions on @Goldman44 about how we’re changing the platform specifically at 3:30 p.m. Eastern.)

We the People, Moving Forward

We’re going to be changing a few things about We the People.

  1. First, from now on, if a petition meets the signature goal within a designated period of time, we will aim to respond to it — with an update or policy statement — within 60 days wherever possible. You can read about the details of our policy in the We the People Terms of Participation.
  2. Second, other outside petitions platforms are starting to tap into the We the People platform. We’re excited to announce today that Change.org is choosing to integrate with the We the People platform, meaning the future signatures of its 100 million users will count toward the threshold for getting an official response from the Administration. We’re also opening up the code behind petitions.whitehouse.gov on Drupal.org and GitHub, which empowers other governments and outside organizations to create their own versions of this platform to engage their own citizens and constituencies.
  3. Third, and most importantly, the process of hearing from us about your petition is going to look a little different. We’ve assembled a team of people responsible for taking your questions and requests and bringing them to the right people — whether within the White House or in an agency within the Administration — who may be in a position to say something about your request….(More)