Paper by Frank L. K. Ohemeng and Kwaku Ofosu-Adarkwa in the Government Information Quarterly: “In recent years the necessity for governments to develop new public values of openness and transparency, and thereby increase their citizenries’ sense of inclusiveness, and their trust in and confidence about their governments, has risen to the point of urgency. The decline of trust in governments, especially in developing countries, has been unprecedented and continuous. A new paradigm that signifies a shift to citizen-driven initiatives over and above state- and market-centric ones calls for innovative thinking that requires openness in government. The need for this new synergy notwithstanding, Open Government cannot be considered truly open unless it also enhances citizen participation and engagement. The Ghana Open Data Initiative (GODI) project strives to create an open data community that will enable government (supply side) and civil society in general (demand side) to exchange data and information. We argue that the GODI is too narrowly focused on the supply side of the project, and suggest that it should generate an even platform to improve interaction between government and citizens to ensure a balance in knowledge sharing with and among all constituencies….(More)”
Big data algorithms can discriminate, and it’s not clear what to do about it
“This program had absolutely nothing to do with race…but multi-variable equations.”
That’s what Brett Goldstein, a former policeman for the Chicago Police Department (CPD) and current Urban Science Fellow at the University of Chicago’s School for Public Policy, said about a predictive policing algorithm he deployed at the CPD in 2010. His algorithm tells police where to look for criminals based on where people have been arrested previously. It’s a “heat map” of Chicago, and the CPD claims it helps them allocate resources more effectively.
Chicago police also recently collaborated with Miles Wernick, a professor of electrical engineering at Illinois Institute of Technology, to algorithmically generate a “heat list” of 400 individuals it claims have thehighest chance of committing a violent crime. In response to criticism, Wernick said the algorithm does not use “any racial, neighborhood, or other such information” and that the approach is “unbiased” and “quantitative.” By deferring decisions to poorly understood algorithms, industry professionals effectively shed accountability for any negative effects of their code.
But do these algorithms discriminate, treating low-income and black neighborhoods and their inhabitants unfairly? It’s the kind of question many researchers are starting to ask as more and more industries use algorithms to make decisions. It’s true that an algorithm itself is quantitative – it boils down to a sequence of arithmetic steps for solving a problem. The danger is that these algorithms, which are trained on data produced by people, may reflect the biases in that data, perpetuating structural racism and negative biases about minority groups.
There are a lot of challenges to figuring out whether an algorithm embodies bias. First and foremost, many practitioners and “computer experts” still don’t publicly admit that algorithms can easily discriminate.More and more evidence supports that not only is this possible, but it’s happening already. The law is unclear on the legality of biased algorithms, and even algorithms researchers don’t precisely understand what it means for an algorithm to discriminate….
While researchers clearly understand the theoretical dangers of algorithmic discrimination, it’s difficult to cleanly measure the scope of the issue in practice. No company or public institution is willing to publicize its data and algorithms for fear of being labeled racist or sexist, or maybe worse, having a great algorithm stolen by a competitor.
Even when the Chicago Police Department was hit with a Freedom of Information Act request, they did not release their algorithms or heat list, claiming a credible threat to police officers and the people on the list. This makes it difficult for researchers to identify problems and potentially provide solutions.
Legal hurdles
Existing discrimination law in the United States isn’t helping. At best, it’s unclear on how it applies to algorithms; at worst, it’s a mess. Solon Barocas, a postdoc at Princeton, and Andrew Selbst, a law clerk for the Third Circuit US Court of Appeals, argued together that US hiring law fails to address claims about discriminatory algorithms in hiring.
The crux of the argument is called the “business necessity” defense, in which the employer argues that a practice that has a discriminatory effect is justified by being directly related to job performance….(More)”
Making data open for everyone
Kathryn L.S. Pettit and Jonathan Schwabis at UrbanWire: “Over the past few years, there have been some exciting developments in open source tools and programming languages, business intelligence tools, big data, open data, and data visualization. These trends, and others, are changing the way we interact with and consume information and data. And that change is driving more organizations and governments to consider better ways to provide their data to more people.
The World Bank, for example, has a concerted effort underway to open its data in better and more visual ways. Google’s Public Data Explorer brings together large datasets from around the world into a single interface. For-profit providers like OpenGov and Socrata are helping local, state, and federal governments open their data (both internally and externally) in newer platforms.
We are firm believers in open data. (There are, of course, limitations to open data because of privacy or security, but that’s a discussion for another time). But open data is not simply about putting more data on the Internet. It’s not just only about posting files and telling people where to find them. To allow and encourage more people to use and interact with data, that data needs to be useful and readable not only by researchers, but also by the dad in northern Virginia or the student in rural Indiana who wants to know more about their public libraries.
Open data should be easy to access, analyze, and visualize
Many are working hard to provide more data in better ways, but we have a long way to go. Take, for example, the Congressional Budget Office (full disclosure, one of us used to work at CBO). Twice a year, CBO releases its Budget and Economic Outlook, which provides the 10-year budget projections for the federal government. Say you want to analyze 10-year budget projections for the Pell Grant program. You’d need to select “Get Data” and click on “Baseline Projections for Education” and then choose “Pell Grant Programs.” This brings you to a PDF report, where you can copy the data table you’re looking for into a format you can actually use (say, Excel). You would need to repeat the exercise to find projections for the 21 other programs for which the CBO provides data.
In another case, the Bureau of Labor Statistics has tried to provide users with query tools that avoid the use of PDFs, but still require extra steps to process. You can get the unemployment rate data through their Java Applet (which doesn’t work on all browsers, by the way), select the various series you want, and click “Get Data.” On the subsequent screen, you are given some basic formatting options, but the default display shows all of your data series as separate Excel files. You can then copy and paste or download each one and then piece them together.
Taking a step closer to the ideal of open data, the Institute of Museum and Library Services (IMLS)followed President Obama’s May 2013 executive order to make their data open in a machine-readable format. That’s great, but it only goes so far. The IMLS platform, for example, allows you to explore information about your own public library. But the data are labeled with variable names such as BRANLIB and BKMOB that are not intuitive or clear. Users then have to find the data dictionary to understand what data fields mean, how they’re defined, and how to use them.
These efforts to provide more data represent real progress, but often fail to be useful to the average person. They move from publishing data that are not readable (buried in PDFs or systems that allow the user to see only one record at a time) to data that are machine-readable (libraries of raw data files or APIs, from which data can be extracted using computer code). We now need to move from a world in which data are simply machine-readable to one in which data are human-readable….(More)”
New Privacy Research Has Implications for Design and Policy
Jedidiah Bracy at PrivacyTech: “Try visualizing the Internet’s basic architecture. Could you draw it? What would be your mental model for it?
Let’s be more specific: Say you just purchased shoes off a website using your mobile phone at work. How would you visualize that digital process? Would a deeper knowledge of this architecture make more apparent the myriad potential privacy risks in this transaction? Or to put it another way, what would your knowledge, or lack thereof, for these architectural underpinnings reveal about your understanding of privacy and security risks?
Whether you’re a Luddite or a tech wiz, creating these mental models of the Internet is not the easiest endeavor. Just try doing so yourself.
It is an exercise, however, that several individuals underwent for new research that has instructive implications for privacy and security pros.
“So everything I do on the Internet or that other people do on the Internet is basically asking the Internet for information, and the Internet is sending us to various places where the information is and then bringing us back.” – CO1
You’d think those who have a better understanding of how the Internet works would probably have a better understanding of the privacy and security risks, right? Most likely. Paradoxically, though, a better technological understanding may have very little influence on an individual’s response to potential privacy risks.
This is what a dedicated team of researchers from Carnegie Mellon University worked to discover recently in their award-winning paper, “My Data Just Goes Everywhere”: User Mental Models of the Internet and Implications for Privacy and Security—a culmination of research from Ruogu Kang, Laura Dabbish, Nathaniel Fruchter and Sara Kiesler—all from CMU’s Human-Computer Interaction Institute and the Heinz College in Pittsburgh, PA.
“I try to browse through the terms and conditions but there’s so much there I really don’t retain it.” – T11
Presented at the CyLab Usable Privacy and Security Laboratory’s (CUPS) 11thSymposium on Usable Privacy and Security (SOUPS), their research demonstrated that even though savvy and non-savvy users of the Internet have much different perceptions of its architecture, such knowledge was not predictive of whether a user would take the necessary steps to protect their privacy online. Experience, rather, appears to play a more determinate role.
Kang, who led the team, said she was surprised by the results….(More)”
Open data can unravel the complex dealings of multinationals
Brett Scott in The Guardian: “…Just like we have complementary currencies to address shortcomings in national monetary systems, we now need to encourage an alternative accounting sector to address shortcomings in global accounting systems.
So what might this look like? We already are seeing the genesis of this in the corporate open data sector. OpenCorporates in London has been a pioneer in this field, creating a global unique identifier system to make it easier to map corporations. Groups like OpenOil in Berlin are now using the OpenCorporates classification system to map companies like BP. Under the tagline “Imagine an open oil industry”, they have also begun mapping ground-level contract and concession data, and are currently building tools to allow the public to model the economics of particular mines and oil fields. This could prove useful in situations where doubt is cast on the value of particular assets controlled by public companies in politically fragile states.
According to OpenOil’s Anton Rühling, a variety of parties have started to use their information. “During the recent conflicts in Yemen we had a sudden spike in downloads of our Yemeni oil contract information. We traced this to UAE, where a lot of financial lawyers and investors are based. They were clearly wanting to see how the contracts could be affected.” Their BP map even raised interest from senior BP officials. “We were contacted by finance executives who were eager to discuss the results.”
Open mapping
Another pillar of the alternative accounting sector that is emerging are supply chain mapping systems. The supply chain largely remains a mystery. In standard corporate accounts suppliers appear as mere expenses. No information is given about where the suppliers are based and what their standards are. In the absence of corporate management volunteering that information, Sourcemap has created an open platform for people to create supply chain maps themselves. Progressively-minded companies – such as Fairphone – have now begun to volunteer supply chain information on the platform.
One industry forum that is actively pondering alternative accounting is ICAEW’s AuditFutures programme. They recently teamed up with the Royal College of Art’s service design programme to build design thinking into accounting practice. AuditFuture’s Martin Martinoff wants accountants’ to perceive themselves as being creative innovators for the public interest. “Imagine getting 10,000 auditors online together to develop an open crowdsourced audit platform.”…(More)
Push, Pull, and Spill: A Transdisciplinary Case Study in Municipal Open Government
New paper by Jan Whittington et al: “Cities hold considerable information, including details about the daily lives of residents and employees, maps of critical infrastructure, and records of the officials’ internal deliberations. Cities are beginning to realize that this data has economic and other value: If done wisely, the responsible release of city information can also release greater efficiency and innovation in the public and private sector. New services are cropping up that leverage open city data to great effect.
Meanwhile, activist groups and individual residents are placing increasing pressure on state and local government to be more transparent and accountable, even as others sound an alarm over the privacy issues that inevitably attend greater data promiscuity. This takes the form of political pressure to release more information, as well as increased requests for information under the many public records acts across the country.
The result of these forces is that cities are beginning to open their data as never before. It turns out there is surprisingly little research to date into the important and growing area of municipal open data. This article is among the first sustained, cross-disciplinary assessments of an open municipal government system. We are a team of researchers in law, computer science, information science, and urban studies. We have worked hand-in-hand with the City of Seattle, Washington for the better part of a year to understand its current procedures from each disciplinary perspective. Based on this empirical work, we generate a set of recommendations to help the city manage risk latent in opening its data….(More)”
Algorithms and Bias
Q. and A. With Cynthia Dwork in the New York Times: “Algorithms have become one of the most powerful arbiters in our lives. They make decisions about the news we read, the jobs we get, the people we meet, the schools we attend and the ads we see.
Yet there is growing evidence that algorithms and other types of software can discriminate. The people who write them incorporate their biases, and algorithms often learn from human behavior, so they reflect the biases we hold. For instance, research has shown that ad-targeting algorithms have shown ads for high-paying jobs to men but not women, and ads for high-interest loans to people in low-income neighborhoods.
Cynthia Dwork, a computer scientist at Microsoft Research in Silicon Valley, is one of the leading thinkers on these issues. In an Upshot interview, which has been edited, she discussed how algorithms learn to discriminate, who’s responsible when they do, and the trade-offs between fairness and privacy.
Q: Some people have argued that algorithms eliminate discriminationbecause they make decisions based on data, free of human bias. Others say algorithms reflect and perpetuate human biases. What do you think?
A: Algorithms do not automatically eliminate bias. Suppose a university, with admission and rejection records dating back for decades and faced with growing numbers of applicants, decides to use a machine learning algorithm that, using the historical records, identifies candidates who are more likely to be admitted. Historical biases in the training data will be learned by the algorithm, and past discrimination will lead to future discrimination.
Q: Are there examples of that happening?
A: A famous example of a system that has wrestled with bias is the resident matching program that matches graduating medical students with residency programs at hospitals. The matching could be slanted to maximize the happiness of the residency programs, or to maximize the happiness of the medical students. Prior to 1997, the match was mostly about the happiness of the programs.
This changed in 1997 in response to “a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the expense of applicants, and whether applicants could ‘game the system,’ ” according to a paper by Alvin Roth and Elliott Peranson published in The American Economic Review.
Q: You have studied both privacy and algorithm design, and co-wrote a paper, “Fairness Through Awareness,” that came to some surprising conclusions about discriminatory algorithms and people’s privacy. Could you summarize those?
A: “Fairness Through Awareness” makes the observation that sometimes, in order to be fair, it is important to make use of sensitive information while carrying out the classification task. This may be a little counterintuitive: The instinct might be to hide information that could be the basis of discrimination….
Q: The law protects certain groups from discrimination. Is it possible to teach an algorithm to do the same?
A: This is a relatively new problem area in computer science, and there are grounds for optimism — for example, resources from the Fairness, Accountability and Transparency in Machine Learning workshop, which considers the role that machines play in consequential decisions in areas like employment, health care and policing. This is an exciting and valuable area for research. …(More)”
Open Data and Sub-national Governments: Lessons from Developing Countries
WebFoundation: “Open government data (OGD) as a concept is gaining currency globally due to the strong advocacy of global organisations as Open Government Partnership. In recent years, there has been increased commitment on the part of national governments to proactively disclose information. However, much of the discussion on OGD is at the national level, especially in developing countries where commitments of proactive disclosure is conditioned by the commitments of national governments as expressed through the OGP national action plans. However, the local is important in the context of open data. In decentralized contexts, the local is where data is collected and stored, where there is strong feasibility that data will be published, and where data can generate the most impact when used. This synthesis paper wants to refocus the discussion of open government data in sub-national contexts by analysing nine country papers produced through the Open Data in Developing Countries research project.
Using a common research framework that focuses on context, governance setting, and open data initiatives, the study found out that there is substantial effort on the part of sub-national governments to proactively disclose data, however, the design delimits citizen participation, and eventually, use. Second, context demands diff erent roles for intermediaries and diff erent types of initiatives to create an enabling environment for open data. Finally, data quality will remain a critical challenge for sub-national governments in developing countries and it will temper potential impact that open data will be able to generate. Download the full research paper here“
100 parliaments as open data, ready for you to use
Myfanwy Nixon at mySociety’s blog and OpeningParliament: “If you need data on the people who make up your parliament, another country’s parliament, or indeed all parliaments, you may be in luck.
Every Politician, the latest Poplus project, aims to collect, store and share information about every parliament in the world, past and present—and it already contains 100 of them.
What’s more, it’s all provided as Open Data to anyone who would like to use it to power a civic tech project. We’re thinking parliamentary monitoring organisations, journalists, groups who run access-to-democracy sites like our own WriteToThem, and especially researchers who want to do analysis across multiple countries.
But isn’t that data already available?
Yes and no. There’s no doubt that you can find details of most parliaments online, either on official government websites, on Wikipedia, or on a variety of other places online.
But, as you might expect from data that’s coming from hundreds of different sources, it’s in a multitude of different formats. That makes it very hard to work with in any kind of consistent fashion.
Every Politician standardises all of its data into the Popolo standard and then provides it in two simple downloadable formats:
- csv, which contains basic data that’s easy to work with on spreadsheets
- JSON which contains richer data on each person, and is ideal for developers
This standardisation means that it should now be a lot easier to work on projects across multiple countries, or to compare one country’s data with another. It also means that data works well with other Poplus Components….(More)”
Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings
Future of Privacy Forum: “In the wake of last year’s news about the Facebook “emotional contagion” study and subsequent public debate about the role of A/B Testing and ethical concerns around the use of Big Data, FPF Senior Fellow Omer Tene participated in a December symposum on corporate consumer research hosted by Silicon Flatirons. This past month, the Colorado Technology Law Journal published a series of papers that emerged out of the symposium, including “Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings.”
“Beyond the Common Rule,” by Jules Polonetsky, Omer Tene, and Joseph Jerome, continues the Future of Privacy Forum’s effort to build on the notion of consumer subject review boards first advocated by Ryan Calo at FPF’s 2013 Big Data symposium. It explores how researchers, increasingly in corporate settings, are analyzing data and testing theories using often sensitive personal information. Many of these new uses of PII are simply natural extensions of current practices, and are either within the expectations of individuals or the bounds of the FIPPs. Yet many of these projects could involve surprising applications or uses of data, exceeding user expectations, and offering notice and obtaining consent could may not be feasible.
This article expands on ideas and suggestions put forward around the recent discussion draft of the White House Consumer Privacy Bill of Rights, which espouses “Privacy Review Boards” as a safety value for noncontextual data uses. It explores how existing institutional review boards within the academy and for human testing research could offer lessons for guiding principles, providing accountability and enhancing consumer trust, and offers suggestions for how companies — and researchers — can pursue both knowledge and data innovation responsibly and ethically….(More)”