Paper by Giuseppe D’Acquisto: “The legal framework on data protection in Europe sets a high standard with regard to the possible re-use of personal data. Principles like purpose limitation and data minimization challenge the emerging Big Data paradigm, where the “value” of data is linked to its somehow still unpredictable potential future uses. Nevertheless, the re-use of data is not impossible, once they are properly anonymized. The EU’s Article 29 Working Party published in 2014 an Opinion on the application of anonymization techniques, which can be implemented to enable potential re-use of previously collected data within a framework of safeguards for individuals. The paper reviews the main elements of the Opinion, with a view to the widespread adoption of anonymization policies enabling a fair re-use of data, and gives an overview of the legal and technical aspects related to anonymization, pointing out the many misconceptions on the issue…(More)”
Harnessing the Internet of Everything to Serve the Public Good
Brian Gill at Socrata: “…Thanks to sensor-based objects, big data is getting bigger, and that presents opportunities — and considerations — for government organizations.
Picture this: It’s a sunny summer’s day a few years from now, plants are in full bloom, and you’re strolling through a major city park. Unfortunately, your eyes are watering as the itchy beginnings of a pollen-induced allergy attack begin to compromise the experience.
Pulling out your phone, you consult a data visualization showing the park’s hour-by-hour pollen count densities. You then choose a new path, one with different vegetation and lighter pollen counts, and go about your way, barely noticing the egg-shaped nodes in the canopy monitoring everything from pollen to air quality to foot-traffic trends.
Welcome to the Internet of Everything, city edition.
….tapping into, and especially generating, IoT data streams is a natural fit for larger municipal governments who not only have the fiscal resources needed to put IoT data to work, they have an innate motivator: improving citizens’ lives.
Consider Chicago’s Array of Things project, an experimental network of modular sensor boxes installed around the city’s core. Think of it as an urban fitness-data tracker: The nodes collect real-time data on the city’s environment and infrastructure for research and public use, with the first units focusing on atmosphere, air quality, and environmental factors such as temperature, humidity, and light.
From this data alone, the potential applications are exciting, such as using air, sound, and vibration data to monitor vehicle traffic, or infrared sensors to measure street temperature to guide salting responses during winter storms. The thinkers behind Array of Things can even envision a downtown where street lamp poles alert pedestrians to icy sidewalk patches and apps guide people to safe nocturnal walking routes.
This is all cool stuff, but “outcome” is the key word here, says McInnis, who recommends a bottom-up approach when assessing IoT opportunities. Instead of worrying whether you currently possess the technical infrastructure to harness IoT data, he says, first determine what you want to achieve, be it water quality monitoring or winter sidewalk safety, and then work from there — you may even already have IoT data streams that can be redeployed.
And as cities like Chicago are demonstrating, the Internet of Things not only has the potential to reshape how municipalities can harness a world of increasing object-driven data; it’s helping reshape how cities think about the nature of usable data.
In other words, all these new IoT data streams are actually like water, a natural resource. And just as water flows from many sources, government IoT data can also be collected, channeled, and processed like any utility — and serve as a powerful public good. …(More)”
The World of Indicators: The Making of Governmental Knowledge through Quantification
New Book by Richard Rottenburg et al: “The twenty-first century has seen a further dramatic increase in the use of quantitative knowledge for governing social life after its explosion in the 1980s. Indicators and rankings play an increasing role in the way governmental and non-governmental organizations distribute attention, make decisions, and allocate scarce resources. Quantitative knowledge promises to be more objective and straightforward as well as more transparent and open for public debate than qualitative knowledge, thus producing more democratic decision-making. However, we know little about the social processes through which this knowledge is constituted nor its effects. Understanding how such numeric knowledge is produced and used is increasingly important as proliferating technologies of quantification alter modes of knowing in subtle and often unrecognized ways. This book explores the implications of the global multiplication of indicators as a specific technology of numeric knowledge production used in governance. (More)”
When Big Data Becomes Bad Data
Lauren Kirchner at ProPublica: “A recent ProPublica analysis of The Princeton Review’s prices for online SAT tutoring shows that customers in areas with a high density of Asian residents are often charged more. When presented with this finding, The Princeton Review called it an “incidental” result of its geographic pricing scheme. The case illustrates how even a seemingly neutral price model could potentially lead to inadvertent bias — bias that’s hard for consumers to detect and even harder to challenge or prove.
Over the past several decades, an important tool for assessing and addressing discrimination has been the “disparate impact” theory. Attorneys have used this idea to successfully challenge policies that have a discriminatory effect on certain groups of people, whether or not the entity that crafted the policy was motivated by an intent to discriminate. It’s been deployed in lawsuits involving employment decisions, housing and credit. Going forward, the question is whether the theory can be applied to bias that results from new technologies that use algorithms….(More)”
Meaningful Consent: The Economics of Privity in Networked Environments
Paper by Jonathan Cave: “Recent work on privacy (e.g. WEIS 2013/4, Meaningful Consent in the Digital Economy project) recognises the unanticipated consequences of data-centred legal protections in a world of shifting relations between data and human actors. But the rules have not caught up with these changes, and the irreversible consequences of ‘make do and mend’ are not often taken into account when changing policy.
Many of the most-protected ‘personal’ data are not personal at all, but are created to facilitate the operation of larger (e.g. administrative, economic, transport) systems or inadvertently generated by using such systems. The protection given to such data typically rests on notions of informed consent even in circumstances where such consent may be difficult to define, harder to give and nearly impossible to certify in meaningful ways. Such protections typically involve a mix of data collection, access and processing rules that are either imposed on behalf of individuals or are to be exercised by them. This approach adequately protects some personal interests, but not all – and is definitely not future-proof. Boundaries between allowing individuals to discover and pursue their interests on one side and behavioural manipulation on the other are often blurred. The costs (psychological and behavioural as well as economic and practical) of exercising control over one’s data are rarely taken into account as some instances of the Right to be Forgotten illustrate. The purposes for which privacy rights were constructed are often forgotten, or have not been reinterpreted in a world of ubiquitous monitoring data, multi-person ‘private exchanges,’ and multiple pathways through which data can be used to create and to capture value. Moreover, the parties who should be involved in making decisions – those connected by a network of informational relationships – are often not in contractual, practical or legal contact. These developments, associated with e.g. the Internet of Things, Cloud computing and big data analytics, should be recognised as challenging privacy rules and, more fundamentally, the adequacy of informed consent (e.g. to access specified data for specified purposes) as a means of managing innovative, flexible, and complex informational architectures.
This paper presents a framework for organising these challenges using them to evaluate proposed policies, specifically in relation to complex, automated, automatic or autonomous data collection, processing and use. It argues for a movement away from a system of property rights based on individual consent to a values-based ‘privity’ regime – a collection of differentiated (relational as well as property) rights and consents that may be better able to accommodate innovations. Privity regimes (see deFillipis 2006) bundle together rights regarding e.g. confidential disclosure with ‘standing’ or voice options in relation to informational linkages.
The impacts are examined through a game-theoretic comparison between the proposed privity regime and existing privacy rights in personal data markets that include: conventional ‘behavioural profiling’ and search; situations where third parties may have complementary roles conflicting interests in such data and where data have value in relation both to specific individuals and to larger groups (e.g. ‘real-world’ health data); n-sided markets on data platforms (including social and crowd-sourcing platforms with long and short memories); and the use of ‘privity-like’ rights inherited by data objects and by autonomous systems whose ownership may be shared among many people….(More)”
Big data algorithms can discriminate, and it’s not clear what to do about it
“This program had absolutely nothing to do with race…but multi-variable equations.”
That’s what Brett Goldstein, a former policeman for the Chicago Police Department (CPD) and current Urban Science Fellow at the University of Chicago’s School for Public Policy, said about a predictive policing algorithm he deployed at the CPD in 2010. His algorithm tells police where to look for criminals based on where people have been arrested previously. It’s a “heat map” of Chicago, and the CPD claims it helps them allocate resources more effectively.
Chicago police also recently collaborated with Miles Wernick, a professor of electrical engineering at Illinois Institute of Technology, to algorithmically generate a “heat list” of 400 individuals it claims have thehighest chance of committing a violent crime. In response to criticism, Wernick said the algorithm does not use “any racial, neighborhood, or other such information” and that the approach is “unbiased” and “quantitative.” By deferring decisions to poorly understood algorithms, industry professionals effectively shed accountability for any negative effects of their code.
But do these algorithms discriminate, treating low-income and black neighborhoods and their inhabitants unfairly? It’s the kind of question many researchers are starting to ask as more and more industries use algorithms to make decisions. It’s true that an algorithm itself is quantitative – it boils down to a sequence of arithmetic steps for solving a problem. The danger is that these algorithms, which are trained on data produced by people, may reflect the biases in that data, perpetuating structural racism and negative biases about minority groups.
There are a lot of challenges to figuring out whether an algorithm embodies bias. First and foremost, many practitioners and “computer experts” still don’t publicly admit that algorithms can easily discriminate.More and more evidence supports that not only is this possible, but it’s happening already. The law is unclear on the legality of biased algorithms, and even algorithms researchers don’t precisely understand what it means for an algorithm to discriminate….
While researchers clearly understand the theoretical dangers of algorithmic discrimination, it’s difficult to cleanly measure the scope of the issue in practice. No company or public institution is willing to publicize its data and algorithms for fear of being labeled racist or sexist, or maybe worse, having a great algorithm stolen by a competitor.
Even when the Chicago Police Department was hit with a Freedom of Information Act request, they did not release their algorithms or heat list, claiming a credible threat to police officers and the people on the list. This makes it difficult for researchers to identify problems and potentially provide solutions.
Legal hurdles
Existing discrimination law in the United States isn’t helping. At best, it’s unclear on how it applies to algorithms; at worst, it’s a mess. Solon Barocas, a postdoc at Princeton, and Andrew Selbst, a law clerk for the Third Circuit US Court of Appeals, argued together that US hiring law fails to address claims about discriminatory algorithms in hiring.
The crux of the argument is called the “business necessity” defense, in which the employer argues that a practice that has a discriminatory effect is justified by being directly related to job performance….(More)”
Making data open for everyone
Kathryn L.S. Pettit and Jonathan Schwabis at UrbanWire: “Over the past few years, there have been some exciting developments in open source tools and programming languages, business intelligence tools, big data, open data, and data visualization. These trends, and others, are changing the way we interact with and consume information and data. And that change is driving more organizations and governments to consider better ways to provide their data to more people.
The World Bank, for example, has a concerted effort underway to open its data in better and more visual ways. Google’s Public Data Explorer brings together large datasets from around the world into a single interface. For-profit providers like OpenGov and Socrata are helping local, state, and federal governments open their data (both internally and externally) in newer platforms.
We are firm believers in open data. (There are, of course, limitations to open data because of privacy or security, but that’s a discussion for another time). But open data is not simply about putting more data on the Internet. It’s not just only about posting files and telling people where to find them. To allow and encourage more people to use and interact with data, that data needs to be useful and readable not only by researchers, but also by the dad in northern Virginia or the student in rural Indiana who wants to know more about their public libraries.
Open data should be easy to access, analyze, and visualize
Many are working hard to provide more data in better ways, but we have a long way to go. Take, for example, the Congressional Budget Office (full disclosure, one of us used to work at CBO). Twice a year, CBO releases its Budget and Economic Outlook, which provides the 10-year budget projections for the federal government. Say you want to analyze 10-year budget projections for the Pell Grant program. You’d need to select “Get Data” and click on “Baseline Projections for Education” and then choose “Pell Grant Programs.” This brings you to a PDF report, where you can copy the data table you’re looking for into a format you can actually use (say, Excel). You would need to repeat the exercise to find projections for the 21 other programs for which the CBO provides data.
In another case, the Bureau of Labor Statistics has tried to provide users with query tools that avoid the use of PDFs, but still require extra steps to process. You can get the unemployment rate data through their Java Applet (which doesn’t work on all browsers, by the way), select the various series you want, and click “Get Data.” On the subsequent screen, you are given some basic formatting options, but the default display shows all of your data series as separate Excel files. You can then copy and paste or download each one and then piece them together.
Taking a step closer to the ideal of open data, the Institute of Museum and Library Services (IMLS)followed President Obama’s May 2013 executive order to make their data open in a machine-readable format. That’s great, but it only goes so far. The IMLS platform, for example, allows you to explore information about your own public library. But the data are labeled with variable names such as BRANLIB and BKMOB that are not intuitive or clear. Users then have to find the data dictionary to understand what data fields mean, how they’re defined, and how to use them.
These efforts to provide more data represent real progress, but often fail to be useful to the average person. They move from publishing data that are not readable (buried in PDFs or systems that allow the user to see only one record at a time) to data that are machine-readable (libraries of raw data files or APIs, from which data can be extracted using computer code). We now need to move from a world in which data are simply machine-readable to one in which data are human-readable….(More)”
Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings
Future of Privacy Forum: “In the wake of last year’s news about the Facebook “emotional contagion” study and subsequent public debate about the role of A/B Testing and ethical concerns around the use of Big Data, FPF Senior Fellow Omer Tene participated in a December symposum on corporate consumer research hosted by Silicon Flatirons. This past month, the Colorado Technology Law Journal published a series of papers that emerged out of the symposium, including “Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings.”
“Beyond the Common Rule,” by Jules Polonetsky, Omer Tene, and Joseph Jerome, continues the Future of Privacy Forum’s effort to build on the notion of consumer subject review boards first advocated by Ryan Calo at FPF’s 2013 Big Data symposium. It explores how researchers, increasingly in corporate settings, are analyzing data and testing theories using often sensitive personal information. Many of these new uses of PII are simply natural extensions of current practices, and are either within the expectations of individuals or the bounds of the FIPPs. Yet many of these projects could involve surprising applications or uses of data, exceeding user expectations, and offering notice and obtaining consent could may not be feasible.
This article expands on ideas and suggestions put forward around the recent discussion draft of the White House Consumer Privacy Bill of Rights, which espouses “Privacy Review Boards” as a safety value for noncontextual data uses. It explores how existing institutional review boards within the academy and for human testing research could offer lessons for guiding principles, providing accountability and enhancing consumer trust, and offers suggestions for how companies — and researchers — can pursue both knowledge and data innovation responsibly and ethically….(More)”
Print Wikipedia
“Print Wikipedia is a both a utilitarian visualization of the largest accumulation of human knowledge and a poetic gesture towards the futility of the scale of big data. Michael Mandiberg has written software that parses the entirety of the English-language Wikipedia database and programmatically lays out 7600 volumes, complete with covers, and then uploads them to Lulu.com. In addition, he has compiled a Wikipedia Table of Contents, and a Wikipedia Contributor Appendix…..
Michael Mandiberg is an interdisciplinary artist, scholar, and educator living in Brooklyn, New York. He received his M.F.A. from the California Institute of the Arts and his B.A. from Brown University. His work traces the lines of political and symbolic power online, working on the Internet in order to comment on and intercede in the real flows of information. His work lives at Mandiberg.com.
Print Wikipedia by Michael Mandiberg from Lulu.com on Vimeo.”
Transform Government From The Outside In
Review by GCN of a new report by Forrester: “Agencies struggles to match the customer experience available from the private sector, and that causes citizens to become dissatisfied with government. In fact, seven of the 10 worst organizations in the Forrester’s U.S. Customer Experience Index are federal agencies, and only a third of Americans say their experience with the government meets expectations.
FINDINGS: To keep up with public expectations, Forrester found governments must embrace mobile, turn big data into actionable insights, improve the customer experience and accelerate digital government. Among the recommendations:
Agencies must shift their thinking to make mobile the primary platform for connection between citizens and government. Government staff should also have mobile access to the tools and resources needed to complete tasks in the field. Agencies should learn what mobile methods work best for citizens, ensure all citizen services are mobile-friendly and use the mobile platform for sharing information with the public and gathering incident reports and sentiments. By building mobile-friendly infrastructure and processes, like municipal Wi-Fi hotspots, the government (and its services) can be constantly connected to its citizens and businesses.
Governments must find ways to integrate, share and use the large amounts of data and analytics it collects. By aggregating citizen-driven data from precinct-level or agency-specific databases and data collected by systems already in place, the government can increase responsiveness, target areas in need and make better short-term decisions and long-term plans. Opening data to researchers, the private sector and citizens can also spark innovation across industries.
Better customer experience has a ripple effect through government, improving the efficacy of legislation, compliance, engagement and the effectiveness of government offices. This means making processes such as applying for healthcare, registering a car or paying taxes easier and available with highly functioning user-friendly websites. Such improvements in communication and digital customer service, will save citizens’ time, increase the use of government services and reduce agencies’ workloads….(More)”