Opening up government data for public benefit


Keiran Hardy at the Mandarin (Australia): “…This post explains the open data movement and considers the benefits and risks of releasing government data as open data. It then outlines the steps taken by the Labor and Liberal governments in accordance with this trend. It argues that the Prime Minister’stask, while admirably intentioned, is likely to prove difficult due to ongoing challenges surrounding the requirements of privacy law and a public service culture that remains reluctant to release government data into the public domain….

A key purpose of releasing government data is to improve the effectiveness and efficiency of services delivered by the government. For example, data on crops, weather and geography might be analysed to improve current approaches to farming and industry, or data on hospital admissions might be analysed alongside demographic and census data to improve the efficiency of health services in areas of need. It has been estimated that such innovation based on open data could benefit the Australian economy by up to $16 billion per year.

Another core benefit is that the open data movement is making gains in transparency and accountability, as a greater proportion of government decisions and operations are being shared with the public. These democratic values are made clear in the OGP’s Open Government Declaration, which aims to make governments ‘more open, accountable, and responsive to citizens’.

Open data can also improve democratic participation by allowing citizens to contribute to policy innovation. Events like GovHack, an annual Australian competition in which government, industry and the general public collaborate to find new uses for open government data, epitomise a growing trend towards service delivery informed by user input. The winner of the “Best Policy Insights Hack” at GovHack 2015 developed a software program for analysing which suburbs are best placed for rooftop solar investment.

At the same time, the release of government data poses significant risks to the privacy of Australian citizens. Much of the open data currently available is spatial (geographic or satellite) data, which is relatively unproblematic to post online as it poses minimal privacy risks. However, for the full benefits of open data to be gained, these kinds of data need to be supplemented with information on welfare payments, hospital admission rates and other potentially sensitive areas which could drive policy innovation.

Policy data in these areas would be de-identified — that is, all names, addresses and other obvious identifying information would be removed so that only aggregate or statistical data remains. However, debates continue as to the reliability of de-identification techniques, as there have been prominent examples of individuals being re-identified by cross-referencing datasets….

With regard to open data, a culture resistant to releasing government informationappears to be driven by several similar factors, including:

  • A generational preference amongst public service management for maintaining secrecy of information, whereas younger generations expect that data should be made freely available;
  • Concerns about the quality or accuracy of information being released;
  • Fear that mistakes or misconduct on behalf of government employees might be exposed;
  • Limited understanding of the benefits that can be gained from open data; and
  • A lack of leadership to help drive the open data movement.

If open data policies have a similar effect on public service culture as FOI legislation, it may be that open data policies in fact hinder transparency by having a chilling effect on government decision-making for fear of what might be exposed….

These legal and cultural hurdles will pose ongoing challenges for the Turnbull government in seeking to release greater amounts of government data as open data….(More)

Big Data Before the Web


Evan Hepler-Smith in the Wall Street Journal: “Sometime in the early 1950s, on a reservation in Wisconsin, a Menominee Indian man looked at an ink blot. An anthropologist recorded the man’s reaction according to a standard Rorschach-test protocol. The researcher submitted a copy of these notes to an enormous cache of records collected over the course of decades by American social scientists working among various “societies ‘other than our own.’ ” This entire collection of social-scientific data was photographed and printed in arrays of microscopic images on 3-by-5-inch cards. Sets of these cards were shipped to research libraries around the world. They gathered dust.

In the results of this Rorschach test, the anthropologist saw evidence of a culture eroded by modernity. Sixty years later, these documents also testify to the aspirations and fate of the social-scientific project for which they were generated. Deep within this forgotten Ozymandian card file sits the Menominee man’s reaction to Rorschach card VI: “It is like a dead planet. It seems to tell the story of a people once great who have lost . . . like something happened. All that’s left is the symbol.”

In “Database of Dreams: The Lost Quest to Catalog Humanity,” Rebecca Lemov delves into the ambitious efforts of mid-20th-century social scientists to build a “capacious and reliable science of the varieties of the human being” by generating an archive of human experience through interviews and tests and by storing the information on the high-tech media of the day.

 For these psychologists and anthropologists, the key to a universal human science lay in studying members of cultures in transition between traditional and modern ways of life and in rendering their individuality as data. Interweaving stories of social scientists, Native American research subjects and information technologies, Ms. Lemov presents a compelling account of “what ‘humanness’ came to mean in an age of rapid change in technological and social conditions.” Ms. Lemov, an associate professor of the history of science at Harvard University, follows two contrasting threads through a story that she calls “a parable for our time.” She shows, first, how collecting data about human experience shapes human experience and, second, how a high-tech data repository of the 1950s became, as she puts it, a “data ruin.”…(More) – See also: Database of Dreams: The Lost Quest to Catalog Humanity

OpenFDA: an innovative platform providing access to a wealth of FDA’s publicly available data


Paper by Taha A Kass-Hout et al in JAMIA: “The objective of openFDA is to facilitate access and use of big important Food and Drug Administration public datasets by developers, researchers, and the public through harmonization of data across disparate FDA datasets provided via application programming interfaces (APIs).

Materials and Methods: Using cutting-edge technologies deployed on FDA’s new public cloud computing infrastructure, openFDA provides open data for easier, faster (over 300 requests per second per process), and better access to FDA datasets; open source code and documentation shared on GitHub for open community contributions of examples, apps and ideas; and infrastructure that can be adopted for other public health big data challenges.

Results:Since its launch on June 2, 2014, openFDA has developed four APIs for drug and device adverse events, recall information for all FDA-regulated products, and drug labeling. There have been more than 20 million API calls (more than half from outside the United States), 6000 registered users, 20,000 connected Internet Protocol addresses, and dozens of new software (mobile or web) apps developed. A case study demonstrates a use of openFDA data to understand an apparent association of a drug with an adverse event. Conclusion With easier and faster access to these datasets, consumers worldwide can learn more about FDA-regulated products

Conclusion: With easier and faster access to these datasets, consumers worldwide can learn more about FDA-regulated products…(More)”

Data Science ethics


Gov.uk blog: “If Tesco knows day-to-day how poorly the nation is, how can Government access  similar  insights so it can better plan health services? If Airbnb can give you a tailored service depending on your tastes, how can Government provide people with the right support to help them back into work in a way that is right for them? If companies are routinely using social media data to get feedback from their customers to improve their services, how can Government also use publicly available data to do the same?

Data science allows us to use new types of data and powerful tools to analyse this more quickly and more objectively than any human could. It can put us in the vanguard of policymaking – revealing new insights that leads to better and more tailored interventions. And  it can help reduce costs, freeing up resource to spend on more serious cases.

But some of these data uses and machine-learning techniques are new and still relatively untested in Government. Of course, we operate within legal frameworks such as the Data Protection Act and Intellectual Property law. These are flexible but don’t always talk explicitly about the new challenges data science throws up. For example, how are you to explain the decision making process of a deep learning black box algorithm? And if you were able to, how would you do so in plain English and not a row of 0s and 1s?

We want data scientists to feel confident to innovate with data, alongside  the policy makers and operational staff who make daily decisions on the data that the analysts provide –. That’s why we are creating an ethical framework which brings together the relevant parts of the law and ethical considerations into a simple document that helps Government officials decide what it can do and what it should do. We have a moral responsibility to maximise the use of data – which is never more apparent than after incidents of abuse or crime are left undetected – as well as to pay heed to the potential risks of these new tools. The guidelines are draft and not formal government policy, but we want to share them more widely in order to help iterate and improve them further….

So what’s in the framework? There is more detail in the fuller document, but it is based around six key principles:

  1. Start with a clear user need and public benefit: this will help you justify the level of data sensitivity and method you use
  2. Use the minimum level of data necessary to fulfill the public benefit: there are many techniques for doing so, such as de-identification, aggregation or querying against data
  3. Build robust data science models: the model is only as good as the data it contains and while machines are less biased than humans they can get it wrong. It’s critical to be clear about the confidence of the model and think through unintended consequences and biases contained within the data
  4. Be alert to public perceptions: put simply, what would a normal person on the street think about the project?
  5. Be as open and accountable as possible: Transparency is the antiseptic for unethical behavior. Aim to be as open as possible (with explanations in plain English), although in certain public protection cases the ability to be transparent will be constrained.
  6. Keep data safe and secure: this is not restricted to data science projects but we know that the public are most concerned about losing control of their data….(More)”

Open Data Index 2015


Open Knowledge: “….This year’s Index showed impressive gains from non-OECD countries with Taiwan topping the Index and Colombia and Uruguay breaking into the top ten at four and seven respectively. Overall, the Index evaluated 122 places and 1586 datasets and determined that only 9%, or 156 datasets, were both technically and legally open.

The Index ranks countries based on the availability and accessibility of data in thirteen key categories, including government spending, election results, procurement, and pollution levels. Over the summer, we held a public consultation, which saw contributions from individuals within the open data community as well as from key civil society organisations across an array of sectors. As a result of this consultation, we expanded the 2015 Index to include public procurement data, water quality data, land ownership data and weather data; we also decided to removed transport timetables due to the difficulties faced when comparing transport system data globally.

Open Knowledge International began to systematically track the release of open data by national governments in 2013 with the objective of measuring if governments were releasing the key datasets of high social and democratic value as open data. That enables us to better understand the current state of play and in turn work with civil society actors to address the gaps in data release. Over the course of the last three years, the Global Open Data Index has become more than just a benchmark – we noticed that governments began to use the Index as a reference to inform their open data priorities and civil society actors began to use the Index advocacy tool to encourage governments to improve their performance in releasing key datasets.

Furthermore, indices such as the Global Open Data Index are not without their challenges. The Index measures the technical and legal openness of datasets deemed to be of critical democratic and social value – it does not measure the openness of a given government. It should be clear that the release of a few key datasets is not a sufficient measure of the openness of a government. The blurring of lines between open data and open government is nothing new and has been hotly debated by civil society groups and transparency organisations since the sharp rise in popularity of open data policies over the last decade. …Index at http://index.okfn.org/”

Big Data in the Policy Cycle: Policy Decision Making in the Digital Era


Paper by Johann Höchtl et al in the Journal of Organizational Computing and Electronic Commerce: “Although of high relevance to political science, the interaction between technological change and political change in the era of Big Data remains somewhat of a neglected topic. Most studies focus on the concept of e-government and e-governance, and on how already existing government activities performed through the bureaucratic body of public administration could be improved by technology. This paper attempts to build a bridge between the field of e-governance and theories of public administration that goes beyond the service delivery approach that dominates a large part of e-government research. Using the policy cycle as a generic model for policy processes and policy development, a new look on how policy decision making could be conducted on the basis of ICT and Big Data is presented in this paper….(More)”

Citizenship, Social Media, and Big Data: Current and Future Research in the Social Sciences


Homero Gil de Zúñiga at Social Science Computer Review: “This special issue of the Social Science Computer Review provides a sample of the latest strategies employing large data sets in social media and political communication research. The proliferation of information communication technologies, social media, and the Internet, alongside the ubiquity of high-performance computing and storage technologies, has ushered in the era of computational social science. However, in no way does the use of “big data” represent a standardized area of inquiry in any field. This article briefly summarizes pressing issues when employing big data for political communication research. Major challenges remain to ensure the validity and generalizability of findings. Strong theoretical arguments are still a central part of conducting meaningful research. In addition, ethical practices concerning how data are collected remain an area of open discussion. The article surveys studies that offer unique and creative ways to combine methods and introduce new tools while at the same time address some solutions to ethical questions….(More)”

The Upside of Slacktivism


 in Pacific Standard: “When you think of meaningful political action, you probably think of the March on Washington for Jobs and Freedom, or perhaps ACT-UP‘s 1990 protests in San Francisco. You probably don’t think of clicking “like” or “share” on Facetwitstagram—though a new study suggests that those likes and shares may be just as important as marching in the streets, singing songs, and carrying signs.

“The efficacy of online networks in disseminating timely information has been praised by many commentators; at the same time, users are often derided as ‘slacktivists’ because of the shallow commitment involved in clicking a forwarding button,” writes a team led by Pablo Barberá, a political scientist at New York University, in the journal PLoS One.

In other words, it’s easy to argue that sharing a post about climate change and whatnot has no value, since it involves no sacrifice—no standoffs with angry police, no going to jail over taxes you didn’t pay because you opposed the Mexican-American War, not even lost shoes.

On the other hand, maybe sacrifice isn’t the point. Maybe it’s getting attention, and, Barberá and colleagues suggest, slacktivism is actually pretty good at that part—a consequence of just how easy it is to spread the word with the click of a mouse.

The team reached that conclusion after analyzing tens of millions of tweets sent by nearly three million users during the May 2013 anti-government protests in Gezi Park, Istanbul. Among other things, the team identified which tweets were originals rather than retweets, who retweeted whom, and how many followers each user had. That meant Barberá and his team could identify not only how information flowed within the network of protesters, but also how many people that information reached.

Most original tweets came from a relatively small group of protestors using hashtags such as #gezipark, suggesting that information flowed from a core group of protestors toward a less-active periphery. Geographic data backed that up: Around 18 percent of core tweeters were physically present for the Gezi Park demonstrations, compared to a quarter of a percent of peripheral tweeters…..(More)”

The Internet’s Loop of Action and Reaction Is Worsening


Farhad Manjoo in the New York Times: “Donald J. Trump and Hillary Clinton said this week that we should think about shutting down parts of the Internet to stop terrorist groups from inspiring and recruiting followers in distant lands. Mr. Trump even suggested an expert who’d be perfect for the job: “We have to go see Bill Gates and a lot of different people that really understand what’s happening, and we have to talk to them — maybe, in certain areas, closing that Internet up in some way,” he said on Monday in South Carolina.

Many online responded to Mr. Trump and Mrs. Clinton with jeers, pointing out both constitutional and technical limits to their plans. Mr. Gates, the Microsoft co-founder who now spends much of his time on philanthropy, has as much power to close down the Internet as he does to fix Mr. Trump’s hair.

Yet I had a different reaction to Mr. Trump and Mrs. Clinton’s fantasy of a world in which you could just shut down parts of the Internet that you didn’t like: Sure, it’s impossible, but just imagine if we could do it, just for a bit. Wouldn’t it have been kind of a pleasant dream world, in these overheated last few weeks, to have lived free of social media?

Hear me out. If you’ve logged on to Twitter and Facebook in the waning weeks of 2015, you’ve surely noticed that the Internet now seems to be on constant boil. Your social feed has always been loud, shrill, reflexive and ugly, but this year everything has been turned up to 11. The Islamic State’s use of the Internet is perhaps only the most dangerous manifestation of what, this year, became an inescapable fact of online life: The extremists of all stripes are ascendant, and just about everywhere you look, much of the Internet is terrible.“The academic in me says that discourse norms have shifted,” said Susan Benesch, a faculty associate at Harvard’s Berkman Center for Internet & Society and the director of the Dangerous Speech Project, an effort to study speech that leads to violence. “It’s become so common to figuratively walk through garbage and violent imagery online that people have accepted it in a way. And it’s become so noisy that you have to shout more loudly, and more shockingly, to be heard.”

You might argue that the angst online is merely a reflection of the news. Terrorism, intractable warfare, mass shootings, a hyperpartisan presidential race, police brutality, institutional racism and the protests over it have dominated the headlines. It’s only natural that the Internet would get a little out of control over that barrage.

But there’s also a way in which social networks seem to be feeding a cycle of action and reaction. In just about every news event, the Internet’s reaction to the situation becomes a follow-on part of the story, so that much of the media establishment becomes trapped in escalating, infinite loops of 140-character, knee-jerk insta-reaction.

“Presidential elections have always been pretty nasty, but these days the mudslinging is omnipresent in a way that’s never been the case before,” said Whitney Phillips, an assistant professor of literary studies and writing at Mercer University, who is the author of “This Is Why We Can’t Have Nice Things,” a study of online “trolling.” “When Donald Trump says something that I would consider insane, it’s not just that it gets reported on by one or two or three outlets, but it becomes this wave of iterative content on top of content on top of content in your feed, taking over everything you see.”

The spiraling feedback loop is exhausting and rarely illuminating. The news brims with instantly produced “hot takes” and a raft of fact-free assertions. Everyone — yours truly included — is always on guard for the next opportunity to meme-ify outrage: What crazy thing did Trump/Obama/The New York Times/The New York Post/Rush Limbaugh/etc. say now, and what clever quip can you fit into a tweet to quickly begin collecting likes?

There is little room for indulging nuance, complexity, or flirting with the middle ground. In every issue, you are either with one aggrieved group or the other, and the more stridently you can express your disdain — short ofhurling profanities at the president on TV, which will earn you a brief suspension — the better reaction you’ll get….(More)”

What Privacy Papers Should Policymakers be Reading in 2016?


Stacy Gray at the Future of Privacy Forum: “Each year, FPF invites privacy scholars and authors to submit articles and papers to be considered by members of our Advisory Board, with an aim toward showcasing those articles that should inform any conversation about privacy among policymakers in Congress, as well as at the Federal Trade Commission and in other government agencies. For our sixth annual Privacy Papers for Policymakers, we received submissions on topics ranging from mobile app privacy, to location tracking, to drone policy.

Our Advisory Board selected papers that describe the challenges and best practices of designing privacy notices, ways to minimize the risks of re-identification of data by focusing on process-based data release policy and taking a precautionary approach to data release, the relationship between privacy and markets, and bringing the concept of trust more strongly into privacy principles.

Our top privacy papers for 2015 are, in alphabetical order:
Florian Schaub, Rebecca Balebako, Adam L. Durity, and Lorrie Faith Cranor
Ira S. Rubinstein and Woodrow Hartzog
Arvind Narayanan, Joanna Huey, and Edward W. Felten
Ryan Calo
Neil Richards and Woodrow Hartzog
Our two papers selected for Notable Mention are:
Peter Swire (Testimony, Senate Judiciary Committee Hearing, July 8, 2015)
Joel R. Reidenberg
….(More)”