Exploration, Extraction and ‘Rawification’. The Shaping of Transparency in the Back Rooms of Open Data


Paper by Denis, Jerome and Goëta, Samuel: “With the advent of open data initiatives, raw data has been staged as a crucial element of government transparency. If the consequences of such data-driven transparency have already been discussed, we still don’t know much about its back rooms. What does it mean for an administration to open its data? Following information infrastructure studies, this communication aims to question the modes of existence of raw data in administrations. Drawing on an ethnography of open government data projects in several French administrations, it shows that data are not ready-at-hand resources. Indeed, three kinds of operations are conducted that progressively instantiate open data. The first one is exploration. Where are, and what are, the data within the institution are tough questions, the response to which entails organizational and technical inquiries. The second one is extraction. Data are encapsulated in databases and its release implies a sometimes complex disarticulation process. The third kind of operations is ‘rawification’. It consists in a series of tasks that transforms what used to be indexical professional data into raw data. To become opened, data are (re)formatted, cleaned, ungrounded. Though largely invisible, these operations foreground specific ‘frictions’ that emerge during the sociotechnical shaping of transparency, even before data publication and reuses.”

Government Surveillance and Internet Search Behavior


New paper by Marthews, Alex and Tucker, Catherine: “This paper uses data from Google Trends on search terms from before and after the surveillance revelations of June 2013 to analyze whether Google users’ search behavior shifted as a result of an exogenous shock in information about how closely their internet searches were being monitored by the U. S. government. We use data from Google Trends on search volume for 282 search terms across eleven different countries. These search terms were independently rated for their degree of privacy-sensitivity along multiple dimensions. Using panel data, our result suggest that cross-nationally, users were less likely to search using search terms that they believed might get them in trouble with the U. S. government. In the U. S., this was the main subset of search terms that were affected. However, internationally there was also a drop in traffic for search terms that were rated as personally sensitive. These results have implications for policy makers in terms of understanding the actual effects on search behavior of disclosures relating to the scale of government surveillance on the Internet and their potential effects on international competitiveness.

Statistics and Open Data: Harvesting unused knowledge, empowering citizens and improving public services


House of Commons Public Administration Committee (Tenth Report):
“1. Open data is playing an increasingly important role in Government and society. It is data that is accessible to all, free of restrictions on use or redistribution and also digital and machine-readable so that it can be combined with other data, and thereby made more useful. This report looks at how the vast amounts of data generated by central and local Government can be used in open ways to improve accountability, make Government work better and strengthen the economy.

2. In this inquiry, we examined progress against a series of major government policy announcements on open data in recent years, and considered the prospects for further development. We heard of government open data initiatives going back some years, including the decision in 2009 to release some Ordnance Survey (OS) data as open data, and the Public Sector Mapping Agreement (PSMA) which makes OS data available for free to the public sector.  The 2012 Open Data White Paper ‘Unleashing the Potential’ says that transparency through open data is “at the heart” of the Government’s agenda and that opening up would “foster innovation and reform public services”. In 2013 the report of the independently-chaired review by Stephan Shakespeare, Chief Executive of the market research and polling company YouGov, of the use, re-use, funding and regulation of Public Sector Information urged Government to move fast to make use of data. He criticised traditional public service attitudes to data before setting out his vision:

    • To paraphrase the great retailer Sir Terry Leahy, to run an enterprise without data is like driving by night with no headlights. And yet that is what Government often does. It has a strong institutional tendency to proceed by hunch, or prejudice, or by the easy option. So the new world of data is good for government, good for business, and above all good for citizens. Imagine if we could combine all the data we produce on education and health, tax and spending, work and productivity, and use that to enhance the myriad decisions which define our future; well, we can, right now. And Britain can be first to make it happen for real.

3. This was followed by publication in October 2013 of a National Action Plan which sets out the Government’s view of the economic potential of open data as well as its aspirations for greater transparency.

4. This inquiry is part of our wider programme of work on statistics and their use in Government. A full description of the studies is set out under the heading “Statistics” in the inquiries section of our website, which can be found at www.parliament.uk/pasc. For this inquiry we received 30 pieces of written evidence and took oral evidence from 12 witnesses. We are grateful to all those who have provided evidence and to our Specialist Adviser on statistics, Simon Briscoe, for his assistance with this inquiry.”

Table of Contents:

Summary
1 Introduction
2 Improving accountability through open data
3 Open Data and Economic Growth
4 Improving Government through open data
5 Moving faster to make a reality of open data
6 A strategic approach to open data?
Conclusion
Conclusions and recommendations

How Twitter Could Help Police Departments Predict Crime


Eric Jaffe in Atlantic Cities: “Initially, Matthew Gerber didn’t believe Twitter could help predict where crimes might occur. For one thing, Twitter’s 140-character limit leads to slang and abbreviations and neologisms that are hard to analyze from a linguistic perspective. Beyond that, while criminals occasionally taunt law enforcement via Twitter, few are dumb or bold enough to tweet their plans ahead of time. “My hypothesis was there was nothing there,” says Gerber.
But then, that’s why you run the data. Gerber, a systems engineer at the University of Virginia’s Predictive Technology Lab, did indeed find something there. He reports in a new research paper that public Twitter data improved the predictions for 19 of 25 crimes that occurred early last year in metropolitan Chicago, compared with predictions based on historical crime patterns alone. Predictions for stalking, criminal damage, and gambling saw the biggest bump…..
Of course, the method says nothing about why Twitter data improved the predictions. Gerber speculates that people are tweeting about plans that correlate highly with illegal activity, as opposed to tweeting about crimes themselves.
Let’s use criminal damage as an example. The algorithm identified 700 Twitter topics related to criminal damage; of these, one topic involved the words “united center blackhawks bulls” and so on. Gather enough sports fans with similar tweets and some are bound to get drunk enough to damage public property after the game. Again this scenario extrapolates far more than the data tells, but it offers a possible window into the algorithm’s predictive power.

The map on the left shows predicted crime threat based on historical patterns; the one on the right includes Twitter data. (Via Decision Support Systems)
From a logistical standpoint, it wouldn’t be too difficult for police departments to use this method in their own predictions; both the Twitter data and modeling software Gerber used are freely available. The big question, he says, is whether a department used the same historical crime “hot spot” data as a baseline for comparison. If not, a new round of tests would have to be done to show that the addition of Twitter data still offered a predictive upgrade.
There’s also the matter of public acceptance. Data-driven crime prediction tends to raise any number of civil rights concerns. In 2012, privacy advocates criticized the FBI for a similar plan to use Twitter for crime predictions. In recent months the Chicago Police Department’s own methods have been knocked as a high-tech means of racial profiling. Gerber says his algorithms don’t target any individuals and only cull data posted voluntarily to a public account.”

Protect the open web and the promise of the digital age


Richard Waters in the Financial Times:  “There is much to be lost if companies and nations put up fences around our digital open plains
For all the drawbacks, it is not hard to feel nostalgic about the early days of the web. Surfing between slow-loading, badly designed sites on a dial-up internet connection running at 56 kilobits per second could be frustrating. No wonder it was known as the “world wide wait”. But the “wow” factor was high. There was unparalleled access to free news and information, even if some of it was deeply untrustworthy. Then came that first, revelatory search on Google, which untangled the online jumble with almost miraculous speed.
Later, an uproarious outbreak of blogs converted what had been a passive medium into a global rant. And, with YouTube and Facebook, a mass audience found digital self-expression for the first time.
As the world wide web turns 25, it is easy to take all this for granted. For a generation that has grown up digital, it is part of the fabric of life.
It is also easy to turn away without too many qualms. More than 80 per cent of time spent on smartphones and tablets does not involve the web at all: it is whiled away in apps, which offer the instant gratification that comes from a tap or swipe of a finger.
Typing a URL on a small device, trying to stretch or shrink a web page to fit the small screen, browsing through Google links in a mobile browser: it is all starting to seem so, well, anachronistic.
But if the world wide web is coming to play a smaller part in everyday life, the significance of its relative decline should be kept in perspective. After all, the web is only one of the applications that rides on top of the internet: it is the standards and technologies of the internet itself that provide the main foundation for the modern, connected world. As long as all bits flow freely (and cheaply), the promise of the digital age will remain intact.
Before declaring the web era over and moving on, however, it is worth dwelling on what it represents – and what could be lost if this early manifestation of digital life were to be consigned to history.
Sir Tim Berners-Lee, who wrote the technical paper a quarter of a century ago that laid out the architecture of the web, certainly senses the threat. The open technical standards and open access that lie at the heart of the web – based on the freedom to link any online document to any other – are not guaranteed. What is needed, he argued this week, is nothing less than a digital bill of rights: a statement that would enshrine the ideals on which the medium was founded.
As this suggests, the web has always been much more than a technology. It is a state of mind, a dream of participation, a call to digital freedom that transcends geography. What place it finds in the connected world of tomorrow will help define what it means to be a digital citizen…”

The Parable of Google Flu: Traps in Big Data Analysis


David Lazer: “…big data last winter had its “Dewey beats Truman” moment, when the poster child of big data (at least for behavioral data), Google Flu Trends (GFT), went way off the rails in “nowcasting” the flu–overshooting the peak last winter by 130% (and indeed, it has been systematically overshooting by wide margins for 3 years). Tomorrow we (Ryan Kennedy, Alessandro Vespignani, and Gary King) have a paper out in Science dissecting why GFT went off the rails, how that could have been prevented, and the broader lessons to be learned regarding big data.
[We are The Parable of Google Flu (WP-Final).pdf we submitted before acceptance. We have also posted an SSRN paper evaluating GFT for 2013-14, since it was reworked in the Fall.]Key lessons that I’d highlight:
1) Big data are typically not scientifically calibrated. This goes back to my post last month regarding measurement. This does not make them useless from a scientific point of view, but you do need to build into the analysis that the “measures” of behavior are being affected by unseen things. In this case, the likely culprit was the Google search algorithm, which was modified in various ways that we believe likely to have increased flu related searches.
2) Big data + analytic code used in scientific venues with scientific claims need to be more transparent. This is a tricky issue, because there are both legitimate proprietary interests involved and privacy concerns, but much more can be done in this regard than has been done in the 3 GFT papers. [One of my aspirations over the next year is to work together with big data companies, researchers, and privacy advocates to figure out how this can be done.]
3) It’s about the questions, not the size of the data. In this particular case, one could have done a better job stating the likely flu prevalence today by ignoring GFT altogether and just project 3 week old CDC data to today (better still would have been to combine the two). That is, a synthesis would have been more effective than a pure “big data” approach. I think this is likely the general pattern.
4) More generally, I’d note that there is much more that the academy needs to do. First, the academy needs to build the foundation for collaborations around big data (e.g., secure infrastructures, legal understandings around data sharing, etc). Second, there needs to be MUCH more work done to build bridges between the computer scientists who work on big data and social scientists who think about deriving insights about human behavior from data more generally. We have moved perhaps 5% of the way that we need to in this regard.”

Intelligent Demand: Policy Rationale, Design and Potential Benefits


New OECD paper: “Policy interest in demand-side initiatives has grown in recent years. This may reflect an expectation that demand-side policy could be particularly effective in steering innovation to meet societal needs. In addition, owing to constrained public finances in most OECD countries, the possibility that demand-side policies might be less expensive than direct support measures is attractive. Interest may also reflect some degree of disappointment with the outcomes of traditional supply-side measures. This paper reviews demand-side innovation policies, their rationales and importance across countries, different approaches to their design, the challenges entailed in their implementation and evaluation, and good practices. Three main forms of demand-side policy are considered: innovation-oriented public procurement, innovation-oriented regulations, and standards. Emphasis is placed on innovation-oriented public procurement.”

A Framework for Benchmarking Open Government Data Efforts


DS Sayogo, TA Pardo, M Cook in the HICSS ’14 Proceedings of the 2014 47th Hawaii International Conference on System Sciences: “This paper presents a preliminary exploration on the status of open government data worldwide as well as in-depth evaluation of selected open government data portals. Using web content analysis of the open government data portals from 35 countries, this study outlines the progress of open government data efforts at the national government level. This paper also conducted in-depth evaluation of selected cases to justify the application of a proposed framework for understanding the status of open government data initiatives. This paper suggest that findings of this exploration offer a new-level of understanding of the depth, breath, and impact of current open government data efforts. The review results also point to the different stages of open government data portal development in term of data content, data manipulation capability and participatory and engagement capability. This finding suggests that development of open government portal follows an incremental approach similar to those of e-government development stages in general. Subsequently, this paper offers several observations in terms of policy and practical implication of open government data portal development drawn from the application of the proposed framework”

Social media effects on fostering online civic engagement and building citizen trust and trust in institutions


Anne Marie Warren, Ainin Sulaiman and Noor Ismawati Jaafar in Government Information Quaterly: “This paper tests the extent to which social media is shaping civic engagement initiatives to build trust among people and increase trust in their institutions, particularly the government, police and justice systems. A survey of 502 citizens showed that using social media for civic engagement has a significant positive impact on trust propensity and that this trust had led to an increase in trust towards institutions. Interestingly, while group incentives encouraged citizens to engage online for civic matters, it is civic publications through postings on social media that intensify the urge of citizens for civic action to address social issues. Post-hoc analysis via ten interviews with social activists was conducted to further examine their perceptions on trust towards institutions. The overall findings suggest that institutions, in their effort to promote a meaningful and trusting citizen engagement, need to enhance trust among the public by fostering social capital via online civic engagement and closing the public–police disengagement gap”

Revolutionising Digital Public Service Delivery: A UK Government Perspective


Paper by Alan W. Brown, Jerry Fishenden,  and Mark Thompson: ” For public sector organizations across the world, the pressures for improved
efficiency during the past decades are now accompanied by an equally strong need to revolutionise service delivery to create solutions that better meet citizens’ needs; to develop channels that offer efficiency and increase inclusion to all citizens being served; and to re-invent supply chains to deliver services faster, cheaper, and more effectively. But how do government organisations ensure investment in digital transformation delivers the intended outcomes after earlier “online government” and “e-government” initiatives produced little in terms of significant, sustainable benefits? Here we focus on how digitisation, built on open standards, is transforming the public sector’s relationship with its citizens. This paper provides a perspective of digital change efforts across the UK government as an illustration of the improvements taking place more broadly in the public sector. It provides a vision for the future of our digital world, revealing the symbiotic relationship between organisational change and digitisation, and offering insights into public service delivery in the digital economy.”