Browser extension automates citations of online material


Springwise: “Plagiarism is a major concern for colleges today, meaning when it comes to writing a thesis or essay, college students can often spend an inordinate amount of time ensuring their bibliographies are up to scratch, to the detriment of the quality of the actual writing. In the past, services such as ReadCube have made it easier to annotate and search online articles, and now Citelighter automatically generates a citation for any web resource, along with a number of tools to help students organize their research.

The service is a toolbar that sits at the top of the user’s browser while they search for material for their paper. When they’ve found a fact or quote that’s useful, users simply highlight the text and click the Capture button, which saves the clipping to the project they’re working on. Citelight automatically captures the bibliographic information necessary to create a citation that reaches academic standards, and users can also add their own comments for when they come to use the quote in their essay. Citations can be re-ordered within each project to enable students to plot out a rough version of their paper before sitting down to write…”

Infomediary Business Models for Connecting Open Data Providers and Users


Paper by Marijn Janssen and Anneke Zuiderwijk in Social Science Computer Review: “Many public organizations are opening their data to the general public and embracing social media in order to stimulate innovation. These developments have resulted in the rise of new, infomediary business models, positioned between open data providers and users. Yet the variation among types of infomediary business models is little understood. The aim of this article is to contribute to the understanding of the diversity of existing infomediary business models that are driven by open data and social media. Cases presenting different modes of open data utilization in the Netherlands are investigated and compared. Six types of business models are identified: single-purpose apps, interactive apps, information aggregators, comparison models, open data repositories, and service platforms. The investigated cases differ in their levels of access to raw data and in how much they stimulate dialogue between different stakeholders involved in open data publication and use. Apps often are easy to use and provide predefined views on data, whereas service platforms provide comprehensive functionality but are more difficult to use. In the various business models, social media is sometimes used for rating and discussion purposes, but it is rarely used for stimulating dialogue or as input to policy making. Hybrid business models were identified in which both public and private organizations contribute to value creation. Distinguishing between different types of open data users was found to be critical in explaining different business models.”

The Use of ICT for Open Government in U. S. Municipalities Perceptions of Chief Administrative Officers


Paper by Sukumar Ganapatiand Christopher G. Reddick in Public Performance & Management Review: “The extent to which U. S. municipal governments have adopted open e-government initiatives is examined through a survey and interviews with chief administrative officers (CAOs) along the three dimensions of open government: transparency, participation, and collaboration. A very high share of CAOs reported satisfaction with implementing open government overall. A majority indicated achievement along each of the open government dimensions. Whereas the CAOs had a significantly positive view of collaboration, their view on challenges was negatively significant for achievement and satisfaction with open government. The interviews indicated that the CAOs do not view open government as a fad and place it high on their respective agendas.”

Sinkhole of bureaucracy


First article in a Washington Post series “examining the failures at the heart of troubled federal systems” by David A. Fahrenthold: “The trucks full of paperwork come every day, turning off a country road north of Pittsburgh and descending through a gateway into the earth. Underground, they stop at a metal door decorated with an American flag.

Behind the door, a room opens up as big as a supermarket, full of five-drawer file cabinets and people in business casual. About 230 feet below the surface, there is easy-listening music playing at somebody’s desk.
This is one of the weirdest workplaces in the U.S. government — both for where it is and for what it does.
Here, inside the caverns of an old Pennsylvania limestone mine, there are 600 employees of the Office of Personnel Management. Their task is nothing top-secret. It is to process the retirement papers of the government’s own workers.
But that system has a spectacular flaw. It still must be done entirely by hand, and almost entirely on paper.

The employees here pass thousands of case files from cavern to cavern and then key in retirees’ personal data, one line at a time. They work underground not for secrecy but for space. The old mine’s tunnels have room for more than 28,000 file cabinets of paper records.
This odd place is an example of how hard it is to get a time-wasting bug out of a big bureaucratic system.
Held up by all that paper, work in the mine runs as slowly now as it did in 1977….”
See also Data mining. The old-fashioned way: View the full graphic.

Exploration, Extraction and ‘Rawification’. The Shaping of Transparency in the Back Rooms of Open Data


Paper by Denis, Jerome and Goëta, Samuel: “With the advent of open data initiatives, raw data has been staged as a crucial element of government transparency. If the consequences of such data-driven transparency have already been discussed, we still don’t know much about its back rooms. What does it mean for an administration to open its data? Following information infrastructure studies, this communication aims to question the modes of existence of raw data in administrations. Drawing on an ethnography of open government data projects in several French administrations, it shows that data are not ready-at-hand resources. Indeed, three kinds of operations are conducted that progressively instantiate open data. The first one is exploration. Where are, and what are, the data within the institution are tough questions, the response to which entails organizational and technical inquiries. The second one is extraction. Data are encapsulated in databases and its release implies a sometimes complex disarticulation process. The third kind of operations is ‘rawification’. It consists in a series of tasks that transforms what used to be indexical professional data into raw data. To become opened, data are (re)formatted, cleaned, ungrounded. Though largely invisible, these operations foreground specific ‘frictions’ that emerge during the sociotechnical shaping of transparency, even before data publication and reuses.”

Government Surveillance and Internet Search Behavior


New paper by Marthews, Alex and Tucker, Catherine: “This paper uses data from Google Trends on search terms from before and after the surveillance revelations of June 2013 to analyze whether Google users’ search behavior shifted as a result of an exogenous shock in information about how closely their internet searches were being monitored by the U. S. government. We use data from Google Trends on search volume for 282 search terms across eleven different countries. These search terms were independently rated for their degree of privacy-sensitivity along multiple dimensions. Using panel data, our result suggest that cross-nationally, users were less likely to search using search terms that they believed might get them in trouble with the U. S. government. In the U. S., this was the main subset of search terms that were affected. However, internationally there was also a drop in traffic for search terms that were rated as personally sensitive. These results have implications for policy makers in terms of understanding the actual effects on search behavior of disclosures relating to the scale of government surveillance on the Internet and their potential effects on international competitiveness.

Statistics and Open Data: Harvesting unused knowledge, empowering citizens and improving public services


House of Commons Public Administration Committee (Tenth Report):
“1. Open data is playing an increasingly important role in Government and society. It is data that is accessible to all, free of restrictions on use or redistribution and also digital and machine-readable so that it can be combined with other data, and thereby made more useful. This report looks at how the vast amounts of data generated by central and local Government can be used in open ways to improve accountability, make Government work better and strengthen the economy.

2. In this inquiry, we examined progress against a series of major government policy announcements on open data in recent years, and considered the prospects for further development. We heard of government open data initiatives going back some years, including the decision in 2009 to release some Ordnance Survey (OS) data as open data, and the Public Sector Mapping Agreement (PSMA) which makes OS data available for free to the public sector.  The 2012 Open Data White Paper ‘Unleashing the Potential’ says that transparency through open data is “at the heart” of the Government’s agenda and that opening up would “foster innovation and reform public services”. In 2013 the report of the independently-chaired review by Stephan Shakespeare, Chief Executive of the market research and polling company YouGov, of the use, re-use, funding and regulation of Public Sector Information urged Government to move fast to make use of data. He criticised traditional public service attitudes to data before setting out his vision:

    • To paraphrase the great retailer Sir Terry Leahy, to run an enterprise without data is like driving by night with no headlights. And yet that is what Government often does. It has a strong institutional tendency to proceed by hunch, or prejudice, or by the easy option. So the new world of data is good for government, good for business, and above all good for citizens. Imagine if we could combine all the data we produce on education and health, tax and spending, work and productivity, and use that to enhance the myriad decisions which define our future; well, we can, right now. And Britain can be first to make it happen for real.

3. This was followed by publication in October 2013 of a National Action Plan which sets out the Government’s view of the economic potential of open data as well as its aspirations for greater transparency.

4. This inquiry is part of our wider programme of work on statistics and their use in Government. A full description of the studies is set out under the heading “Statistics” in the inquiries section of our website, which can be found at www.parliament.uk/pasc. For this inquiry we received 30 pieces of written evidence and took oral evidence from 12 witnesses. We are grateful to all those who have provided evidence and to our Specialist Adviser on statistics, Simon Briscoe, for his assistance with this inquiry.”

Table of Contents:

Summary
1 Introduction
2 Improving accountability through open data
3 Open Data and Economic Growth
4 Improving Government through open data
5 Moving faster to make a reality of open data
6 A strategic approach to open data?
Conclusion
Conclusions and recommendations

How Twitter Could Help Police Departments Predict Crime


Eric Jaffe in Atlantic Cities: “Initially, Matthew Gerber didn’t believe Twitter could help predict where crimes might occur. For one thing, Twitter’s 140-character limit leads to slang and abbreviations and neologisms that are hard to analyze from a linguistic perspective. Beyond that, while criminals occasionally taunt law enforcement via Twitter, few are dumb or bold enough to tweet their plans ahead of time. “My hypothesis was there was nothing there,” says Gerber.
But then, that’s why you run the data. Gerber, a systems engineer at the University of Virginia’s Predictive Technology Lab, did indeed find something there. He reports in a new research paper that public Twitter data improved the predictions for 19 of 25 crimes that occurred early last year in metropolitan Chicago, compared with predictions based on historical crime patterns alone. Predictions for stalking, criminal damage, and gambling saw the biggest bump…..
Of course, the method says nothing about why Twitter data improved the predictions. Gerber speculates that people are tweeting about plans that correlate highly with illegal activity, as opposed to tweeting about crimes themselves.
Let’s use criminal damage as an example. The algorithm identified 700 Twitter topics related to criminal damage; of these, one topic involved the words “united center blackhawks bulls” and so on. Gather enough sports fans with similar tweets and some are bound to get drunk enough to damage public property after the game. Again this scenario extrapolates far more than the data tells, but it offers a possible window into the algorithm’s predictive power.

The map on the left shows predicted crime threat based on historical patterns; the one on the right includes Twitter data. (Via Decision Support Systems)
From a logistical standpoint, it wouldn’t be too difficult for police departments to use this method in their own predictions; both the Twitter data and modeling software Gerber used are freely available. The big question, he says, is whether a department used the same historical crime “hot spot” data as a baseline for comparison. If not, a new round of tests would have to be done to show that the addition of Twitter data still offered a predictive upgrade.
There’s also the matter of public acceptance. Data-driven crime prediction tends to raise any number of civil rights concerns. In 2012, privacy advocates criticized the FBI for a similar plan to use Twitter for crime predictions. In recent months the Chicago Police Department’s own methods have been knocked as a high-tech means of racial profiling. Gerber says his algorithms don’t target any individuals and only cull data posted voluntarily to a public account.”

Protect the open web and the promise of the digital age


Richard Waters in the Financial Times:  “There is much to be lost if companies and nations put up fences around our digital open plains
For all the drawbacks, it is not hard to feel nostalgic about the early days of the web. Surfing between slow-loading, badly designed sites on a dial-up internet connection running at 56 kilobits per second could be frustrating. No wonder it was known as the “world wide wait”. But the “wow” factor was high. There was unparalleled access to free news and information, even if some of it was deeply untrustworthy. Then came that first, revelatory search on Google, which untangled the online jumble with almost miraculous speed.
Later, an uproarious outbreak of blogs converted what had been a passive medium into a global rant. And, with YouTube and Facebook, a mass audience found digital self-expression for the first time.
As the world wide web turns 25, it is easy to take all this for granted. For a generation that has grown up digital, it is part of the fabric of life.
It is also easy to turn away without too many qualms. More than 80 per cent of time spent on smartphones and tablets does not involve the web at all: it is whiled away in apps, which offer the instant gratification that comes from a tap or swipe of a finger.
Typing a URL on a small device, trying to stretch or shrink a web page to fit the small screen, browsing through Google links in a mobile browser: it is all starting to seem so, well, anachronistic.
But if the world wide web is coming to play a smaller part in everyday life, the significance of its relative decline should be kept in perspective. After all, the web is only one of the applications that rides on top of the internet: it is the standards and technologies of the internet itself that provide the main foundation for the modern, connected world. As long as all bits flow freely (and cheaply), the promise of the digital age will remain intact.
Before declaring the web era over and moving on, however, it is worth dwelling on what it represents – and what could be lost if this early manifestation of digital life were to be consigned to history.
Sir Tim Berners-Lee, who wrote the technical paper a quarter of a century ago that laid out the architecture of the web, certainly senses the threat. The open technical standards and open access that lie at the heart of the web – based on the freedom to link any online document to any other – are not guaranteed. What is needed, he argued this week, is nothing less than a digital bill of rights: a statement that would enshrine the ideals on which the medium was founded.
As this suggests, the web has always been much more than a technology. It is a state of mind, a dream of participation, a call to digital freedom that transcends geography. What place it finds in the connected world of tomorrow will help define what it means to be a digital citizen…”

The Parable of Google Flu: Traps in Big Data Analysis


David Lazer: “…big data last winter had its “Dewey beats Truman” moment, when the poster child of big data (at least for behavioral data), Google Flu Trends (GFT), went way off the rails in “nowcasting” the flu–overshooting the peak last winter by 130% (and indeed, it has been systematically overshooting by wide margins for 3 years). Tomorrow we (Ryan Kennedy, Alessandro Vespignani, and Gary King) have a paper out in Science dissecting why GFT went off the rails, how that could have been prevented, and the broader lessons to be learned regarding big data.
[We are The Parable of Google Flu (WP-Final).pdf we submitted before acceptance. We have also posted an SSRN paper evaluating GFT for 2013-14, since it was reworked in the Fall.]Key lessons that I’d highlight:
1) Big data are typically not scientifically calibrated. This goes back to my post last month regarding measurement. This does not make them useless from a scientific point of view, but you do need to build into the analysis that the “measures” of behavior are being affected by unseen things. In this case, the likely culprit was the Google search algorithm, which was modified in various ways that we believe likely to have increased flu related searches.
2) Big data + analytic code used in scientific venues with scientific claims need to be more transparent. This is a tricky issue, because there are both legitimate proprietary interests involved and privacy concerns, but much more can be done in this regard than has been done in the 3 GFT papers. [One of my aspirations over the next year is to work together with big data companies, researchers, and privacy advocates to figure out how this can be done.]
3) It’s about the questions, not the size of the data. In this particular case, one could have done a better job stating the likely flu prevalence today by ignoring GFT altogether and just project 3 week old CDC data to today (better still would have been to combine the two). That is, a synthesis would have been more effective than a pure “big data” approach. I think this is likely the general pattern.
4) More generally, I’d note that there is much more that the academy needs to do. First, the academy needs to build the foundation for collaborations around big data (e.g., secure infrastructures, legal understandings around data sharing, etc). Second, there needs to be MUCH more work done to build bridges between the computer scientists who work on big data and social scientists who think about deriving insights about human behavior from data more generally. We have moved perhaps 5% of the way that we need to in this regard.”