Metadata Liberation Movement


Holman Jenkins in the Wall Street Journal: “The biggest problem, then, with metadata surveillance may simply be that the wrong agencies are in charge of it. One particular reason why this matters is that the potential of metadata surveillance might actually be quite large but is being squandered by secret agencies whose narrow interest is only looking for terrorists….
“Big data” is only as good as the algorithms used to find out things worth finding out. The efficacy and refinement of big-data techniques are advanced by repetition, by giving more chances to find something worth knowing. Bringing metadata out of its black box wouldn’t only be a way to improve public trust in what government is doing. It would be a way to get more real value for society out of techniques that are being squandered on a fairly minor threat.
Bringing metadata out of the black box would open up new worlds of possibility—from anticipating traffic jams to locating missing persons after a disaster. It would also create an opportunity to make big data more consistent with the constitutional prohibition of unwarranted search and seizure. In the first instance, with the computer withholding identifying details of the individuals involved, any red flag could be examined by a law-enforcement officer to see, based on accumulated experience, whether the indication is of interest.
If so, a warrant could be obtained to expose the identities involved. If not, the record could immediately be expunged. All this could take place in a reasonably aboveboard, legal fashion, open to inspection in court when and if charges are brought or—this would be a good idea—a court is informed of investigations that led to no action.
Our guess is that big data techniques would pop up way too many false positives at first, and only considerable learning and practice would allow such techniques to become a useful tool. At the same time, bringing metadata surveillance out of the shadows would help the Googles, Verizons and Facebooks defend themselves from a wholly unwarranted suspicion that user privacy is somehow better protected by French or British or (heavens) Chinese companies from their own governments than U.S. data is from the U.S. government.
Most of all, it would allow these techniques to be put to work on solving problems that are actual problems for most Americans, which terrorism isn’t.”

Metrics for Government Reform


Geoff Mulgan: “How do you measure a programme of government reform? What counts as evidence that it’s working or not? I’ve been asked this question many times, so this very brief note suggests some simple answers – mainly prompted by seeing a few writings on this question which I thought confused some basic points.”
Any type of reform programme will combine elements at very different levels. These may include:

  • A new device – for example, adjusting the wording in an official letter or a call centre script to see what impact this has on such things as tax compliance.
  • A new kind of action – for example a new way of teaching maths in schools, treating patients with diabetes, handling prison leavers.
  • A new kind of policy – for example opening up planning processes to more local input; making welfare payments more conditional.
  • A new strategy – for example a scheme to cut carbon in cities, combining retrofitting of housing with promoting bicycle use; or a strategy for public health.
  • A new approach to strategy – for example making more use of foresight, scenarios or big data.
  • A new approach to governance – for example bringing hitherto excluded groups into political debate and decision-making.

This rough list hopefully shows just how different these levels are in their nature. Generally as we go down the list the following things rise:

  • The number of variables and the complexity of the processes involved
  • The timescales over which any judgements can be made
  • The difficultness involved in making judgements about causation
  • The importance of qualitative relative to quantitative assessment”

Big Data Comes To Boston’s Neighborhoods


WBUR: “In the spring of 1982, social scientists James Q. Wilson and George L. Kelling published a seminal article in The Atlantic Monthly titled “Broken Windows.”
The piece focused public attention on a long-simmering theory in urban sociology: that broken windows, graffiti and other signs of neighborhood decay are correlated with — and may even help cause — some of the biggest problems in America’s cities.
Wilson and Kelling focused on the link to crime, in particular; an abandoned car, they argued, signals that illicit behavior is acceptable on a given block….Some researchers have poked holes in the theory — arguing that broken widows, known in academic circles as “physical disorder,” are more symptom than cause. But there is no disputing the idea’s influence: it’s inspired reams of research and shaped big city policing from New York to Los Angeles…
But a new study out of the Boston Area Research Initiative, a Harvard University-based collaborative of academics and city officials, suggests a new possibility: a cheap, sprawling and easily updated map of the urban condition.
Mining data from Boston’s constituent relationship management (CRM) operation — a hotline, website and mobile app for citizens to report everything from abandoned bicycles to mouse-infested apartment buildings — researchers have created an almost real-time guide to what ails the city…
But a first-of-its-kind measure of civic engagement — how likely are residents of a given block to report a pothole or broken streetlight? — yields more meaningful results.
One early finding: language barriers seem to explain scant reporting in neighborhoods with large populations of Latino and Asian renters; that’s already prompted targeted flyering that’s yielded modest improvements.
The same engagement measure points to another, more hopeful phenomenon: clusters of citizen activists show up not just in wealthy enclaves, as expected, but in low-income areas.”

Infoglut: How Too Much Information Is Changing the Way We Think and Know


New book by Mark Andrejevic: “Today, more mediated information is available to more people than at any other time in human history. New and revitalized sense-making strategies multiply in response to the challenges of “cutting through the clutter” of competing narratives and taming the avalanche of information. Data miners, “sentiment analysts,” and decision markets offer to help bodies of data “speak for themselves”—making sense of their own patterns so we don’t have to. Neuromarketers and body language experts promise to peer behind people’s words to see what their brains are really thinking and feeling. New forms of information processing promise to displace the need for expertise and even comprehension—at least for those with access to the data.
Infoglut explores the connections between these wide-ranging sense-making strategies for an era of information overload and “big data,” and the new forms of control they enable. Andrejevic critiques the popular embrace of deconstructive debunkery, calling into question the post-truth, post-narrative, and post-comprehension politics it underwrites, and tracing a way beyond them.”

Infographics: Winds of change


Book Review in the Economist:

  • Data Points: Visualisation That Means Something. By Nathan Yau. Wiley; 300 pages; $32 and £26.99.
  • Facts are Sacred. By Simon Rogers. Faber and Faber; 311 pages; £20.
  • The Infographic History of the World. By James Ball and Valentina D’Efilippo. Collins; 224 pages; £20.

“IN THE late 1700s William Playfair, a Scottish engineer, created the bar chart, pie chart and line graph. These amounted to visual breakthroughs, innovations that allowed people to see patterns in data that they would otherwise have missed if they just stared at long tables of numbers.
Big data, the idea that the world is replete with more information than ever, is now all the rage. And the search for fresh and enlightened ways to help people absorb it is causing a revolution. A new generation of statisticians and designers—often the same person—are working on computer technologies and visual techniques that will depict data at scales and in forms previously unimaginable. The simple line graph and pie chart are being supplemented by things like colourful, animated bubble charts, which can present more variables. Three-dimensional network diagrams show ratios and relationships that were impossible to depict before.

The Real-Time City? Big Data and Smart Urbanism


New paper by Rob Kitchin from the National University of Ireland, Maynooth (NUI Maynooth) – NIRSA: “‘Smart cities’ is a term that has gained traction in academia, business and government to describe cities that, on the one hand, are increasingly composed of and monitored by pervasive and ubiquitous computing and, on the other, whose economy and governance is being driven by innovation, creativity and entrepreneurship, enacted by smart people. This paper focuses on the former and how cities are being instrumented with digital devices and infrastructure that produce ‘big data’ which enable real-time analysis of city life, new modes of technocratic urban governance, and a re-imagining of cities. The paper details a number of projects that seek to produce a real-time analysis of the city and provides a critical reflection on the implications of big data and smart urbanism”
 
 

City Data: Big, Open and Linked


Working Paper by Mark S. Fox (University of Toronto): “Cities are moving towards policymaking based on data. They are publishing data using Open Data standards, linking data from disparate sources, allowing the crowd to update their data with Smart Phone Apps that use Open APIs, and applying “Big Data” Techniques to discover relationships that lead to greater efficiencies.
One Big City Data example is from New York City (Schönberger & Cukier, 2013). Building owners were illegally converting their buildings into rooming houses that contained 10 times the number people they were designed for. These buildings posed a number of problems, including fire hazards, drugs, crime, disease and pest infestations. There are over 900,000 properties in New York City and only 200 inspectors who received over 25,000 illegal conversion complaints per year. The challenge was to distinguish nuisance complaints from those worth investigating where current methods were resulting in only 13% of the inspections resulting in vacate orders.
New York’s Analytics team created a dataset that combined data from 19 agencies including buildings, preservation, police, fire, tax, and building permits. By combining data analysis with expertise gleaned from inspectors (e.g., buildings that recently received a building permit were less likely to be a problem as they were being well maintained), the team was able to develop a rating system for complaints. Based on their analysis of this data, they were able to rate complaints such that in 70% of their visits, inspectors issued vacate orders; a fivefold increase in efficiency…
This paper provides an introduction to the concepts that underlie Big City Data. It explains the concepts of Open, Unified, Linked and Grounded data that lie at the heart of the Semantic Web. It then builds on this by discussing Data Analytics, which includes Statistics, Pattern Recognition and Machine Learning. Finally we discuss Big Data as the extension of Data Analytics to the Cloud where massive amounts of computing power and storage are available for processing large data sets. We use city data to illustrate each.”

Microsensors help map crowdsourced pollution data


air-quality-egg-mapElena Craft in GreenBiz: Michael Heimbinder, a Brooklyn entrepreneur, hopes to empower individuals with his small-scale air quality monitoring system, AirCasting. The AirCasting system uses a mobile, Bluetooth-enabled air monitor not much larger than a smartphone to measure carbon dioxide, carbon monoxide, nitrogen dioxide, particulate matter and other pollutants. An accompanying Android app records and formats the information to an emissions map.
Alternatively, another instrument, the Air Quality Egg, comes pre-assembled ready to use. Innovative air monitoring systems, such as AirCasting or the Air Quality Egg, empower ordinary citizens to monitor the pollution they encounter daily and proactively address problematic sources of pollution.
This technology is part of a growing movement to enable the use of small sensors. In response to inquiries about small-sensor data, the EPA is researching the next generation of air measuring technologies. EPA experts are working with sensor developers to evaluate data quality and understand useful sensor applications. Through this ongoing collaboration, the EPA hopes to bolster measurements from conventional, stationary air-monitoring systems with data collected from individuals’ air quality microsensors….
Like many technologies emerging from the big data revolution and innovations in the energy sector, microsensing technology provides a wealth of high-quality data at a relatively low cost. It allows us to track previously undetected air pollution from traditional sources of urban smog, such as highways, and unconventional sources of pollution. Microsensing technology not only educates the public, but also helps to enlighten regulators so that policymakers can work from the facts to protect citizens’ health and welfare.

Capitol Words


CaptureAbout Capitol Words: “For every day Congress is in session, Capitol Words visualizes the most frequently used words in the Congressional Record, giving you an at-a-glance view of which issues lawmakers address on a daily, weekly, monthly and yearly basis. Capitol Words lets you see what are the most popular words spoken by lawmakers on the House and Senate floor.

Methodology

The contents of the Congressional Record are downloaded daily from the website of the Government Printing Office. The GPO distributes the Congressional Record in ZIP files containing the contents of the record in plain-text format.

Each text file is parsed and turned into an XML document, with things like the title and speaker marked up. The contents of each file are then split up into words and phrases — from one word to five.

The resulting data is saved to a search engine. Capitol Words has data from 1996 to the present.”