Orwell is drowning in data: the volume problem


Dom Shaw in OpenDemocracy: “During World War II, whilst Bletchley Park laboured in the front line of code breaking, the British Government was employing vast numbers of female operatives to monitor and report on telephone, mail and telegraph communications in and out of the country.
The biggest problem, of course, was volume. Without even the most primitive algorithm to detect key phrases that later were to cause such paranoia amongst the sixties and seventies counterculture, causing a whole generation of drug users to use a wholly unnecessary set of telephone synonyms for their desired substance, the army of women stationed in exchanges around the country was driven to report everything and then pass it on up to those whose job it was to analyse such content for significance.
Orwell’s vision of Big Brother’s omniscience was based upon the same model – vast armies of Winston Smiths monitoring data to ensure discipline and control. He saw a culture of betrayal where every citizen was held accountable for their fellow citizens’ political and moral conformity.
Up until the US Government’s Big Data Research and Development Initiative [12] and the NSA development of the Prism programme [13], the fault lines always lay in the technology used to collate or collect and the inefficiency or competing interests of the corporate systems and processes that interpreted the information. Not for the first time, the bureaucracy was the citizen’s best bulwark against intrusion.
Now that the algorithms have become more complex and the technology tilted towards passive surveillance through automation, the volume problem becomes less of an obstacle….
The technology for obtaining this information, and indeed the administration of it, is handled by corporations. The Government, driven by the creed that suggests private companies are better administrators than civil servants, has auctioned off the job to a dozen or more favoured corporate giants who are, as always, beholden not only to their shareholders, but to their patrons within the government itself….
The only problem the state had was managing the scale of the information gleaned from so many people in so many forms. Not any more. The volume problem has been overcome.”

Rulemaking 2.0: Understanding and Getting Better Public Participation


New Report from Cynthia Farina and Mary Newhart for the The IBM Center for The Business of Government: “This report provides important insights in how governments can improve the rulemaking process by taking full advantage of Rulemaking 2.0 technology. The report’s findings and recommendations are based on five experiments with Rulemaking 2.0 conducted by CeRI researchers, four in partnership with the Department of Transportation and one with the Consumer Financial Protection Bureau.While geared specifically to achieving better public participation in rulemaking, the concepts, findings, and recommendations contained in the report are applicable to all government agencies interested in enhancing public participation in a variety of processes. The report offers advice on how government organizations can increase both the quantity and quality of public participation from specific groups of citizens, including missing stakeholders, unaffiliated experts, and the general public.The report describes three barriers to effective participation in rulemaking: lack of awareness, low participation literacy, and information overload. While the report focuses on rulemaking, these barriers also hinder public participation in other arenas.The report offers three strategies to overcome such barriers:

  • Outreach to alert and engage potential new participants
  • Converting newcomers into effective commenters
  • Making substantive rulemaking information accessible”

The Power of Hackathons


Woodrow Wilson International Center for Scholars: “The Commons Lab of the Science and Technology Innovation Program is proud to announce the release of The Power of Hackathons: A Roadmap for Sustainable Open Innovation. Hackathons are collaborative events that have long been part of programmer culture, where people gather in person, online or both to work together on a problem. This could involve creating an application, improving an existing one or testing a platform.
In recent years, government agencies at multiple levels have started holding hackathon events of their own. For this brief, author Zachary Bastian interviewed agency staff, hackathon planners and hackathon participants to better understand how these events can be structured. The fundamental lesson was that a hackathon is not a panacea, but instead should be part of a broader open data and innovation centric strategy.
The full brief can be found here”

Why you should never trust a data visualisation


in The Guardian: “An excellent blogpost has been receiving a lot of attention over the last week. Pete Warden, an experienced data scientist and author for O’Reilly on all things data, writes:

The wonderful thing about being a data scientist is that I get all of the credibility of genuine science, with none of the irritating peer review or reproducibility worries … I thought I was publishing an entertaining view of some data I’d extracted, but it was treated like a scientific study.

This is an important acknowledgement of a very real problem, but in my view Warden has the wrong target in his crosshairs. Data presented in any medium is a powerful tool and must be used responsibly, but it is when information is expressed visually that the risks are highest.
The central example Warden uses is his visualisation of Facebook friend networks across the United States, which proved extremely popular and was even cited in the New York Times as evidence for growing social division.
As he explains in his post, the methodology behind his underlying network graph is perfectly defensible, but the subsequent clustering process was “produced by me squinting at all the lines, coloring in some areas that seemed more connected in a paint program, and picking silly names for the areas”. The exercise was only ever intended as a bit of fun with a large and interesting dataset, so there really shouldn’t be any problem here.
But there is: humans are visual creatures. Peer-reviewed studies have shown that we can consume information more quickly when it is expressed in diagrams than when it is presented as text.
Even something as simple as colour scheme can have a marked impact on the perceived credibility of information presented visually – often a considerably more marked impact than the actual authority of the data source.
Another great example of this phenomenon was the Washington Post’s ‘map of the world’s most and least racially tolerant countries‘, which went viral back in May of this year. It was widely accepted as an objective, scientific piece of work, despite a number of social scientists identifying flaws in the methodology and the underlying data itself.”

New York City liberates map data trove


The Newyorkworld: New York City streets are free to walk, but until now getting the city’s master map database cost dearly.
That changed on Thursday, when the Department of City Planning made MapPLUTO — an extensive database full of information about each of the city’s parcels of land — available to the public on its website, free of charge.
jamaica_sample_mappluto
Previously, the same files came at a steep price of $300 per borough, and a required license barred users from posting any of the data, including maps, on the Internet.
“We revised our policy on the sale of PLUTO and MapPLUTO data in keeping with the Mayor’s ongoing commitment to using technology to improve customer service and transparency,” a Department of Planning spokesperson wrote in an email on Thursday.
In April, The New York World reported that even as the city moved to put vital government data online for free download, MapPLUTO remained a glaring exception.”

The Internet generation will learn to let go


Julian B. Gewirtz and Adam B. Kern in The Washington Post: “Ours is the first generation to have grown up with the Internet. The first generation that got suspended from school because of a photo of underage drinking posted online. The first generation that could talk in chat rooms to anyone, anywhere, without our parents knowing. The first generation that has been “tracked” and “followed” and “shared” since childhood.
All this data will remain available forever — both to the big players (tech companies, governments) and to our friends, our sort-of friends and the rest of civil society. This fact is not really new, but our generation will confront the latter on a scale beyond that experienced by previous generations…
Certainly there will be many uses for information, such as health data, that will wind up governed by law. But so many other uses cannot be predicted or legislated, and laws themselves have to be informed by values. It is therefore critical that people establish, with their actions and expectations, cultural norms that prevent their digital selves from imprisoning their real selves.
We see three possible paths: One, people become increasingly restrained about what they share and do online. Two, people become increasingly restrained about what they do, period. Three, we learn to care less about what people did when they were younger, less mature or otherwise different.
The first outcome seems unproductive. There is no longer much of an Internet without sharing, and one of the great benefits of the Internet has been its ability to nurture relationships and connections that previously had been impossible. Withdrawal is unacceptable. Fear of the digital future should not drive us apart.
The second option seems more deeply unsettling. Childhood, adolescence, college — the whole process of growing up — is, as thinkers from John Locke to Dr. Spock have written, a necessarily experimental time. Everyone makes at least one mistake, and we’d like to think that process continues into adulthood. Creativity should not be overwhelmed by the fear of what people might one day find unpalatable.
This leaves the third outcome: the idea that we must learn to care less about what people did when they were younger or otherwise different. In an area where regulations, privacy policies and treaties may take decades to catch up to reality, our generation needs to take the lead in negotiating a “cultural treaty” endorsing a new value, related to privacy, that secures our ability to have a past captured in data that is not held to be the last word but seen in light of our having grown up in a way that no one ever has before.
Growing up, that is, on the record.”

The Charitable-Industrial Complex


Peter Buffett in the New York Times: “It’s time for a new operating system. Not a 2.0 or a 3.0, but something built from the ground up. New code.

What we have is a crisis of imagination. Albert Einstein said that you cannot solve a problem with the same mind-set that created it. Foundation dollars should be the best “risk capital” out there.

There are people working hard at showing examples of other ways to live in a functioning society that truly creates greater prosperity for all (and I don’t mean more people getting to have more stuff).

Money should be spent trying out concepts that shatter current structures and systems that have turned much of the world into one vast market. Is progress really Wi-Fi on every street corner? No. It’s when no 13-year-old girl on the planet gets sold for sex. But as long as most folks are patting themselves on the back for charitable acts, we’ve got a perpetual poverty machine.

It’s an old story; we really need a new one.”

Big data  + politics = open data: The case of health care data in England


New Paper in Policy & Internet: “There is a great deal of enthusiasm about the prospects for Big Data held in health care systems around the world. Health care appears to offer the ideal combination of circumstances for its exploitation, with a need to improve productivity on the one hand and the availability of data that can be used to identify opportunities for improvement on the other. The enthusiasm rests on two assumptions. First, that the data sets held by hospitals and other organizations, and the technological infrastructure needed for their acquisition, storage, and manipulation, are up to the task. Second, that organizations outside health care systems will be able to access detailed datasets. We argue that both assumptions can be challenged. The article uses the example of the National Health Service in England to identify data, technology, and information governance challenges. The public acceptability of third party access to detailed health care datasets is, at best, unclear.”

Sitegeist


“Sitegeist is a mobile application that helps you to learn more about your surroundings in seconds. Drawing on publicly available information, the app presents solid data in a simple at-a-glance format to help you tap into the pulse of your location. From demographics about people and housing to the latest popular spots or weather, Sitegeist presents localized information visually so you can get back to enjoying the neighborhood. The application draws on free APIs such as the U.S. Census, Yelp! and others to showcase what’s possible with access to data. Sitegeist was created by the Sunlight Foundation in consultation with design firm IDEO and with support from the John S. and James L. Knight Foundation. It is the third in a series of National Data Apps.”

Create a Crowd Competition That Works


Ahmad Ashkar in HBR Blog Network: “It’s no secret that people in business are turning to the crowd to solve their toughest challenges. Well-known sites like Kickstarter and Indiegogo allow people to raise money for new projects. Design platforms like Crowdspring and 99designs give people the tools needed to crowdsource graphic design ideas and feedback.
At the Hult Prize — a start-up accelerator that challenges Millennials to develop innovative social enterprises to solve our world’s most pressing issues (and rewards the top team with $1,000,000 in start-up capital) — we’ve learned that the crowd can also offer an unorthodox solution in developing innovative and disruptive ideas, particularly ones focused on tackling complex, large-scale social issues.
But to effectively harness the power of the crowd, you have to engage it carefully. Over the past four years, we’ve developed a well-defined set of principles that guide our annual “challenge,” (lauded by Bill Clinton in TIME magazine as one of the top five initiatives changing the world for the better) that produces original and actionable ideas to solve social issues.
Companies like Netflix, General Electric, and Proctor & Gamble have also started “challenging the crowd” and employing many of these principles to tackle their own business roadblocks. If you’re looking to spark disruptive and powerful ideas that benefit your company, follow these guidelines to launch an engaging competition:
1. Define the boundaries
2. Identify a specific and bold stretch target. …
3. Insist on low barriers to entry. …
4. Encourage teams and networks. …
5. Provide a toolkit. Once interested parties become participants in your challenge, provide tools to set them up for success. If you are working on a social problem, you can use IDEO’s human-centered design toolkit. If you have a private-sector challenge, consider posting it on an existing innovation platform. As an organizer, you don’t have to spend time recreating the wheel — use one of the many existing platforms and borrow materials from those willing to share.”