Big Data Is Not Our Master. Humans create technology. Humans can control it.


Chris Hughes in New Republic: “We’ve known for a long time that big companies can stalk our every digital move and customize our every Web interaction. Our movements are tracked by credit cards, Gmail, and tollbooths, and we haven’t seemed to care all that much.
That is, until this week’s news of government eavesdropping, with the help of these very same big companies—Verizon, Facebook, and Google, among others. For the first time, America is waking up to the realities of what all this information—known in the business as “big data”—enables governments and corporations to do….
We are suddenly wondering, Can the rise of enormous data systems that enable this surveillance be stopped or controlled? Is it possible to turn back the clock?
Technologists see the rise of big data as the inevitable march of history, impossible to prevent or alter. Viktor Mayer-Schönberger and Kenneth Cukier’s recent book Big Data is emblematic of this argument: They say that we must cope with the consequences of these changes, but they never really consider the role we play in creating and supporting these technologies themselves….
But these well-meaning technological advocates have forgotten that as a society, we determine our own future and set our own standards, norms, and policy. Talking about technological advancements as if they are pre-ordained science erases the role of human autonomy and decision-making in inventing our own future. Big data is not a Leviathan that must be coped with, but a technological trend that we have made possible and support through social and political policy.”

Smart Citizen Kit enables crowdsourced environmental monitoring


Emma Hutchings at PSFK: “The Smart Citizen Kit is a crowdsourced environmental monitoring platform. By scattering devices around the world, the creators hope to build a global network of sensors that report local environmental conditions like CO and NO2 levels, light, noise, temperature and humidity.
Organized by the Fab Lab at the Institute for Advanced Architecture of Catalonia, a team of scientists, architects, and engineers are paving the way to humanize environmental monitoring. The open-source platform consists of arduino-compatible hardware, data visualization web API and a mobile app. Users are invited to take part in the interactive global environmental database, visualizing their data and comparing it with others around the world.”
Smart Citizen Kit Calls For Environmental Monitoring

Why Big Data Is Not Truth


Quentin Hardy in the New York Times: “Kate Crawford, a researcher at Microsoft Research, calls the problem “Big Data fundamentalism — the idea with larger data sets, we get closer to objective truth.” Speaking at a conference in Berkeley, Calif., on Thursday, she identified what she calls “six myths of Big Data.”
Myth 1: Big Data is New
In 1997, there was a paper that discussed the difficulty of visualizing Big Data, and in 1999, a paper that discussed the problems of gaining insight from the numbers in Big Data. That indicates that two prominent issues today in Big Data, display and insight, had been around for awhile…..
Myth 2: Big Data Is Objective
Over 20 million Twitter messages about Hurricane Sandy were posted last year. … “These were very privileged urban stories.” And some people, privileged or otherwise, put information like their home addresses on Twitter in an effort to seek aid. That sensitive information is still out there, even though the threat is gone.
Myth 3: Big Data Doesn’t Discriminate
“Big Data is neither color blind nor gender blind,” Ms. Crawford said. “We can see how it is used in marketing to segment people.” …
Myth 4: Big Data Makes Cities Smart
…, moving cities toward digital initiatives like predictive policing, or creating systems where people are seen, whether they like it or not, can promote lots of tension between individuals and their governments.
Myth 5: Big Data Is Anonymous
A study published in Nature last March looked at 1.5 million phone records that had personally identifying information removed. It found that just four data points of when and where a call was made could identify 95 percent of individuals. …
Myth 6: You Can Opt Out
… given the ways that information can be obtained in these big systems, “what are the chances that your personal information will never be used?”
Before Big Data disappears into the background as another fact of life, Ms. Crawford said, “We need to think about how we will navigate these systems. Not just individually, but as a society.”

Techs and the City


Alec Appelbaum, who teaches at Pratt Institute in The New York Times: “THIS spring New York City is rolling out its much-ballyhooed bike-sharing program, which relies on a sophisticated set of smartphone apps and other digital tools to manage it. The city isn’t alone: across the country, municipalities are buying ever more complicated technological “solutions” for urban life.

But higher tech is not always essential tech. Cities could instead be making savvier investments in cheaper technology that may work better to stoke civic involvement than the more complicated, expensive products being peddled by information-technology developers….

To be sure, big tech can zap some city weaknesses. According to I.B.M., its predictive-analysis technology, which examines historical data to estimate the next crime hot spots, has helped Memphis lower its violent crime rate by 30 percent.

But many problems require a decidedly different approach. Take the seven-acre site in Lower Manhattan called the Seward Park Urban Renewal Area, where 1,000 mixed-income apartments are set to rise. A working-class neighborhood that fell to bulldozers in 1969, it stayed bare as co-ops nearby filled with affluent families, including my own.

In 2010, with the city ready to invite developers to bid for the site, long-simmering tensions between nearby public-housing tenants and wealthier dwellers like me turned suddenly — well, civil.

What changed? Was it some multimillion-dollar “open democracy” platform from Cisco, or a Big Data program to suss out the community’s real priorities? Nope. According to Dominic Pisciotta Berg, then the chairman of the local community board, it was plain old e-mail, and the dialogue it facilitated. “We simply set up an e-mail box dedicated to receiving e-mail comments” on the renewal project, and organizers would then “pull them together by comment type and then consolidate them for display during the meetings,” he said. “So those who couldn’t be there had their voices considered and those who were there could see them up on a screen and adopted, modified or rejected.”

Through e-mail conversations, neighbors articulated priorities — permanently affordable homes, a movie theater, protections for small merchants — that even a supercomputer wouldn’t necessarily have identified in the data.

The point is not that software is useless. But like anything else in a city, it’s only as useful as its ability to facilitate the messy clash of real human beings and their myriad interests and opinions. And often, it’s the simpler software, the technology that merely puts people in contact and steps out of the way, that works best.”

The Dictatorship of Data


Kenneth Cukier and Viktor Mayer-Schönberger in MIT Technology Review: “Big data is poised to transform society, from how we diagnose illness to how we educate children, even making it possible for a car to drive itself. Information is emerging as a new economic input, a vital resource. Companies, governments, and even individuals will be measuring and optimizing everything possible.
But there is a dark side. Big data erodes privacy. And when it is used to make predictions about what we are likely to do but haven’t yet done, it threatens freedom as well. Yet big data also exacerbates a very old problem: relying on the numbers when they are far more fallible than we think. Nothing underscores the consequences of data analysis gone awry more than the story of Robert McNamara.”

The Declassification Engine


Wired: “The CIA offers an electronic search engine that lets you mine about 11 million agency documents that have been declassified over the years. It’s called CREST, short for CIA Records Search Tool. But this represents only a portion the CIA’s declassified materials, and if you want unfettered access to the search engine, you’ll have to physically visit the National Archives at College Park, Maryland….
a new project launched by a team of historians, mathematicians, and computer scientists at Columbia University in New York City. Led by Matthew Connelly — a Columbia professor trained in diplomatic history — the project is known as The Declassification Engine, and it seeks to provide a single online database for declassified documents from across the federal government, including the CIA, the State Department, and potentially any other agency.
The project is still in the early stages, but the team has already assembled a database of documents that stretches back to the 1940s, and it has begun building new tools for analyzing these materials. In aggregating all documents into a single database, the researchers hope to not only provide quicker access to declassified materials, but to glean far more information from these documents than we otherwise could.
In the parlance of the day, the project is tackling these documents with the help of Big Data. If you put enough of this declassified information in a single place, Connelly believes, you can begin to predict what government information is still being withheld”

Deepbills project


Cato Institute: “The Deepbills project takes the raw XML of Congressional bills (available at FDsys and Thomas) and adds additional semantic information to them in inside the text.

You can download the continuously-updated data at http://deepbills.cato.org/download

Congress already produces machine-readable XML of almost every bill it proposes, but that XML is designed primarily for formatting a paper copy, not for extracting information. For example, it’s not currently possible to find every mention of an Agency, every legal reference, or even every spending authorization in a bill without having a human being read it….
Currently the following information is tagged:

  • Legal citations…
  • Budget Authorities (both Authorizations of Appropriations and Appropriations)…
  • Agencies, bureaus, and subunits of the federal government.
  • Congressional committees
  • Federal elective officeholders (Congressmen)”

Intel Fuels a Rebellion Around Your Data


we the dataAntonio Regalado and Jessica Leber in MIT Technology Review:”Intel Labs, the company’s R&D arm, is launching an initiative around what it calls the “data economy”—how consumers might capture more of the value of their personal information, like digital records of their their location or work history. To make this possible, Intel is funding hackathons to urge developers to explore novel uses of personal data. It has also paid for a rebellious-sounding website called We the Data, featuring raised fists and stories comparing Facebook to Exxon Mobil.
Intel’s effort to stir a debate around “your data” is just one example of how some companies—and society more broadly—are grappling with a basic economic asymmetry of the big data age: they’ve got the data, and we don’t.

Data Edge


Steven Weber, professor in the School of Information and Political Science department at UC Berkeley, in Policy by the Numbers“It’s commonly said that most people overestimate the impact of technology in the short term, and underestimate its impact over the longer term.
Where is Big Data in 2013? Starting to get very real, in our view, and right on the cusp of underestimation in the long term. The short term hype cycle is (thankfully) burning itself out, and the profound changes that data science can and will bring to human life are just now coming into focus. It may be that Data Science is right now about where the Internet itself was in 1993 or so. That’s roughly when it became clear that the World Wide Web was a wind that would blow across just about every sector of the modern economy while transforming foundational things we thought were locked in about human relationships, politics, and social change. It’s becoming a reasonable bet that Data Science is set to do the same—again, and perhaps even more profoundly—over the next decade. Just possibly, more quickly than that….
Can data, no matter how big, change the world for the better? It may be the case that in some fields of human endeavor and behavior, the scientific analysis of big data by itself will create such powerful insights that change will simply have to happen, that businesses will deftly re-organize, that health care will remake itself for efficiency and better outcomes, that people will adopt new behaviors that make them happier, healthier, more prosperous and peaceful. Maybe. But almost everything we know about technology and society across human history argues that it won’t be so straightforward.
…join senior industry and academic leaders at DataEDGE at UC Berkeley on May 30-31 to engage in what will be a lively and important conversation aimed at answering today’s questions about the data science revolution—and formulating tomorrow’s.

Wikipedia Recent Changes Map


Wikipedia

The Verge: “By watching a new visualization, known plainly as the Wikipedia Recent Changes Map, viewers can see the location of every unregistered Wikipedia user who makes a change to the open encyclopedia. It provides a voyeuristic look at the rate that knowledge is contributed to the website, giving you the faintest impression of the Spaniard interested in the television show Jackass or the Brazilian who defaced the page on the Jersey Devil to feature a photograph of the new pope. Though the visualization moves quickly, it’s only displaying about one-fifth of the edits being made: Wikipedia doesn’t reveal location data for registered users, and unregistered users make up just 15 to 20 percent of all contribution, according to studies of the website.”