Kenneth Neil Cukier and Viktor Mayer-Schoenberger in Foreign Affairs: “Everyone knows that the Internet has changed how businesses operate, governments function, and people live. But a new, less visible technological trend is just as transformative: “big data.” Big data starts with the fact that there is a lot more information floating around these days than ever before, and it is being put to extraordinary new uses. Big data is distinct from the Internet, although the Web makes it much easier to collect and share data. Big data is about more than just communication: the idea is that we can learn from a large body of information things that we could not comprehend when we used only smaller amounts.”
Gideon Rose, editor of Foreign Affairs, sits down with Kenneth Cukier, data editor of The Economist (video):
Investigating Terror in the Age of Twitter
Michael Chertoff and Dallas Lawrence in WSJ: “A dozen years ago when the terrorists struck on 9/11, there was no Facebook or Twitter or i-anything on the market. Cellphones were relatively common, but when cell networks collapsed in 2001, many people were left disconnected and wanting for immediate answers. Last week in Boston, when mobile networks became overloaded following the bombings, the social-media-savvy Boston Police Department turned to Twitter, using the platform as a makeshift newsroom to alert media and concerned citizens to breaking news.
Law-enforcement agencies around the world will note how social media played a prominent role both in telling the story and writing its eventual conclusion. Some key lessons have emerged.”
Knowing Where to Focus the Wisdom of Crowds
Nick Bilton in NYT: “It looks as if the theory of the “wisdom of crowds” doesn’t apply to terrorist manhunts. Last week after the Boston Marathon bombings, the Internet quickly offered to help find the people responsible. In a scene metaphorically reminiscent of a movie in which vigilantes swarm the streets with pitchforks and lanterns, people took to Reddit, the popular community and social news Web site, and started scouring images posted online from the bombings.
One Reddit forum told users to search for ”people carrying black bags,” and noted that “if they look suspicious, then post them. Then people will try and follow their movements using all the images.” In the process, each time a scrap of information was discovered — the color of a hat, the type of straps on a backpack, the weighted droop of a bag — it was passed out on Twitter like “Wanted” posters tacked to lampposts. It didn’t matter whether it was right, wrong or even completely made up (some images posted to forums had been manipulated) — off it went, fiction and fact indistinguishable. Some misinformation online landed on the front page of The New York Post, incorrectly identifying an innocent high school student as a suspect. Later in the week, the Web wrongly identified one of the suspects as a student from Brown University who went missing earlier this month…
Perhaps the scariest aspect of these crowd-like investigations is that when information is incorrect, no one is held responsible.
As my colleague David Carr noted in his column this week, “even good reporters with good sources can end up with stories that go bad.” But the difference between CNN, The Associated Press or The New York Post getting it wrong, is that those names are held accountable when they publish incorrect news. No one is going to remember, or punish, the users on Reddit or Twitter who incorrectly identify random high school runners and missing college students as terrorists.”
Crowd diagnosis could spot rare diseases doctors miss
New Scientist: “Diagnosing rare illnesses could get easier, thanks to new web-based tools that pool information from a wide variety of sources…CrowdMed, launched on 16 April at the TedMed conference in Washington DC, uses crowds to solve tough medical cases.
Anyone can join CrowdMed and analyse cases, regardless of their background or training. Participants are given points that they can then use to bet on the correct diagnosis from lists of suggestions. This creates a prediction market, with diagnoses falling and rising in value based on their popularity, like stocks in a stock market. Algorithms then calculate the probability that each diagnosis will be correct. In 20 initial test cases, around 700 participants identified each of the mystery diseases as one of their top three suggestions….
Frustrated patients and doctors can also turn to FindZebra, a recently launched search engine for rare diseases. It lets users search an index of rare disease databases looked after by a team of researchers. In initial trials, FindZebra returned more helpful results than Google on searches within this same dataset.”
White House: Unleashing the Power of Big Data
Tom Kalil, Deputy Director for Technology and Innovation at OSTP : “As we enter the second year of the Big Data Initiative, the Obama Administration is encouraging multiple stakeholders, including federal agencies, private industry, academia, state and local government, non-profits, and foundations to develop and participate in Big Data initiatives across the country. Of particular interest are partnerships designed to advance core Big Data technologies; harness the power of Big Data to advance national goals such as economic growth, education, health, and clean energy; use competitions and challenges; and foster regional innovation.
The National Science Foundation has issued a request for information encouraging stakeholders to identify Big Data projects they would be willing to support to achieve these goals. And, later this year, OSTP, NSF, and other partner agencies in the Networking and Information Technology R&D (NITRD) program plan to convene an event that highlights high-impact collaborations and identifies areas for expanded collaboration between the public and private sectors.”
Work-force Science and Big Data
Steve Lohr from the New York Times: “Work-force science, in short, is what happens when Big Data meets H.R….Today, every e-mail, instant message, phone call, line of written code and mouse-click leaves a digital signal. These patterns can now be inexpensively collected and mined for insights into how people work and communicate, potentially opening doors to more efficiency and innovation within companies.
Digital technology also makes it possible to conduct and aggregate personality-based assessments, often using online quizzes or games, in far greater detail and numbers than ever before. In the past, studies of worker behavior were typically based on observing a few hundred people at most. Today, studies can include thousands or hundreds of thousands of workers, an exponential leap ahead.
“The heart of science is measurement,” says Erik Brynjolfsson, director of the Center for Digital Business at the Sloan School of Management at M.I.T. “We’re seeing a revolution in measurement, and it will revolutionize organizational economics and personnel economics.”
The data-gathering technology, to be sure, raises questions about the limits of worker surveillance. “The larger problem here is that all these workplace metrics are being collected when you as a worker are essentially behind a one-way mirror,” says Marc Rotenberg, executive director of the Electronic Privacy Information Center, an advocacy group. “You don’t know what data is being collected and how it is used.”
Policy Modeling through Collaboration and Simulation
New paper on “Bridging narrative scenario texts and formal policy modeling through conceptual policy modeling” in Artificial Intelligence and Law.
Abstract: “Engaging stakeholders in policy making and supporting policy development with advanced information and communication technologies including policy simulation is currently high on the agenda of research. In order to involve stakeholders in providing their input to policy modeling via online means, simple techniques need to be employed such as scenario technique. Scenarios enable stakeholders to express their views in narrative text. At the other end of policy development, a frequently used approach to policy modeling is agent-based simulation. So far, effective support to transform narrative text input to formal simulation statements is not widely available. In this paper, we present a novel approach to support the transformation of narrative texts via conceptual modeling into formal simulation models. The approach also stores provenance information which is conveyed via annotations of texts to the conceptual model and further on to the simulation model. This way, traceability of information is provided, which contributes to better understanding and transparency, and therewith enables stakeholders and policy modelers to return to the sources that informed the conceptual and simulation model.”
Digital Public Library of America Launched
Press Release: “The Digital Public Library of America (DPLA) launched a beta of its discovery portal and open platform today. The portal delivers millions of materials found in American archives, libraries, museums, and cultural heritage institutions to students, teachers, scholars, and the public. Far more than a search engine, the portal provides innovative ways to search and scan through its united collection of distributed resources. Special features include a dynamic map, a timeline that allow users to visually browse by year or decade, and an app library that provides access to applications and tools created by external developers using DPLA’s open data…
With an application programming interface (API) and maximally open data, the DPLA can be used by software developers, researchers, and others to create novel environments for learning, tools for discovery, and engaging apps. The DPLA App Library features an initial slate of applications built on top of the platform; developers and hobbyists of all skill levels are freely able to make use of the data provided via the platform….
With its content partners, the DPLA has developed a number of diverse virtual exhibitions that tell the stories of people, places, and historical events both here in the US and abroad; all are available freely via the portal.”
Demystifying data centers
Wired: “If you walk into the lobby of the data center Facebook operates in the high desert in Prineville, Oregon, you’ll find a flatscreen display on the wall where you can check the pulse of this massive computing facility.
The display tracks the efficiency of the operation, which spans 333,400-square feet and tens of thousands of computer servers. Facebook built this data center in an effort to significantly reduce the power and dollars needed to serve up the world’s most popular social network, and — driven by CEO Mark Zuckerberg’s deep-seeded belief in the free exchange of ideas — the company aims to push the computing world in a similar direction. The display — which shows much the same information Facebook engineers use to monitor the facility — is an advertisement for the Facebook way.
Now, the company is taking this idea a step further. On Thursday, Facebook uncloaked a pair of web services that let anyone in the world track the efficiency of the Prineville data center and its sister facility in Forest City, North Carolina. “We’re pulling back the curtain to share some of the same information that our data center technicians view every day,” Facebook’s Lyrica McTiernan said in a blog post. “We think it’s important to demystify data centers and share more about what our operations really look like.”
Newark's Cory Booker: Social Media Can Help Fix Broken Government
Internet Evolution on Cory Booker’s panel at Ad Age Digital Conference: “Social media have been a part of a transformation of the City of Newark from a butt of jokes to a community experiencing economic growth, Booker told the Ad Age conference. Newark has a population of 300,000 in a state with 9 million people, and yet, Newark has a third of the economic growth in the state. The city population is growing for the first time in 60 years.
Social media can be a big part of the cure for government that has become unresponsive to the needs of its citizens, Booker said. He quoted California Lt. Governor Gavin Newsom, who uses the phrase “vending machine government.” Citizens pay for government services, and get prepackaged offerings in return. “If you don’t like what you get, you shake the vending machine,” Booker said…
When people lean back and disengage, government becomes unresponsive. But social media provide the tools for citizens to collaborate with government. “We have all these tools pulling government away from citizens,” Booker said. These include special interest groups and moneyed corporate lobbies. “But social media brings us closer.”
Twitter helped Newark rebuild its reputation. The city had been a butt of jokes for years. When Conan O’Brien made a joke at Newark’s expense, Booker replied with an online video that said O’Brien was now on the no-fly list at Newark Airport. The TSA got into the act, issuing a statement that Booker didn’t have that power. Then-Secretary of State Hillary Clinton followed up with a plea for Booker and O’Brien to just get along.
And it’s not just a matter of public relations; social media have helped improve Newark in concrete ways — Newark’s government is more effective. For example, its inspectors are vastly more efficient at finding violations when citizens can use social media to point up problems, Booker said.
Video can be an even more powerful tool for getting a message out than microblogging services such as Twitter, Booker said. And that led to discussion of Booker’s startup, #waywire. The beta video service, updated this week to focus on video curation, is a place where people can collect and share online video.”