The Big Data Debate: Correlation vs. Causation


Gil Press: “In the first quarter of 2013, the stock of big data has experienced sudden declines followed by sporadic bouts of enthusiasm. The volatility—a new big data “V”—continues and Ted Cuzzillo summed up the recent negative sentiment in “Big data, big hype, big danger” on SmartDataCollective:
“A remarkable thing happened in Big Data last week. One of Big Data’s best friends poked fun at one of its cornerstones: the Three V’s. The well-networked and alert observer Shawn Rogers, vice president of research at Enterprise Management Associates, tweeted his eight V’s: ‘…Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data.’ He was quick to explain to me that this is no comment on Gartner analyst Doug Laney’s three-V definition. Shawn’s just tired of people getting stuck on V’s.”…
Cuzzillo is joined by a growing chorus of critics that challenge some of the breathless pronouncements of big data enthusiasts. Specifically, it looks like the backlash theme-of-the-month is correlation vs. causation, possibly in reaction to the success of Viktor Mayer-Schönberger and Kenneth Cukier’s recent big data book in which they argued for dispensing “with a reliance on causation in favor of correlation”…
In “Steamrolled by Big Data,” The New Yorker’s Gary Marcus declares that “Big Data isn’t nearly the boundless miracle that many people seem to think it is.”…
Matti Keltanen at The Guardian agrees, explaining “Why ‘lean data’ beats big data.” Writes Keltanen: “…the lightest, simplest way to achieve your data analysis goals is the best one…The dirty secret of big data is that no algorithm can tell you what’s significant, or what it means. Data then becomes another problem for you to solve. A lean data approach suggests starting with questions relevant to your business and finding ways to answer them through data, rather than sifting through countless data sets. Furthermore, purely algorithmic extraction of rules from data is prone to creating spurious connections, such as false correlations… today’s big data hype seems more concerned with indiscriminate hoarding than helping businesses make the right decisions.”
In “Data Skepticism,” O’Reilly Radar’s Mike Loukides adds this gem to the discussion: “The idea that there are limitations to data, even very big data, doesn’t contradict Google’s mantra that more data is better than smarter algorithms; it does mean that even when you have unlimited data, you have to be very careful about the conclusions you draw from that data. It is in conflict with the all-too-common idea that, if you have lots and lots of data, correlation is as good as causation.”
Isn’t more-data-is-better the same as correlation-is-as-good-as-causation? Or, in the words of Chris Andersen, “with enough data, the numbers speak for themselves.”
“Can numbers actually speak for themselves?” non-believer Kate Crawford asks in “The Hidden Biases in Big Data” on the Harvard Business Review blog and answers: “Sadly, they can’t. Data and data sets are not objective; they are creations of human design…
And David Brooks in The New York Times, while probing the limits of “the big data revolution,” takes the discussion to yet another level: “One limit is that correlations are actually not all that clear. A zillion things can correlate with each other, depending on how you structure the data and what you compare. To discern meaningful correlations from meaningless ones, you often have to rely on some causal hypothesis about what is leading to what. You wind up back in the land of human theorizing…”

The Next Great Internet Disruption: Authority and Governance


An essay by David Bollier and John Clippinger as part of their ongoing work of ID3, the Institute for Data-Driven Design :As the Internet and digital technologies have proliferated over the past twenty years, incumbent enterprises nearly always resist open network dynamics with fierce determination, a narrow ingenuity and resistance….But the inevitable rearguard actions to defend old forms are invariably overwhelmed by the new, network-based ones.  The old business models, organizational structures, professional sinecures, cultural norms, etc., ultimately yield to open platforms.
When we look back on the past twenty years of Internet history, we can more fully appreciate the prescience of David P. Reed’s seminal 1999 paper on “Group Forming Networks” (GFNs). “Reed’s Law” posits that value in networks increases exponentially as interactions move from a broadcasting model that offers “best content” (in which value is described by n, the number of consumers) to a network of peer-to-peer transactions (where the network’s value is based on “most members” and mathematically described by n2).  But by far the most valuable networks are based on those that facilitate group affiliations, Reed concluded.  When users have tools for “free and responsible association for common purposes,” he found, the value of the network soars exponentially to 2– a fantastically large number.   This is the Group Forming Network.  Reed predicted that “the dominant value in a typical network tends to shift from one category to another as the scale of the network increases.…”
What is really interesting about Reed’s analysis is that today’s world of GFNs, as embodied by Facebook, Twitter, Wikipedia and other Web 2.0 technologies, remains highly rudimentary.  It is based on proprietary platforms (as opposed to open source, user-controlled platforms), and therefore provides only limited tools for members of groups to develop trust and confidence in each other.  This suggests a huge, unmet opportunity to actualize greater value from open networks.  Citing Francis Fukuyama’ book Trust, Reed points out that “there is a strong correlation between the prosperity of national economies and social capital, which [Fukuyama] defines culturally as the ease with which people in a particular culture can form new associations.”

Measuring Impact of Open and Transparent Governance


opengovMark Robinson @ OGP blog: “Eighteen months on from the launch of the Open Government Partnership in New York in September 2011, there is growing attention to what has been achieved to date.  In the recent OGP Steering Committee meeting in London, government and civil society members were unanimous in the view that the OGP must demonstrate results and impact to retain its momentum and wider credibility.  This will be a major focus of the annual OGP conference in London on 31 October and 1 November, with an emphasis on showcasing innovations, highlighting results and sharing lessons.
Much has been achieved in eighteen months.  Membership has grown from 8 founding governments to 58.  Many action plan commitments have been realised for the majority of OGP member countries. The Independent Reporting Mechanism has been approved and launched. Lesson learning and sharing experience is moving ahead….
The third type of results are the trickiest to measure: What has been the impact of openness and transparency on the lives of ordinary citizens?  In the two years since the OGP was launched it may be difficult to find many convincing examples of such impact, but it is important to make a start in collecting such evidence.
Impact on the lives of citizens would be evident in improvements in the quality of service delivery, by making information on quality, access and complaint redressal public. A related example would be efficiency savings realised from publishing government contracts.  Misallocation of public funds exposed through enhanced budget transparency is another. Action on corruption arising from bribes for services, misuse of public funds, or illegal procurement practices would all be significant results from these transparency reforms.  A final example relates to jobs and prosperity, where the utilisation of government data in the public domain by the private sector to inform business investment decisions and create employment.
Generating convincing evidence on the impact of transparency reforms is critical to the longer-term success of the OGP. It is the ultimate test of whether lofty public ambitions announced in country action plans achieve real impacts to the benefit of citizens.”

Department of Better Technology


logo-250Next City reports: “…opening up government can get expensive. That’s why two developers this week launched the Department of Better Technology, an effort to make open government tools cheaper, more efficient and easier to engage with.

As founder Clay Johnson explains in a post on the site’s blog, a federal website that catalogues databases on government contracts, which launched last year, cost $181 million to build — $81 million more than a recent research initiative to map the human brain.

“I’d like to say that this is just a one-off anomaly, but government regularly pays millions of dollars for websites,” writes Johnson, the former director of Sunlight Labs at the Sunlight Foundation and author the 2012 book The Information Diet.

The first undertaking of Johnson and his partner, GovHub co-founder Adam Becker, is a tool meant to make it simpler for businesses to find government projects to bid on, as well as help officials streamline the process of managing procurements. In a pilot experiment, Johnson writes, the pair found that not only were bids coming in faster and at a reduced price, but more people were doing the bidding.

Per Johnson, “many of the bids that came in were from businesses that had not ordinarily contracted with the federal government before.”
The Department of Better Technology will accept five cities to test a beta version of this tool, called Procure.io, in 2013.”

Visual argumentation


Volta: “Visualising arguments helps people assemble their throughts and get to grip with complex problems according to The Argumentation Factory, based in Amsterdam. Their Argument Maps, constructed for government agencies, NGOs and commercial organizations, are designed to enable people to make better decisions and share and communicate information.
Dutch research organisation TNO, in association with The Argumentation Factory, have launched the European Shale Gas Argument Map detailing the pros and cons of the production of shale gas for EU member states with shale gas resources. Their map is designed to provide the foundation for an open discussion and help the user make a balaced assessment.”

schaliegaswinning-s-26

The Dark Side of the Digital Revolution


Eric Schmidt, Google’s executive chairman and former CEO.  and Jared Cohen, director of Google Ideas in the WSJ: “…While technology has great potential to bring about change, there is a dark side to the digital revolution that is too often ignored. There is a turbulent transition ahead for autocratic regimes as more of their citizens come online, but technology doesn’t just help the good guys pushing for democratic reform—it can also provide powerful new tools for dictators to suppress dissent.
Fifty-seven percent of the world’s population still lives under some sort of autocratic regime. In the span of a decade, the world’s autocracies will go from having a minority of their citizens online to a majority. From Tehran to Beijing, autocrats are building the technology and training the personnel to suppress democratic dissent, often with the help of Western companies….
Dictators and autocrats in the years to come will attempt to build all-encompassing surveillance states, and they will have unprecedented technologies with which to do so. But they can never succeed completely. Dissidents will build tunnels out and bridges across. Citizens will have more ways to fight back than ever before—some of them anonymous, some courageously public.
The digital revolution will continue. For all the complications this revolution brings, no country is worse off because of the Internet. And with five billion people set to join us online in the coming decades—perhaps someday even the Pyongyang traffic police and the students in the Potemkin computer lab we visited in North Korea among them—the digital future can be bright indeed, despite its dark side.”
See also: The New Digital Age: Reshaping the Future of People, Nations and Business,

Work-force Science and Big Data


Steve Lohr from the New York Times: “Work-force science, in short, is what happens when Big Data meets H.R….Today, every e-mail, instant message, phone call, line of written code and mouse-click leaves a digital signal. These patterns can now be inexpensively collected and mined for insights into how people work and communicate, potentially opening doors to more efficiency and innovation within companies.

Digital technology also makes it possible to conduct and aggregate personality-based assessments, often using online quizzes or games, in far greater detail and numbers than ever before. In the past, studies of worker behavior were typically based on observing a few hundred people at most. Today, studies can include thousands or hundreds of thousands of workers, an exponential leap ahead.

“The heart of science is measurement,” says Erik Brynjolfsson, director of the Center for Digital Business at the Sloan School of Management at M.I.T. “We’re seeing a revolution in measurement, and it will revolutionize organizational economics and personnel economics.”

The data-gathering technology, to be sure, raises questions about the limits of worker surveillance. “The larger problem here is that all these workplace metrics are being collected when you as a worker are essentially behind a one-way mirror,” says Marc Rotenberg, executive director of the Electronic Privacy Information Center, an advocacy group. “You don’t know what data is being collected and how it is used.”

The New Digital Age: Reshaping the Future of People, Nations and Business


The New Digital Age: Reshaping the Future of People, Nations and Business by Eric Schmidt and Jared Cohen, Knopf, 2013
Scientific American: “Schmidt, executive chairman of Google, and Cohen, director of Google Ideas and a foreign policy wonk who has advised Hillary Clinton, deliver their vision of the future in this ambitious, fascinating account. For gadget geeks, the book is filled with tantalizing examples of futuristic goods and services: robotic plumbers; automated haircuts; computers that read body language; and 3-D holographs of weddings projected into the living rooms of relatives who couldn’t attend. Not surprisingly, the authors are bullish on how connectivity—access to the Internet that will soon be nearly universal—will transform education, terrorism, journalism, government, privacy and war. The result, they argue, though not perfect, will be “more egalitarian, more transparent and more interesting than we can even imagine.”