Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”
Paper by Stephanie McNulty for the 2013 Annual Meeting of the American Political Science Association (Aug. 29-Sept. 1, 2013): “Can a nationally mandated participatory budget process change the nature of local governance? Passed in 2003 to mandate participatory budgeting in all districts and regions of Peru, Peru’s National PB Law has garnered international attention from proponents of participatory governance. However, to date, the results of the process have not been widely documented. Presenting data that have been gathered through fieldwork, online databases, and primary documents, this paper explores the results of Peru’s PB after ten years of implementation. The paper finds that results are limited. While there are a significant number of actors engaged in the process, the PB is still dominated by elite actors that do not represent the diversity of the civil society sector in Peru. Participants approve important “pro-poor” projects, but they are not always executed. Finally, two important indicators of governance, sub-national conflict and trust in local institutions, have not improved over time. Until Peruvian politicians make a concerted effort to move beyond politics as usual, results will continue to be limited”
DOD/DARPA Notice (See also Foreign Policy article): “OBJECTIVE: Investigate the national security threat posed by public data available either for purchase or through open sources. Based on principles of data science, develop tools to characterize and assess the nature, persistence, and quality of the data. Develop tools for the rapid anonymization and de-anonymization of data sources. Develop framework and tools to measure the national security impact of public data and to defend against the malicious use of public data against national interests.
DESCRIPTION: The vulnerabilities to individuals from a data compromise are well known and documented now as “identity theft.” These include regular stories published in the news and research journals documenting the loss of personally identifiable information by corporations and governments around the world. Current trends in social media and commerce, with voluntary disclosure of personal information, create other potential vulnerabilities for individuals participating heavily in the digital world. The Netflix Challenge in 2009 was launched with the goal of creating better customer pick prediction algorithms for the movie service . An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data. This de-anonymization led to a federal lawsuit and the cancellation of the sequel challenge . The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase.
Could a modestly funded group deliver nation-state type effects using only public data?…”
The official link for this solicitation is: www.acq.osd.mil/osbp/sbir/solicitations/sbir20133.
Mashable: “The digital rights conversation was thrust into the mainstream spotlight after news of ongoing, widespread mass surveillance programs leaked to the public. Always a hot topic, these revelations sparked a strong online debate among the Internet community.
It also made us here at Mashable reflect on the digital freedoms and protections we feel each user should be guaranteed as a citizen of the Internet. To highlight some of the great conversations taking place about digital rights online, we asked the digital community to collaborate with us on the creation of a crowdsourced Digital Bill of Rights.
After six weeks of public discussions, document updates and changes, as well as incorporating input from digital rights experts, Mashable is pleased to unveil its first-ever Digital Bill of Rights, made for the Internet, by the Internet.”
Tom Steinberg in mySociety Blog: “As I wrote in my last post, I am very concerned about the lack of comprehensible, consistent language to talk about the hugely diverse ways in which people are using the internet to bring about social and political change….My approach to finding an appropriate name was to look at the way that other internet industry sectors are named, so that I could choose a name that sits nicely next to very familiar sectoral labels….
Segmenting the Civic Power sector
Choosing a single sectoral name – Civic Power – is not really the point of this exercise. The real benefit would come from being able to segment the many projects within this sector so that they are more easy to compare and contrast.
Here is my suggested four part segmentation of the Civic Power sector…:
- Decision influencing organisations try to directly shape or change particular decisions made by powerful individuals or organisations.
- Regime changing organisations try to replace decision makers, not persuade them.
- Citizen Empowering organisations try to give people the resources and the confidence required to exert power for whatever purpose those people see fit, both now and in the future.
- Digital Government organisations try to improve the ways in which governments acquire and use computers and networks. Strictly speaking this is just a sub-category of ‘decision influencing organisation’, on a par with an environmental group or a union, but more geeky.”
See also: Open Government – What’s in a Name?
New paper in Science: “Our society is increasingly relying on the digitized, aggregated opinions of others to make decisions. We therefore designed and analyzed a large-scale randomized experiment on a social news aggregation Web site to investigate whether knowledge of such aggregates distorts decision-making. Prior ratings created significant bias in individual rating behavior, and positive and negative social influences created asymmetric herding effects. Whereas negative social influence inspired users to correct manipulated ratings, positive social influence increased the likelihood of positive ratings by 32% and created accumulating positive herding that increased final ratings by 25% on average. This positive herding was topic-dependent and affected by whether individuals were viewing the opinions of friends or enemies. A mixture of changing opinion and greater turnout under both manipulations together with a natural tendency to up-vote on the site combined to create the herding effects. Such findings will help interpret collective judgment accurately and avoid social influence bias in collective intelligence in the future.”
See also: ‘Like’ This Article Online? Your Friends Will Probably Approve, Too, Scientists Say
New e-book: “Anita Estell has done it! She has published an easy-to-read handbook that promises to transform our individual and collective understanding of the federal government, how it really works, and most important, our own relevance in its operation. The Power of US is a must-have guide. It provides instruction for those possessing the audacity to seize the opportunities unfolding during one of the most transformational periods in American history. Estell shares insights, experiences, wisdom, and expertise, gained in more than twenty years of working at the federal level, in a way that not only invites and supports constructive engagement but also sheds light on the way forward. Estell provides an extraordinary panorama of information and instruction, melding a multidisciplinary suite of principles that underscore and bring texture to what Estell calls citizen-centricity, or citizen-centric engagement. The Power of US provides a profoundly creative approach relevant to policymakers and advocates. Estell’s treatment is a breath of fresh air in civic discourse—which can be stifled by stale approaches and potentially toxic hyperpartisan dynamics. In The Power of US, Estell establishes herself as a revolutionary thinker exhibiting the vision, knowledge, and personal power to move the compass of individual hope in the direction of collective freedom.”
Review by Martin Wolf of The Entrepreneurial State: Debunking Public vs Private Sector Myths, by Mariana Mazzucato, Anthem Press: “…what determines innovation? Conventional economics offers abstract models; conventional wisdom insists the answer lies with private entrepreneurship. In this brilliant book, Mariana Mazzucato, a Sussex university professor of economics who specialises in science and technology, argues that the former is useless and the latter incomplete. Yes, innovation depends on bold entrepreneurship. But the entity that takes the boldest risks and achieves the biggest breakthroughs is not the private sector; it is the much-maligned state…
Why is the state’s role so important? The answer lies in the huge uncertainties, time spans and costs associated with fundamental, science-based innovation. Private companies cannot and will not bear these costs, partly because they cannot be sure to reap the fruits and partly because these fruits lie so far in the future.
Indeed, the more competitive and finance-driven the economy, the less the private sector will be willing to bear such risks. Buying back shares is apparently a far more attractive way of using surplus cash than spending on fundamental innovation. The days of AT&T’s path-breaking Bell Labs are long gone. In any case, the private sector could not have created the internet or GPS. Only the US military had the resources to do so.
Arguably, the most important engines of innovation in the past five decades have been the US Defense Advanced Research Projects Agency and the NIH. Today, if the world is to make fundamental breakthroughs in energy technologies, states will play a big role. Indeed, the US government even helped drive the development of the hydraulic fracturing of shale rock.”
Paper by Cass Sunstein: “In recent years, social scientists have been incorporating empirical findings about human behavior into economic models. These findings offer important insights for thinking about regulation and its likely consequences. They also offer some suggestions about the appropriate design of effective, low-cost, choice-preserving approaches to regulatory problems, including disclosure requirements, default rules, and simplification. A general lesson is that small, inexpensive policy initiatives can have large and highly beneficial effects. In the United States, a large number of recent practices and reforms reflect an appreciation of this lesson. They also reflect an understanding of the need to ensure that regulations have strong empirical foundations, both through careful analysis of costs and benefits in advance and through retrospective review of what works and what does not.”
Wired: “I’m no neuroscientist, and yet, here I am at my computer attempting to reconstruct a neural circuit of a mouse’s retina. It’s not quite as difficult and definitely not as boring as it sounds. In fact, it’s actually pretty fun, which is a good thing considering I’m playing a videogame.
Called EyeWire, the browser-based game asks players to map the connections between retinal neurons by coloring in 3-D slices of the brain. Much like any other game out there, being good at EyeWire earns you points, but the difference is that the data you produce during gameplay doesn’t just get you on a leader board—it’s actually used by scientists to build a better picture of the human brain.
Created by neuroscientist Sebastian Seung’s lab at MIT, EyeWire basically gamifies the professional research Seung and his collaborators do on a daily basis. Seung is studying the connectome, the hyper-complex tangle of connections among neurons in the brain.”