Three Paradoxes of Big Data


New Paper by Neil M. Richards and Jonathan H. King in the Stanford Law Review Online:Big data is all the rage. Its proponents tout the use of sophisticated analytics to mine large data sets for insight as the solution to many of our society’s problems. These big data evangelists insist that data-driven decisionmaking can now give us better predictions in areas ranging from college admissions to dating to hiring to medicine to national security and crime prevention. But much of the rhetoric of big data contains no meaningful analysis of its potential perils, only the promise. We don’t deny that big data holds substantial potential for the future, and that large dataset analysis has important uses today. But we would like to sound a cautionary note and pause to consider big data’s potential more critically. In particular, we want to highlight three paradoxes in the current rhetoric about big data to help move us toward a more complete understanding of the big data picture. First, while big data pervasively collects all manner of private information, the operations of big data itself are almost entirely shrouded in legal and commercial secrecy. We call this the Transparency Paradox. Second, though big data evangelists talk in terms of miraculous outcomes, this rhetoric ignores the fact that big data seeks to identify at the expense of individual and collective identity. We call this the Identity Paradox. And third, the rhetoric of big data is characterized by its power to transform society, but big data has power effects of its own, which privilege large government and corporate entities at the expense of ordinary individuals. We call this the Power Paradox. Recognizing the paradoxes of big data, which show its perils alongside its potential, will help us to better understand this revolution. It may also allow us to craft solutions to produce a revolution that will be as good as its evangelists predict.”

Vint Cerf: Freedom and the Social Contract


Vinton G. Cerf in the Communications of the ACM: “The last several weeks (as of this writing) have been filled with disclosures of intelligence practices in the U.S. and elsewhere. Edward Snowden’s unauthorized release of highly classified information has stirred a great deal of debate about national security and the means used to preserve it.
In the midst of all this, I looked to Jean-Jacques Rousseau’s well-known 18th-century writings on the Social Contract (Du Contrat Social, Ou Principes du Droit Politique) for insight. Distilled and interpreted through my perspective, I took away several notions. One is that in a society, to achieve a degree of safety and stability, we as individuals give up some absolute freedom of action to what Rousseau called the sovereign will of the people. He did not equate this to government, which he argued was distinct and derived its power from the sovereign people.
I think it may be fair to say that most of us would not want to live in a society that had no limits to individual behavior. In such a society, there would be no limit to the potential harm an individual could visit upon others. In exchange for some measure of stability and safety, we voluntarily give up absolute freedom in exchange for the rule of law. In Rousseau’s terms, however, the laws must come from the sovereign people, not from the government. We approximate this in most modern societies creating representative government using public elections to populate the key parts of the government.”

(Appropriate) Big Data for Climate Resilience?


Amy Luers at the Stanford Social Innovation Review: “The answer to whether big data can help communities build resilience to climate change is yes—there are huge opportunities, but there are also risks.

Opportunities

  • Feedback: Strong negative feedback is core to resilience. A simple example is our body’s response to heat stress—sweating, which is a natural feedback to cool down our body. In social systems, feedbacks are also critical for maintaining functions under stress. For example, communication by affected communities after a hurricane provides feedback for how and where organizations and individuals can provide help. While this kind of feedback used to rely completely on traditional communication channels, now crowdsourcing and data mining projects, such as Ushahidi and Twitter Earthquake detector, enable faster and more-targeted relief.
  • Diversity: Big data is enhancing diversity in a number of ways. Consider public health systems. Health officials are increasingly relying on digital detection methods, such as Google Flu Trends or Flu Near You, to augment and diversify traditional disease surveillance.
  • Self-Organization: A central characteristic of resilient communities is the ability to self-organize. This characteristic must exist within a community (see the National Research Council Resilience Report), not something you can impose on it. However, social media and related data-mining tools (InfoAmazonia, Healthmap) can enhance situational awareness and facilitate collective action by helping people identify others with common interests, communicate with them, and coordinate efforts.

Risks

  • Eroding trust: Trust is well established as a core feature of community resilience. Yet the NSA PRISM escapade made it clear that big data projects are raising privacy concerns and possibly eroding trust. And it is not just an issue in government. For example, Target analyzes shopping patterns and can fairly accurately guess if someone in your family is pregnant (which is awkward if they know your daughter is pregnant before you do). When our trust in government, business, and communities weakens, it can decrease a society’s resilience to climate stress.
  • Mistaking correlation for causation: Data mining seeks meaning in patterns that are completely independent of theory (suggesting to some that theory is dead). This approach can lead to erroneous conclusions when correlation is mistakenly taken for causation. For example, one study demonstrated that data mining techniques could show a strong (however spurious) correlation between the changes in the S&P 500 stock index and butter production in Bangladesh. While interesting, a decision support system based on this correlation would likely prove misleading.
  • Failing to see the big picture: One of the biggest challenges with big data mining for building climate resilience is its overemphasis on the hyper-local and hyper-now. While this hyper-local, hyper-now information may be critical for business decisions, without a broader understanding of the longer-term and more-systemic dynamism of social and biophysical systems, big data provides no ability to understand future trends or anticipate vulnerabilities. We must not let our obsession with the here and now divert us from slower-changing variables such as declining groundwater, loss of biodiversity, and melting ice caps—all of which may silently define our future. A related challenge is the fact that big data mining tends to overlook the most vulnerable populations. We must not let the lure of the big data microscope on the “well-to-do” populations of the world make us blind to the less well of populations within cities and communities that have more limited access to smart phones and the Internet.”

Frontiers in Massive Data Analysis


New report from the National Academy of Sciences: “Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data.
Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale–terabytes and petabytes–is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge–from computer science, statistics, machine learning, and application disciplines–that must be brought to bear to make useful inferences from massive data.”

Political Scientists Acknowledge Need to Make Stronger Case for Their Field


Beth McMurtrie in The Chronicle of Higher Education: “Back in March, Congress limited federal support for political-science research by the National Science Foundation to projects that promote national security or American economic interests. That decision was a victory for Sen. Tom Coburn, a Republican from Oklahoma who has long aimed to eliminate all NSF grants for political science, arguing that unlike the hard sciences it rarely produces concrete benefits to society.
Congress’s action has led to soul searching within the discipline about how effective academics have been in conveying the value of their work to the public. It has also revived a longstanding debate among political scientists about the shift toward more statistically sophisticated, mathematically esoteric research, and its usefulness outside of academe. Those discussions were out front at the annual conference of the American Political Science Association, held here last week.
Rogers M. Smith, a political-science professor at the University of Pennsylvania, was one of 13 members of a panel that discussed the controversy over NSF money for political-science studies. He put the problem bluntly: “We need to make a better case for ourselves.”
Few on the panel, in fact, seemed to think that political science had done a good job on that front. The association has created a task force—led by Arthur Lupia, a political-science professor at the University of Michigan at Ann Arbor—to improve public perceptions of political science’s value. He said his colleagues could learn from organizations like the American Association for the Advancement of Science, which holds special sessions for the news media at its annual conference to explain the work of its members to the public.”

White House: "We Want Your Input on Building a More Open Government"


Nick Sinai at the White House Blog:”…We are proud of this progress, but recognize that there is always more we can do to build a more efficient, effective, and accountable government.  In that spirit, the Obama Administration has committed to develop a second National Action Plan on Open Government: “NAP 2.0.”
In order to develop a Plan with the most creative and ambitious solutions, we need all-hands-on-deck. That’s why we are asking for your input on what should be in the NAP 2.0:

  1. How can we better encourage and enable the public to participate in government and increase public integrity? For example, in the first National Action Plan, we required Federal enforcement agencies to make publicly available compliance information easily accessible, downloadable and searchable online – helping the public to hold the government and regulated entities accountable.
  • What other kinds of government information should be made more available to help inform decisions in your communities or in your lives?
  • How would you like to be able to interact with Federal agencies making decisions which impact where you live?
  • How can the Federal government better ensure broad feedback and public participation when considering a new policy?
  1. The American people must be able to trust that their Government is doing everything in its power to stop wasteful practices and earn a high return on every tax dollar that is spent.  How can the government better manage public resources? 
  • What suggestions do you have to help the government achieve savings while also improving the way that government operates?
  • What suggestions do you have to improve transparency in government spending?
  1. The American people deserve a Government that is responsive to their needs, makes information readily accessible, and leverages Federal resources to help foster innovation both in the public and private sector.   How can the government more effectively work in collaboration with the public to improve services?
  • What are your suggestions for ways the government can better serve you when you are seeking information or help in trying to receive benefits?
  • In the past few years, the government has promoted the use of “grand challenges,” ambitious yet achievable goals to solve problems of national priority, and incentive prizes, where the government identifies challenging problems and provides prizes and awards to the best solutions submitted by the public.  Are there areas of public services that you think could be especially benefited by a grand challenge or incentive prize?
  • What information or data could the government make more accessible to help you start or improve your business?

Please think about these questions and send your thoughts to [email protected] by September 23. We will post a summary of your submissions online in the future.”

How Mechanical Turkers Crowdsourced a Huge Lexicon of Links Between Words and Emotion


The Physics arXiv Blog: Sentiment analysis on the social web depends on how a person’s state of mind is expressed in words. Now a new database of the links between words and emotions could provide a better foundation for this kind of analysis


One of the buzzphrases associated with the social web is sentiment analysis. This is the ability to determine a person’s opinion or state of mind by analysing the words they post on Twitter, Facebook or some other medium.
Much has been promised with this method—the ability to measure satisfaction with politicians, movies and products; the ability to better manage customer relations; the ability to create dialogue for emotion-aware games; the ability to measure the flow of emotion in novels; and so on.
The idea is to entirely automate this process—to analyse the firehose of words produced by social websites using advanced data mining techniques to gauge sentiment on a vast scale.
But all this depends on how well we understand the emotion and polarity (whether negative or positive) that people associate with each word or combinations of words.
Today, Saif Mohammad and Peter Turney at the National Research Council Canada in Ottawa unveil a huge database of words and their associated emotions and polarity, which they have assembled quickly and inexpensively using Amazon’s crowdsourcing Mechanical Turk website. They say this crowdsourcing mechanism makes it possible to increase the size and quality of the database quickly and easily….The result is a comprehensive word-emotion lexicon for over 10,000 words or two-word phrases which they call EmoLex….
The bottom line is that sentiment analysis can only ever be as good as the database on which it relies. With EmoLex, analysts have a new tool for their box of tricks.”
Ref: arxiv.org/abs/1308.6297: Crowdsourcing a Word-Emotion Association Lexicon

Nonsectarian Welfare Statements


New Paper by Cass Sunstein: “How can we measure whether national institutions in general, and regulatory institutions in particular, are dysfunctional? A central question is whether they are helping a nation’s citizens to live good lives. A full answer to that question would require a great deal of philosophical work, but it should be possible to achieve an incompletely theorized agreement on a kind of nonsectarian welfarism, emphasizing the importance of five variables: subjective well-being, longevity, health, educational attainment, and per capita income. In principle, it would be valuable to identify the effects of new initiatives (including regulations) on all of these variables. In practice, it is not feasible to do so; assessments of subjective well-being present particular challenges. In their ideal form, Regulatory Impact Statements should be seen as Nonsectarian Welfare Statements, seeking to identify the consequences of regulatory initiatives for various components of welfare. So understood, they provide reasonable measures of regulatory success or failure, and hence a plausible test of dysfunction. There is a pressing need for improved evaluations, including both randomized controlled trials and ex post assessments.”

Twitter’s activist roots: How Twitter’s past shapes its use as a protest tool


Radio Netherlands Worldwide: “Surprised when demonstrators from all over the world took to Twitter as a protest tool? Evan “Rabble” Henshaw-Plath, member of Twitter’s founding team, was not. Rather, he sees it as a return to its roots: Inspired by protest coordination tools like TXTMob, and shaped by the values and backgrounds of Twitter’s founders, he believes activist potential was built into the service from the start.

It took a few revolutions before Twitter was taken seriously. Critics claimed that its 140-character limit only provided space for the most trivial thoughts: neat for keeping track of Ashton Kutcher’s lunch choices, but not much else. It made the transition from Silicon Valley toy into Middle East protest tool seem all the more astonishing.
Unless, Twitter co-founder Evan Henshaw-Plath argues, you know the story of how Twitter came to be. Evan Henshaw-Plath was the lead developer at Odeo, the company that started and eventually became Twitter. TXTMob, an activist tool deployed during the 2004 Republican National Convention in the US to coordinate protest efforts via SMS was, says Henshaw-Plath, a direct inspiration for Twitter.
Protest 1.0
In 2004, while Henshaw-Plath was working at Odeo, he and a few other colleagues found a fun side-project in working on TXTMob, an initiative by what he describes as a “group of academic artist/prankster/hacker/makers” that operated under the ostensibly serious moniker of Institute for Applied Autonomy (IAA). Earlier IAA projects included small graffiti robots on wheels that spray painted slogans on pavements during demonstrations, and a pudgy talking robot with big puppy eyes made to distribute subversive literature to people who ignored less-cute human pamphleteers.
TXTMob was a more serious endeavor than these earlier projects: a tactical protest coordination tool. With TXTMob, users could quickly exchange text messages with large groups of other users about protest locations and police crackdowns….”

The Multistakeholder Model in Global Technology Governance: A Cross-Cultural Perspective


New APSA 2013 Annual Meeting Paper by Nanette S. Levinson: “This paper examines two key but often overlooked analytic dimensions related to global technology governance: the roles of culture and cross-cultural communication processes and the broader framework of Multistakeholderism. Each of these dimensions has a growing tradition of research/conceptual frameworks that can inform the analysis of Internet governance and related complex and power-related processes that may be present in a multistakeholder setting. The use of the term ‘multistakeholder’ itself has grown exponentially in discussing Internet governance and related governance domains; yet there are few rigorous studies within Internet governance that treat actual multistakeholder processes, especially from a cross-cultural and comparative perspective.
Using research on cross-cultural communication and related factors (at small group, occupational, organizational, interorganizational or cross- sector, national, regional levels), this paper provides and uses an analytic framework, especially for the 2012 WCIT and 2013 WSIS 10 and World Telecommunications Policy Forum that goes beyond the rhetoric of ‘multistakeholder’ as a term. It includes an examination of variables found to be important in studies from environmental governance, public administration, and private sector partnership domains including trust, absorptive capacity, and power in knowledge transfer processes. “