Why Big Data Is Not Truth


Quentin Hardy in the New York Times: “Kate Crawford, a researcher at Microsoft Research, calls the problem “Big Data fundamentalism — the idea with larger data sets, we get closer to objective truth.” Speaking at a conference in Berkeley, Calif., on Thursday, she identified what she calls “six myths of Big Data.”
Myth 1: Big Data is New
In 1997, there was a paper that discussed the difficulty of visualizing Big Data, and in 1999, a paper that discussed the problems of gaining insight from the numbers in Big Data. That indicates that two prominent issues today in Big Data, display and insight, had been around for awhile…..
Myth 2: Big Data Is Objective
Over 20 million Twitter messages about Hurricane Sandy were posted last year. … “These were very privileged urban stories.” And some people, privileged or otherwise, put information like their home addresses on Twitter in an effort to seek aid. That sensitive information is still out there, even though the threat is gone.
Myth 3: Big Data Doesn’t Discriminate
“Big Data is neither color blind nor gender blind,” Ms. Crawford said. “We can see how it is used in marketing to segment people.” …
Myth 4: Big Data Makes Cities Smart
…, moving cities toward digital initiatives like predictive policing, or creating systems where people are seen, whether they like it or not, can promote lots of tension between individuals and their governments.
Myth 5: Big Data Is Anonymous
A study published in Nature last March looked at 1.5 million phone records that had personally identifying information removed. It found that just four data points of when and where a call was made could identify 95 percent of individuals. …
Myth 6: You Can Opt Out
… given the ways that information can be obtained in these big systems, “what are the chances that your personal information will never be used?”
Before Big Data disappears into the background as another fact of life, Ms. Crawford said, “We need to think about how we will navigate these systems. Not just individually, but as a society.”

Life and Death of Tweets Not so Random After All


MIT Technology Review: “MIT assistant professor Tauhid Zaman and two other researchers (Emily Fox at the University of Washington and Eric Bradlow at the University of Pennsylvania’s Wharton School) have come up with a model that can predict how many times a tweet will ultimately be retweeted, minutes after it is posted. The model was created by collecting retweets on a slew of topics and looking at the time when the original tweet was posted and how fast it spread. That provided knowledge used to predict how popular a new tweet will be by looking at how many times it was retweeted shortly after it was first posted.
The researchers’ findings were explained in a paper submitted to the Annals of Applied Statistics. In the paper, the authors note that “understanding retweet behavior could lead to a better understanding of how broader ideas spread in Twitter and in other social networks,” and such data may be helpful in a number of areas, like marketing and political campaigning.
You can check out the model here.”

Transparency, E-Government, and Accountability Some Issues and Considerations


New Paper in Public Performance & Management Review: “Greater use of information and communications technology and e-government can increase governmental transparency. This, in turn, may invite citizen participation, foster e-governance, and facilitate e-democracy. However, beyond a certain point, more government openness may be dysfunctional if it reduces operational capacity. This article claims that in the real world, where the proverbial question is “Why can’t government be like business?,” many public managers are challenged by the need to perform a balancing act between the pursuit of greater openness and private-sector efficiency. The article concludes that there is a need to develop theories, models, and trainings to assist managers in addressing this balancing challenge.”

The Internet as Politicizing Instrument


New Issue of Transformations (Editorial): “This issue of Transformations presents essays responding to Marcus Breen’s recent book Uprising: The Internet’s Unintended Consequences. Breen asks whether the Internet can become a politicising instrument for the new online proletariat – the individualised users isolated by the monitor screen. He asks “if the proletariat can use the Internet, is it freed from the moral and social constraints of the past that were imposed by conventional media and its regulation of the public space?” (32) This question raises further issues. Does this freedom translate into an emancipatory politics where the proletariat is able to pursue its own ends, or does it simply reproduce the power relation between the user-subject and the Internet and those who control and manage it. The articles in this issue respond in various ways to these questions.
Marcus Breen’s own article “The Internet and Privatism: Reconstructing the Monitor Space” makes a case for privatism – the restriction of subjective life to isolated or privatised experience, especially in relation to the computer monitor – as the new modality of meaning making in the Internet era. Using approaches associated with cultural and media studies, the paper traces the way the Internet has influenced the shift in the culture towards values associated with the confluence of ideas around the private, best described by privatism.
Fidele Vlavo’s article investigates the central discourses that have constructed the internet as a democratic and public environment removed from state and corporate control. The aim is to call attention to the issues that have limited the development of the internet as a tool for socio-political empowerment. The paper first retraces the early discursive constructions that insist on representing the internet as a decentralised and open structure. It also questions the role played by the digerati (or cyber elite) in the formulation of contradictory demands for public interests, self-governance, and entrepreneurial rights. Finally, it examines the emergence of two early virtual communities and their attempts to facilitate free speech and self-regulation. In the context of activists advocating freedom of expression and government institutions re-organizing legislation to control the Internet, the examination of these discourses provides a useful starting point for the (re)assessment of the potential of direct online mobilization.
Emit Snake-Being’s article examines the limits of the Internet as a politicising instrument by showing how Internet users are subject to the controls of the search engine algorithm, managed by elite groups whose purpose is to reproduce themselves in terms of neo-liberal capitalism. Invoking recent political events in the Middle East and in London in which a wired proletariat sought to resist and overturn political authorities through Internet communication, Snake-Beings argues that such events are compromised by the fact that they owe their possibility to Internet providers and their commercial imperatives. Snake-Being’s article, as well as most of the other articles in this issue, offers a timely reminder not only of the possibilities, but of the limits of the Internet as a politicising instrument for progressive, emancipatory politics.
Frances Shaw’s paper concerns the way in which the logic of surveillance operates in contested sites in cities where live coverage of demonstrations against capitalism leads to confrontation between demonstrators and police. Through a detailed account of the “Occupy Sydney” demonstration in 2011, Shaw shows how both demonstrators and police engaged in tactics of surveillance and resistance to counter each other’s power and authority. In an age of instant communication and global surveillance, freedom of movement and freedom from surveillance in public spaces is drawn into the logics of power mediated by mobile ‘phones and computer based communication technology.
Karyl Ketchum’s paper offers detailed analysis of two Internet sites to show how the proletarianisation of the Internet is gendered in terms of male interests. Picking up on Breen’s argument that Internet proletarianisation leads to an open system that “supports both anything and anyone,” she argues that, in the domain of online pornography, this new-found freedom turns out to be “the power of computer analytics to harness and hone the shifting meanings of white Western Enlightenment masculinities in new globalising postcolonial contexts, economies and geopolitical struggles.” Furthermore, Ketchum shows how this default to male interests was also at work in American reporting of the Arab Spring revolutions in Egypt and other Middle Eastern countries. The YouTube video posted by a young Egyptian woman, Asmaa Mahfouz, which sparked the revolution in Egypt that eventually overthrew the Mubarak government, was not given due coverage by the Western media, so that “women like Mahfouz all but disappear from Western accounts of the Arab Spring.”
Liden and Giritli Nygren’s paper addresses the challenges to the theories of the political sphere posed by a digital society. It is suggested that this is most evident at the intersection between understandings of technology, performativities, and politics that combines empirical closeness with abstract understandings of socio-political and cultural contexts. The paper exemplifies this by reporting on a study of online citizen dialogue in the making, in this case concerning school planning in a Swedish municipality. Applying these theoretical perspectives to this case provides some key findings. The technological design is regarded as restricting the potential dialogue, as is outlined in different themes where the participants enact varying positions—taxpayers, citizen consumers, or local residents. The political analysis stresses a dialogue that lacks both polemic and public perspectives, and rather is characterized by the expression of different special interests. Together, these perspectives can provide the foundation for the development of applying theories in a digital society.
The Internet and Privatism: Reconstructing the Monitor Space (Marcus Breen)
The Digital Hysterias of Decentralisation, Entrepreneurship and Open Community (Fidele Vlavo)
From Ideology to Algorithm: the Opaque Politics of the Internet (Emit Snake-Beings)
“Walls of Seeing”: Protest Surveillance, Embodied Boundaries, and Counter-Surveillance at Occupy Sydney (Frances Shaw)
Gendered Uprisings: Desire, Revolution, and the Internet’s “Unintended Consequences”(Karyl E. Ketchum)
Analysing the Intersections between Technology, Performativity, and Politics: the Case of Local Citizen Dialogue (Gustav Lidén and Katarina Giritli Nygren)”

The Performativity of Data: Re-conceptualizing the Web of Data


New Paper by several authors of the Rensselaer Polytechnic Institute, including Jim Hendler, Marie Joan Kristine Gloria, Dominic DiFranzo and Marco Fernando Navarro: “As the discipline of Web Science matures, its interdisciplinary claim has many researchers unsure about its core theory and methodology. Instead, we often see results that are more multi-disciplinary than interdisciplinary. The following contribution attempts to recast our understanding of the current methodologies and tools leveraged by the Web Science community.  Specifically, we review the Semantic Web and Linked Data technologies not just from a technical perspective; but, through a critical reading of key social theories such as Goffman’s theory of performance. Our goal is to re- conceptualize the performativity of semantic web tools their boundaries, and any potential avenues for future research”

Deepbills project


Cato Institute: “The Deepbills project takes the raw XML of Congressional bills (available at FDsys and Thomas) and adds additional semantic information to them in inside the text.

You can download the continuously-updated data at http://deepbills.cato.org/download

Congress already produces machine-readable XML of almost every bill it proposes, but that XML is designed primarily for formatting a paper copy, not for extracting information. For example, it’s not currently possible to find every mention of an Agency, every legal reference, or even every spending authorization in a bill without having a human being read it….
Currently the following information is tagged:

  • Legal citations…
  • Budget Authorities (both Authorizations of Appropriations and Appropriations)…
  • Agencies, bureaus, and subunits of the federal government.
  • Congressional committees
  • Federal elective officeholders (Congressmen)”

How We Imagined the Internet Before the Internet Even Existed


Matt Novak in Paleofuture : “In a few years, men will be able to communicate more effectively through a machine than face to face. Sounds obvious today. But in 1968, a full year before ARPANET made its first connection? It was downright clairvoyant…
The paper was written by J.C.R. Licklider and Robert Taylor, illustrated by Rowland B. Wilson, and appeared in the April 1968 issue of Science and Technology. The article includes some of the most amazingly accurate predictions for what networked computing would eventually allow….

The article rather boldly predicts that the computerized networks of the future will be even more important for communication than the “printing press and the picture tube”—another idea not taken for granted in 1968:

Creative, interactive communication requires a plastic or moldable medium that can be modeled, a dynamic medium in which premises will flow into consequences, and above all a common medium that can be contributed to and experimented with by all.
Such a medium is at hand—the programmed digital computer. Its presence can change the nature and value of communication even more profoundly than did the printing press and the picture tube, for, as we shall show, a well-programmed computer can provide direct access both to informational resources and to the processes for making use of the resources.

The paper predicts that the person-to-person interaction that a networked computer system allows for will not only build relationships between individuals, but will build communities.

What will on-line interactive communities be like? In most fields they will consist of geographically separated members, sometimes grouped in small clusters and sometimes working individually. They will be communities not of common location, but of common interest. In each field, the overall community of interest will be large enough to support a comprehensive system of field-oriented programs and data.

…In the end, Licklider and Taylor predict that all of this interconnectedness will make us happier and even make unemployment a thing of the past. Their vision of everyone sitting at a console, working “through the network” is stunningly accurate for an information-driven society that fifty years ago would’ve looked far less tech-obsessed.

When people do their informational work “at the console” and “through the network,” telecommunication will be as natural an extension of individual work as face-to-face communication is now. The impact of that fact, and of the marked facilitation of the communicative process, will be very great—both on the individual and on society.
First, life will be happier for the on-line individual because the people with whom one interacts most strongly will be selected more by commonality of interests and goals than by accidents of proximity. Second, communication will be more effective and productive, and therefore more enjoyable. Third, much communication and interaction will be with programs and programmed models, which will be (a) highly responsive, (b) supplementary to one’s own capabilities, rather than competitive, and (c) capable of representing progressively more complex ideas without necessarily displaying all the levels of their structure at the same time-and which will therefore be both challenging and rewarding. And, fourth, there will be plenty of opportunity for everyone (who can afford a console) to find his calling, for the whole world of information, with all its fields and disciplines, will be open to him—with programs ready to guide him or to help him explore.

(You can read the entire paper online [pdf]. )”

Mapping the global Twitter heartbeat: The geography of Twitter


A new paper by Kalev Leetaru, Shaowen Wang, Guofeng Cao, Anand Padmanabhan, Eric Shook in First Monday: “In just under seven years, Twitter has grown to count nearly 3% of the entire global population among its active users who have sent more than 170 billion 140-character messages. Today the service plays such a significant role in American culture that the Library of Congress has assembled a permanent archive of the site back to its first tweet, updated daily. With its open API, Twitter has become one of the most popular data sources for social research, yet the majority of the literature has focused on it as a text or network graph source, with only limited efforts to date focusing exclusively on the geography of Twitter, assessing the various sources of geographic information on the service and their accuracy. More than 3% of all tweets are found to have native location information available, while a naive geocoder based on a simple major cities gazetteer and relying on the user-provided Location and Profile fields is able to geolocate more than a third of all tweets with high accuracy when measured against the GPS-based baseline. Geographic proximity is found to play a minimal role both in who users communicate with and what they communicate about, providing evidence that social media is shifting the communicative landscape.”

Social media, personality traits and civic engagement


New Paper on “Influence of Social Media Use on Discussion Network Heterogeneity and Civic Engagement: The Moderating Role of Personality Traits” in Journal of Communication: “Using original national survey data, we examine how social media use affects individuals’ discussion network heterogeneity and their level of civic engagement. We also investigate the moderating role of personality traits (i.e., extraversion and openness to experiences) in this association. Results support the notion that use of social media contributes to heterogeneity of discussion networks and activities in civic life. More importantly, personality traits such as extraversion and openness to experiences were found to moderate the influence of social media on discussion network heterogeneity and civic participation, indicating that the contributing role of social media in increasing network heterogeneity and civic engagement is greater for introverted and less open individuals.”

When the Crowd Fights Corruption


New Harvard Business School Research Paper by Paul Healy and Karthik Ramanna  (Harvard Business Review): “Corruption is the greatest impediment to conducting business in Russia, according to leaders recently surveyed by the World Economic Forum. Indeed, it’s a problem in many emerging markets, and businesses have a role to play in combating it, according to Healy and Ramanna. The authors focus on RosPil — an anticorruption entity in Russia set up by Alexey Navalny, a crusader against public and private malfeasance in that country. As of December 2011, RosPil claimed to have prevented the granting of dubious contracts worth US$1.3 billion. The organization holds corrupt politicians’ and bureaucrats’ feet to the fire largely through internet-based crowdsourcing, whereby often-anonymous people identify requests for government-issued tenders that are designed to generate kickbacks. Should entities like RosPil be supported, and should companies fashion their own responses to corruption? On the one hand, there are obvious public-relations and political risks; on the other hand, corruption can erode a firm’s competitiveness, the trust of customers and employees, and even the very legitimacy of capitalism. The authors argue that heads of many multinational companies are well positioned to combat corruption in emerging markets. Those leaders have the power to enforce policies in their organizations and networks, and they enjoy the ability to organize others in the industry against this pernicious threat.”