Big Data Challenge for Social Sciences: From Society and Opinion to Replications


Symposium Paper by Dominique Boullier: “When in 2007 Savage and Burrows pointed out ‘the coming crisis of empirical methods’, they were not expecting to be so right. Their paper however became a landmark, signifying the social sciences’ reaction to the tremendous shock triggered by digital methods. As they frankly acknowledge in a more recent paper, they did not even imagine the extent to which their prediction might become true, in an age of Big Data, where sources and models have to be revised in the light of extended computing power and radically innovative mathematical approaches.They signalled not just a debate about academic methods but also a momentum for ‘commercial sociology’ in which platforms acquire the capacity to add ‘another major nail in the coffin of academic sociology claims to jurisdiction over knowledge of the social’, because ‘research methods (are) an intrinsic feature of contemporary capitalist organisations’ (Burrows and Savage, 2014, p. 2). This need for a serious account of research methods is well tuned with the claims of Social Studies of Science that should be applied to the social sciences as well.

I would like to build on these insights and principles of Burrows and Savage to propose an historical and systematic account of quantification during the last century, following in the footsteps of Alain Desrosières, and in which we see Big Data and Machine Learning as a major shift in the way social science can be performed. And since, according to Burrows and Savage (2014, p. 5), ‘the use of new data sources involves a contestation over the social itself’, I will take the risk here of identifying and defining the entities that are supposed to encapsulate the social for each kind of method: beyond the reign of ‘society’ and ‘opinion’, I will point at the emergence of the ‘replications’ that are fabricated by digital platforms but are radically different from previous entities. This is a challenge to invent not only new methods but also a new process of reflexivity for societies, made available by new stakeholders (namely, the digital platforms) which transform reflexivity into reactivity (as operational quantifiers always tend to)….(More)”.

Normative Challenges of Identification in the Internet of Things: Privacy, Profiling, Discrimination, and the GDPR


Paper by Sandra Wachter: “In the Internet of Things (IoT), identification and access control technologies provide essential infrastructure to link data between a user’s devices with unique identities, and provide seamless and linked up services. At the same time, profiling methods based on linked records can reveal unexpected details about users’ identity and private life, which can conflict with privacy rights and lead to economic, social, and other forms of discriminatory treatment. A balance must be struck between identification and access control required for the IoT to function and user rights to privacy and identity. Striking this balance is not an easy task because of weaknesses in cybersecurity and anonymisation techniques.

The EU General Data Protection Regulation (GDPR), set to come into force in May 2018, may provide essential guidance to achieve a fair balance between the interests of IoT providers and users. Through a review of academic and policy literature, this paper maps the inherit tension between privacy and identifiability in the IoT.

It focuses on four challenges: (1) profiling, inference, and discrimination; (2) control and context-sensitive sharing of identity; (3) consent and uncertainty; and (4) honesty, trust, and transparency. The paper will then examine the extent to which several standards defined in the GDPR will provide meaningful protection for privacy and control over identity for users of IoT. The paper concludes that in order to minimise the privacy impact of the conflicts between data protection principles and identification in the IoT, GDPR standards urgently require further specification and implementation into the design and deployment of IoT technologies….(More)”.

How the Index Card Cataloged the World


Daniela Blei in the Atlantic: “…The index card was a product of the Enlightenment, conceived by one of its towering figures: Carl Linnaeus, the Swedish botanist, physician, and the father of modern taxonomy. But like all information systems, the index card had unexpected political implications, too: It helped set the stage for categorizing people, and for the prejudice and violence that comes along with such classification….

In 1780, two years after Linnaeus’s death, Vienna’s Court Library introduced a card catalog, the first of its kind. Describing all the books on the library’s shelves in one ordered system, it relied on a simple, flexible tool: paper slips. Around the same time that the library catalog appeared, says Krajewski, Europeans adopted banknotes as a universal medium of exchange. He believes this wasn’t a historical coincidence. Banknotes, like bibliographical slips of paper and the books they referred to, were material, representational, and mobile. Perhaps Linnaeus took the same mental leap from “free-floating banknotes” to “little paper slips” (or vice versa). Sweden’s great botanist was also a participant in an emerging capitalist economy.

Linnaeus never grasped the full potential of his paper technology. Born of necessity, his paper slips were “idiosyncratic,” say Charmantier and Müller-Wille. “There is no sign he ever tried to rationalize or advertise the new practice.” Like his taxonomical system, paper slips were both an idea and a method, designed to bring order to the chaos of the world.

The passion for classification, a hallmark of the Enlightenment, also had a dark side. From nature’s variety came an abiding preoccupation with the differences between people. As soon as anthropologists applied Linnaeus’s taxonomical system to humans, the category of race, together with the ideology of racism, was born.

It’s fitting, then, that the index card would have a checkered history. To take one example, the FBI’s J. Edgar Hoover used skills he burnished as a cataloger at the Library of Congress to assemble his notorious “Editorial Card Index.” By 1920, he had cataloged 200,000 subversive individuals and organizations in detailed, cross-referenced entries. Nazi ideologues compiled a deadlier index-card database to classify 500,000 Jewish Germans according to racial and genetic background. Other regimes have employed similar methods, relying on the index card’s simplicity and versatility to catalog enemies real and imagined.

The act of organizing information—even notes about plants—is never neutral or objective. Anyone who has used index cards to plan a project, plot a story, or study for an exam knows that hierarchies are inevitable. Forty years ago, Michel Foucault observed in a footnote that, curiously, historians had neglected the invention of the index card. The book was Discipline and Punish, which explores the relationship between knowledge and power. The index card was a turning point, Foucault believed, in the relationship between power and technology. Like the categories they cataloged, Linnaeus’s paper slips belong to the history of politics as much as the history of science….(More)”.

Behind the Screen: the Syrian Virtual Resistance


Billie Jeanne Brownlee at Cyber Orient: “Six years have gone by since the political upheaval that swept through many Middle East and North African (MENA) countries begun. Syria was caught in the grip of this revolutionary moment, one that drove the country from a peaceful popular mobilisation to a deadly fratricide civil war with no apparent way out.

This paper provides an alternative approach to the study of the root causes of the Syrian uprising by examining the impact that the development of new media had in reconstructing forms of collective action and social mobilisation in pre-revolutionary Syria.

By providing evidence of a number of significant initiatives, campaigns and acts of contentious politics that occurred between 2000 and 2011, this paper shows how, prior to 2011, scholarly work on Syria has not given sufficient theoretical and empirical consideration to the development of expressions of dissent and resilience of its cyberspace and to the informal and hybrid civic engagement they produced….(More)”.

The whys of social exclusion : insights from behavioral economics


Paper by Karla Hoff and James Sonam Walsh: “All over the world, people are prevented from participating fully in society through mechanisms that go beyond the structural and institutional barriers identified by rational choice theory (poverty, exclusion by law or force, taste-based and statistical discrimination, and externalities from social networks).

This essay discusses four additional mechanisms that bounded rationality can explain: (i) implicit discrimination, (ii) self-stereotyping and self-censorship, (iii) “fast thinking” adapted to underclass neighborhoods, and (iv)”adaptive preferences” in which an oppressed group views its oppression as natural or even preferred.

Stable institutions have cognitive foundations — concepts, categories, social identities, and worldviews — that function like lenses through which individuals see themselves and the world. Abolishing or reforming a discriminatory institution may have little effect on these lenses. Groups previously discriminated against by law or policy may remain excluded through habits of the mind. Behavioral economics recognizes forces of social exclusion left out of rational choice theory, and identifies ways to overcome them. Some interventions have had very consequential impact….(More)”.

Accountability of AI Under the Law: The Role of Explanation


Paper by Finale Doshi-Velez and Mason Kortz: “The ubiquity of systems using artificial intelligence or “AI” has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before—applications range from clinical decision support to autonomous driving and predictive policing. That said, our AIs continue to lag in common sense reasoning [McCarthy, 1960], and thus there exist legitimate concerns about the intentional and unintentional negative consequences of AI systems [Bostrom, 2003, Amodei et al., 2016, Sculley et al., 2014]. How can we take advantage of what AI systems have to offer, while also holding them accountable?

In this work, we focus on one tool: explanation. Questions about a legal right to explanation from AI systems was recently debated in the EU General Data Protection Regulation [Goodman and Flaxman, 2016, Wachter et al., 2017a], and thus thinking carefully about when and how explanation from AI systems might improve accountability is timely. Good choices about when to demand explanation can help prevent negative consequences from AI systems, while poor choices may not only fail to hold AI systems accountable but also hamper the development of much-needed beneficial AI systems.

Below, we briefly review current societal, moral, and legal norms around explanation, and then focus on the different contexts under which explanation is currently required under the law. We find that there exists great variation around when explanation is demanded, but there also exist important consistencies: when demanding explanation from humans, what we typically want to know is whether and how certain input factors affected the final decision or outcome.

These consistencies allow us to list the technical considerations that must be considered if we desired AI systems that could provide kinds of explanations that are currently required of humans under the law. Contrary to popular wisdom of AI systems as indecipherable black boxes, we find that this level of explanation should generally be technically feasible but may sometimes be practically onerous—there are certain aspects of explanation that may be simple for humans to provide but challenging for AI systems, and vice versa. As an interdisciplinary team of legal scholars, computer scientists, and cognitive scientists, we recommend that for the present, AI systems can and should be held to a similar standard of explanation as humans currently are; in the future we may wish to hold an AI to a different standard….(More)”

From #Resistance to #Reimagining governance


Stefaan G. Verhulst in Open Democracy: “…There is no doubt that #Resistance (and its associated movements) holds genuine transformative potential. But for the change it brings to be meaningful (and positive), we need to ask the question: What kind of government do we really want?

Working to maintain the status quo or simply returning to, for instance, a pre-Trump reality cannot provide for the change we need to counter the decline in trust, the rise of populism and the complex social, economic and cultural problems we face. We need a clear articulation of alternatives.  Without such an articulation, there is a danger of a certain hollowness and dispersion of energies. The call for #Resistance requires a more concrete –and ultimately more productive – program that is concerned not just with rejecting or tearing down, but with building up new institutions and governance processes. What’s needed, in short, is not simply #Resistance.

Below, I suggest six shifts that can help us reimagine governance for the twenty-first century. Several of these shifts are enabled by recent technological changes (e.g., the advent of big data, blockchain and collective intelligence) as well as other emerging methods such as design thinking, behavioral economics, and agile development.

Some of the shifts I suggest have been experimented with, but they have often been developed in an ad hoc manner without a full understanding of how they could make a more systemic impact. Part of the purpose of this paper is to begin the process of a more systematic enquiry; the following amounts to a preliminary outline or blueprint for reimagined governance for the twenty-first century.

Screen Shot 2017-12-14 at 1.21.29 PM

  • Shift 1: from gatekeeper to platform…
  • Shift 2: from inward to user-and-problem orientation…
  • Shift 3: from closed to open…
  • Shift 4: from deliberation to collaboration and co-creation…
  • Shift 5: from ideology to evidence-based…
  • Shift 6: from centralized to distributed… (More)

Factors Influencing Decisions about Crowdsourcing in the Public Sector: A Literature Review


Paper by Regina Lenart‑Gansiniec: “Crowdsourcing is a relatively new notion, nonetheless raising more and more interest with researchers. In short, it means selection of functions which until present have been performed by employees and transferring them, in the form of an open on‑line call, to an undefined virtual community. In economic practice it has become amegatrend, which drives innovations, collaboration in the field of scientific research, business, or society. It is reached by more and more organisations, for instance considering its potential business value (Rouse 2010; Whitla 2009).

The first paper dedicated to crowdsourcing appeared relatively recently, in 2006 thanks to J. Howe’s article entitled:“The Rise of Crowdsourcing”. Although crowdsourcing is more and more the subject of scientific research, one may note in the literature many ambiguities, which result from proliferation of various research approaches and perspectives. Therefore, this may lead to many misunderstandings (Hopkins, 2011). This especially concerns the key aspects and factors, which have an impact on making decisions about crowdsourcing by organisations, particularly the public ones.

The aim of this article is identification of the factors that influence making decisions about implementing crowdsourcing by public organisations in their activity, in particular municipal offices in Poland. The article is of a theoretical and review nature. Searching for the answer to this question, a literature review was conducted and an analysis of crowdsourcing initiatives used by self‑government units in Poland was made….(More)”.

Crowdsourcing Accurately and Robustly Predicts Supreme Court Decisions


Paper by Katz, Daniel Martin and Bommarito, Michael James and Blackman, Josh: “Scholars have increasingly investigated “crowdsourcing” as an alternative to expert-based judgment or purely data-driven approaches to predicting the future. Under certain conditions, scholars have found that crowd-sourcing can outperform these other approaches. However, despite interest in the topic and a series of successful use cases, relatively few studies have applied empirical model thinking to evaluate the accuracy and robustness of crowdsourcing in real-world contexts.

In this paper, we offer three novel contributions. First, we explore a dataset of over 600,000 predictions from over 7,000 participants in a multi-year tournament to predict the decisions of the Supreme Court of the United States. Second, we develop a comprehensive crowd construction framework that allows for the formal description and application of crowdsourcing to real-world data. Third, we apply this framework to our data to construct more than 275,000 crowd models. We find that in out-of-sample historical simulations, crowdsourcing robustly outperforms the commonly-accepted null model, yielding the highest-known performance for this context at 80.8% case level accuracy. To our knowledge, this dataset and analysis represent one of the largest explorations of recurring human prediction to date, and our results provide additional empirical support for the use of crowdsourcing as a prediction method….(More)”.

Big data in social and psychological science: theoretical and methodological issues


Paper by Lin Qiu, Sarah Hian May Chan and David Chan in the Journal of Computational Social Science: “Big data presents unprecedented opportunities to understand human behavior on a large scale. It has been increasingly used in social and psychological research to reveal individual differences and group dynamics. There are a few theoretical and methodological challenges in big data research that require attention. In this paper, we highlight four issues, namely data-driven versus theory-driven approaches, measurement validity, multi-level longitudinal analysis, and data integration. They represent common problems that social scientists often face in using big data. We present examples of these problems and propose possible solutions….(More)”.