Big Data Challenge for Social Sciences: From Society and Opinion to Replications


Symposium Paper by Dominique Boullier: “When in 2007 Savage and Burrows pointed out ‘the coming crisis of empirical methods’, they were not expecting to be so right. Their paper however became a landmark, signifying the social sciences’ reaction to the tremendous shock triggered by digital methods. As they frankly acknowledge in a more recent paper, they did not even imagine the extent to which their prediction might become true, in an age of Big Data, where sources and models have to be revised in the light of extended computing power and radically innovative mathematical approaches.They signalled not just a debate about academic methods but also a momentum for ‘commercial sociology’ in which platforms acquire the capacity to add ‘another major nail in the coffin of academic sociology claims to jurisdiction over knowledge of the social’, because ‘research methods (are) an intrinsic feature of contemporary capitalist organisations’ (Burrows and Savage, 2014, p. 2). This need for a serious account of research methods is well tuned with the claims of Social Studies of Science that should be applied to the social sciences as well.

I would like to build on these insights and principles of Burrows and Savage to propose an historical and systematic account of quantification during the last century, following in the footsteps of Alain Desrosières, and in which we see Big Data and Machine Learning as a major shift in the way social science can be performed. And since, according to Burrows and Savage (2014, p. 5), ‘the use of new data sources involves a contestation over the social itself’, I will take the risk here of identifying and defining the entities that are supposed to encapsulate the social for each kind of method: beyond the reign of ‘society’ and ‘opinion’, I will point at the emergence of the ‘replications’ that are fabricated by digital platforms but are radically different from previous entities. This is a challenge to invent not only new methods but also a new process of reflexivity for societies, made available by new stakeholders (namely, the digital platforms) which transform reflexivity into reactivity (as operational quantifiers always tend to)….(More)”.

Could Bitcoin technology help science?


Andy Extance at Nature: “…The much-hyped technology behind Bitcoin, known as blockchain, has intoxicated investors around the world and is now making tentative inroads into science, spurred by broad promises that it can transform key elements of the research enterprise. Supporters say that it could enhance reproducibility and the peer review process by creating incorruptible data trails and securely recording publication decisions. But some also argue that the buzz surrounding blockchain often exceeds reality and that introducing the approach into science could prove expensive and introduce ethical problems.

A few collaborations, including Scienceroot and Pluto, are already developing pilot projects for science. Scienceroot aims to raise US$20 million, which will help pay both peer reviewers and authors within its electronic journal and collaboration platform. It plans to raise the funds in early 2018 by exchanging some of the science tokens it uses for payment for another digital currency known as ether. And the Wolfram Mathematica algebra program — which is widely used by researchers — is currently working towards offering support for an open-source blockchain platform called Multichain. Scientists could use this, for example, to upload data to a shared, open workspace that isn’t controlled by any specific party, according to Multichain….

Claudia Pagliari, who researches digital health-tracking technologies at the University of Edinburgh, UK, says that she recognizes the potential of blockchain, but researchers have yet to properly explore its ethical issues. What happens if a patient withdraws consent for a trial that is immutably recorded on a blockchain? And unscrupulous researchers could still add fake data to a blockchain, even if the process is so open that everyone can see who adds it, says Pagliari. Once added, no-one can change that information, although it’s possible they could label it as retracted….(More)”.

How the Index Card Cataloged the World


Daniela Blei in the Atlantic: “…The index card was a product of the Enlightenment, conceived by one of its towering figures: Carl Linnaeus, the Swedish botanist, physician, and the father of modern taxonomy. But like all information systems, the index card had unexpected political implications, too: It helped set the stage for categorizing people, and for the prejudice and violence that comes along with such classification….

In 1780, two years after Linnaeus’s death, Vienna’s Court Library introduced a card catalog, the first of its kind. Describing all the books on the library’s shelves in one ordered system, it relied on a simple, flexible tool: paper slips. Around the same time that the library catalog appeared, says Krajewski, Europeans adopted banknotes as a universal medium of exchange. He believes this wasn’t a historical coincidence. Banknotes, like bibliographical slips of paper and the books they referred to, were material, representational, and mobile. Perhaps Linnaeus took the same mental leap from “free-floating banknotes” to “little paper slips” (or vice versa). Sweden’s great botanist was also a participant in an emerging capitalist economy.

Linnaeus never grasped the full potential of his paper technology. Born of necessity, his paper slips were “idiosyncratic,” say Charmantier and Müller-Wille. “There is no sign he ever tried to rationalize or advertise the new practice.” Like his taxonomical system, paper slips were both an idea and a method, designed to bring order to the chaos of the world.

The passion for classification, a hallmark of the Enlightenment, also had a dark side. From nature’s variety came an abiding preoccupation with the differences between people. As soon as anthropologists applied Linnaeus’s taxonomical system to humans, the category of race, together with the ideology of racism, was born.

It’s fitting, then, that the index card would have a checkered history. To take one example, the FBI’s J. Edgar Hoover used skills he burnished as a cataloger at the Library of Congress to assemble his notorious “Editorial Card Index.” By 1920, he had cataloged 200,000 subversive individuals and organizations in detailed, cross-referenced entries. Nazi ideologues compiled a deadlier index-card database to classify 500,000 Jewish Germans according to racial and genetic background. Other regimes have employed similar methods, relying on the index card’s simplicity and versatility to catalog enemies real and imagined.

The act of organizing information—even notes about plants—is never neutral or objective. Anyone who has used index cards to plan a project, plot a story, or study for an exam knows that hierarchies are inevitable. Forty years ago, Michel Foucault observed in a footnote that, curiously, historians had neglected the invention of the index card. The book was Discipline and Punish, which explores the relationship between knowledge and power. The index card was a turning point, Foucault believed, in the relationship between power and technology. Like the categories they cataloged, Linnaeus’s paper slips belong to the history of politics as much as the history of science….(More)”.

Business Models For Sustainable Research Data Repositories


OECD Report: “In 2007, the OECD Principles and Guidelines for Access to Research Data from Public Funding were published and in the intervening period there has been an increasing emphasis on open science. At the same time, the quantity and breadth of research data has massively expanded. So called “Big Data” is no longer limited to areas such as particle physics and astronomy, but is ubiquitous across almost all fields of research. This is generating exciting new opportunities, but also challenges.

The promise of open research data is that they will not only accelerate scientific discovery and improve reproducibility, but they will also speed up innovation and improve citizen engagement with research. In short, they will benefit society as a whole. However, for the benefits of open science and open research data to be realised, these data need to be carefully and sustainably managed so that they can be understood and used by both present and future generations of researchers.

Data repositories – based in local and national research institutions and international bodies – are where the long-term stewardship of research data takes place and hence they are the foundation of open science. Yet good data stewardship is costly and research budgets are limited. So, the development of sustainable business models for research data repositories needs to be a high priority in all countries. Surprisingly, perhaps, little systematic analysis has been done on income streams, costs, value propositions, and business models for data repositories, and that is the gap this report attempts to address, from a science policy perspective…..

This project was designed to take up the challenge and to contribute to a better understanding of how research data repositories are funded, and what developments are occurring in their funding. Central questions included:

  • How are data repositories currently funded, and what are the key revenue sources?
  • What innovative revenue sources are available to data repositories?
  • How do revenue sources fit together into sustainable business models?
  • What incentives for, and means of, optimising costs are available?
  • What revenue sources and business models are most acceptable to key stakeholders?…(More)”

There’s more to evidence-based policies than data: why it matters for healthcare


 at The Conversation: “The big question is: how can countries strengthen their health systems to deliver accessible, affordable and equitable care when they are often under-financed and governed in complex ways?

One answer lies in governments developing policies and programmes that are informed by evidence of what works or doesn’t. This should include what we would call “traditional data”, but should also include a broader definition of evidence. This would mean including, for example, information from citizens and stakeholders as well as programme evaluations. In this way, policies can be made more relevant for the people they affect.

Globally there is an increasing appreciation for this sort of policymaking that relies of a broader definition of evidence. Countries such as South Africa, Ghana and Thailand provide good examples.

What is evidence?

Using evidence to inform the development of health care has grown out of the use of science to choose the best decisions. It is based on data being collected in a methodical way. This approach is useful but it can’t always be neatly applied to policymaking. There are several reasons for this.

The first is that there are many different types of evidence. Evidence is more than data, even though the terms are often used to mean the same thing. For example, there is statistical and administrative data, research evidence, citizen and stakeholder information as well as programme evaluations.

The challenge is that some of these are valued more than others. More often than not, statistical data is more valued in policymaking. But both researchers and policymakers must acknowledge that for policies to be sound and comprehensive, different phases of policymaking process would require different types of evidence.

Secondly, data-as-evidence is only one input into policymaking. Policymakers face a long list of pressures they must respond to, including time, resources, political obligations and unplanned events.

Researchers may push technically excellent solutions designed in research environments. But policymakers may have other priorities in mind: are the solutions being put to them practical and affordable?Policymakers also face the limitations of having to balance various constituents while straddling the constraints of the bureaucracies they work in.

Researchers must recognise that policymakers themselves are a source of evidence of what works or doesn’t. They are able to draw on their own experiences, those of their constituents, history and their contextual knowledge of the terrain.

What this boils down to is that for policies that are based on evidence to be effective, fewer ‘push/pull’ models of evidence need to be used. Instead the models where evidence is jointly fashioned should be employed.

This means that policymakers, researchers and other key actors (like health managers or communities) must come together as soon as a problem is identified. They must first understand each other’s ideas of evidence and come to a joint conclusion of what evidence would be appropriate for the solution.

In South Africa, for example, the Department of Environmental Affairshas developed a four-phase process to policymaking. In the first phase, researchers and policymakers come together to set the agenda and agree on the needed solution. Their joint decision is then reviewed before research is undertaken and interpreted together….(More)”.

Big data in social and psychological science: theoretical and methodological issues


Paper by Lin Qiu, Sarah Hian May Chan and David Chan in the Journal of Computational Social Science: “Big data presents unprecedented opportunities to understand human behavior on a large scale. It has been increasingly used in social and psychological research to reveal individual differences and group dynamics. There are a few theoretical and methodological challenges in big data research that require attention. In this paper, we highlight four issues, namely data-driven versus theory-driven approaches, measurement validity, multi-level longitudinal analysis, and data integration. They represent common problems that social scientists often face in using big data. We present examples of these problems and propose possible solutions….(More)”.

Enhancing social impact through better monitoring, evaluation, and learning


Deloitte: “Social sector organizations tackle some of the world’s most difficult and complex challenges on a daily basis. And, just as in other industries, getting the right data and information at the right time is essential to understanding what an organization needs to achieve, whether it is doing what it set out to do, and what impact its efforts are actually having. Yet, despite marked advances in the tools and methods for monitoring, evaluation, and learning in the social sector, as well as a growing number of bright spots in practice emerging in the field, there is broad dissatisfaction across the sector about how data is—or is not—used….

Based on our interviews, the research team identified three characteristics that participants within and outside the social sector believe should be defining pillars of a better future for monitoring, evaluation, and learning. These three characteristics are purpose, perspective, and alignment with other actors….(More)”

Screen Shot 2017-12-14 at 7.38.20 AM

The Engineers and the Political System


Aaron Timms at the Los Angeles Review of Books: “Engineers enjoy a prestige in China that connects them to political power far more directly than in the United States. ….America, by contrast, has historically been governed by lawyers. That remains true today: there are 218 lawyers in Congress and 208 former businesspeople, according to the Congressional Research Service, but only eight engineers. (Science is even more severely underrepresented, with just three members in the House.) It’s unlikely that that balance will tilt meaningfully in favor of STEM-ers in the near term. But in another sense, the growing cultural capital of the engineers will inevitably translate to political power, whatever its form.

The engineering profession today is broad, much broader than it was in 1921 when Thorstein Veblen published The Engineers and the Price System, his classic pamphlet on industrial sabotage and government by technocrats. Engineering has outgrown the four traditional branches (chemical, civil, electrical, mechanical) to include all the professions in which the laws of mathematics and science are applied to real-world problems…..In a way that was never the case for previous generations, engineering today is politics, and politics engineering. Power is coming for the engineers, but are the engineers ready for power?

…tech smarts do not port easily to politics. However violently Silicon Valley pushes the story that it’s here to fix things for all of us, building an algorithm and coming up with intelligent ways to improve society are not the same thing. The triumph of the engineers is that they’ve managed to convince so many people otherwise.

This victory is more than simply economic or mechanical; engineering has also come to permeate the language of politics itself. Zuckerberg’s doe-eyed both-sidesism is the latest expression of the idea, nourished through the Clinton years and the height of the evidence-based policy movement, that facts offer the surest solution to knotty political problems. This is, we already know, a temple built on sand, ignoring as it does the intractably political nature of politics; hence the failure of “figures” and “facts” and “evidence” to do anything to shift positions on gun reform or voter fraud. But it’s a temple with enduring bipartisan appeal, and the engineers have come along at the right moment to give it a fresh lick of paint. If thinking like an engineer is the new way to do business, engineerialism, in politics, is the new centrism — rule by experts remarketed for the innovation age. It might be generations before a Veblenian technocrat calls the White House home, but no presidency can match the power engineers already have — a power to define progress, a power without check….(More)”.

Solving Public Problems with Data


Dinorah Cantú-Pedraza and Sam DeJohn at The GovLab: “….To serve the goal of more data-driven and evidence-based governing,  The GovLab at NYU Tandon School of Engineering this week launched “Solving Public Problems with Data,” a new online course developed with support from the Laura and John Arnold Foundation.

This online lecture series helps those working for the public sector, or simply in the public interest, learn to use data to improve decision-making. Through real-world examples and case studies — captured in 10 video lectures from leading experts in the field — the new course outlines the fundamental principles of data science and explores ways practitioners can develop a data analytical mindset. Lectures in the series include:

  1. Introduction to evidence-based decision-making  (Quentin Palfrey, formerly of MIT)
  2. Data analytical thinking and methods, Part I (Julia Lane, NYU)
  3. Machine learning (Gideon Mann, Bloomberg LP)
  4. Discovering and collecting data (Carter Hewgley, Johns Hopkins University)
  5. Platforms and where to store data (Arnaud Sahuguet, Cornell Tech)
  6. Data analytical thinking and methods, Part II (Daniel Goroff, Alfred P. Sloan Foundation)
  7. Barriers to building a data practice (Beth Blauer, Johns Hopkins University and GovEx)
  8. Data collaboratives (Stefaan G. Verhulst, The GovLab)
  9. Strengthening a data analytic culture (Amen Ra Mashariki, ESRI)
  10. Data governance and sharing (Beth Simone Noveck, NYU Tandon/The GovLab)

The goal of the lecture series is to enable participants to define and leverage the value of data to achieve improved outcomes and equities, reduced cost and increased efficiency in how public policies and services are created. No prior experience with computer science or statistics is necessary or assumed. In fact, the course is designed precisely to serve public professionals seeking an introduction to data science….(More)”.

Science’s Next Frontier? It’s Civic Engagement


Louise Lief at Discover Magazine: “…As a lay observer who has explored scientists’ relationship to the public, I have often wondered why many scientists and scientific institutions continue to rely on what is known as the “deficit model” of science communication, despite its well-documented shortcomings and even a backfire effect. This approach views the public as  “empty vessels” or “warped minds” ready to be set straight with facts. Perhaps many scientists continue to use it because it’s familiar and mimics classroom instruction. But it’s not doing the job.

Scientists spend much of their time with the public defending science, and little time building trust.

Many scientists also give low priority to trust building. At the 2016 American Association for the Advancement of Science conference, Michigan State University professor John C. Besley showed these results (right) of a survey of scientists’ priorities for engaging with the public online.

Scientists are focusing on the frustrating, reactive task of defending science, spending little time establishing bonds of trust with the public, which comes in last as a professional priority. How much more productive their interactions with the public – and through them, policymakers — would be if establishing trust was a top priority!

There is evidence that the public is hungry for such exchanges. When Research!America asked the public in 2016 how important is it for scientists to inform elected officials and the public about their research and its impact on society, 84 percent said it was very or somewhat important — a number that ironically mirrors the percentage of Americans who cannot name a scientist….

This means scientists need to go even further, venturing into unfamiliar local venues where science may not be mentioned but where communities gather to discuss their problems. Interesting new opportunities to do this are emerging nation wide. In 2014 the Chicago Community Trust, one of the nation’s largest community foundations, launched a series of dinners across the city through a program called On the Table, to discuss community problems and brainstorm possible solutions. In 2014, the first year, almost 10,000 city residents participated. In 2017, almost 100,000 Chicago residents took part. Recently the Trust added a grants component to the program, awarding more than $135,000 in small grants to help participants translate their ideas into action….(More)”.