The Why of the World


Book review by Tim Maudlin of The Book of Why: The New Science of Cause and Effect by Judea Pearl and Dana Mackenzie: “Correlation is not causation.” Though true and important, the warning has hardened into the familiarity of a cliché. Stock examples of so-called spurious correlations are now a dime a dozen. As one example goes, a Pacific island tribe believed flea infestations to be good for one’s health because they observed that healthy people had fleas while sick people did not. The correlation is real and robust, but fleas do not cause health, of course: they merely indicate it. Fleas on a fevered body abandon ship and seek a healthier host. One should not seek out and encourage fleas in the quest to ward off sickness.

The rub lies in another observation: that the evidence for causation seems to lie entirely in correlations. But for seeing correlations, we would have no clue about causation. The only reason we discovered that smoking causes lung cancer, for example, is that we observed correlations in that particular circumstance. And thus a puzzle arises: if causation cannot be reduced to correlation, how can correlation serve as evidence of causation?

The Book of Why, co-authored by the computer scientist Judea Pearl and the science writer Dana Mackenzie, sets out to give a new answer to this old question, which has been around—in some form or another, posed by scientists and philosophers alike—at least since the Enlightenment. In 2011 Pearl won the Turing Award, computer science’s highest honor, for “fundamental contributions to artificial intelligence through the development of a calculus of probabilistic and causal reasoning,” and this book sets out to explain what all that means for a general audience, updating his more technical book on the same subject, Causality, published nearly two decades ago. Written in the first person, the new volume mixes theory, history, and memoir, detailing both the technical tools of causal reasoning Pearl has developed as well as the tortuous path by which he arrived at them—all along bucking a scientific establishment that, in his telling, had long ago contented itself with data-crunching analysis of correlations at the expense of investigation of causes. There are nuggets of wisdom and cautionary tales in both these aspects of the book, the scientific as well as the sociological…(More)”.

Computational Communication Science


Introduction to Special Issue of the International Journal of Communication:”Over the past two decades, processes of digitalization and mediatization have shaped the communication landscape and have had a strong impact on various facets of communication. The digitalization of communication results in completely new forms of digital traces that make communication processes observable in new and unprecedented ways. Although many scholars in the social sciences acknowledge the chances and requirements of the digital revolution in communication, they are also facing fundamental challenges in implementing successful research programs, strategies, and designs that are based on computational methods and “big data.” This Special Section aims at bringing together seminal perspectives on challenges and chances of computational communication science (CCS). In this introduction, we highlight the impulses provided by the research presented in the Special Section, discuss the most pressing challenges in the context of CCS, and sketch a potential roadmap for future research in this field….(More)”.

How to Build Artificial Intelligence We Can Trust


Gary Marcus and Ernest Davis at the New York Times: “Artificial intelligence has a trust problem. We are relying on A.I. more and more, but it hasn’t yet earned our confidence.

Tesla cars driving in Autopilot mode, for example, have a troubling history of crashing into stopped vehicles. Amazon’s facial recognition system works great much of the time, but when asked to compare the faces of all 535 members of Congress with 25,000 public arrest photos, it found 28 matches, when in reality there were none. A computer program designed to vet job applicants for Amazon was discovered to systematically discriminate against women. Every month new weaknesses in A.I. are uncovered.

The problem is not that today’s A.I. needs to get better at what it does. The problem is that today’s A.I. needs to try to do something completely different.

In particular, we need to stop building computer systems that merely get better and better at detecting statistical patterns in data sets — often using an approach known as deep learning — and start building computer systems that from the moment of their assembly innately grasp three basic concepts: time, space and causality….

We face a choice. We can stick with today’s approach to A.I. and greatly restrict what the machines are allowed to do (lest we end up with autonomous-vehicle crashes and machines that perpetuate bias rather than reduce it). Or we can shift our approach to A.I. in the hope of developing machines that have a rich enough conceptual understanding of the world that we need not fear their operation. Anything else would be too risky….(More)”.

Raw data won’t solve our problems — asking the right questions will


Stefaan G. Verhulst in apolitical: “If I had only one hour to save the world, I would spend fifty-five minutes defining the questions, and only five minutes finding the answers,” is a famous aphorism attributed to Albert Einstein.

Behind this quote is an important insight about human nature: Too often, we leap to answers without first pausing to examine our questions. We tout solutions without considering whether we are addressing real or relevant challenges or priorities. We advocate fixes for problems, or for aspects of society, that may not be broken at all.

This misordering of priorities is especially acute — and represents a missed opportunity — in our era of big data. Today’s data has enormous potential to solve important public challenges.

However, policymakers often fail to invest in defining the questions that matter, focusing mainly on the supply side of the data equation (“What data do we have or must have access to?”) rather than the demand side (“What is the core question and what data do we really need to answer it?” or “What data can or should we actually use to solve those problems that matter?”).

As such, data initiatives often provide marginal insights while at the same time generating unnecessary privacy risks by accessing and exploring data that may not in fact be needed at all in order to address the root of our most important societal problems.

A new science of questions

So what are the truly vexing questions that deserve attention and investment today? Toward what end should we strategically seek to leverage data and AI?

The truth is that policymakers and other stakeholders currently don’t have a good way of defining questions or identifying priorities, nor a clear framework to help us leverage the potential of data and data science toward the public good.

This is a situation we seek to remedy at The GovLab, an action research center based at New York University.

Our most recent project, the 100 Questions Initiative, seeks to begin developing a new science and practice of questions — one that identifies the most urgent questions in a participatory manner. Launched last month, the goal of this project is to develop a process that takes advantage of distributed and diverse expertise on a range of given topics or domains so as to identify and prioritize those questions that are high impact, novel and feasible.

Because we live in an age of data and much of our work focuses on the promises and perils of data, we seek to identify the 100 most pressing problems confronting the world that could be addressed by greater use of existing, often inaccessible, datasets through data collaboratives – new forms of cross-disciplinary collaboration beyond public-private partnerships focused on leveraging data for good….(More)”.

The hidden assumptions in public engagement: A case study of engaging on ethics in government data analysis


Paper by Emily S. Rempel, Julie Barnett and Hannah Durrant: “This study examines the hidden assumptions around running public-engagement exercises in government. We study an example of public engagement on the ethics of combining and analysing data in national government – often called data science ethics. We study hidden assumptions, drawing on hidden curriculum theories in education research, as it allows us to identify conscious and unconscious underlying processes related to conducting public engagement that may impact results. Through participation in the 2016 Public Dialogue for Data Science Ethics in the UK, four key themes were identified that exposed underlying public engagement norms. First, that organizers had constructed a strong imagined public as neither overly critical nor supportive, which they used to find and engage participants. Second, that official aims of the engagement, such as including publics in developing ethical data regulations, were overshadowed by underlying meta-objectives, such as counteracting public fears. Third, that advisory group members, organizers and publics understood the term ‘engagement’ in varying ways, from creating interest to public inclusion. And finally, that stakeholder interests, particularly government hopes for a positive report, influenced what was written in the final report. Reflection on these underlying mechanisms, such as the development of meta-objectives that seek to benefit government and technical stakeholders rather than publics, suggests that the practice of public engagement can, in fact, shut down opportunities for meaningful public dialogue….(More)”.

The 9 Pitfalls of Data Science


Book by Gary Smith and Jay Cordes: “Data science has never had more influence on the world. Large companies are now seeing the benefit of employing data scientists to interpret the vast amounts of data that now exists. However, the field is so new and is evolving so rapidly that the analysis produced can be haphazard at best. 

The 9 Pitfalls of Data Science shows us real-world examples of what can go wrong. Written to be an entertaining read, this invaluable guide investigates the all too common mistakes of data scientists – who can be plagued by lazy thinking, whims, hunches, and prejudices – and indicates how they have been at the root of many disasters, including the Great Recession. 

Gary Smith and Jay Cordes emphasise how scientific rigor and critical thinking skills are indispensable in this age of Big Data, as machines often find meaningless patterns that can lead to dangerous false conclusions. The 9 Pitfalls of Data Science is loaded with entertaining tales of both successful and misguided approaches to interpreting data, both grand successes and epic failures. These cautionary tales will not only help data scientists be more effective, but also help the public distinguish between good and bad data science….(More)”.

Introduction to Decision Intelligence


Blog post by Cassie Kozyrkov: “…Decision intelligence is a new academic discipline concerned with all aspects of selecting between options. It brings together the best of applied data science, social science, and managerial science into a unified field that helps people use data to improve their lives, their businesses, and the world around them. It’s a vital science for the AI era, covering the skills needed to lead AI projects responsibly and design objectives, metrics, and safety-nets for automation at scale.

Let’s take a tour of its basic terminology and concepts. The sections are designed to be friendly to skim-reading (and skip-reading too, that’s where you skip the boring bits… and sometimes skip the act of reading entirely).

What’s a decision?

Data are beautiful, but it’s decisions that are important. It’s through our decisions — our actions — that we affect the world around us.

We define the word “decision” to mean any selection between options by any entity, so the conversation is broader than MBA-style dilemmas (like whether to open a branch of your business in London).

In this terminology, labeling a photo as cat versus not-cat is a decision executed by a computer system, while figuring out whether to launch that system is a decision taken thoughtfully by the human leader (I hope!) in charge of the project.

What’s a decision-maker?

In our parlance, a “decision-maker” is not that stakeholder or investor who swoops in to veto the machinations of the project team, but rather the person who is responsible for decision architecture and context framing. In other words, a creator of meticulously-phrased objectives as opposed to their destroyer.

What’s decision-making?

Decision-making is a word that is used differently by different disciplines, so it can refer to:

  • taking an action when there were alternative options (in this sense it’s possible to talk about decision-making by a computer or a lizard).
  • performing the function of a (human) decision-maker, part of which is taking responsibility for decisions. Even though a computer system can execute a decision, it will not be called a decision-maker because it does not bear responsibility for its outputs — that responsibility rests squarely on the shoulders of the humans who created it.

Decision intelligence taxonomy

One way to approach learning about decision intelligence is to break it along traditional lines into its quantitative aspects (largely overlapping with applied data science) and qualitative aspects (developed primarily by researchers in the social and managerial sciences)….(More)”.


Trust and Mistrust in Americans’ Views of Scientific Experts


Report by the Pew Research Center: “In an era when science and politics often appear to collide, public confidence in scientists is on the upswing, and six-inten Americans say scientists should play an active role in policy debates about scientific
issues, according to a new Pew Research Center survey.

The survey finds public confidence in scientists on par with confidence in the military. It also exceeds the levels of public confidence in other groups and institutions, including the media, business leaders and elected officials.

At the same time, Americans are divided along party lines in terms of how they view the value and objectivity of scientists and their ability to act in the public interest. And, while political divides do not carry over to views of all scientists and scientific issues, there are particularly sizable gaps between Democrats and Republicans when it comes to trust in scientists whose work is related to the environment.

Higher levels of familiarity with the work of scientists are associated with more positive and more trusting views of scientists regarding their competence, credibility and commitment to the public, the survey shows….(More)”.

Innovation Beyond Technology: Science for Society and Interdisciplinary Approaches


Book edited by Sébastien Lechevalier: ” The major purpose of this book is to clarify the importance of non-technological factors in innovation to cope with contemporary complex societal issues while critically reconsidering the relations between science, technology, innovation (STI), and society. For a few decades now, innovation—mainly derived from technological advancement—has been considered a driving force of economic and societal development and prosperity.

With that in mind, the following questions are dealt with in this book: What are the non-technological sources of innovation? What can the progress of STI bring to humankind? What roles will society be expected to play in the new model of innovation? The authors argue that the majority of so-called technological innovations are actually socio-technical innovations, requiring huge resources for financing activities, adapting regulations, designing adequate policy frames, and shaping new uses and new users while having the appropriate interaction with society.

This book gathers multi- and trans-disciplinary approaches in innovation that go beyond technology and take into account the inter-relations with social and human phenomena. Illustrated by carefully chosen examples and based on broad and well-informed analyses, it is highly recommended to readers who seek an in-depth and up-to-date integrated overview of innovation in its non-technological dimensions….(More)”.

For academics, what matters more: journal prestige or readership?


Katie Langin at Science: “With more than 30,000 academic journals now in circulation, academics can have a hard time figuring out where to submit their work for publication. The decision is made all the more difficult by the sky-high pressure of today’s academic environment—including working toward tenure and trying to secure funding, which can depend on a researcher’s publication record. So, what does a researcher prioritize?

According to a new study posted on the bioRxiv preprint server, faculty members say they care most about whether the journal is read by the people they most want to reach—but they think their colleagues care most about journal prestige. Perhaps unsurprisingly, prestige also held more sway for untenured faculty members than for their tenured colleagues.

“I think that it is about the security that comes with being later in your career,” says study co-author Juan Pablo Alperin, an assistant professor in the publishing program at Simon Fraser University in Vancouver, Canada. “It means you can stop worrying so much about the specifics of what is being valued; there’s a lot less at stake.”

According to a different preprint that Alperin and his colleagues posted on PeerJ in April, 40% of research-intensive universities in the United States and Canada explicitly mention that journal impact factors can be considered in promotion and tenure decisions. More likely do so unofficially, with faculty members using journal names on a CV as a kind of shorthand for how “good” a candidate’s publication record is. “You can’t ignore the fact that journal impact factor is a reality that gets looked at,” Alperin says. But some argue that journal prestige and impact factor are overemphasized and harm science, and that academics should focus on the quality of individual work rather than journal-wide metrics. 

In the new study, only 31% of the 338 faculty members who were surveyed—all from U.S. and Canadian institutions and from a variety of disciplines, including 38% in the life and physical sciences and math—said that journal prestige was “very important” to them when deciding where to submit a manuscript. The highest priority was journal readership, which half said was very important. Fewer respondents felt that publication costs (24%) and open access (10%) deserved the highest importance rating.

But, when those same faculty members were asked to assess how their colleagues make the same decision, journal prestige shot to the top of the list, with 43% of faculty members saying that it was very important to their peers when deciding where to submit a manuscript. Only 30% of faculty members thought the same thing about journal readership—a drop of 20 percentage points compared with how faculty members assessed their own motivations….(More)”.