Expecting the Unexpected: Effects of Data Collection Design Choices on the Quality of Crowdsourced User-Generated Content


Paper by Roman Lukyanenko: “As crowdsourced user-generated content becomes an important source of data for organizations, a pressing question is how to ensure that data contributed by ordinary people outside of traditional organizational boundaries is of suitable quality to be useful for both known and unanticipated purposes. This research examines the impact of different information quality management strategies, and corresponding data collection design choices, on key dimensions of information quality in crowdsourced user-generated content. We conceptualize a contributor-centric information quality management approach focusing on instance-based data collection. We contrast it with the traditional consumer-centric fitness-for-use conceptualization of information quality that emphasizes class-based data collection. We present laboratory and field experiments conducted in a citizen science domain that demonstrate trade-offs between the quality dimensions of accuracy, completeness (including discoveries), and precision between the two information management approaches and their corresponding data collection designs. Specifically, we show that instance-based data collection results in higher accuracy, dataset completeness and number of discoveries, but this comes at the expense of lower precision. We further validate the practical value of the instance-based approach by conducting an applicability check with potential data consumers (scientists, in our context of citizen science). In a follow-up study, we show, using human experts and supervised machine learning techniques, that substantial precision gains on instance-based data can be achieved with post-processing. We conclude by discussing the benefits and limitations of different information quality and data collection design choice for information quality in crowdsourced user-generated content…(More)”.

The Promise of Community-Driven Science


Article by Louise Lief: “Powered by thousands of early-career scientists and students, a global movement to transform scientific practice has emerged in recent years. The objective is to “expand the boundaries of what we consider science,” says Rajul Pandya, senior director of Thriving Earth Exchange at the American Geophysical Union (AGU), “to fundamentally transform science and the way we use it.”

These scientists have joined forces with community leaders and members of the public to establish new protocols and methods for doing community-driven science in an effort to make civic science even more inclusive and accessible to the public. Community science is an outgrowth of two earlier movements that emerged in response to the democratizing forces of the internet: open science, the push to make scientific research accessible and to encourage sharing and collaboration throughout the research cycle; and open data, the support for data that anyone can freely use, reuse, and share.

For open-science advocates, a reset of scientific practice is long overdue. For decades, the field has been dominated by what some experts call the “science-push” model, a top-down approach in which scientists decide which investigations to pursue, what questions to ask, how to do the science, and which results are significant. If members of the public are involved at all, they serve as research subjects or passive consumers of knowledge curated and presented to them by scientists.

The traditional approach to science has resulted in the public’s increasing distrust of scientists—their motives, values, and business interests. Science is a process that explores the world through observation and experiment, looking for evidence that may reveal larger patterns, often producing new discoveries. However, science itself does not decide the effects or outcomes of these results. The devastating opioid epidemic—in which manufacturers have aggressively promoted the highly addictive drugs, downplaying risks and misinforming doctors—has shown that the values and motives of those who practice science make all the difference.

Instead, open-science advocates believe science should be a joint enterprise between scientists and the public to demonstrate the value of science in people’s lives. Such collaboration will change the way scientists, communities, regulatory agencies, policy makers, academia, and funders work individually and collectively. Each player will be able to integrate science more easily into civic decision-making and target problems more efficiently and at lower costs. This collaborative work will create new opportunities for civic action and give the public a greater sense of ownership—making it their science….(More)”.

Randomistas vs. Contestistas


Excerpt by By Beth Simone Noveck: “Social scientists who either run experiments or conduct systematic reviews tend to be fervent proponents of the value of RCTs. But that evidentiary hierarchy—what some people call the “RCT industrial complex”—may actually lead us to discount workable solutions just because there is no accompanying RCT.

A trawl of the solution space shows that successful interventions developed by entrepreneurs in business, philanthropy, civil society, social enterprise, or business schools who promote and study open innovation, often by developing and designing competitions to source ideas, often come from more varied places. Uncovering these exciting social innovations lays bare the limitations of confining a definition of what works only to RCTs.

Many more entrepreneurial and innovative solutions are simply not tested with an RCT and are not the subject of academic study. As one public official said to me, you cannot saddle an entrepreneur with having to do a randomized controlled trial (RCT), which they do not have the time or know-how to do. They are busy helping real people, and we have to allow them “to get on with it.”

For example, MIT Solve, which describes itself as a marketplace for socially impactful innovation designed to identify lasting solutions to the world’s most pressing problems. It catalogs hundreds of innovations in use around the world, like Faircap, a chemical-free water filter used in Mozambique, or WheeLog!, an application that enables individuals and local governments to share accessibility information in Tokyo.

Research funding is also too limited (and too slow) for RCTs to assess every innovation in every domain. Many effective innovators do not have the time, resources, or know-how to partner with academic researchers to conduct a study, or they evaluate projects by some other means.

There are also significant limitations to RCTs. For a start, systematic evidence reviews are quite slow, frequently taking upward of two years, and despite published standards for review, there is a lack of transparency. Faster approaches are important. In addition, many solutions that have been tested with an RCT clearly do not work. Interestingly, the first RCT in an area tends to produce an inflated effect size….(More)”.

Academic Incentives and Research Impact: Developing Reward and Recognition Systems to Better People’s Lives


Report by Jonathan Grant: “…offers new strategies to increase the societal impact that health research can have on the community and critiques the existing academic reward structure that determines the career trajectories of so many academics—including, tenure, peer-review publication, citations, and grant funding, among others. The new assessment illustrates how these incentives can lead researchers to produce studies as an end-goal, rather than pursuing impact by applying the work in real world settings.

Dr. Grant also outlines new system-, institution-, and person-level changes to academic incentives that, if implemented, could make societal impact an integral part of the research process. Among the changes offered by Dr. Grant are tying a percentage of grant funding to the impact the research has on the community, breaking from the tenure model to incentivize ongoing development and quality research, and encouraging academics themselves to prioritize social impact when submitting or reviewing research and grant proposals…(More)”.

Data and Society: A Critical Introduction


Book by Anne Beaulieu and Sabina Leonelli: “Data and Society: A Critical Introduction investigates the growing importance of data as a technological, social, economic and scientific resource. It explains how data practices have come to underpin all aspects of human life and explores what this means for those directly involved in handling data. The book

  • fosters informed debate over the role of data in contemporary society
  • explains the significance of data as evidence beyond the “Big Data” hype
  • spans the technical, sociological, philosophical and ethical dimensions of data
  • provides guidance on how to use data responsibly
  • includes data stories that provide concrete cases and discussion questions.

Grounded in examples spanning genetics, sport and digital innovation, this book fosters insight into the deep interrelations between technical, social and ethical aspects of data work…(More)”.

Collective innovation is key to the lasting successes of democracies


Article by Kent Walker and Jared Cohen: “Democracies across the world have been through turbulent times in recent years, as polarization and gridlock have posed significant challenges to progress. The initial spread of COVID-19 spurred chaos at the global level, and governments scrambled to respond. With uncertainty and skepticism at an all-time high, few of us would have guessed a year ago that 66 percent of Americans would have received at least one vaccine dose by now. So what made that possible?

It turns out democracies, unlike their geopolitical competitors, have a secret weapon: collective innovation. The concept of collective innovation draws on democratic values of openness and pluralism. Free expression and free association allow for cooperation and scientific inquiry. Freedom to fail leaves room for risk-taking, while institutional checks and balances protect from state overreach.

Vaccine development and distribution offers a powerful case study. Within days of the coronavirus being first sequenced by Chinese researchers, research centers across the world had exchanged viral genome data through international data-sharing initiatives. The Organization for Economic Cooperation and Development found that 75 percent of COVID-19 research published after the outbreak relied on open data. In the United States and Europe, in universities and companies, scientists drew on open information, shared research, and debated alternative approaches to develop powerful vaccines in record-setting time.

Democracies’ self- and co-regulatory frameworks have played a critical role in advancing scientific and technological progress, leading to robust capital markets, talent-attracting immigration policies, world-class research institutions, and dynamic manufacturing sectors. The resulting world-leading productivity underpins democracies’ geopolitical influence….(More)”.

Manufacturing Consensus


Essay by M. Anthony Mills: “…Yet, the achievement of consensus within science, however rare and special, rarely translates into consensus in social and political contexts. Take nuclear physics, a well-established field of natural science if ever there were one, in which there is a high degree of consensus. But agreement on the physics of nuclear fission is not sufficient for answering such complex social, political, and economic questions as whether nuclear energy is a safe and viable alternative energy source, whether and where to build nuclear power plants, or how to dispose of nuclear waste. Expertise in nuclear physics and literacy in its consensus views is obviously important for answering such questions, but inadequate. That’s because answering them also requires drawing on various other kinds of technical expertise — from statistics to risk assessment to engineering to environmental science — within which there may or may not be disciplinary consensus, not to mention grappling with practical challenges and deep value disagreements and conflicting interests.

It is in these contexts — where multiple kinds of scientific expertise are necessary but not sufficient for solving controversial political problems — that the dependence of non-experts on scientific expertise becomes fraught, as our debates over pandemic policies amply demonstrate. Here scientific experts may disagree about the meaning, implications, or limits of what they know. As a result, their authority to say what they know becomes precarious, and the public may challenge or even reject it. To make matters worse, we usually do not have the luxury of a scientific consensus in such controversial contexts anyway, because political decisions often have to be made long before a scientific consensus can be reached — or because the sciences involved are those in which a consensus is simply not available, and may never be.

To be sure, scientific experts can and do weigh in on controversial political decisions. For instance, scientific institutions, such as the National Academies of Sciences, will sometimes issue “consensus reports” or similar documents on topics of social and political significance, such as risk assessment, climate change, and pandemic policies. These usually draw on existing bodies of knowledge from widely varied disciplines and take considerable time and effort to produce. Such documents can be quite helpful and are frequently used to aid policy and regulatory decision-making, although they are not always available when needed for making a decision.

Yet the kind of consensus expressed in these documents is importantly distinct from the kind we have been discussing so far, even though they are both often labeled as such. The difference is between what philosopher of science Stephen P. Turner calls a “scientific consensus” and a “consensus of scientists.” A scientific consensus, as described earlier, is a relatively stable paradigm that structures and organizes scientific research. By contrast, a consensus of scientists is an organized, professional opinion, created in response to an explicit political or social need, often an official government request…(More)”.

Open science, data sharing and solidarity: who benefits?


Report by Ciara Staunton et al: “Research, innovation, and progress in the life sciences are increasingly contingent on access to large quantities of data. This is one of the key premises behind the “open science” movement and the global calls for fostering the sharing of personal data, datasets, and research results. This paper reports on the outcomes of discussions by the panel “Open science, data sharing and solidarity: who benefits?” held at the 2021 Biennial conference of the International Society for the History, Philosophy, and Social Studies of Biology (ISHPSSB), and hosted by Cold Spring Harbor Laboratory (CSHL)….(More)”.

Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis


Book by Ethan Bueno de Mesquita and Anthony Fowler: “An introduction to data science or statistics shouldn’t involve proving complex theorems or memorizing obscure terms and formulas, but that is exactly what most introductory quantitative textbooks emphasize. In contrast, Thinking Clearly with Data focuses, first and foremost, on critical thinking and conceptual understanding in order to teach students how to be better consumers and analysts of the kinds of quantitative information and arguments that they will encounter throughout their lives.

Among much else, the book teaches how to assess whether an observed relationship in data reflects a genuine relationship in the world and, if so, whether it is causal; how to make the most informative comparisons for answering questions; what questions to ask others who are making arguments using quantitative evidence; which statistics are particularly informative or misleading; how quantitative evidence should and shouldn’t influence decision-making; and how to make better decisions by using moral values as well as data. Filled with real-world examples, the book shows how its thinking tools apply to problems in a wide variety of subjects, including elections, civil conflict, crime, terrorism, financial crises, health care, sports, music, and space travel.

Above all else, Thinking Clearly with Data demonstrates why, despite the many benefits of our data-driven age, data can never be a substitute for thinking.

  • An ideal textbook for introductory quantitative methods courses in data science, statistics, political science, economics, psychology, sociology, public policy, and other fields
  • Introduces the basic toolkit of data analysis—including sampling, hypothesis testing, Bayesian inference, regression, experiments, instrumental variables, differences in differences, and regression discontinuity
  • Uses real-world examples and data from a wide variety of subjects
  • Includes practice questions and data exercises…(More)”.

AI Generates Hypotheses Human Scientists Have Not Thought Of


Robin Blades in Scientific American: “Electric vehicles have the potential to substantially reduce carbon emissions, but car companies are running out of materials to make batteries. One crucial component, nickel, is projected to cause supply shortages as early as the end of this year. Scientists recently discovered four new materials that could potentially help—and what may be even more intriguing is how they found these materials: the researchers relied on artificial intelligence to pick out useful chemicals from a list of more than 300 options. And they are not the only humans turning to A.I. for scientific inspiration.

Creating hypotheses has long been a purely human domain. Now, though, scientists are beginning to ask machine learning to produce original insights. They are designing neural networks (a type of machine-learning setup with a structure inspired by the human brain) that suggest new hypotheses based on patterns the networks find in data instead of relying on human assumptions. Many fields may soon turn to the muse of machine learning in an attempt to speed up the scientific process and reduce human biases.

In the case of new battery materials, scientists pursuing such tasks have typically relied on database search tools, modeling and their own intuition about chemicals to pick out useful compounds. Instead a team at the University of Liverpool in England used machine learning to streamline the creative process. The researchers developed a neural network that ranked chemical combinations by how likely they were to result in a useful new material. Then the scientists used these rankings to guide their experiments in the laboratory. They identified four promising candidates for battery materials without having to test everything on their list, saving them months of trial and error…(More)”.