Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing


Book by Ron Kohavi, Diane Tang, and Ya Xu: “Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven decisions.

Learn how to use the scientific method to evaluate hypotheses using controlled experiments Define key metrics and ideally an Overall Evaluation Criterion Test for trustworthiness of the results and alert experimenters to violated assumptions. Build a scalable platform that lowers the marginal cost of experiments close to zero. Avoid pitfalls like carryover effects and Twyman’s law. Understand how statistical issues play out in practice….(More)”.

A controlled trial for reproducibility


Marc P. Raphael, Paul E. Sheehan & Gary J. Vora at Nature: “In 2016, the US Defense Advanced Research Projects Agency (DARPA) told eight research groups that their proposals had made it through the review gauntlet and would soon get a few million dollars from its Biological Technologies Office (BTO). Along with congratulations, the teams received a reminder that their award came with an unusual requirement — an independent shadow team of scientists tasked with reproducing their results.

Thus began an intense, multi-year controlled trial in reproducibility. Each shadow team consists of three to five researchers, who visit the ‘performer’ team’s laboratory and often host visits themselves. Between 3% and 8% of the programme’s total funds go to this independent validation and verification (IV&V) work. But DARPA has the flexibility and resources for such herculean efforts to assess essential techniques. In one unusual instance, an IV&V laboratory needed a sophisticated US$200,000 microscopy and microfluidic set-up to make an accurate assessment.

These costs are high, but we think they are an essential investment to avoid wasting taxpayers’ money and to advance fundamental research towards beneficial applications. Here, we outline what we’ve learnt from implementing this programme, and how it could be applied more broadly….(More)”.

Why resilience to online disinformation varies between countries


Edda Humprecht at the Democratic Audit: “The massive spread of online disinformation, understood as content intentionally produced to mislead others, has been widely discussed in the context of the UK Brexit referendum and the US general election in 2016. However, in many other countries online disinformation seems to be less prevalent. It seems certain countries are better equipped to face the problems of the digital era, demonstrating a resilience to manipulation attempts. In other words, citizens in these countries are better able to adapt to overcome challenges such as the massive spread of online disinformation and their exposure to it. So, do structural conditions render countries more or less resilient towards online disinformation?

As a first step to answering this question, in new research with Frank Esser and Peter Van Aelst, we identified the structural conditions that are theoretically linked to resilience to online disinformation, which relate to different political, media and economic environments. To test these expectations, we then identified quantifiable indicators for these theoretical conditions, which allowed us to measure their significance for 18 Western democracies. A cluster analysis then yielded three country groups: one group with high resilience to online disinformation (including the Northern European countries) and two country groups with low resilience (including Southern European countries and the US).

Conditions for resilience: political, media and economic environments

In polarised political environments, citizens are confronted with different deviating representations of reality and therefore it becomes increasingly difficult for them to distinguish between false and correct information. Thus, societal polarisation is likely to decrease resilience to online disinformation. Moreover, research has shown that both populism and partisan disinformation share a binary Manichaeanworldview, comprising anti-elitism, mistrust of expert knowledge and a belief in conspiracy theories. As a consequence of these combined influences, citizens can obtain inaccurate perceptions of reality. Thus, in environments with high levels of populist communication, online users are exposed to more disinformation.

Another condition that has been linked to resilience to online disinformation in previous research is trust in news media. Previous research has shown that in environments in which distrust in news media is higher, people are less likely to be exposed to a variety of sources of political information and to critically evaluate those. In this vein,the level of knowledge that people gain is likely to play an important role when confronted with online disinformation. Research has shown that in countries with wide-reaching public service media, citizens’ knowledge about public affairs is higher compared to countries with marginalised public service media. Therefore, it can be assumed that environments with weak public broadcasting services (PBS) are less resilient to online disinformation….

Looking at the economic environment, false social media content is often produced in pursuit of advertising revenue, as was the case with the Macedonian ‘fake news factories’ during the 2016 US presidential election. It is especially appealing for producers to publish this kind of content if the potential readership is large. Thus, large-size advertising markets with a high number of potential users are less resistant to disinformation than smaller-size markets….(More)”.

Disinformation is particularly prevalent on social media and in countries with very many social media users, it is easier for rumour-spreaders to build partisan follower networks. Moreover, it has been found that a media diet mainly consisting of news from social media limits political learning and leads to less knowledge of public affairs compared to other media source. From this, societies with a high rate of social media users are more vulnerable to online disinformation spreading rapidly than other societies…(More)”.

Now Is the Time for Open Access Policies—Here’s Why



Victoria Heath and Brigitte Vézina at Creative Commons: “Over the weekend, news emerged that upset even the most ardent skeptics of open access. Under the headline, “Trump vs Berlin” the German newspaper Welt am Sonntag reported that President Trump offered $1 billion USD to the German biopharmaceutical company CureVac to secure their COVID-19 vaccine “only for the United States.”

In response, Jens Spahn, the German health minister said such a deal was completely “off the table” and Peter Altmaier, the German economic minister replied, “Germany is not for sale.” Open science advocates were especially infuriated. Professor Lorraine Leeson of Trinity College Dublin, for example, tweeted, “This is NOT the time for this kind of behavior—it flies in the face of the #OpenScience work that is helping us respond meaningfully right now. This is the time for solidarity, not exclusivity.” The White House and CureVac have since denied the report. 

Today, we find ourselves at a pivotal moment in history—we must cooperate effectively to respond to an unprecedented global health emergency. The mantra, “when we share, everyone wins” applies now more than ever. With this in mind, we felt it imperative to underscore the importance of open access, specifically open science, in times of crisis.

Why open access matters, especially during a global health emergency 

One of the most important components of maintaining global health, specifically in the face of urgent threats, is the creation and dissemination of reliable, up-to-date scientific information to the public, government officials, humanitarian and health workers, as well as scientists.

Several scientific research funders like the Gates Foundation, the Hewlett Foundation, and the Wellcome Trust have long-standing open access policies and some have now called for increased efforts to share COVID-19 related research rapidly and openly to curb the outbreak. By licensing material under a CC BY-NC-SA license, the World Health Organization (WHO) is adopting a more conservative approach to open access that falls short of what the scientific community urgently needs in order to access and build upon critical information….(More)”.

Crowdsourcing hypothesis tests: making transparent how design choices shape research results


Paper by J.F. Landy and Leonid Tiokhin: “To what extent are research results influenced by subjective decisions that scientists make as they design studies?

Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams rendered statistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses.

Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim….(More)”.

The Power of Experiments: Decision Making in a Data-Driven World


Book by By Michael Luca and Max H. Bazerman: “Have you logged into Facebook recently? Searched for something on Google? Chosen a movie on Netflix? If so, you’ve probably been an unwitting participant in a variety of experiments—also known as randomized controlled trials—designed to test the impact of different online experiences. Once an esoteric tool for academic research, the randomized controlled trial has gone mainstream. No tech company worth its salt (or its share price) would dare make major changes to its platform without first running experiments to understand how they would influence user behavior. In this book, Michael Luca and Max Bazerman explain the importance of experiments for decision making in a data-driven world.

Luca and Bazerman describe the central role experiments play in the tech sector, drawing lessons and best practices from the experiences of such companies as StubHub, Alibaba, and Uber. Successful experiments can save companies money—eBay, for example, discovered how to cut $50 million from its yearly advertising budget—or bring to light something previously ignored, as when Airbnb was forced to confront rampant discrimination by its hosts. Moving beyond tech, Luca and Bazerman consider experimenting for the social good—different ways that governments are using experiments to influence or “nudge” behavior ranging from voter apathy to school absenteeism. Experiments, they argue, are part of any leader’s toolkit. With this book, readers can become part of “the experimental revolution.”…(More)”.

Beyond Randomized Controlled Trials


Iqbal Dhaliwal, John Floretta & Sam Friedlander at SSIR: “…In its post-Nobel phase, one of J-PAL’s priorities is to unleash the treasure troves of big digital data in the hands of governments, nonprofits, and private firms. Primary data collection is by far the most time-, money-, and labor-intensive component of the vast majority of experiments that evaluate social policies. Randomized evaluations have been constrained by simple numbers: Some questions are just too big or expensive to answer. Leveraging administrative data has the potential to dramatically expand the types of questions we can ask and the experiments we can run, as well as implement quicker, less expensive, larger, and more reliable RCTs, an invaluable opportunity to scale up evidence-informed policymaking massively without dramatically increasing evaluation budgets.

Although administrative data hasn’t always been of the highest quality, recent advances have significantly increased the reliability and accuracy of GPS coordinates, biometrics, and digital methods of collection. But despite good intentions, many implementers—governments, businesses, and big NGOs—aren’t currently using the data they already collect on program participants and outcomes to improve anti-poverty programs and policies. This may be because they aren’t aware of its potential, don’t have the in-house technical capacity necessary to create use and privacy guidelines or analyze the data, or don’t have established partnerships with researchers who can collaborate to design innovative programs and run rigorous experiments to determine which are the most impactful. 

At J-PAL, we are leveraging this opportunity through a new global research initiative we are calling the “Innovations in Data and Experiments for Action” Initiative (IDEA). IDEA supports implementers to make their administrative data accessible, analyze it to improve decision-making, and partner with researchers in using this data to design innovative programs, evaluate impact through RCTs, and scale up successful ideas. IDEA will also build the capacity of governments and NGOs to conduct these types of activities with their own data in the future….(More)”.

Invest 5% of research funds in ensuring data are reusable


Barend Mons at Nature: “It is irresponsible to support research but not data stewardship…

Many of the world’s hardest problems can be tackled only with data-intensive, computer-assisted research. And I’d speculate that the vast majority of research data are never published. Huge sums of taxpayer funds go to waste because such data cannot be reused. Policies for data reuse are falling into place, but fixing the situation will require more resources than the scientific community is willing to face.

In 2013, I was part of a group of Dutch experts from many disciplines that called on our national science funder to support data stewardship. Seven years later, policies that I helped to draft are starting to be put into practice. These require data created by machines and humans to meet the FAIR principles (that is, they are findable, accessible, interoperable and reusable). I now direct an international Global Open FAIR office tasked with helping communities to implement the guidelines, and I am convinced that doing so will require a large cadre of professionals, about one for every 20 researchers.

Even when data are shared, the metadata, expertise, technologies and infrastructure necessary for reuse are lacking. Most published data sets are scattered into ‘supplemental files’ that are often impossible for machines or even humans to find. These and other sloppy data practices keep researchers from building on each other’s work. In cases of disease outbreaks, for instance, this might even cost lives….(More)”.

Facial Recognition Software requires Checks and Balances


David Eaves,  and Naeha Rashid in Policy Options: “A few weeks ago, members of the Nexus traveller identification program were notified that Canadian Border Services is upgrading its automated system, from iris scanners to facial recognition technology. This is meant to simplify identification and increase efficiency without compromising security. But it also raises profound questions concerning how we discuss and develop public policies around such technology – questions that may not be receiving sufficiently open debate in the rush toward promised greater security.

Analogous to the U.S. Customs and Border Protection (CBP) program Global Entry, Nexus is a joint Canada-US border control system designed for low-risk, pre-approved travellers. Nexus does provide a public good, and there are valid reasons to improve surveillance at airports. Even before 9/11, border surveillance was an accepted annoyance and since then, checkpoint operations have become more vigilant and complex in response to the public demand for safety.

Nexus is one of the first North America government-sponsored services to adopt facial recognition, and as such it could be a pilot program that other services will follow. Left unchecked, the technology will likely become ubiquitous at North American border crossings within the next decade, and it will probably be adopted by governments to solve domestic policy challenges.

Facial recognition software is imperfect and has documented bias, but it will continue to improve and become superior to humans in identifying individuals. Given this, questions arise such as, what policies guide the use of this technology? What policies should inform future government use? In our headlong rush toward enhanced security, we risk replicating the justification the used by the private sector in an attempt to balance effectiveness, efficiency and privacy.

One key question involves citizens’ capacity to consent. Previously, Nexus members submitted to fingerprint and retinal scans – biometric markers that are relatively unique and enable government to verify identity at the border. Facial recognition technology uses visual data and seeks, analyzes, and stores identifying facial information in a database, which is then used to compare with new images and video….(More)”.

Mapping Wikipedia


Michael Mandiberg at The Atlantic: “Wikipedia matters. In a time of extreme political polarization, algorithmically enforced filter bubbles, and fact patterns dismissed as fake news, Wikipedia has become one of the few places where we can meet to write a shared reality. We treat it like a utility, and the U.S. and U.K. trust it about as much as the news.

But we know very little about who is writing the world’s encyclopedia. We do know that just because anyone can edit, doesn’t mean that everyone does: The site’s editors are disproportionately cis white men from the global North. We also know that, as with most of the internet, a small number of the editors do a large amount of the editing. But that’s basically it: In the interest of improving retention, the Wikimedia Foundation’s own research focuses on the motivations of people who do edit, not on those who don’t. The media, meanwhile, frequently focus on Wikipedia’s personality stories, even when covering the bigger questions. And Wikipedia’s own culture pushes back against granular data harvesting: The Wikimedia Foundation’s strong data-privacy rules guarantee users’ anonymity and limit the modes and duration of their own use of editor data.

But as part of my research in producing Print Wikipedia, I discovered a data set that can offer an entry point into the geography of Wikipedia’s contributors. Every time anyone edits Wikipedia, the software records the text added or removed, the time of the edit, and the username of the editor. (This edit history is part of Wikipedia’s ethos of radical transparency: Everyone is anonymous, and you can see what everyone is doing.) When an editor isn’t logged in with a username, the software records that user’s IP address. I parsed all of the 884 million edits to English Wikipedia to collect and geolocate the 43 million IP addresses that have edited English Wikipedia. I also counted 8.6 million username editors who have made at least one edit to an article.

The result is a set of maps that offer, for the first time, insight into where the millions of volunteer editors who build and maintain English Wikipedia’s 5 million pages are—and, maybe more important, where they aren’t….

Like the Enlightenment itself, the modern encyclopedia has a history entwined with colonialism. Encyclopédie aimed to collect and disseminate all the world’s knowledge—but in the end, it could not escape the biases of its colonial context. Likewise, Napoleon’s Description de l’Égypte augmented an imperial military campaign with a purportedly objective study of the nation, which was itself an additional form of conquest. If Wikipedia wants to break from the past and truly live up to its goal to compile the sum of all human knowledge, it requires the whole world’s participation….(More)”.