Synthetic data offers advanced privacy for the Census Bureau, business


Kate Kaye at IAPP: “In the early 2000s, internet accessibility made risks of exposing individuals from population demographic data more likely than ever. So, the U.S. Census Bureau turned to an emerging privacy approach: synthetic data.

Some argue the algorithmic techniques used to develop privacy-secure synthetic datasets go beyond traditional deidentification methods. Today, along with the Census Bureau, clinical researchers, autonomous vehicle system developers and banks use these fake datasets that mimic statistically valid data.

In many cases, synthetic data is built from existing data by filtering it through machine learning models. Real data representing real individuals flows in, and fake data mimicking individuals with corresponding characteristics flows out.

When data scientists at the Census Bureau began exploring synthetic data methods, adoption of the internet had made deidentified, open-source data on U.S. residents, their households and businesses more accessible than in the past.

Especially concerning, census-block-level information was now widely available. Because in rural areas, a census block could represent data associated with as few as one house, simply stripping names, addresses and phone numbers from that information might not be enough to prevent exposure of individuals.

“There was pretty widespread angst” among statisticians, said John Abowd, the bureau’s associate director for research and methodology and chief scientist. The hand-wringing led to a “gradual awakening” that prompted the agency to begin developing synthetic data methods, he said.

Synthetic data built from the real data preserves privacy while providing information that is still relevant for research purposes, Abowd said: “The basic idea is to try to get a model that accurately produces an image of the confidential data.”

The plan for the 2020 census is to produce a synthetic image of that original data. The bureau also produces On the Map, a web-based mapping and reporting application that provides synthetic data showing where workers are employed and where they live along with reports on age, earnings, industry distributions, race, ethnicity, educational attainment and sex.

Of course, the real census data is still locked away, too, Abowd said: “We have a copy and the national archives have a copy of the confidential microdata.”…(More)”.

Birth of Intelligence: From RNA to Artificial Intelligence


Book by Daeyeol Lee: “What is intelligence? How did it begin and evolve to human intelligence? Does a high level of biological intelligence require a complex brain? Can man-made machines be truly intelligent? Is AI fundamentally different from human intelligence? In Birth of Intelligence, distinguished neuroscientist Daeyeol Lee tackles these pressing fundamental issues. To better prepare for future society and its technology, including how the use of AI will impact our lives, it is essential to understand the biological root and limits of human intelligence. After systematically reviewing biological and computational underpinnings of decision making and intelligent behaviors, Birth of Intelligence proposes that true intelligence requires life…(More)”.

The Rules of Contagion: Why Things Spread–And Why They Stop


Book by Adam Kucharski: “From ideas and infections to financial crises and “fake news,” why the science of outbreaks is the science of modern life.


These days, whenever anything spreads, whether it’s a YouTube fad or a political rumor, we say it went viral. But how does virality actually work? In The Rules of Contagion, epidemiologist Adam Kucharski explores topics including gun violence, online manipulation, and, of course, outbreaks of disease to show how much we get wrong about contagion, and how astonishing the real science is.
Why did the president retweet a Mussolini quote as his own? Why do financial bubbles take off so quickly? Why are disinformation campaigns so effective? And what makes the emergence of new illnesses–such as MERS, SARS, or the coronavirus disease COVID-19–so challenging? By uncovering the crucial factors driving outbreaks, we can see how things really spread — and what we can do about it….(More)”.

Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing


Book by Ron Kohavi, Diane Tang, and Ya Xu: “Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven decisions.

Learn how to use the scientific method to evaluate hypotheses using controlled experiments Define key metrics and ideally an Overall Evaluation Criterion Test for trustworthiness of the results and alert experimenters to violated assumptions. Build a scalable platform that lowers the marginal cost of experiments close to zero. Avoid pitfalls like carryover effects and Twyman’s law. Understand how statistical issues play out in practice….(More)”.

A controlled trial for reproducibility


Marc P. Raphael, Paul E. Sheehan & Gary J. Vora at Nature: “In 2016, the US Defense Advanced Research Projects Agency (DARPA) told eight research groups that their proposals had made it through the review gauntlet and would soon get a few million dollars from its Biological Technologies Office (BTO). Along with congratulations, the teams received a reminder that their award came with an unusual requirement — an independent shadow team of scientists tasked with reproducing their results.

Thus began an intense, multi-year controlled trial in reproducibility. Each shadow team consists of three to five researchers, who visit the ‘performer’ team’s laboratory and often host visits themselves. Between 3% and 8% of the programme’s total funds go to this independent validation and verification (IV&V) work. But DARPA has the flexibility and resources for such herculean efforts to assess essential techniques. In one unusual instance, an IV&V laboratory needed a sophisticated US$200,000 microscopy and microfluidic set-up to make an accurate assessment.

These costs are high, but we think they are an essential investment to avoid wasting taxpayers’ money and to advance fundamental research towards beneficial applications. Here, we outline what we’ve learnt from implementing this programme, and how it could be applied more broadly….(More)”.

Why resilience to online disinformation varies between countries


Edda Humprecht at the Democratic Audit: “The massive spread of online disinformation, understood as content intentionally produced to mislead others, has been widely discussed in the context of the UK Brexit referendum and the US general election in 2016. However, in many other countries online disinformation seems to be less prevalent. It seems certain countries are better equipped to face the problems of the digital era, demonstrating a resilience to manipulation attempts. In other words, citizens in these countries are better able to adapt to overcome challenges such as the massive spread of online disinformation and their exposure to it. So, do structural conditions render countries more or less resilient towards online disinformation?

As a first step to answering this question, in new research with Frank Esser and Peter Van Aelst, we identified the structural conditions that are theoretically linked to resilience to online disinformation, which relate to different political, media and economic environments. To test these expectations, we then identified quantifiable indicators for these theoretical conditions, which allowed us to measure their significance for 18 Western democracies. A cluster analysis then yielded three country groups: one group with high resilience to online disinformation (including the Northern European countries) and two country groups with low resilience (including Southern European countries and the US).

Conditions for resilience: political, media and economic environments

In polarised political environments, citizens are confronted with different deviating representations of reality and therefore it becomes increasingly difficult for them to distinguish between false and correct information. Thus, societal polarisation is likely to decrease resilience to online disinformation. Moreover, research has shown that both populism and partisan disinformation share a binary Manichaeanworldview, comprising anti-elitism, mistrust of expert knowledge and a belief in conspiracy theories. As a consequence of these combined influences, citizens can obtain inaccurate perceptions of reality. Thus, in environments with high levels of populist communication, online users are exposed to more disinformation.

Another condition that has been linked to resilience to online disinformation in previous research is trust in news media. Previous research has shown that in environments in which distrust in news media is higher, people are less likely to be exposed to a variety of sources of political information and to critically evaluate those. In this vein,the level of knowledge that people gain is likely to play an important role when confronted with online disinformation. Research has shown that in countries with wide-reaching public service media, citizens’ knowledge about public affairs is higher compared to countries with marginalised public service media. Therefore, it can be assumed that environments with weak public broadcasting services (PBS) are less resilient to online disinformation….

Looking at the economic environment, false social media content is often produced in pursuit of advertising revenue, as was the case with the Macedonian ‘fake news factories’ during the 2016 US presidential election. It is especially appealing for producers to publish this kind of content if the potential readership is large. Thus, large-size advertising markets with a high number of potential users are less resistant to disinformation than smaller-size markets….(More)”.

Disinformation is particularly prevalent on social media and in countries with very many social media users, it is easier for rumour-spreaders to build partisan follower networks. Moreover, it has been found that a media diet mainly consisting of news from social media limits political learning and leads to less knowledge of public affairs compared to other media source. From this, societies with a high rate of social media users are more vulnerable to online disinformation spreading rapidly than other societies…(More)”.

Now Is the Time for Open Access Policies—Here’s Why



Victoria Heath and Brigitte Vézina at Creative Commons: “Over the weekend, news emerged that upset even the most ardent skeptics of open access. Under the headline, “Trump vs Berlin” the German newspaper Welt am Sonntag reported that President Trump offered $1 billion USD to the German biopharmaceutical company CureVac to secure their COVID-19 vaccine “only for the United States.”

In response, Jens Spahn, the German health minister said such a deal was completely “off the table” and Peter Altmaier, the German economic minister replied, “Germany is not for sale.” Open science advocates were especially infuriated. Professor Lorraine Leeson of Trinity College Dublin, for example, tweeted, “This is NOT the time for this kind of behavior—it flies in the face of the #OpenScience work that is helping us respond meaningfully right now. This is the time for solidarity, not exclusivity.” The White House and CureVac have since denied the report. 

Today, we find ourselves at a pivotal moment in history—we must cooperate effectively to respond to an unprecedented global health emergency. The mantra, “when we share, everyone wins” applies now more than ever. With this in mind, we felt it imperative to underscore the importance of open access, specifically open science, in times of crisis.

Why open access matters, especially during a global health emergency 

One of the most important components of maintaining global health, specifically in the face of urgent threats, is the creation and dissemination of reliable, up-to-date scientific information to the public, government officials, humanitarian and health workers, as well as scientists.

Several scientific research funders like the Gates Foundation, the Hewlett Foundation, and the Wellcome Trust have long-standing open access policies and some have now called for increased efforts to share COVID-19 related research rapidly and openly to curb the outbreak. By licensing material under a CC BY-NC-SA license, the World Health Organization (WHO) is adopting a more conservative approach to open access that falls short of what the scientific community urgently needs in order to access and build upon critical information….(More)”.

Crowdsourcing hypothesis tests: making transparent how design choices shape research results


Paper by J.F. Landy and Leonid Tiokhin: “To what extent are research results influenced by subjective decisions that scientists make as they design studies?

Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams rendered statistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses.

Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim….(More)”.

The Power of Experiments: Decision Making in a Data-Driven World


Book by By Michael Luca and Max H. Bazerman: “Have you logged into Facebook recently? Searched for something on Google? Chosen a movie on Netflix? If so, you’ve probably been an unwitting participant in a variety of experiments—also known as randomized controlled trials—designed to test the impact of different online experiences. Once an esoteric tool for academic research, the randomized controlled trial has gone mainstream. No tech company worth its salt (or its share price) would dare make major changes to its platform without first running experiments to understand how they would influence user behavior. In this book, Michael Luca and Max Bazerman explain the importance of experiments for decision making in a data-driven world.

Luca and Bazerman describe the central role experiments play in the tech sector, drawing lessons and best practices from the experiences of such companies as StubHub, Alibaba, and Uber. Successful experiments can save companies money—eBay, for example, discovered how to cut $50 million from its yearly advertising budget—or bring to light something previously ignored, as when Airbnb was forced to confront rampant discrimination by its hosts. Moving beyond tech, Luca and Bazerman consider experimenting for the social good—different ways that governments are using experiments to influence or “nudge” behavior ranging from voter apathy to school absenteeism. Experiments, they argue, are part of any leader’s toolkit. With this book, readers can become part of “the experimental revolution.”…(More)”.

Beyond Randomized Controlled Trials


Iqbal Dhaliwal, John Floretta & Sam Friedlander at SSIR: “…In its post-Nobel phase, one of J-PAL’s priorities is to unleash the treasure troves of big digital data in the hands of governments, nonprofits, and private firms. Primary data collection is by far the most time-, money-, and labor-intensive component of the vast majority of experiments that evaluate social policies. Randomized evaluations have been constrained by simple numbers: Some questions are just too big or expensive to answer. Leveraging administrative data has the potential to dramatically expand the types of questions we can ask and the experiments we can run, as well as implement quicker, less expensive, larger, and more reliable RCTs, an invaluable opportunity to scale up evidence-informed policymaking massively without dramatically increasing evaluation budgets.

Although administrative data hasn’t always been of the highest quality, recent advances have significantly increased the reliability and accuracy of GPS coordinates, biometrics, and digital methods of collection. But despite good intentions, many implementers—governments, businesses, and big NGOs—aren’t currently using the data they already collect on program participants and outcomes to improve anti-poverty programs and policies. This may be because they aren’t aware of its potential, don’t have the in-house technical capacity necessary to create use and privacy guidelines or analyze the data, or don’t have established partnerships with researchers who can collaborate to design innovative programs and run rigorous experiments to determine which are the most impactful. 

At J-PAL, we are leveraging this opportunity through a new global research initiative we are calling the “Innovations in Data and Experiments for Action” Initiative (IDEA). IDEA supports implementers to make their administrative data accessible, analyze it to improve decision-making, and partner with researchers in using this data to design innovative programs, evaluate impact through RCTs, and scale up successful ideas. IDEA will also build the capacity of governments and NGOs to conduct these types of activities with their own data in the future….(More)”.