A data sharing method in the open web environment: Data sharing in hydrology


Paper by Jin Wang et al: “Data sharing plays a fundamental role in providing data resources for geographic modeling and simulation. Although there are many successful cases of data sharing through the web, current practices for sharing data mostly focus on data publication using metadata at the file level, which requires identifying, restructuring and synthesizing raw data files for further usage. In hydrology, because the same hydrological information is often stored in data files with different formats, modelers should identify the required information from multisource data sets and then customize data requirements for their applications. However, these data customization tasks are difficult to repeat, which leads to repetitive labor. This paper presents a data sharing method that provides a solution for data manipulation based on a structured data description model rather than raw data files. With the structured data description model, multisource hydrological data can be accessed and processed in a unified way and published as data services using a designed data server. This study also proposes a data configuration manager to customize data requirements through an interactive programming tool, which can help in using the data services. In addition, a component-based data viewer is developed for the visualization of multisource data in a sharable visualization scheme. A case study that involves sharing and applying hydrological data is designed to examine the applicability and feasibility of the proposed data sharing method….(More)”.

Epistemic Humility—Knowing Your Limits in a Pandemic


Essay by Erik Angner: “Ignorance,” wrote Charles Darwin in 1871, “more frequently begets confidence than does knowledge.”

Darwin’s insight is worth keeping in mind when dealing with the current coronavirus crisis. That includes those of us who are behavioral scientists. Overconfidence—and a lack of epistemic humility more broadly—can cause real harm.

In the middle of a pandemic, knowledge is in short supply. We don’t know how many people are infected, or how many people will be. We have much to learn about how to treat the people who are sick—and how to help prevent infection in those who aren’t. There’s reasonable disagreement on the best policies to pursue, whether about health care, economics, or supply distribution. Although scientists worldwide are working hard and in concert to address these questions, final answers are some ways away.

Another thing that’s in short supply is the realization of how little we know. Even a quick glance at social or traditional media will reveal many people who express themselves with way more confidence than they should…

Frequent expressions of supreme confidence might seem odd in light of our obvious and inevitable ignorance about a new threat. The thing about overconfidence, though, is that it afflicts most of us much of the time. That’s according to cognitive psychologists, who’ve studied the phenomenon systematically for half a century. Overconfidence has been called “the mother of all psychological biases.” The research has led to findings that are at the same time hilarious and depressing. In one classic study, for example, 93 percent of U.S. drivers claimed to be more skillful than the median—which is not possible.

“But surely,” you might object, “overconfidence is only for amateurs—experts would not behave like this.” Sadly, being an expert in some domain does not protect against overconfidence. Some research suggests that the more knowledgeable are more prone to overconfidence. In a famous study of clinical psychologists and psychology students, researchers asked a series of questions about a real person described in psychological literature. As the participants received more and more information about the case, their confidence in their judgment grew—but the quality of their judgment did not. And psychologists with a Ph.D. did no better than the students….(More)”.

We Have the Power to Destroy Ourselves Without the Wisdom to Ensure That We Don’t


EdgeCast by Toby Ord: “Lately, I’ve been asking myself questions about the future of humanity, not just about the next five years or even the next hundred years, but about everything humanity might be able to achieve in the time to come.

The past of humanity is about 200,000 years. That’s how long Homo sapiens have been around according to our current best guess (it might be a little bit longer). Maybe we should even include some of our other hominid ancestors and think about humanity somewhat more broadly. If we play our cards right, we could live hundreds of thousands of years more. In fact, there’s not much stopping us living millions of years. The typical species lives about a million years. Our 200,000 years so far would put us about in our adolescence, just old enough to be getting ourselves in trouble, but not wise enough to have thought through how we should act.

But a million years isn’t an upper bound for how long we could live. The horseshoe crab, for example, has lived for 450 million years so far. The Earth should remain habitable for at least that long. So, if we can survive as long as the horseshoe crab, we could have a future stretching millions of centuries from now. That’s millions of centuries of human progress, human achievement, and human flourishing. And if we could learn over that time how to reach out a little bit further into the cosmos to get to the planets around other stars, then we could have longer yet. If we went seven light-years at a time just making jumps of that distance, we could reach almost every star in the galaxy by continually spreading out from the new location. There are already plans in progress to send spacecraft these types of distances. If we could do that, the whole galaxy would open up to us….

Humanity is not a typical species. One of the things that most worries me is the way in which our technology might put us at risk. If we look back at the history of humanity these 2000 centuries, we see this initially gradual accumulation of knowledge and power. If you think back to the earliest humans, they weren’t that remarkable compared to the other species around them. An individual human is not that remarkable on the Savanna compared to a cheetah, or lion, or gazelle, but what set us apart was our ability to work together, to cooperate with other humans to form something greater than ourselves. It was teamwork, the ability to work together with those of us in the same tribe that let us expand to dozens of humans working together in cooperation. But much more important than that was our ability to cooperate across time, across the generations. By making small innovations and passing them on to our children, we were able to set a chain in motion wherein generations of people worked across time, slowly building up these innovations and technologies and accumulating power….(More)”.

Covid-19 Changed How the World Does Science, Together


Matt Apuzzo and David D. Kirkpatrick at The New York Times: “…Normal imperatives like academic credit have been set aside. Online repositories make studies available months ahead of journals. Researchers have identified and shared hundreds of viral genome sequences. More than 200 clinical trials have been launched, bringing together hospitals and laboratories around the globe.

“I never hear scientists — true scientists, good quality scientists — speak in terms of nationality,” said Dr. Francesco Perrone, who is leading a coronavirus clinical trial in Italy. “My nation, your nation. My language, your language. My geographic location, your geographic location. This is something that is really distant from true top-level scientists.”

On a recent morning, for example, scientists at the University of Pittsburgh discovered that a ferret exposed to Covid-19 particles had developed a high fever — a potential advance toward animal vaccine testing. Under ordinary circumstances, they would have started work on an academic journal article.

“But you know what? There is going to be plenty of time to get papers published,” said Paul Duprex, a virologist leading the university’s vaccine research. Within two hours, he said, he had shared the findings with scientists around the world on a World Health Organization conference call. “It is pretty cool, right? You cut the crap, for lack of a better word, and you get to be part of a global enterprise.”…

Several scientists said the closest comparison to this moment might be the height of the AIDS epidemic in the 1990s, when scientists and doctors locked arms to combat the disease. But today’s technology and the pace of information-sharing dwarfs what was possible three decades ago.

As a practical matter, medical scientists today have little choice but to study the coronavirus if they want to work at all. Most other laboratory research has been put on hold because of social distancing, lockdowns or work-from-home restrictions.

The pandemic is also eroding the secrecy that pervades academic medical research, said Dr. Ryan Carroll, a Harvard Medical professor who is involved in the coronavirus trial there. Big, exclusive research can lead to grants, promotions and tenure, so scientists often work in secret, suspiciously hoarding data from potential competitors, he said.

“The ability to work collaboratively, setting aside your personal academic progress, is occurring right now because it’s a matter of survival,” he said….(More)”.

Synthetic data offers advanced privacy for the Census Bureau, business


Kate Kaye at IAPP: “In the early 2000s, internet accessibility made risks of exposing individuals from population demographic data more likely than ever. So, the U.S. Census Bureau turned to an emerging privacy approach: synthetic data.

Some argue the algorithmic techniques used to develop privacy-secure synthetic datasets go beyond traditional deidentification methods. Today, along with the Census Bureau, clinical researchers, autonomous vehicle system developers and banks use these fake datasets that mimic statistically valid data.

In many cases, synthetic data is built from existing data by filtering it through machine learning models. Real data representing real individuals flows in, and fake data mimicking individuals with corresponding characteristics flows out.

When data scientists at the Census Bureau began exploring synthetic data methods, adoption of the internet had made deidentified, open-source data on U.S. residents, their households and businesses more accessible than in the past.

Especially concerning, census-block-level information was now widely available. Because in rural areas, a census block could represent data associated with as few as one house, simply stripping names, addresses and phone numbers from that information might not be enough to prevent exposure of individuals.

“There was pretty widespread angst” among statisticians, said John Abowd, the bureau’s associate director for research and methodology and chief scientist. The hand-wringing led to a “gradual awakening” that prompted the agency to begin developing synthetic data methods, he said.

Synthetic data built from the real data preserves privacy while providing information that is still relevant for research purposes, Abowd said: “The basic idea is to try to get a model that accurately produces an image of the confidential data.”

The plan for the 2020 census is to produce a synthetic image of that original data. The bureau also produces On the Map, a web-based mapping and reporting application that provides synthetic data showing where workers are employed and where they live along with reports on age, earnings, industry distributions, race, ethnicity, educational attainment and sex.

Of course, the real census data is still locked away, too, Abowd said: “We have a copy and the national archives have a copy of the confidential microdata.”…(More)”.

Birth of Intelligence: From RNA to Artificial Intelligence


Book by Daeyeol Lee: “What is intelligence? How did it begin and evolve to human intelligence? Does a high level of biological intelligence require a complex brain? Can man-made machines be truly intelligent? Is AI fundamentally different from human intelligence? In Birth of Intelligence, distinguished neuroscientist Daeyeol Lee tackles these pressing fundamental issues. To better prepare for future society and its technology, including how the use of AI will impact our lives, it is essential to understand the biological root and limits of human intelligence. After systematically reviewing biological and computational underpinnings of decision making and intelligent behaviors, Birth of Intelligence proposes that true intelligence requires life…(More)”.

The Rules of Contagion: Why Things Spread–And Why They Stop


Book by Adam Kucharski: “From ideas and infections to financial crises and “fake news,” why the science of outbreaks is the science of modern life.


These days, whenever anything spreads, whether it’s a YouTube fad or a political rumor, we say it went viral. But how does virality actually work? In The Rules of Contagion, epidemiologist Adam Kucharski explores topics including gun violence, online manipulation, and, of course, outbreaks of disease to show how much we get wrong about contagion, and how astonishing the real science is.
Why did the president retweet a Mussolini quote as his own? Why do financial bubbles take off so quickly? Why are disinformation campaigns so effective? And what makes the emergence of new illnesses–such as MERS, SARS, or the coronavirus disease COVID-19–so challenging? By uncovering the crucial factors driving outbreaks, we can see how things really spread — and what we can do about it….(More)”.

Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing


Book by Ron Kohavi, Diane Tang, and Ya Xu: “Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven decisions.

Learn how to use the scientific method to evaluate hypotheses using controlled experiments Define key metrics and ideally an Overall Evaluation Criterion Test for trustworthiness of the results and alert experimenters to violated assumptions. Build a scalable platform that lowers the marginal cost of experiments close to zero. Avoid pitfalls like carryover effects and Twyman’s law. Understand how statistical issues play out in practice….(More)”.

A controlled trial for reproducibility


Marc P. Raphael, Paul E. Sheehan & Gary J. Vora at Nature: “In 2016, the US Defense Advanced Research Projects Agency (DARPA) told eight research groups that their proposals had made it through the review gauntlet and would soon get a few million dollars from its Biological Technologies Office (BTO). Along with congratulations, the teams received a reminder that their award came with an unusual requirement — an independent shadow team of scientists tasked with reproducing their results.

Thus began an intense, multi-year controlled trial in reproducibility. Each shadow team consists of three to five researchers, who visit the ‘performer’ team’s laboratory and often host visits themselves. Between 3% and 8% of the programme’s total funds go to this independent validation and verification (IV&V) work. But DARPA has the flexibility and resources for such herculean efforts to assess essential techniques. In one unusual instance, an IV&V laboratory needed a sophisticated US$200,000 microscopy and microfluidic set-up to make an accurate assessment.

These costs are high, but we think they are an essential investment to avoid wasting taxpayers’ money and to advance fundamental research towards beneficial applications. Here, we outline what we’ve learnt from implementing this programme, and how it could be applied more broadly….(More)”.

Why resilience to online disinformation varies between countries


Edda Humprecht at the Democratic Audit: “The massive spread of online disinformation, understood as content intentionally produced to mislead others, has been widely discussed in the context of the UK Brexit referendum and the US general election in 2016. However, in many other countries online disinformation seems to be less prevalent. It seems certain countries are better equipped to face the problems of the digital era, demonstrating a resilience to manipulation attempts. In other words, citizens in these countries are better able to adapt to overcome challenges such as the massive spread of online disinformation and their exposure to it. So, do structural conditions render countries more or less resilient towards online disinformation?

As a first step to answering this question, in new research with Frank Esser and Peter Van Aelst, we identified the structural conditions that are theoretically linked to resilience to online disinformation, which relate to different political, media and economic environments. To test these expectations, we then identified quantifiable indicators for these theoretical conditions, which allowed us to measure their significance for 18 Western democracies. A cluster analysis then yielded three country groups: one group with high resilience to online disinformation (including the Northern European countries) and two country groups with low resilience (including Southern European countries and the US).

Conditions for resilience: political, media and economic environments

In polarised political environments, citizens are confronted with different deviating representations of reality and therefore it becomes increasingly difficult for them to distinguish between false and correct information. Thus, societal polarisation is likely to decrease resilience to online disinformation. Moreover, research has shown that both populism and partisan disinformation share a binary Manichaeanworldview, comprising anti-elitism, mistrust of expert knowledge and a belief in conspiracy theories. As a consequence of these combined influences, citizens can obtain inaccurate perceptions of reality. Thus, in environments with high levels of populist communication, online users are exposed to more disinformation.

Another condition that has been linked to resilience to online disinformation in previous research is trust in news media. Previous research has shown that in environments in which distrust in news media is higher, people are less likely to be exposed to a variety of sources of political information and to critically evaluate those. In this vein,the level of knowledge that people gain is likely to play an important role when confronted with online disinformation. Research has shown that in countries with wide-reaching public service media, citizens’ knowledge about public affairs is higher compared to countries with marginalised public service media. Therefore, it can be assumed that environments with weak public broadcasting services (PBS) are less resilient to online disinformation….

Looking at the economic environment, false social media content is often produced in pursuit of advertising revenue, as was the case with the Macedonian ‘fake news factories’ during the 2016 US presidential election. It is especially appealing for producers to publish this kind of content if the potential readership is large. Thus, large-size advertising markets with a high number of potential users are less resistant to disinformation than smaller-size markets….(More)”.

Disinformation is particularly prevalent on social media and in countries with very many social media users, it is easier for rumour-spreaders to build partisan follower networks. Moreover, it has been found that a media diet mainly consisting of news from social media limits political learning and leads to less knowledge of public affairs compared to other media source. From this, societies with a high rate of social media users are more vulnerable to online disinformation spreading rapidly than other societies…(More)”.