The imperative of interpretable machines


Julia Stoyanovich, Jay J. Van Bavel & Tessa V. West at Nature: “As artificial intelligence becomes prevalent in society, a framework is needed to connect interpretability and trust in algorithm-assisted decisions, for a range of stakeholders.

We are in the midst of a global trend to regulate the use of algorithms, artificial intelligence (AI) and automated decision systems (ADS). As reported by the One Hundred Year Study on Artificial Intelligence: “AI technologies already pervade our lives. As they become a central force in society, the field is shifting from simply building systems that are intelligent to building intelligent systems that are human-aware and trustworthy.” Major cities, states and national governments are establishing task forces, passing laws and issuing guidelines about responsible development and use of technology, often starting with its use in government itself, where there is, at least in theory, less friction between organizational goals and societal values.

In the United States, New York City has made a public commitment to opening the black box of the government’s use of technology: in 2018, an ADS task force was convened, the first of such in the nation, and charged with providing recommendations to New York City’s government agencies for how to become transparent and accountable in their use of ADS. In a 2019 report, the task force recommended using ADS where they are beneficial, reduce potential harm and promote fairness, equity, accountability and transparency2. Can these principles become policy in the face of the apparent lack of trust in the government’s ability to manage AI in the interest of the public? We argue that overcoming this mistrust hinges on our ability to engage in substantive multi-stakeholder conversations around ADS, bringing with it the imperative of interpretability — allowing humans to understand and, if necessary, contest the computational process and its outcomes.

Remarkably little is known about how humans perceive and evaluate algorithms and their outputs, what makes a human trust or mistrust an algorithm3, and how we can empower humans to exercise agency — to adopt or challenge an algorithmic decision. Consider, for example, scoring and ranking — data-driven algorithms that prioritize entities such as individuals, schools, or products and services. These algorithms may be used to determine credit worthiness, and desirability for college admissions or employment. Scoring and ranking are as ubiquitous and powerful as they are opaque. Despite their importance, members of the public often know little about why one person is ranked higher than another by a résumé screening or a credit scoring tool, how the ranking process is designed and whether its results can be trusted.

As an interdisciplinary team of scientists in computer science and social psychology, we propose a framework that forms connections between interpretability and trust, and develops actionable explanations for a diversity of stakeholders, recognizing their unique perspectives and needs. We focus on three questions (Box 1) about making machines interpretable: (1) what are we explaining, (2) to whom are we explaining and for what purpose, and (3) how do we know that an explanation is effective? By asking — and charting the path towards answering — these questions, we can promote greater trust in algorithms, and improve fairness and efficiency of algorithm-assisted decision making…(More)”.

How Facebook and Google are helping the CDC forecast coronavirus


Karen Hao at MIT Technology Review: “When it comes to predicting the spread of an infectious disease, it’s crucial to understand what Ryan Tibshirani, an associate professor at Carnegie Mellon University, calls the “the pyramid of severity.” The bottom of the pyramid is asymptomatic carriers (those who have the infection but feel fine); the next level is symptomatic carriers (those who are feeling ill); then come hospitalizations, critical hospitalizations, and finally deaths.

Every level of the pyramid has a clear relationship to the next: “For example, sadly, it’s pretty predictable how many people will die once you know how many people are under critical care,” says Tibshirani, who is part of CMU’s Delphi research group, one of the best flu-forecasting teams in the US. The goal, therefore, is to have a clear measure of the lower levels of the pyramid, as the foundation for forecasting the higher ones.

But in the US, building such a model is a Herculean task. A lack of testing makes it impossible to assess the number of asymptomatic carriers. The results also don’t accurately reflect how many symptomatic carriers there are. Different counties have different testing requirements—some choosing only to test patients who require hospitalization. Test results also often take upwards of a week to return.

The remaining option is to measure symptomatic carriers through a large-scale, self-reported survey. But such an initiative won’t work unless it covers a big enough cross section of the entire population. Now the Delphi group, which has been working with the Centers for Disease Control and Prevention to help it coordinate the national pandemic response, has turned to the largest platforms in the US: Facebook and Google.

Facebook will help CMU Delphi research group gather data about Covid symptoms

In a new partnership with Delphi, both tech giants have agreed to help gather data from those who voluntarily choose to report whether they’re experiencing covid-like symptoms. Facebook will target a fraction of their US users with a CMU-run survey, while Google has thus far been using its Opinion Rewards app, which lets users respond to questions for app store credit. The hope is this new information will allow the lab to produce county-by-county projections that will help policymakers allocate resources more effectively.

Neither company will ever actually see the survey results; they’re merely pointing users to the questions administered and processed by the lab. The lab will also never share any of the raw data back to either company. Still, the agreements represent a major deviation from typical data-sharing practices, which could raise privacy concerns. “If this wasn’t a pandemic, I don’t know that companies would want to take the risk of being associated with or asking directly for such a personal piece of information as health,” Tibshirani says.

Without such cooperation, the researchers would’ve been hard pressed to find the data anywhere else. Several other apps allow users to self-report symptoms, including a popular one in the UK known as the Covid Symptom Tracker that has been downloaded over 1.5 million times. But none of them offer the same systematic and expansive coverage as a Facebook or Google-administered survey, says Tibshirani. He hopes the project will collect millions of responses each week….(More)”.

Tracking coronavirus: big data and the challenge to privacy


Nic Fildes and Javier Espinoza at the Financial Times: “When the World Health Organization launched a 2007 initiative to eliminate malaria on Zanzibar, it turned to an unusual source to track the spread of the disease between the island and mainland Africa: mobile phones sold by Tanzania’s telecoms groups including Vodafone, the UK mobile operator.

Working together with researchers at Southampton university, Vodafone began compiling sets of location data from mobile phones in the areas where cases of the disease had been recorded. 

Mapping how populations move between locations has proved invaluable in tracking and responding to epidemics. The Zanzibar project has been replicated by academics across the continent to monitor other deadly diseases, including Ebola in west Africa….

With much of Europe at a standstill as a result of the coronavirus pandemic, politicians want the telecoms operators to provide similar data from smartphones. Thierry Breton, the former chief executive of France Telecom who is now the European commissioner for the internal market, has called on operators to hand over aggregated location data to track how the virus is spreading and to identify spots where help is most needed.

Both politicians and the industry insist that the data sets will be “anonymised”, meaning that customers’ individual identities will be scrubbed out. Mr Breton told the Financial Times: “In no way are we going to track individuals. That’s absolutely not the case. We are talking about fully anonymised, aggregated data to anticipate the development of the pandemic.”

But the use of such data to track the virus has triggered fears of growing surveillance, including questions about how the data might be used once the crisis is over and whether such data sets are ever truly anonymous….(More)”.

Coronavirus: country comparisons are pointless unless we account for these biases in testing


Norman Fenton, Magda Osman, Martin Neil, and Scott McLachlan at The Conversation: “Suppose we wanted to estimate how many car owners there are in the UK and how many of those own a Ford Fiesta, but we only have data on those people who visited Ford car showrooms in the last year. If 10% of the showroom visitors owned a Fiesta, then, because of the bias in the sample, this would certainly overestimate the proportion of Ford Fiesta owners in the country.

Estimating death rates for people with COVID-19 is currently undertaken largely along the same lines. In the UK, for example, almost all testing of COVID-19 is performed on people already hospitalised with COVID-19 symptoms. At the time of writing, there are 29,474 confirmed COVID-19 cases (analogous to car owners visiting a showroom) of whom 2,352 have died (Ford Fiesta owners who visited a showroom). But it misses out all the people with mild or no symptoms.

Concluding that the death rate from COVID-19 is on average 8% (2,352 out of 29,474) ignores the many people with COVID-19 who are not hospitalised and have not died (analogous to car owners who did not visit a Ford showroom and who do not own a Ford Fiesta). It is therefore equivalent to making the mistake of concluding that 10% of all car owners own a Fiesta.

There are many prominent examples of this sort of conclusion. The Oxford COVID-19 Evidence Service have undertaken a thorough statistical analysis. They acknowledge potential selection bias, and add confidence intervals showing how big the error may be for the (potentially highly misleading) proportion of deaths among confirmed COVID-19 patients.

They note various factors that can result in wide national differences – for example the UK’s 8% (mean) “death rate” is very high compared to Germany’s 0.74%. These factors include different demographics, for example the number of elderly in a population, as well as how deaths are reported. For example, in some countries everybody who dies after having been diagnosed with COVID-19 is recorded as a COVID-19 death, even if the disease was not the actual cause, while other people may die from the virus without actually having been diagnosed with COVID-19.

However, the models fail to incorporate explicit causal explanations in their modelling that might enable us to make more meaningful inferences from the available data, including data on virus testing.

What a causal model would look like. Author provided

We have developed an initial prototype “causal model” whose structure is shown in the figure above. The links between the named variables in a model like this show how they are dependent on each other. These links, along with other unknown variables, are captured as probabilities. As data are entered for specific, known variables, all of the unknown variable probabilities are updated using a method called Bayesian inference. The model shows that the COVID-19 death rate is as much a function of sampling methods, testing and reporting, as it is determined by the underlying rate of infection in a vulnerable population….(More)”

Covid-19 Changed How the World Does Science, Together


Matt Apuzzo and David D. Kirkpatrick at The New York Times: “…Normal imperatives like academic credit have been set aside. Online repositories make studies available months ahead of journals. Researchers have identified and shared hundreds of viral genome sequences. More than 200 clinical trials have been launched, bringing together hospitals and laboratories around the globe.

“I never hear scientists — true scientists, good quality scientists — speak in terms of nationality,” said Dr. Francesco Perrone, who is leading a coronavirus clinical trial in Italy. “My nation, your nation. My language, your language. My geographic location, your geographic location. This is something that is really distant from true top-level scientists.”

On a recent morning, for example, scientists at the University of Pittsburgh discovered that a ferret exposed to Covid-19 particles had developed a high fever — a potential advance toward animal vaccine testing. Under ordinary circumstances, they would have started work on an academic journal article.

“But you know what? There is going to be plenty of time to get papers published,” said Paul Duprex, a virologist leading the university’s vaccine research. Within two hours, he said, he had shared the findings with scientists around the world on a World Health Organization conference call. “It is pretty cool, right? You cut the crap, for lack of a better word, and you get to be part of a global enterprise.”…

Several scientists said the closest comparison to this moment might be the height of the AIDS epidemic in the 1990s, when scientists and doctors locked arms to combat the disease. But today’s technology and the pace of information-sharing dwarfs what was possible three decades ago.

As a practical matter, medical scientists today have little choice but to study the coronavirus if they want to work at all. Most other laboratory research has been put on hold because of social distancing, lockdowns or work-from-home restrictions.

The pandemic is also eroding the secrecy that pervades academic medical research, said Dr. Ryan Carroll, a Harvard Medical professor who is involved in the coronavirus trial there. Big, exclusive research can lead to grants, promotions and tenure, so scientists often work in secret, suspiciously hoarding data from potential competitors, he said.

“The ability to work collaboratively, setting aside your personal academic progress, is occurring right now because it’s a matter of survival,” he said….(More)”.

The 9/11 Playbook for Protecting Privacy


Adam Klein and Edward Felten at Politico: “Geolocation data—precise GPS coordinates or records of proximity to other devices, often collected by smartphone apps—is emerging as a critical tool for tracking potential spread. But other, more novel types of surveillance are already being contemplated for this first pandemic of the digital age. Body temperature readings from internet-connected thermometers are already being used at scale, but there are more exotic possibilities. Could smart-home devices be used to identify coughs of a timbre associated with Covid-19? Can facial recognition and remote temperature sensing be harnessed to identify likely carriers at a distance?

Weigh the benefits of each collection and use of data against the risks.

Each scenario will present a different level of privacy sensitivity, different collection mechanisms, different technical options affecting privacy, and varying potential value to health professionals, meaning there is no substitute for case-by-case judgment about whether the benefits of a particular use of data outweighs the risks.

The various ways to use location data, for example, present vastly different levels of concern for privacy. Aggregated location data, which combines many individualized location trails to show broader trends, is possible with few privacy risks, using methods that ensure no individual’s location trail is reconstructable from released data. For that reason, governments should not seek individualized location trails for any application where aggregated data would suffice—for example, analyzing travel trends to predict future epidemic hotspots.

If authorities need to trace the movements of identifiable people, their location trails should be obtained on the basis of an individualized showing. Gathering from companies the location trails for all users—as the Israeli government does, according to news reports—would raise far greater privacy concerns.

Establish clear rules for how data can be used, retained, and shared.

Once data is collected, the focus shifts to what the government can do with it. In counterterrorism programs, detailed rules seek to reduce the effect on individual privacy by limiting how different types of data can be used, stored, and shared.

The most basic safeguard is deleting data when it is no longer needed. Keeping data longer than needed unnecessarily exposes it to data breaches, leaks, and other potential privacy harms. Any individualized location tracking should cease, and the data should be deleted, once the individual no longer presents a danger to public health.

Poland’s new tracking app for those exposed to the coronavirus illustrates why reasonable limits are essential. The Polish government plans to retain location data collected by the app for six years. It is hard to see a public-health justification for keeping the data that long. But the story also illustrates well how a failure to consider users’ privacy can undermine a program’s efficacy: the app’s onerous terms led at least one Polish citizen to refuse to download it….(More)”.

The War on Coronavirus Is Also a War on Paperwork


Article by Cass Sunstein: “As part of the war on coronavirus, U.S. regulators are taking aggressive steps against “sludge” – paperwork burdens and bureaucratic obstacles. This new battle front is aimed at eliminating frictions, or administrative barriers, that have been badly hurting doctors, nurses, hospitals, patients, and beneficiaries of essential public and private programs. 

Increasingly used in behavioral science, the term sludge refers to everything from form-filling requirements to time spent waiting in line to rules mandating in-person interviews imposed by both private and public sectors. Sometimes those burdens are justified – as, for example, when the Social Security Administration takes steps to ensure that those who receive benefits actually qualify for them. But far too often, sludge is imposed with little thought about its potentially devastating impact.

The coronavirus pandemic is concentrating the bureaucratic mind – and leading to impressive and brisk reforms. Consider a few examples. 

Under the Supplemental Nutrition Assistance Program (formerly known as food stamps), would-be beneficiaries have had to complete interviews before they are approved for benefits. In late March, the Department of Agriculture waived that requirement – and now gives states “blanket approval” to give out benefits to people who are entitled to them.

Early last week, the Internal Revenue Service announced that in order to qualify for payments under the Families First Coronavirus Response Act, people would have to file tax returns – even if they are Social Security recipients who typically don’t do that. The sludge would have ensured that many people never got money to which they were legally entitled. Under public pressure, the Department of Treasury reversed course – and said that Social Security recipients would receive the money automatically.

Some of the most aggressive sludge reduction efforts have come from the Department of Health and Human Services. Paperwork, reporting and auditing requirements are being eliminated. Importantly, dozens of medical services can now be provided through “telehealth.” 

In the department’s own words, the government “is allowing telehealth to fulfill many face-to-face visit requirements for clinicians to see their patients in inpatient rehabilitation facilities, hospice and home health.” 

In addition, Medicare will now pay laboratory technicians to travel to people’s homes to collect specimens for testing – thus eliminating the need for people to travel to health-care facilities for tests (and risk exposure to themselves or others). There are many other examples….(More)”.

Experts warn of privacy risk as US uses GPS to fight coronavirus spread


Alex Hern at The Guardian: “A transatlantic divide on how to use location data to fight coronavirus risks highlights the lack of safeguards for Americans’ personal data, academics and data scientists have warned.

The US Centers for Disease Control and Prevention (CDC) has turned to data provided by the mobile advertising industry to analyse population movements in the midst of the pandemic.

Owing to a lack of systematic privacy protections in the US, data collected by advertising companies is often extremely detailed: companies with access to GPS location data, such as weather apps or some e-commerce sites, have been known to sell that data on for ad targeting purposes. That data provides much more granular information on the location and movement of individuals than the mobile network data received by the UK government from carriers including O2 and BT.

While both datasets track individuals at the collection level, GPS data is accurate to within five metres, according to Yves-Alexandre de Montjoye, a data scientist at Imperial College, while mobile network data is accurate to 0.1km² in city centres and much less in less dense areas – the difference between locating an individual to their street and to a specific room in their home…

But, warns de Montjoye, such data is never truly anonymous. “The original data is pseudonymised, yet it is quite easy to reidentify someone. Knowing where someone was is enough to reidentify them 95% of the time, using mobile phone data. So there’s the privacy concern: you need to process the pseudonymised data, but the pseudonymised data can be reidentified. Most of the time, if done properly, the aggregates are aggregated, and cannot be de-anonymised.”

The data scientist points to successful attempts to use location data in tracking outbreaks of malaria in Kenya or dengue in Pakistan as proof that location data has use in these situations, but warns that trust will be hurt if data collected for modelling purposes is then “surreptitiously used to crack down on individuals not respecting quarantines or kept and used for unrelated purposes”….(More)”.

Why isn’t the government publishing more data about coronavirus deaths?


Article by Jeni Tennison: “Studying the past is futile in an unprecedented crisis. Science is the answer – and open-source information is paramount…Data is a necessary ingredient in day-to-day decision-making – but in this rapidly evolving situation, it’s especially vital. Everything has changed, almost overnight. Demands for foodtransport, and energy have been overhauled as more people stop travelling and work from home. Jobs have been lost in some sectors, and workers are desperately needed in others. Historic experience can no longer tell us how our society or economy is working. Past models hold little predictive power in an unprecedented situation. To know what is happening right now, we need up-to-date information….

This data is also crucial for scientists, who can use it to replicate and build upon each other’s work. Yet no open data has been published alongside the evidence for the UK government’s coronavirus response. While a model that informed the US government’s response is freely available as a Google spreadsheet, the Imperial College London model that prompted the current lockdown has still not been published as open-source code. Making data open – publishing it on the web, in spreadsheets, without restrictions on access – is the best way to ensure it can be used by the people who need it most.

There is currently no open data available on UK hospitalisation rates; no regional, age or gender breakdown of daily deaths. The more granular breakdown of registered deaths provided by the Office of National Statistics is only published on a weekly basis, and with a delay. It is hard to tell whether this data does not exist or the NHS has prioritised creating dashboards for government decision makers rather than informing the rest of the country. But the UK is making progress with regard to data: potential Covid-19 cases identified through online and call-centre triage are now being published daily by NHS Digital.

Of course, not all data should be open. Singapore has been publishing detailed data about every infected person, including their age, gender, workplace, where they have visited and whether they had contact with other infected people. This can both harm the people who are documented and incentivise others to lie to authorities, undermining the quality of data.

When people are concerned about how data about them is handled, they demand transparency. To retain our trust, governments need to be open about how data is collected and used, how it’s being shared, with whom, and for what purpose. Openness about the use of personal data to help tackle the Covid-19 crisis will become more pressing as governments seek to develop contact tracing apps and immunity passports….(More)”.

The Fate of the News in the Age of the Coronavirus


Michael Luo at the New Yorker: “The shift to paywalls has been a boon for quality journalism. Instead of chasing trends on search engines and social media, subscription-based publications can focus on producing journalism worth paying for, which has meant investments in original reporting of all kinds. A small club of élite publications has now found a sustainable way to support its journalism, through readers instead of advertisers. The Times and the Post, in particular, have thrived in the Trump era. So have subscription-driven startups, such as The Information, which covers the tech industry and charges three hundred and ninety-nine dollars a year. Meanwhile, many of the free-to-read outlets still dependent on ad revenue—including former darlings of the digital-media revolution, such as BuzzFeed, Vice, HuffPost, Mic, Mashable, and the titles under Vox Media—have labored to find viable business models.

Many of these companies attracted hundreds of millions of dollars in venture funding, and built sizable newsrooms. Even so, they’ve struggled to succeed as businesses, in part because Google and Facebook take in the bulk of the revenue derived from digital advertising. Some sites have been forced to shutter; others have slashed their staffs and scaled back their journalistic ambitions. There are free digital news sites that continue to attract outsized audiences: CNN and Fox News, for instance, each draw well over a hundred million visitors a month. But the news on these sites tends to be commodified. Velocity is the priority, not complexity and depth.

A robust, independent press is widely understood to be an essential part of a functioning democracy. It helps keep citizens informed; it also serves as a bulwark against the rumors, half-truths, and propaganda that are rife on digital platforms. It’s a problem, therefore, when the majority of the highest-quality journalism is behind a paywall. In recent weeks, recognizing the value of timely, fact-based news during a pandemic, the TimesThe Atlantic, the Wall Street Journal, the Washington Post, and other publications—including The New Yorker—have lowered their paywalls for portions of their coronavirus coverage. But it’s unclear how long publishers will stay committed to keeping their paywalls down, as the state of emergency stretches on. The coronavirus crisis promises to engulf every aspect of society, leading to widespread economic dislocations and social disruptions that will test our political processes and institutions in ways far beyond the immediate public-health threat. With the misinformation emanating from the Trump White House, the need for reliable, widely-accessible information and facts is more urgent than ever. Yet the economic shutdown created by the spread of covid-19 promises to decimate advertising revenue, which could doom more digital news outlets and local newspapers.

It’s easy to underestimate the information imbalance in American society. After all, “information” has never felt more easily available. A few keyboard strokes on an Internet search engine instantly connects us to unlimited digital content. On Facebook, Instagram, and other social-media platforms, people who might not be intentionally looking for news encounter it, anyway. And yet the apparent ubiquity of news and information is misleading. Between 2004 and 2018, nearly one in five American newspapers closed; in that time, print newsrooms have shed nearly half of their employees. Digital-native publishers employ just a fraction of the diminished number of journalists who still remain at legacy outlets, and employment in broadcast-TV newsrooms trails that of newspapers. On some level, news is a product manufactured by journalists. Fewer journalists means less news. The tributaries that feed the river of information have been drying up. There are a few mountain springs of quality journalism; most sit behind a paywall.

A report released last year by the Reuters Institute for the Study of Journalism maps the divide that is emerging among news readers. The proportion of people in the United States who pay for online news remains small: just sixteen per cent. Those readers tend to be wealthier, and are more likely to have college degrees; they are also significantly more likely to find news trustworthy. Disparities in the level of trust that people have in their news diets, the data suggests, are likely driven by the quality of the news they are consuming….(More)”.