Tear down this wall: Microsoft embraces open data


The Economist: “Two decades ago Microsoft was a byword for a technological walled garden. One of its bosses called free open-source programs a “cancer”. That was then. On April 21st the world’s most valuable tech firm joined a fledgling movement to liberate the world’s data. Among other things, the company plans to launch 20 data-sharing groups by 2022 and give away some of its digital information, including data it has aggregated on covid-19.

Microsoft is not alone in its newfound fondness for sharing in the age of the coronavirus. “The world has faced pandemics before, but this time we have a new superpower: the ability to gather and share data for good,” Mark Zuckerberg, the boss of Facebook, a social-media conglomerate, wrote in the Washington Post on April 20th. Despite the EU’s strict privacy rules, some Eurocrats now argue that data-sharing could speed up efforts to fight the coronavirus. 

But the argument for sharing data is much older than the virus. The OECD, a club mostly of rich countries, reckons that if data were more widely exchanged, many countries could enjoy gains worth between 1% and 2.5% of GDP. The estimate is based on heroic assumptions (such as putting a number on business opportunities created for startups). But economists agree that readier access to data is broadly beneficial, because data are “non-rivalrous”: unlike oil, say, they can be used and re-used without being depleted, for instance to power various artificial-intelligence algorithms at once. 

Many governments have recognised the potential. Cities from Berlin to San Francisco have “open data” initiatives. Companies have been cagier, says Stefaan Verhulst, who heads the Governance Lab at New York University, which studies such things. Firms worry about losing intellectual property, imperilling users’ privacy and hitting technical obstacles. Standard data formats (eg, JPEG images) can be shared easily, but much that a Facebook collects with its software would be meaningless to a Microsoft, even after reformatting. Less than half of the 113 “data collaboratives” identified by the lab involve corporations. Those that do, including initiatives by BBVA, a Spanish bank, and GlaxoSmithKline, a British drugmaker, have been small or limited in scope. 

Microsoft’s campaign is the most consequential by far. Besides encouraging more non-commercial sharing, the firm is developing software, licences and (with the Governance Lab and others) governance frameworks that permit firms to trade data or provide access to them without losing control. Optimists believe that the giant’s move could be to data what IBM’s embrace in the late 1990s of the Linux operating system was to open-source software. Linux went on to become a serious challenger to Microsoft’s own Windows and today underpins Google’s Android mobile software and much of cloud-computing…(More)”.

The global pandemic has spawned new forms of activism – and they’re flourishing


Erica Chenoweth, Austin Choi-Fitzpatrick, Jeremy Pressman, Felipe G Santos and Jay Ulfelder at The Guardian: “Before the Covid-19 pandemic, the world was experiencing unprecedented levels of mass mobilization. The decade from 2010 to 2019 saw more mass movements demanding radical change around the world than in any period since World War II. Since the pandemic struck, however, street mobilization – mass demonstrations, rallies, protests, and sit-ins – has largely ground to an abrupt halt in places as diverse as India, Lebanon, Chile, Hong Kong, Iraq, Algeria, and the United States.

The near cessation of street protests does not mean that people power has dissipated. We have been collecting data on the various methods that people have used to express solidarity or adapted to press for change in the midst of this crisis. In just several weeks’ time, we’ve identified nearly 100 distinct methods of nonviolent action that include physical, virtual and hybrid actions – and we’re still counting. Far from condemning social movements to obsolescence, the pandemic – and governments’ responses to it – are spawning new tools, new strategies, and new motivation to push for change.

In terms of new tools, all across the world, people have turned to methods like car caravanscacerolazos (collectively banging pots and pans inside the home), and walkouts from workplaces with health and safety challenges to voice personal concerns, make political claims, and express social solidarity. Activists have developed alternative institutions such as coordinated mask-sewing, community mutual aid pods, and crowdsourced emergency funds. Communities have placed teddy bears in their front windows for children to find during scavenger hunts, authors have posted live-streamed readings, and musicians have performed from their balconies and rooftops. Technologists are experimenting with drones adapted to deliver supplies, disinfect common areas, check individual temperatures, and monitor high-risk areas. And, of course, many movements are moving their activities online, with digital ralliesteachins, and information-sharing.

Such activities have had important impacts. Perhaps the most immediate and life-saving efforts have been those where movements have begun to coordinate and distribute critical resources to people in need. Local mutual aid pods, like those in Massachusetts, have emerged to highlight urgent needs and provide for crowdsourced and volunteer rapid response. Pop-up food banks, reclaiming vacant housing, crowdsourced hardship funds, free online medical-consultation clinics, mass donations of surgical masks, gloves, gowns, goggles and sanitizer, and making masks at home are all methods that people have developed in the past several weeks. Most people have made these items by hand. Others have even used 3D printers to make urgently-needed medical supplies. These actions of movements and communities have already saved countless lives….(More)”.

The imperative of interpretable machines


Julia Stoyanovich, Jay J. Van Bavel & Tessa V. West at Nature: “As artificial intelligence becomes prevalent in society, a framework is needed to connect interpretability and trust in algorithm-assisted decisions, for a range of stakeholders.

We are in the midst of a global trend to regulate the use of algorithms, artificial intelligence (AI) and automated decision systems (ADS). As reported by the One Hundred Year Study on Artificial Intelligence: “AI technologies already pervade our lives. As they become a central force in society, the field is shifting from simply building systems that are intelligent to building intelligent systems that are human-aware and trustworthy.” Major cities, states and national governments are establishing task forces, passing laws and issuing guidelines about responsible development and use of technology, often starting with its use in government itself, where there is, at least in theory, less friction between organizational goals and societal values.

In the United States, New York City has made a public commitment to opening the black box of the government’s use of technology: in 2018, an ADS task force was convened, the first of such in the nation, and charged with providing recommendations to New York City’s government agencies for how to become transparent and accountable in their use of ADS. In a 2019 report, the task force recommended using ADS where they are beneficial, reduce potential harm and promote fairness, equity, accountability and transparency2. Can these principles become policy in the face of the apparent lack of trust in the government’s ability to manage AI in the interest of the public? We argue that overcoming this mistrust hinges on our ability to engage in substantive multi-stakeholder conversations around ADS, bringing with it the imperative of interpretability — allowing humans to understand and, if necessary, contest the computational process and its outcomes.

Remarkably little is known about how humans perceive and evaluate algorithms and their outputs, what makes a human trust or mistrust an algorithm3, and how we can empower humans to exercise agency — to adopt or challenge an algorithmic decision. Consider, for example, scoring and ranking — data-driven algorithms that prioritize entities such as individuals, schools, or products and services. These algorithms may be used to determine credit worthiness, and desirability for college admissions or employment. Scoring and ranking are as ubiquitous and powerful as they are opaque. Despite their importance, members of the public often know little about why one person is ranked higher than another by a résumé screening or a credit scoring tool, how the ranking process is designed and whether its results can be trusted.

As an interdisciplinary team of scientists in computer science and social psychology, we propose a framework that forms connections between interpretability and trust, and develops actionable explanations for a diversity of stakeholders, recognizing their unique perspectives and needs. We focus on three questions (Box 1) about making machines interpretable: (1) what are we explaining, (2) to whom are we explaining and for what purpose, and (3) how do we know that an explanation is effective? By asking — and charting the path towards answering — these questions, we can promote greater trust in algorithms, and improve fairness and efficiency of algorithm-assisted decision making…(More)”.

How Facebook and Google are helping the CDC forecast coronavirus


Karen Hao at MIT Technology Review: “When it comes to predicting the spread of an infectious disease, it’s crucial to understand what Ryan Tibshirani, an associate professor at Carnegie Mellon University, calls the “the pyramid of severity.” The bottom of the pyramid is asymptomatic carriers (those who have the infection but feel fine); the next level is symptomatic carriers (those who are feeling ill); then come hospitalizations, critical hospitalizations, and finally deaths.

Every level of the pyramid has a clear relationship to the next: “For example, sadly, it’s pretty predictable how many people will die once you know how many people are under critical care,” says Tibshirani, who is part of CMU’s Delphi research group, one of the best flu-forecasting teams in the US. The goal, therefore, is to have a clear measure of the lower levels of the pyramid, as the foundation for forecasting the higher ones.

But in the US, building such a model is a Herculean task. A lack of testing makes it impossible to assess the number of asymptomatic carriers. The results also don’t accurately reflect how many symptomatic carriers there are. Different counties have different testing requirements—some choosing only to test patients who require hospitalization. Test results also often take upwards of a week to return.

The remaining option is to measure symptomatic carriers through a large-scale, self-reported survey. But such an initiative won’t work unless it covers a big enough cross section of the entire population. Now the Delphi group, which has been working with the Centers for Disease Control and Prevention to help it coordinate the national pandemic response, has turned to the largest platforms in the US: Facebook and Google.

Facebook will help CMU Delphi research group gather data about Covid symptoms

In a new partnership with Delphi, both tech giants have agreed to help gather data from those who voluntarily choose to report whether they’re experiencing covid-like symptoms. Facebook will target a fraction of their US users with a CMU-run survey, while Google has thus far been using its Opinion Rewards app, which lets users respond to questions for app store credit. The hope is this new information will allow the lab to produce county-by-county projections that will help policymakers allocate resources more effectively.

Neither company will ever actually see the survey results; they’re merely pointing users to the questions administered and processed by the lab. The lab will also never share any of the raw data back to either company. Still, the agreements represent a major deviation from typical data-sharing practices, which could raise privacy concerns. “If this wasn’t a pandemic, I don’t know that companies would want to take the risk of being associated with or asking directly for such a personal piece of information as health,” Tibshirani says.

Without such cooperation, the researchers would’ve been hard pressed to find the data anywhere else. Several other apps allow users to self-report symptoms, including a popular one in the UK known as the Covid Symptom Tracker that has been downloaded over 1.5 million times. But none of them offer the same systematic and expansive coverage as a Facebook or Google-administered survey, says Tibshirani. He hopes the project will collect millions of responses each week….(More)”.

Tracking coronavirus: big data and the challenge to privacy


Nic Fildes and Javier Espinoza at the Financial Times: “When the World Health Organization launched a 2007 initiative to eliminate malaria on Zanzibar, it turned to an unusual source to track the spread of the disease between the island and mainland Africa: mobile phones sold by Tanzania’s telecoms groups including Vodafone, the UK mobile operator.

Working together with researchers at Southampton university, Vodafone began compiling sets of location data from mobile phones in the areas where cases of the disease had been recorded. 

Mapping how populations move between locations has proved invaluable in tracking and responding to epidemics. The Zanzibar project has been replicated by academics across the continent to monitor other deadly diseases, including Ebola in west Africa….

With much of Europe at a standstill as a result of the coronavirus pandemic, politicians want the telecoms operators to provide similar data from smartphones. Thierry Breton, the former chief executive of France Telecom who is now the European commissioner for the internal market, has called on operators to hand over aggregated location data to track how the virus is spreading and to identify spots where help is most needed.

Both politicians and the industry insist that the data sets will be “anonymised”, meaning that customers’ individual identities will be scrubbed out. Mr Breton told the Financial Times: “In no way are we going to track individuals. That’s absolutely not the case. We are talking about fully anonymised, aggregated data to anticipate the development of the pandemic.”

But the use of such data to track the virus has triggered fears of growing surveillance, including questions about how the data might be used once the crisis is over and whether such data sets are ever truly anonymous….(More)”.

Coronavirus: country comparisons are pointless unless we account for these biases in testing


Norman Fenton, Magda Osman, Martin Neil, and Scott McLachlan at The Conversation: “Suppose we wanted to estimate how many car owners there are in the UK and how many of those own a Ford Fiesta, but we only have data on those people who visited Ford car showrooms in the last year. If 10% of the showroom visitors owned a Fiesta, then, because of the bias in the sample, this would certainly overestimate the proportion of Ford Fiesta owners in the country.

Estimating death rates for people with COVID-19 is currently undertaken largely along the same lines. In the UK, for example, almost all testing of COVID-19 is performed on people already hospitalised with COVID-19 symptoms. At the time of writing, there are 29,474 confirmed COVID-19 cases (analogous to car owners visiting a showroom) of whom 2,352 have died (Ford Fiesta owners who visited a showroom). But it misses out all the people with mild or no symptoms.

Concluding that the death rate from COVID-19 is on average 8% (2,352 out of 29,474) ignores the many people with COVID-19 who are not hospitalised and have not died (analogous to car owners who did not visit a Ford showroom and who do not own a Ford Fiesta). It is therefore equivalent to making the mistake of concluding that 10% of all car owners own a Fiesta.

There are many prominent examples of this sort of conclusion. The Oxford COVID-19 Evidence Service have undertaken a thorough statistical analysis. They acknowledge potential selection bias, and add confidence intervals showing how big the error may be for the (potentially highly misleading) proportion of deaths among confirmed COVID-19 patients.

They note various factors that can result in wide national differences – for example the UK’s 8% (mean) “death rate” is very high compared to Germany’s 0.74%. These factors include different demographics, for example the number of elderly in a population, as well as how deaths are reported. For example, in some countries everybody who dies after having been diagnosed with COVID-19 is recorded as a COVID-19 death, even if the disease was not the actual cause, while other people may die from the virus without actually having been diagnosed with COVID-19.

However, the models fail to incorporate explicit causal explanations in their modelling that might enable us to make more meaningful inferences from the available data, including data on virus testing.

What a causal model would look like. Author provided

We have developed an initial prototype “causal model” whose structure is shown in the figure above. The links between the named variables in a model like this show how they are dependent on each other. These links, along with other unknown variables, are captured as probabilities. As data are entered for specific, known variables, all of the unknown variable probabilities are updated using a method called Bayesian inference. The model shows that the COVID-19 death rate is as much a function of sampling methods, testing and reporting, as it is determined by the underlying rate of infection in a vulnerable population….(More)”

Covid-19 Changed How the World Does Science, Together


Matt Apuzzo and David D. Kirkpatrick at The New York Times: “…Normal imperatives like academic credit have been set aside. Online repositories make studies available months ahead of journals. Researchers have identified and shared hundreds of viral genome sequences. More than 200 clinical trials have been launched, bringing together hospitals and laboratories around the globe.

“I never hear scientists — true scientists, good quality scientists — speak in terms of nationality,” said Dr. Francesco Perrone, who is leading a coronavirus clinical trial in Italy. “My nation, your nation. My language, your language. My geographic location, your geographic location. This is something that is really distant from true top-level scientists.”

On a recent morning, for example, scientists at the University of Pittsburgh discovered that a ferret exposed to Covid-19 particles had developed a high fever — a potential advance toward animal vaccine testing. Under ordinary circumstances, they would have started work on an academic journal article.

“But you know what? There is going to be plenty of time to get papers published,” said Paul Duprex, a virologist leading the university’s vaccine research. Within two hours, he said, he had shared the findings with scientists around the world on a World Health Organization conference call. “It is pretty cool, right? You cut the crap, for lack of a better word, and you get to be part of a global enterprise.”…

Several scientists said the closest comparison to this moment might be the height of the AIDS epidemic in the 1990s, when scientists and doctors locked arms to combat the disease. But today’s technology and the pace of information-sharing dwarfs what was possible three decades ago.

As a practical matter, medical scientists today have little choice but to study the coronavirus if they want to work at all. Most other laboratory research has been put on hold because of social distancing, lockdowns or work-from-home restrictions.

The pandemic is also eroding the secrecy that pervades academic medical research, said Dr. Ryan Carroll, a Harvard Medical professor who is involved in the coronavirus trial there. Big, exclusive research can lead to grants, promotions and tenure, so scientists often work in secret, suspiciously hoarding data from potential competitors, he said.

“The ability to work collaboratively, setting aside your personal academic progress, is occurring right now because it’s a matter of survival,” he said….(More)”.

The 9/11 Playbook for Protecting Privacy


Adam Klein and Edward Felten at Politico: “Geolocation data—precise GPS coordinates or records of proximity to other devices, often collected by smartphone apps—is emerging as a critical tool for tracking potential spread. But other, more novel types of surveillance are already being contemplated for this first pandemic of the digital age. Body temperature readings from internet-connected thermometers are already being used at scale, but there are more exotic possibilities. Could smart-home devices be used to identify coughs of a timbre associated with Covid-19? Can facial recognition and remote temperature sensing be harnessed to identify likely carriers at a distance?

Weigh the benefits of each collection and use of data against the risks.

Each scenario will present a different level of privacy sensitivity, different collection mechanisms, different technical options affecting privacy, and varying potential value to health professionals, meaning there is no substitute for case-by-case judgment about whether the benefits of a particular use of data outweighs the risks.

The various ways to use location data, for example, present vastly different levels of concern for privacy. Aggregated location data, which combines many individualized location trails to show broader trends, is possible with few privacy risks, using methods that ensure no individual’s location trail is reconstructable from released data. For that reason, governments should not seek individualized location trails for any application where aggregated data would suffice—for example, analyzing travel trends to predict future epidemic hotspots.

If authorities need to trace the movements of identifiable people, their location trails should be obtained on the basis of an individualized showing. Gathering from companies the location trails for all users—as the Israeli government does, according to news reports—would raise far greater privacy concerns.

Establish clear rules for how data can be used, retained, and shared.

Once data is collected, the focus shifts to what the government can do with it. In counterterrorism programs, detailed rules seek to reduce the effect on individual privacy by limiting how different types of data can be used, stored, and shared.

The most basic safeguard is deleting data when it is no longer needed. Keeping data longer than needed unnecessarily exposes it to data breaches, leaks, and other potential privacy harms. Any individualized location tracking should cease, and the data should be deleted, once the individual no longer presents a danger to public health.

Poland’s new tracking app for those exposed to the coronavirus illustrates why reasonable limits are essential. The Polish government plans to retain location data collected by the app for six years. It is hard to see a public-health justification for keeping the data that long. But the story also illustrates well how a failure to consider users’ privacy can undermine a program’s efficacy: the app’s onerous terms led at least one Polish citizen to refuse to download it….(More)”.

The War on Coronavirus Is Also a War on Paperwork


Article by Cass Sunstein: “As part of the war on coronavirus, U.S. regulators are taking aggressive steps against “sludge” – paperwork burdens and bureaucratic obstacles. This new battle front is aimed at eliminating frictions, or administrative barriers, that have been badly hurting doctors, nurses, hospitals, patients, and beneficiaries of essential public and private programs. 

Increasingly used in behavioral science, the term sludge refers to everything from form-filling requirements to time spent waiting in line to rules mandating in-person interviews imposed by both private and public sectors. Sometimes those burdens are justified – as, for example, when the Social Security Administration takes steps to ensure that those who receive benefits actually qualify for them. But far too often, sludge is imposed with little thought about its potentially devastating impact.

The coronavirus pandemic is concentrating the bureaucratic mind – and leading to impressive and brisk reforms. Consider a few examples. 

Under the Supplemental Nutrition Assistance Program (formerly known as food stamps), would-be beneficiaries have had to complete interviews before they are approved for benefits. In late March, the Department of Agriculture waived that requirement – and now gives states “blanket approval” to give out benefits to people who are entitled to them.

Early last week, the Internal Revenue Service announced that in order to qualify for payments under the Families First Coronavirus Response Act, people would have to file tax returns – even if they are Social Security recipients who typically don’t do that. The sludge would have ensured that many people never got money to which they were legally entitled. Under public pressure, the Department of Treasury reversed course – and said that Social Security recipients would receive the money automatically.

Some of the most aggressive sludge reduction efforts have come from the Department of Health and Human Services. Paperwork, reporting and auditing requirements are being eliminated. Importantly, dozens of medical services can now be provided through “telehealth.” 

In the department’s own words, the government “is allowing telehealth to fulfill many face-to-face visit requirements for clinicians to see their patients in inpatient rehabilitation facilities, hospice and home health.” 

In addition, Medicare will now pay laboratory technicians to travel to people’s homes to collect specimens for testing – thus eliminating the need for people to travel to health-care facilities for tests (and risk exposure to themselves or others). There are many other examples….(More)”.

Experts warn of privacy risk as US uses GPS to fight coronavirus spread


Alex Hern at The Guardian: “A transatlantic divide on how to use location data to fight coronavirus risks highlights the lack of safeguards for Americans’ personal data, academics and data scientists have warned.

The US Centers for Disease Control and Prevention (CDC) has turned to data provided by the mobile advertising industry to analyse population movements in the midst of the pandemic.

Owing to a lack of systematic privacy protections in the US, data collected by advertising companies is often extremely detailed: companies with access to GPS location data, such as weather apps or some e-commerce sites, have been known to sell that data on for ad targeting purposes. That data provides much more granular information on the location and movement of individuals than the mobile network data received by the UK government from carriers including O2 and BT.

While both datasets track individuals at the collection level, GPS data is accurate to within five metres, according to Yves-Alexandre de Montjoye, a data scientist at Imperial College, while mobile network data is accurate to 0.1km² in city centres and much less in less dense areas – the difference between locating an individual to their street and to a specific room in their home…

But, warns de Montjoye, such data is never truly anonymous. “The original data is pseudonymised, yet it is quite easy to reidentify someone. Knowing where someone was is enough to reidentify them 95% of the time, using mobile phone data. So there’s the privacy concern: you need to process the pseudonymised data, but the pseudonymised data can be reidentified. Most of the time, if done properly, the aggregates are aggregated, and cannot be de-anonymised.”

The data scientist points to successful attempts to use location data in tracking outbreaks of malaria in Kenya or dengue in Pakistan as proof that location data has use in these situations, but warns that trust will be hurt if data collected for modelling purposes is then “surreptitiously used to crack down on individuals not respecting quarantines or kept and used for unrelated purposes”….(More)”.