Experts warn of privacy risk as US uses GPS to fight coronavirus spread


Alex Hern at The Guardian: “A transatlantic divide on how to use location data to fight coronavirus risks highlights the lack of safeguards for Americans’ personal data, academics and data scientists have warned.

The US Centers for Disease Control and Prevention (CDC) has turned to data provided by the mobile advertising industry to analyse population movements in the midst of the pandemic.

Owing to a lack of systematic privacy protections in the US, data collected by advertising companies is often extremely detailed: companies with access to GPS location data, such as weather apps or some e-commerce sites, have been known to sell that data on for ad targeting purposes. That data provides much more granular information on the location and movement of individuals than the mobile network data received by the UK government from carriers including O2 and BT.

While both datasets track individuals at the collection level, GPS data is accurate to within five metres, according to Yves-Alexandre de Montjoye, a data scientist at Imperial College, while mobile network data is accurate to 0.1km² in city centres and much less in less dense areas – the difference between locating an individual to their street and to a specific room in their home…

But, warns de Montjoye, such data is never truly anonymous. “The original data is pseudonymised, yet it is quite easy to reidentify someone. Knowing where someone was is enough to reidentify them 95% of the time, using mobile phone data. So there’s the privacy concern: you need to process the pseudonymised data, but the pseudonymised data can be reidentified. Most of the time, if done properly, the aggregates are aggregated, and cannot be de-anonymised.”

The data scientist points to successful attempts to use location data in tracking outbreaks of malaria in Kenya or dengue in Pakistan as proof that location data has use in these situations, but warns that trust will be hurt if data collected for modelling purposes is then “surreptitiously used to crack down on individuals not respecting quarantines or kept and used for unrelated purposes”….(More)”.

Privacy Protection Key for Using Patient Data to Develop AI Tools


Article by  Jessica Kent: “Clinical data should be treated as a public good when used for research or artificial intelligence algorithm development, so long as patients’ privacy is protected, according to a report from the Radiological Society of North America (RSNA).

As artificial intelligence and machine learning are increasingly applied to medical imaging, bringing the potential for streamlined analysis and faster diagnoses, the industry still lacks a broad consensus on an ethical framework for sharing this data.

“Now that we have electronic access to clinical data and the data processing tools, we can dramatically accelerate our ability to gain understanding and develop new applications that can benefit patients and populations,” said study lead author David B. Larson, MD, MBA, from the Stanford University School of Medicine. “But unsettled questions regarding the ethical use of the data often preclude the sharing of that information.”

To offer solutions around data sharing for AI development, RSNA developed a framework that highlights how to ethically use patient data for secondary purposes.

“Medical data, which are simply recorded observations, are acquired for the purposes of providing patient care,” Larson said….(More)”

Unpredictable Residency during the COVID-19 Pandemic Spells Trouble for the 2020 Census Count


Blog by Diana Elliott and Robert Santos: “Social distancing measures to curtail the community spread of COVID-19 have upended daily life. Just before lockdowns were implemented across the country, there was tremendous movement and migration of people relocating to different residences to shelter in place. This makes sense for the people involved but could be disastrous for the communities they fled and the final 2020 Census counts.

Pandemic-based migration undermines an accurate count

The 2020 Census, like most data collected by the US Census Bureau, is residence based. In the years leading up to 2020, the US Census Bureau worked diligently on the quality of the Master Address File, or the catalog of all residential addresses in the country. Staff account for newly built housing developments and buildings, apartment units or accessory dwelling units that are used as permanent residences, and the demolition of homes and apartments in the past decade. Census materials are sent to an address, rather than a person.

Most residences across America have already received their 2020 Census invitation. Whether completed online, by paper, by phone, or in person, the first official question on the 2020 Census questionnaire is “How many people were living or staying in this house, apartment, or mobile home on April 1, 2020?” Households are expected to answer this based on the concept of “usual residence,” or the place where a person lives and sleeps most of the time.

Despite written guidance provided on the 2020 Census on how to answer this question, doing so may be wrought with complexities and nuance from the pandemic.

First, research reveals that respondents do not often read questionnaire instructions; they dive in and start answering. With many people scrambling to other counties, cities, and states to hunker down for the long haul with loved ones, this will lead to incorrect counts when people are counted at temporary addresses.

Second, for many, the concept of “usual residence” has little relevance in the uncertainty unfolding during the COVID-19 pandemic. What if your temporary address becomes your permanent address? What does “usual residence” mean during a global epidemic that could stretch for 18 months or more? And perhaps more importantly, what should it mean?

Finally, there is the added complication of census operational delays (PDF). Self-response to the 2020 Census has been extended into August, as have the nonresponse follow-up efforts, when enumerators knock on the doors of those who haven’t yet answered the census. Additional delays seem unavoidable. The longer the delay, the more time there is for people who have not yet completed a census form to realize their temporary plan has evolved into a state of permanence….(More)”.

Researchers Develop Faster Way to Replace Bad Data With Accurate Information


NCSU Press Release: “Researchers from North Carolina State University and the Army Research Office have demonstrated a new model of how competing pieces of information spread in online social networks and the Internet of Things (IoT). The findings could be used to disseminate accurate information more quickly, displacing false information about anything from computer security to public health….

In their paper, the researchers show that a network’s size plays a significant role in how quickly “good” information can displace “bad” information. However, a large network is not necessarily better or worse than a small one. Instead, the speed at which good data travels is primarily affected by the network’s structure.

A highly interconnected network can disseminate new data very quickly. And the larger the network, the faster the new data will travel.

However, in networks that are connected primarily by a limited number of key nodes, those nodes serve as bottlenecks. As a result, the larger this type of network is, the slower the new data will travel.

The researchers also identified an algorithm that can be used to assess which point in a network would allow you to spread new data throughout the network most quickly.

“Practically speaking, this could be used to ensure that an IoT network purges old data as quickly as possible and is operating with new, accurate data,” Wenye Wang says.

“But these findings are also applicable to online social networks, and could be used to facilitate the spread of accurate information regarding subjects that affect the public,” says Jie Wang. “For example, we think it could be used to combat misinformation online.”…(More)”

Full paper: “Modeling and Analysis of Conflicting Information Propagation in a Finite Time Horizon,”

The Fate of the News in the Age of the Coronavirus


Michael Luo at the New Yorker: “The shift to paywalls has been a boon for quality journalism. Instead of chasing trends on search engines and social media, subscription-based publications can focus on producing journalism worth paying for, which has meant investments in original reporting of all kinds. A small club of élite publications has now found a sustainable way to support its journalism, through readers instead of advertisers. The Times and the Post, in particular, have thrived in the Trump era. So have subscription-driven startups, such as The Information, which covers the tech industry and charges three hundred and ninety-nine dollars a year. Meanwhile, many of the free-to-read outlets still dependent on ad revenue—including former darlings of the digital-media revolution, such as BuzzFeed, Vice, HuffPost, Mic, Mashable, and the titles under Vox Media—have labored to find viable business models.

Many of these companies attracted hundreds of millions of dollars in venture funding, and built sizable newsrooms. Even so, they’ve struggled to succeed as businesses, in part because Google and Facebook take in the bulk of the revenue derived from digital advertising. Some sites have been forced to shutter; others have slashed their staffs and scaled back their journalistic ambitions. There are free digital news sites that continue to attract outsized audiences: CNN and Fox News, for instance, each draw well over a hundred million visitors a month. But the news on these sites tends to be commodified. Velocity is the priority, not complexity and depth.

A robust, independent press is widely understood to be an essential part of a functioning democracy. It helps keep citizens informed; it also serves as a bulwark against the rumors, half-truths, and propaganda that are rife on digital platforms. It’s a problem, therefore, when the majority of the highest-quality journalism is behind a paywall. In recent weeks, recognizing the value of timely, fact-based news during a pandemic, the TimesThe Atlantic, the Wall Street Journal, the Washington Post, and other publications—including The New Yorker—have lowered their paywalls for portions of their coronavirus coverage. But it’s unclear how long publishers will stay committed to keeping their paywalls down, as the state of emergency stretches on. The coronavirus crisis promises to engulf every aspect of society, leading to widespread economic dislocations and social disruptions that will test our political processes and institutions in ways far beyond the immediate public-health threat. With the misinformation emanating from the Trump White House, the need for reliable, widely-accessible information and facts is more urgent than ever. Yet the economic shutdown created by the spread of covid-19 promises to decimate advertising revenue, which could doom more digital news outlets and local newspapers.

It’s easy to underestimate the information imbalance in American society. After all, “information” has never felt more easily available. A few keyboard strokes on an Internet search engine instantly connects us to unlimited digital content. On Facebook, Instagram, and other social-media platforms, people who might not be intentionally looking for news encounter it, anyway. And yet the apparent ubiquity of news and information is misleading. Between 2004 and 2018, nearly one in five American newspapers closed; in that time, print newsrooms have shed nearly half of their employees. Digital-native publishers employ just a fraction of the diminished number of journalists who still remain at legacy outlets, and employment in broadcast-TV newsrooms trails that of newspapers. On some level, news is a product manufactured by journalists. Fewer journalists means less news. The tributaries that feed the river of information have been drying up. There are a few mountain springs of quality journalism; most sit behind a paywall.

A report released last year by the Reuters Institute for the Study of Journalism maps the divide that is emerging among news readers. The proportion of people in the United States who pay for online news remains small: just sixteen per cent. Those readers tend to be wealthier, and are more likely to have college degrees; they are also significantly more likely to find news trustworthy. Disparities in the level of trust that people have in their news diets, the data suggests, are likely driven by the quality of the news they are consuming….(More)”.

A Closer Look at Location Data: Privacy and Pandemics


Assessment by Stacey Gray: “In light of COVID-19, there is heightened global interest in harnessing location data held by major tech companies to track individuals affected by the virus, better understand the effectiveness of social distancing, or send alerts to individuals who might be affected based on their previous proximity to known cases. Governments around the world are considering whether and how to use mobile location data to help contain the virus: Israel’s government passed emergency regulations to address the crisis using cell phone location data; the European Commission requested that mobile carriers provide anonymized and aggregate mobile location data; and South Korea has created a publicly available map of location data from individuals who have tested positive. 

Public health agencies and epidemiologists have long been interested in analyzing device location data to track diseases. In general, the movement of devices effectively mirrors movement of people (with some exceptions discussed below). However, its use comes with a range of ethical and privacy concerns. 

In order to help policymakers address these concerns, we provide below a brief explainer guide of the basics: (1) what is location data, (2) who holds it, and (3) how is it collected? Finally we discuss some preliminary ethical and privacy considerations for processing location data. Researchers and agencies should consider: how and in what context location data was collected; the fact and reasoning behind location data being classified as legally “sensitive” in most jurisdictions; challenges to effective “anonymization”; representativeness of the location dataset (taking into account potential bias and lack of inclusion of low-income and elderly subpopulations who do not own phones); and the unique importance of purpose limitation, or not re-using location data for other civil or law enforcement purposes after the pandemic is over….(More)”.

A controlled trial for reproducibility


Marc P. Raphael, Paul E. Sheehan & Gary J. Vora at Nature: “In 2016, the US Defense Advanced Research Projects Agency (DARPA) told eight research groups that their proposals had made it through the review gauntlet and would soon get a few million dollars from its Biological Technologies Office (BTO). Along with congratulations, the teams received a reminder that their award came with an unusual requirement — an independent shadow team of scientists tasked with reproducing their results.

Thus began an intense, multi-year controlled trial in reproducibility. Each shadow team consists of three to five researchers, who visit the ‘performer’ team’s laboratory and often host visits themselves. Between 3% and 8% of the programme’s total funds go to this independent validation and verification (IV&V) work. But DARPA has the flexibility and resources for such herculean efforts to assess essential techniques. In one unusual instance, an IV&V laboratory needed a sophisticated US$200,000 microscopy and microfluidic set-up to make an accurate assessment.

These costs are high, but we think they are an essential investment to avoid wasting taxpayers’ money and to advance fundamental research towards beneficial applications. Here, we outline what we’ve learnt from implementing this programme, and how it could be applied more broadly….(More)”.

The US lacks health information technologies to stop COVID-19 epidemic


Niam Yaraghi at Brookings: “The COVID-19 pandemic highlights the crucial importance of health information technology and data interoperability. The pandemic has shattered our common beliefs about the type and scope of health information exchange. It has shown us that the definition of health data should no longer be limited to medical data of patients and instead should encompass a much wider variety of data types from individuals’ online and offline activity. Moreover, the pandemic has proven that healthcare is not local. In an interconnected world, with more individuals traveling long distances than ever before, it is naïve to look at regions in isolation from each other and try to manage public health independently. To efficiently manage a pandemic like this, the scope of health information exchange efforts should not be limited to small geographical regions and instead should be done at least nationally, if not internationally.

HEALTH DATA SHOULD GO BEYOND MEDICAL RECORDS

A wide variety of factors affect one’s overall well-being, a very small fraction of which could be quantified via medical records. We tend to ignore this fact, and try to explain and predict a patient’s condition only based on medical data. Previously, we did not have the technology and knowledge to collect huge amounts of non-medical data and analyze it for healthcare purposes. Now, privacy concerns and outdated regulations have exacerbated the situation and has led to a fragmented data ecosystem. Interoperability, even among healthcare providers, remains a major challenge where exchange and analysis of non-medical data for healthcare purposes almost never happens….(More)”.

Will This Year’s Census Be the Last?


Jill Lepore at The New Yorker: “People have been counting people for thousands of years. Count everyone, beginning with babies who have teeth, decreed census-takers in China in the first millennium B.C.E., under the Zhou dynasty. “Take ye the sum of all the congregation of the children of Israel, after their families, by the house of their fathers, with the number of their names, every male by their polls,” God commands Moses in the Book of Numbers, describing a census, taken around 1500 B.C.E., that counted only men “twenty years old and upward, all that are able to go forth to war in Israel”—that is, potential conscripts.

Ancient rulers took censuses to measure and gather their strength: to muster armies and levy taxes. Who got counted depended on the purpose of the census. In the United States, which counts “the whole number of persons in each state,” the chief purpose of the census is to apportion representation in Congress. In 2018, Secretary of Commerce Wilbur Ross sought to add a question to the 2020 U.S. census that would have read, “Is this person a citizen of the United States?” Ross is a banker who specialized in bankruptcy before joining the Trump Administration; earlier, he had handled cases involving the insolvency of Donald Trump’s casinos. The Census Bureau objected to the question Ross proposed. Eighteen states, the District of Columbia, fifteen cities and counties, the United Conference of Mayors, and a coalition of non-governmental organizations filed a lawsuit, alleging that the question violated the Constitution.

Last year, United States District Court Judge Jesse Furman, in an opinion for the Southern District, found Ross’s attempt to add the citizenship question to be not only unlawful, and quite possibly unconstitutional, but also, given the way Ross went about trying to get it added to the census, an abuse of power. Furman wrote, “To conclude otherwise and let Secretary Ross’s decision stand would undermine the proposition—central to the rule of law—that ours is a ‘government of laws, and not of men.’ ” There is, therefore, no citizenship question on the 2020 census.

All this, though, may be by the bye, because the census, like most other institutions of democratic government, is under threat. Google and Facebook, after all, know a lot more about you, and about the population of the United States, or any other state, than does the U.S. Census Bureau or any national census agency. This year may be the last time that a census is taken door by door, form by form, or even click by click….

In the ancient world, rulers counted and collected information about people in order to make use of them, to extract their labor or their property. Facebook works the same way. “It was the great achievement of eighteenth- and nineteenth-century census-takers to break that nexus and persuade people—the public on one side and their colleagues in government on the other—that states could collect data on their citizens without using it against them,” Whitby writes. It is among the tragedies of the past century that this trust has been betrayed. But it will be the error of the next if people agree to be counted by unregulated corporations, rather than by democratic governments….(More)”.

Scraping the Web for Public Health Gains: Ethical Considerations from a ‘Big Data’ Research Project on HIV and Incarceration


Stuart Rennie, Mara Buchbinder, Eric Juengst, Lauren Brinkley-Rubinstein, and David L Rosen at Public Health Ethics: “Web scraping involves using computer programs for automated extraction and organization of data from the Web for the purpose of further data analysis and use. It is frequently used by commercial companies, but also has become a valuable tool in epidemiological research and public health planning. In this paper, we explore ethical issues in a project that “scrapes” public websites of U.S. county jails as part of an effort to develop a comprehensive database (including individual-level jail incarcerations, court records and confidential HIV records) to enhance HIV surveillance and improve continuity of care for incarcerated populations. We argue that the well-known framework of Emanuel et al. (2000) provides only partial ethical guidance for the activities we describe, which lie at a complex intersection of public health research and public health practice. We suggest some ethical considerations from the ethics of public health practice to help fill gaps in this relatively unexplored area….(More)”.