A Closer Look at Location Data: Privacy and Pandemics


Assessment by Stacey Gray: “In light of COVID-19, there is heightened global interest in harnessing location data held by major tech companies to track individuals affected by the virus, better understand the effectiveness of social distancing, or send alerts to individuals who might be affected based on their previous proximity to known cases. Governments around the world are considering whether and how to use mobile location data to help contain the virus: Israel’s government passed emergency regulations to address the crisis using cell phone location data; the European Commission requested that mobile carriers provide anonymized and aggregate mobile location data; and South Korea has created a publicly available map of location data from individuals who have tested positive. 

Public health agencies and epidemiologists have long been interested in analyzing device location data to track diseases. In general, the movement of devices effectively mirrors movement of people (with some exceptions discussed below). However, its use comes with a range of ethical and privacy concerns. 

In order to help policymakers address these concerns, we provide below a brief explainer guide of the basics: (1) what is location data, (2) who holds it, and (3) how is it collected? Finally we discuss some preliminary ethical and privacy considerations for processing location data. Researchers and agencies should consider: how and in what context location data was collected; the fact and reasoning behind location data being classified as legally “sensitive” in most jurisdictions; challenges to effective “anonymization”; representativeness of the location dataset (taking into account potential bias and lack of inclusion of low-income and elderly subpopulations who do not own phones); and the unique importance of purpose limitation, or not re-using location data for other civil or law enforcement purposes after the pandemic is over….(More)”.

A controlled trial for reproducibility


Marc P. Raphael, Paul E. Sheehan & Gary J. Vora at Nature: “In 2016, the US Defense Advanced Research Projects Agency (DARPA) told eight research groups that their proposals had made it through the review gauntlet and would soon get a few million dollars from its Biological Technologies Office (BTO). Along with congratulations, the teams received a reminder that their award came with an unusual requirement — an independent shadow team of scientists tasked with reproducing their results.

Thus began an intense, multi-year controlled trial in reproducibility. Each shadow team consists of three to five researchers, who visit the ‘performer’ team’s laboratory and often host visits themselves. Between 3% and 8% of the programme’s total funds go to this independent validation and verification (IV&V) work. But DARPA has the flexibility and resources for such herculean efforts to assess essential techniques. In one unusual instance, an IV&V laboratory needed a sophisticated US$200,000 microscopy and microfluidic set-up to make an accurate assessment.

These costs are high, but we think they are an essential investment to avoid wasting taxpayers’ money and to advance fundamental research towards beneficial applications. Here, we outline what we’ve learnt from implementing this programme, and how it could be applied more broadly….(More)”.

The US lacks health information technologies to stop COVID-19 epidemic


Niam Yaraghi at Brookings: “The COVID-19 pandemic highlights the crucial importance of health information technology and data interoperability. The pandemic has shattered our common beliefs about the type and scope of health information exchange. It has shown us that the definition of health data should no longer be limited to medical data of patients and instead should encompass a much wider variety of data types from individuals’ online and offline activity. Moreover, the pandemic has proven that healthcare is not local. In an interconnected world, with more individuals traveling long distances than ever before, it is naïve to look at regions in isolation from each other and try to manage public health independently. To efficiently manage a pandemic like this, the scope of health information exchange efforts should not be limited to small geographical regions and instead should be done at least nationally, if not internationally.

HEALTH DATA SHOULD GO BEYOND MEDICAL RECORDS

A wide variety of factors affect one’s overall well-being, a very small fraction of which could be quantified via medical records. We tend to ignore this fact, and try to explain and predict a patient’s condition only based on medical data. Previously, we did not have the technology and knowledge to collect huge amounts of non-medical data and analyze it for healthcare purposes. Now, privacy concerns and outdated regulations have exacerbated the situation and has led to a fragmented data ecosystem. Interoperability, even among healthcare providers, remains a major challenge where exchange and analysis of non-medical data for healthcare purposes almost never happens….(More)”.

Will This Year’s Census Be the Last?


Jill Lepore at The New Yorker: “People have been counting people for thousands of years. Count everyone, beginning with babies who have teeth, decreed census-takers in China in the first millennium B.C.E., under the Zhou dynasty. “Take ye the sum of all the congregation of the children of Israel, after their families, by the house of their fathers, with the number of their names, every male by their polls,” God commands Moses in the Book of Numbers, describing a census, taken around 1500 B.C.E., that counted only men “twenty years old and upward, all that are able to go forth to war in Israel”—that is, potential conscripts.

Ancient rulers took censuses to measure and gather their strength: to muster armies and levy taxes. Who got counted depended on the purpose of the census. In the United States, which counts “the whole number of persons in each state,” the chief purpose of the census is to apportion representation in Congress. In 2018, Secretary of Commerce Wilbur Ross sought to add a question to the 2020 U.S. census that would have read, “Is this person a citizen of the United States?” Ross is a banker who specialized in bankruptcy before joining the Trump Administration; earlier, he had handled cases involving the insolvency of Donald Trump’s casinos. The Census Bureau objected to the question Ross proposed. Eighteen states, the District of Columbia, fifteen cities and counties, the United Conference of Mayors, and a coalition of non-governmental organizations filed a lawsuit, alleging that the question violated the Constitution.

Last year, United States District Court Judge Jesse Furman, in an opinion for the Southern District, found Ross’s attempt to add the citizenship question to be not only unlawful, and quite possibly unconstitutional, but also, given the way Ross went about trying to get it added to the census, an abuse of power. Furman wrote, “To conclude otherwise and let Secretary Ross’s decision stand would undermine the proposition—central to the rule of law—that ours is a ‘government of laws, and not of men.’ ” There is, therefore, no citizenship question on the 2020 census.

All this, though, may be by the bye, because the census, like most other institutions of democratic government, is under threat. Google and Facebook, after all, know a lot more about you, and about the population of the United States, or any other state, than does the U.S. Census Bureau or any national census agency. This year may be the last time that a census is taken door by door, form by form, or even click by click….

In the ancient world, rulers counted and collected information about people in order to make use of them, to extract their labor or their property. Facebook works the same way. “It was the great achievement of eighteenth- and nineteenth-century census-takers to break that nexus and persuade people—the public on one side and their colleagues in government on the other—that states could collect data on their citizens without using it against them,” Whitby writes. It is among the tragedies of the past century that this trust has been betrayed. But it will be the error of the next if people agree to be counted by unregulated corporations, rather than by democratic governments….(More)”.

Scraping the Web for Public Health Gains: Ethical Considerations from a ‘Big Data’ Research Project on HIV and Incarceration


Stuart Rennie, Mara Buchbinder, Eric Juengst, Lauren Brinkley-Rubinstein, and David L Rosen at Public Health Ethics: “Web scraping involves using computer programs for automated extraction and organization of data from the Web for the purpose of further data analysis and use. It is frequently used by commercial companies, but also has become a valuable tool in epidemiological research and public health planning. In this paper, we explore ethical issues in a project that “scrapes” public websites of U.S. county jails as part of an effort to develop a comprehensive database (including individual-level jail incarcerations, court records and confidential HIV records) to enhance HIV surveillance and improve continuity of care for incarcerated populations. We argue that the well-known framework of Emanuel et al. (2000) provides only partial ethical guidance for the activities we describe, which lie at a complex intersection of public health research and public health practice. We suggest some ethical considerations from the ethics of public health practice to help fill gaps in this relatively unexplored area….(More)”.

Techlash? America’s Growing Concern with Major Technology Companies


Press Release: “Just a few years ago, Americans were overwhelmingly optimistic about the power of new technologies to foster an informed and engaged society. More recently, however, that confidence has been challenged by emerging concerns over the role that internet and technology companies — especially social media — now play in our democracy.

A new Knight Foundation and Gallup study explores how much the landscape has shifted. This wide-ranging study confirms that, for Americans, the techlash is real, widespread, and bipartisan. From concerns about the spread of misinformation to election interference and data privacy, we’ve documented the deep pessimism of folks across the political spectrum who believe tech companies have too much power — and that they do more harm than good. 

Despite their shared misgivings, Americans are deeply divided on how best to address these challenges. This report explores the contours of the techlash in the context of the issues currently animating policy debates in Washington and Silicon Valley. Below are the main findings from the executive summary….

  • 77% of Americans say major internet and technology companies like Facebook, Google, Amazon and Apple have too muchpower.
  • Americans are equally divided among those who favor (50%) and oppose (49%) government intervention that would require internet and technology companies to break into smaller companies. 
  • Americans do not trust social media companies much (44%) or at all (40%) to make the right decisions about what content should or should not be allowed on online platforms.
  • However, they would still prefer the companies (55%) to make those decisions rather than the government (44%). …(More)

Milwaukee’s Amani Neighborhood Uses Data to Target Traffic Safety and Build Trust


Article by Kassie Scott: “People in Milwaukee’s Amani neighborhood are using data to identify safety issues and build relationships with the police. It’s a story of community-engaged research at its best.

In 2017, the Milwaukee Police Department received a grant under the federal Byrne Criminal Justice Innovation program, now called the Community Based Crime Reduction Program, whose purpose is to bridge the gap between practitioners and researchers and advance the use of data in making communities safer. Because of its close ties in the Amani neighborhood, the Dominican Center was selected to lead this initiative, known as the Amani Safety Initiative, and they partnered with local churches, the district attorney’s office, LISC-Milwaukee, and others. To support the effort with data and coaching, the police department contracted with Data You Can Use.

Together with Data You Can Use, the Amani Safety Initiative team first implemented a survey to gauge perceptions of public safety and police legitimacy. Neighborhood ambassadors were trained (and paid) to conduct the survey themselves, going door to door to gather the information from nearly 300 of their neighbors. The ambassadors shared these results with their neighborhood during what they called “data chats.” They also printed summary survey results on door hangers, which they distributed throughout the neighborhood.

Neighbors and community organizations were surprised by the survey results. Though violent crime and mistrust in the police were commonly thought to be the biggest issues, the data showed that residents were most concerned about traffic safety. Ultimately, residents decided to post slow-down signs in intersections.

This project stands out for letting the people in the neighborhood lead the way. Neighbors collected data, shared results, and took action. The partnership between neighbors, police, and local organizations shows how people can drive decision-making for their neighborhood.

The larger story is one of social cohesion and mutual trust. Through participating in the initiative and learning more about their neighborhood, Amani neighbors built stronger relationships with the police. The police began coming to neighborhood community meetings, which helped them build relationships with people in the community and understand the challenges they face….(More).

Is Your Data Being Collected? These Signs Will Tell You Where


Flavie Halais at Wired: “Alphabet’s Sidewalk Labs is testing icons that provide “digital transparency” when information is collected in public spaces….

As cities incorporate digital technologies into their landscapes, they face the challenge of informing people of the many sensors, cameras, and other smart technologies that surround them. Few people have the patience to read through the lengthy privacy notice on a website or smartphone app. So how can a city let them know how they’re being monitored?

Sidewalk Labs, the Google sister company that applies technology to urban problems, is taking a shot. Through a project called Digital Transparency in the Public Realm, or DTPR, the company is demonstrating a set of icons, to be displayed in public spaces, that shows where and what kinds of data are being collected. The icons are being tested as part Sidewalk Labs’ flagship project in Toronto, where it plans to redevelop a 12-acre stretch of the city’s waterfront. The signs would be displayed at each location where data would be collected—streets, parks, businesses, and courtyards.

Data collection is a core feature of the project, called Sidewalk Toronto, and the source of much of the controversy surrounding it. In 2017, Waterfront Toronto, the organization in charge of administering the redevelopment of the city’s eastern waterfront, awarded Sidewalk Labs the contract to develop the waterfront site. The project has ambitious goals: It says it could create 44,000 direct jobs by 2040 and has the potential to be the largest “climate-positive” community—removing more CO2 from the atmosphere than it produces—in North America. It will make use of new urban technology like modular street pavers and underground freight delivery. Sensors, cameras, and Wi-Fi hotspots will monitor and control traffic flows, building temperature, and crosswalk signals.

All that monitoring raises inevitable concerns about privacy, which Sidewalk aims to address—at least partly—by posting signs in the places where data is being collected.

The signs display a set of icons in the form of stackable hexagons, derived in part from a set of design rules developed by Google in 2014. Some describe the purpose for collecting the data (mobility, energy efficiency, or waste management, for example). Others refer to the type of data that’s collected, such as photos, air quality, or sound. When the data is identifiable, meaning it can be associated with a person, the hexagon is yellow. When the information is stripped of personal identifiers, the hexagon is blue…(More)”.

Accelerating AI with synthetic data


Essay by Khaled El Emam: “The application of artificial intelligence and machine learning to solve today’s problems requires access to large amounts of data. One of the key obstacles faced by analysts is access to this data (for example, these issues were reflected in reports from the General Accountability Office and the McKinsey Institute).

Synthetic data can help solve this data problem in a privacy preserving manner.

What is synthetic data ?

Data synthesis is an emerging privacy-enhancing technology that can enable access to realistic data, which is information that may be synthetic, but has the properties of an original dataset. It also simultaneously ensures that such information can be used and disclosed with reduced obligations under contemporary privacy statutes. Synthetic data retains the statistical properties of the original data. Therefore, there are an increasing number of use cases where it would serve as a proxy for real data.

Synthetic data is created by taking an original (real) dataset and then building a model to characterize the distributions and relationships in that data — this is called the “synthesizer.” The synthesizer is typically an artificial neural network or other machine learning technique that learns these (original) data characteristics. Once that model is created, it can be used to generate synthetic data. The data is generated from the model and does not have a 1:1 mapping to real data, meaning that the likelihood of mapping the synthetic records to real individuals would be very small — it is not considered personal information.

Many different types of data can be synthesized, including images, video, audio, text and structured data. The main focus in this article is on the synthesis of structured data.

Even though data can be generated in this manner, that does not mean it cannot be personal information. If the synthesizer is overfit to real data, then the generated data will replicate the original real data. Therefore, the synthesizer has to be constructed in a manner to avoid such overfitting. A formal privacy assurance should also be performed on the synthesized data to validate that there is a weak mapping between synthetic records to individuals….(More)”.

Monitoring of the Venezuelan exodus through Facebook’s advertising platform


Paper by Palotti et al: “Venezuela is going through the worst economical, political and social crisis in its modern history. Basic products like food or medicine are scarce and hyperinflation is combined with economic depression. This situation is creating an unprecedented refugee and migrant crisis in the region. Governments and international agencies have not been able to consistently leverage reliable information using traditional methods. Therefore, to organize and deploy any kind of humanitarian response, it is crucial to evaluate new methodologies to measure the number and location of Venezuelan refugees and migrants across Latin America.

In this paper, we propose to use Facebook’s advertising platform as an additional data source for monitoring the ongoing crisis. We estimate and validate national and sub-national numbers of refugees and migrants and break-down their socio-economic profiles to further understand the complexity of the phenomenon. Although limitations exist, we believe that the presented methodology can be of value for real-time assessment of refugee and migrant crises world-wide….(More)”.