A Closer Look at Location Data: Privacy and Pandemics


Assessment by Stacey Gray: “In light of COVID-19, there is heightened global interest in harnessing location data held by major tech companies to track individuals affected by the virus, better understand the effectiveness of social distancing, or send alerts to individuals who might be affected based on their previous proximity to known cases. Governments around the world are considering whether and how to use mobile location data to help contain the virus: Israel’s government passed emergency regulations to address the crisis using cell phone location data; the European Commission requested that mobile carriers provide anonymized and aggregate mobile location data; and South Korea has created a publicly available map of location data from individuals who have tested positive. 

Public health agencies and epidemiologists have long been interested in analyzing device location data to track diseases. In general, the movement of devices effectively mirrors movement of people (with some exceptions discussed below). However, its use comes with a range of ethical and privacy concerns. 

In order to help policymakers address these concerns, we provide below a brief explainer guide of the basics: (1) what is location data, (2) who holds it, and (3) how is it collected? Finally we discuss some preliminary ethical and privacy considerations for processing location data. Researchers and agencies should consider: how and in what context location data was collected; the fact and reasoning behind location data being classified as legally “sensitive” in most jurisdictions; challenges to effective “anonymization”; representativeness of the location dataset (taking into account potential bias and lack of inclusion of low-income and elderly subpopulations who do not own phones); and the unique importance of purpose limitation, or not re-using location data for other civil or law enforcement purposes after the pandemic is over….(More)”.

A controlled trial for reproducibility


Marc P. Raphael, Paul E. Sheehan & Gary J. Vora at Nature: “In 2016, the US Defense Advanced Research Projects Agency (DARPA) told eight research groups that their proposals had made it through the review gauntlet and would soon get a few million dollars from its Biological Technologies Office (BTO). Along with congratulations, the teams received a reminder that their award came with an unusual requirement — an independent shadow team of scientists tasked with reproducing their results.

Thus began an intense, multi-year controlled trial in reproducibility. Each shadow team consists of three to five researchers, who visit the ‘performer’ team’s laboratory and often host visits themselves. Between 3% and 8% of the programme’s total funds go to this independent validation and verification (IV&V) work. But DARPA has the flexibility and resources for such herculean efforts to assess essential techniques. In one unusual instance, an IV&V laboratory needed a sophisticated US$200,000 microscopy and microfluidic set-up to make an accurate assessment.

These costs are high, but we think they are an essential investment to avoid wasting taxpayers’ money and to advance fundamental research towards beneficial applications. Here, we outline what we’ve learnt from implementing this programme, and how it could be applied more broadly….(More)”.

Will This Year’s Census Be the Last?


Jill Lepore at The New Yorker: “People have been counting people for thousands of years. Count everyone, beginning with babies who have teeth, decreed census-takers in China in the first millennium B.C.E., under the Zhou dynasty. “Take ye the sum of all the congregation of the children of Israel, after their families, by the house of their fathers, with the number of their names, every male by their polls,” God commands Moses in the Book of Numbers, describing a census, taken around 1500 B.C.E., that counted only men “twenty years old and upward, all that are able to go forth to war in Israel”—that is, potential conscripts.

Ancient rulers took censuses to measure and gather their strength: to muster armies and levy taxes. Who got counted depended on the purpose of the census. In the United States, which counts “the whole number of persons in each state,” the chief purpose of the census is to apportion representation in Congress. In 2018, Secretary of Commerce Wilbur Ross sought to add a question to the 2020 U.S. census that would have read, “Is this person a citizen of the United States?” Ross is a banker who specialized in bankruptcy before joining the Trump Administration; earlier, he had handled cases involving the insolvency of Donald Trump’s casinos. The Census Bureau objected to the question Ross proposed. Eighteen states, the District of Columbia, fifteen cities and counties, the United Conference of Mayors, and a coalition of non-governmental organizations filed a lawsuit, alleging that the question violated the Constitution.

Last year, United States District Court Judge Jesse Furman, in an opinion for the Southern District, found Ross’s attempt to add the citizenship question to be not only unlawful, and quite possibly unconstitutional, but also, given the way Ross went about trying to get it added to the census, an abuse of power. Furman wrote, “To conclude otherwise and let Secretary Ross’s decision stand would undermine the proposition—central to the rule of law—that ours is a ‘government of laws, and not of men.’ ” There is, therefore, no citizenship question on the 2020 census.

All this, though, may be by the bye, because the census, like most other institutions of democratic government, is under threat. Google and Facebook, after all, know a lot more about you, and about the population of the United States, or any other state, than does the U.S. Census Bureau or any national census agency. This year may be the last time that a census is taken door by door, form by form, or even click by click….

In the ancient world, rulers counted and collected information about people in order to make use of them, to extract their labor or their property. Facebook works the same way. “It was the great achievement of eighteenth- and nineteenth-century census-takers to break that nexus and persuade people—the public on one side and their colleagues in government on the other—that states could collect data on their citizens without using it against them,” Whitby writes. It is among the tragedies of the past century that this trust has been betrayed. But it will be the error of the next if people agree to be counted by unregulated corporations, rather than by democratic governments….(More)”.

Scraping the Web for Public Health Gains: Ethical Considerations from a ‘Big Data’ Research Project on HIV and Incarceration


Stuart Rennie, Mara Buchbinder, Eric Juengst, Lauren Brinkley-Rubinstein, and David L Rosen at Public Health Ethics: “Web scraping involves using computer programs for automated extraction and organization of data from the Web for the purpose of further data analysis and use. It is frequently used by commercial companies, but also has become a valuable tool in epidemiological research and public health planning. In this paper, we explore ethical issues in a project that “scrapes” public websites of U.S. county jails as part of an effort to develop a comprehensive database (including individual-level jail incarcerations, court records and confidential HIV records) to enhance HIV surveillance and improve continuity of care for incarcerated populations. We argue that the well-known framework of Emanuel et al. (2000) provides only partial ethical guidance for the activities we describe, which lie at a complex intersection of public health research and public health practice. We suggest some ethical considerations from the ethics of public health practice to help fill gaps in this relatively unexplored area….(More)”.

Techlash? America’s Growing Concern with Major Technology Companies


Press Release: “Just a few years ago, Americans were overwhelmingly optimistic about the power of new technologies to foster an informed and engaged society. More recently, however, that confidence has been challenged by emerging concerns over the role that internet and technology companies — especially social media — now play in our democracy.

A new Knight Foundation and Gallup study explores how much the landscape has shifted. This wide-ranging study confirms that, for Americans, the techlash is real, widespread, and bipartisan. From concerns about the spread of misinformation to election interference and data privacy, we’ve documented the deep pessimism of folks across the political spectrum who believe tech companies have too much power — and that they do more harm than good. 

Despite their shared misgivings, Americans are deeply divided on how best to address these challenges. This report explores the contours of the techlash in the context of the issues currently animating policy debates in Washington and Silicon Valley. Below are the main findings from the executive summary….

  • 77% of Americans say major internet and technology companies like Facebook, Google, Amazon and Apple have too muchpower.
  • Americans are equally divided among those who favor (50%) and oppose (49%) government intervention that would require internet and technology companies to break into smaller companies. 
  • Americans do not trust social media companies much (44%) or at all (40%) to make the right decisions about what content should or should not be allowed on online platforms.
  • However, they would still prefer the companies (55%) to make those decisions rather than the government (44%). …(More)

Milwaukee’s Amani Neighborhood Uses Data to Target Traffic Safety and Build Trust


Article by Kassie Scott: “People in Milwaukee’s Amani neighborhood are using data to identify safety issues and build relationships with the police. It’s a story of community-engaged research at its best.

In 2017, the Milwaukee Police Department received a grant under the federal Byrne Criminal Justice Innovation program, now called the Community Based Crime Reduction Program, whose purpose is to bridge the gap between practitioners and researchers and advance the use of data in making communities safer. Because of its close ties in the Amani neighborhood, the Dominican Center was selected to lead this initiative, known as the Amani Safety Initiative, and they partnered with local churches, the district attorney’s office, LISC-Milwaukee, and others. To support the effort with data and coaching, the police department contracted with Data You Can Use.

Together with Data You Can Use, the Amani Safety Initiative team first implemented a survey to gauge perceptions of public safety and police legitimacy. Neighborhood ambassadors were trained (and paid) to conduct the survey themselves, going door to door to gather the information from nearly 300 of their neighbors. The ambassadors shared these results with their neighborhood during what they called “data chats.” They also printed summary survey results on door hangers, which they distributed throughout the neighborhood.

Neighbors and community organizations were surprised by the survey results. Though violent crime and mistrust in the police were commonly thought to be the biggest issues, the data showed that residents were most concerned about traffic safety. Ultimately, residents decided to post slow-down signs in intersections.

This project stands out for letting the people in the neighborhood lead the way. Neighbors collected data, shared results, and took action. The partnership between neighbors, police, and local organizations shows how people can drive decision-making for their neighborhood.

The larger story is one of social cohesion and mutual trust. Through participating in the initiative and learning more about their neighborhood, Amani neighbors built stronger relationships with the police. The police began coming to neighborhood community meetings, which helped them build relationships with people in the community and understand the challenges they face….(More).

Is Your Data Being Collected? These Signs Will Tell You Where


Flavie Halais at Wired: “Alphabet’s Sidewalk Labs is testing icons that provide “digital transparency” when information is collected in public spaces….

As cities incorporate digital technologies into their landscapes, they face the challenge of informing people of the many sensors, cameras, and other smart technologies that surround them. Few people have the patience to read through the lengthy privacy notice on a website or smartphone app. So how can a city let them know how they’re being monitored?

Sidewalk Labs, the Google sister company that applies technology to urban problems, is taking a shot. Through a project called Digital Transparency in the Public Realm, or DTPR, the company is demonstrating a set of icons, to be displayed in public spaces, that shows where and what kinds of data are being collected. The icons are being tested as part Sidewalk Labs’ flagship project in Toronto, where it plans to redevelop a 12-acre stretch of the city’s waterfront. The signs would be displayed at each location where data would be collected—streets, parks, businesses, and courtyards.

Data collection is a core feature of the project, called Sidewalk Toronto, and the source of much of the controversy surrounding it. In 2017, Waterfront Toronto, the organization in charge of administering the redevelopment of the city’s eastern waterfront, awarded Sidewalk Labs the contract to develop the waterfront site. The project has ambitious goals: It says it could create 44,000 direct jobs by 2040 and has the potential to be the largest “climate-positive” community—removing more CO2 from the atmosphere than it produces—in North America. It will make use of new urban technology like modular street pavers and underground freight delivery. Sensors, cameras, and Wi-Fi hotspots will monitor and control traffic flows, building temperature, and crosswalk signals.

All that monitoring raises inevitable concerns about privacy, which Sidewalk aims to address—at least partly—by posting signs in the places where data is being collected.

The signs display a set of icons in the form of stackable hexagons, derived in part from a set of design rules developed by Google in 2014. Some describe the purpose for collecting the data (mobility, energy efficiency, or waste management, for example). Others refer to the type of data that’s collected, such as photos, air quality, or sound. When the data is identifiable, meaning it can be associated with a person, the hexagon is yellow. When the information is stripped of personal identifiers, the hexagon is blue…(More)”.

Accelerating AI with synthetic data


Essay by Khaled El Emam: “The application of artificial intelligence and machine learning to solve today’s problems requires access to large amounts of data. One of the key obstacles faced by analysts is access to this data (for example, these issues were reflected in reports from the General Accountability Office and the McKinsey Institute).

Synthetic data can help solve this data problem in a privacy preserving manner.

What is synthetic data ?

Data synthesis is an emerging privacy-enhancing technology that can enable access to realistic data, which is information that may be synthetic, but has the properties of an original dataset. It also simultaneously ensures that such information can be used and disclosed with reduced obligations under contemporary privacy statutes. Synthetic data retains the statistical properties of the original data. Therefore, there are an increasing number of use cases where it would serve as a proxy for real data.

Synthetic data is created by taking an original (real) dataset and then building a model to characterize the distributions and relationships in that data — this is called the “synthesizer.” The synthesizer is typically an artificial neural network or other machine learning technique that learns these (original) data characteristics. Once that model is created, it can be used to generate synthetic data. The data is generated from the model and does not have a 1:1 mapping to real data, meaning that the likelihood of mapping the synthetic records to real individuals would be very small — it is not considered personal information.

Many different types of data can be synthesized, including images, video, audio, text and structured data. The main focus in this article is on the synthesis of structured data.

Even though data can be generated in this manner, that does not mean it cannot be personal information. If the synthesizer is overfit to real data, then the generated data will replicate the original real data. Therefore, the synthesizer has to be constructed in a manner to avoid such overfitting. A formal privacy assurance should also be performed on the synthesized data to validate that there is a weak mapping between synthetic records to individuals….(More)”.

Government by Algorithm: Artificial Intelligence in Federal Administrative Agencies


The Administrative Conference of the United States: “Artificial intelligence (AI) promises to transform how government agencies do their work. Rapid developments in AI have the potential to reduce the cost of core governance functions, improve the quality of decisions, and unleash the power of administrative data, thereby making government performance more efficient and effective. Agencies that use AI to realize these gains will also confront important questions about the proper design of algorithms and user interfaces, the respective scope of human and machine decision-making, the boundaries between public actions and private contracting, their own capacity to learn over time using AI, and whether the use of AI is even permitted.

These are important issues for public debate and academic inquiry. Yet little is known about how agencies are currently using AI systems beyond a few headlinegrabbing examples or surface-level descriptions. Moreover, even amidst growing public and  scholarly discussion about how society might regulate government use of AI, little attention has been devoted to how agencies acquire such tools in the first place or oversee their use. In an effort to fill these gaps, the Administrative Conference of the United States (ACUS) commissioned this report from researchers at Stanford University and New York University. The research team included a diverse set of lawyers, law students, computer scientists, and social scientists with the capacity to analyze these cutting-edge issues from technical, legal, and policy angles. The resulting report offers three cuts at federal agency use of AI:

  • a rigorous canvass of AI use at the 142 most significant federal departments, agencies, and sub-agencies (Part I)
  • a series of in-depth but accessible case studies of specific AI applications at seven leading agencies covering a range of governance tasks (Part II); and
  • a set of cross-cutting analyses of the institutional, legal, and policy challenges raised by agency use of AI (Part III)….(More)”

How Philanthropy Can Help Lead on Data Justice


Louise Lief at Stanford Social Innovation Review: “Today, data governs almost every aspect of our lives, shaping the opportunities we have, how we perceive reality and understand problems, and even what we believe to be possible. Philanthropy is particularly data driven, relying on it to inform decision-making, define problems, and measure impact. But what happens when data design and collection methods are flawed, lack context, or contain critical omissions and misdirected questions? With bad data, data-driven strategies can misdiagnose problems and worsen inequities with interventions that don’t reflect what is needed.

Data justice begins by asking who controls the narrative. Who decides what data is collected and for which purpose? Who interprets what it means for a community? Who governs it? In recent years, affected communities, social justice philanthropists, and academics have all begun looking deeper into the relationship between data and social justice in our increasingly data-driven world. But philanthropy can play a game-changing role in developing practices of data justice to more accurately reflect the lived experience of communities being studied. Simply incorporating data justice principles into everyday foundation practice—and requiring it of grantees—would be transformative: It would not only revitalize research, strengthen communities, influence policy, and accelerate social change, it would also help address deficiencies in current government data sets.

When Data Is Flawed

Some of the most pioneering work on data justice has been done by Native American communities, who have suffered more than most from problems with bad data. A 2017 analysis of American Indian data challenges—funded by the W.K. Kellogg Foundation and the Morris K. Udall and Stewart L. Udall Foundation—documented how much data on Native American communities is of poor quality, inaccurate, inadequate, inconsistent, irrelevant, and/or inaccessible. The National Congress of American Indians even described American Native communities as “The Asterisk Nation,” because in many government data sets they are represented only by an asterisk denoting sampling errors instead of data points.

Where it concerns Native Americans, data is often not standardized and different government databases identify tribal members at least seven different ways using different criteria; federal and state statistics often misclassify race and ethnicity; and some data collection methods don’t allow tribes to count tribal citizens living off the reservation. For over a decade the Department of the Interior’s Bureau of Indian Affairs has struggled to capture the data it needs for a crucial labor force report it is legally required to produce; methodology errors and reporting problems have been so extensive that at times it prevented the report from even being published. But when the Department of the Interior changed several reporting requirements in 2014 and combined data submitted by tribes with US Census data, it only compounded the problem, making historical comparisons more difficult. Moreover, Native Americans have charged that the Census Bureau significantly undercounts both the American Indian population and key indicators like joblessness….(More)”.