Augmented Reality and the Surveillance Society


Mark Pesce at IEEE Spectrum: “First articulated in a 1965 white paper by Ivan Sutherland, titled “The Ultimate Display,” augmented reality (AR) lay beyond our technical capacities for 50 years. That changed when smartphones began providing people with a combination of cheap sensors, powerful processors, and high-bandwidth networking—the trifecta needed for AR to generate its spatial illusions. Among today’s emerging technologies, AR stands out as particularly demanding—for computational power, for sensed data, and, I’d argue, for attention to the danger it poses.

Unlike virtual-reality (VR) gear, which creates for the user a completely synthetic experience, AR gear adds to the user’s perception of her environment. To do that effectively, AR systems need to know where in space the user is located. VR systems originally used expensive and fragile systems for tracking user movements from the outside in, often requiring external sensors to be set up in the room. But the new generation of VR accomplishes this through a set of techniques collectively known as simultaneous localization and mapping (SLAM). These systems harvest a rich stream of observational data—mostly from cameras affixed to the user’s headgear, but sometimes also from sonar, lidar, structured light, and time-of-flight sensors—using those measurements to update a continuously evolving model of the user’s spatial environment.

For safety’s sake, VR systems must be restricted to certain tightly constrained areas, lest someone blinded by VR goggles tumble down a staircase. AR doesn’t hide the real world, though, so people can use it anywhere. That’s important because the purpose of AR is to add helpful (or perhaps just entertaining) digital illusions to the user’s perceptions. But AR has a second, less appreciated, facet: It also functions as a sophisticated mobile surveillance system.

This second quality is what makes Facebook’s recent Project Aria experiment so unnerving. Nearly four years ago, Mark Zuckerberg announced Facebook’s goal to create AR “spectacles”—consumer-grade devices that could one day rival the smartphone in utility and ubiquity. That’s a substantial technical ask, so Facebook’s research team has taken an incremental approach. Project Aria packs the sensors necessary for SLAM within a form factor that resembles a pair of sunglasses. Wearers collect copious amounts of data, which is fed back to Facebook for analysis. This information will presumably help the company to refine the design of an eventual Facebook AR product.

The concern here is obvious: When it comes to market in a few years, these glasses will transform their users into data-gathering minions for Facebook. Tens, then hundreds of millions of these AR spectacles will be mapping the contours of the world, along with all of its people, pets, possessions, and peccadilloes. The prospect of such intensive surveillance at planetary scale poses some tough questions about who will be doing all this watching and why….(More)”.

Inside India’s booming dark data economy


Snigdha Poonam and Samarath Bansal at the Rest of the World: “…The black market for data, as it exists online in India, resembles those for wholesale vegetables or smuggled goods. Customers are encouraged to buy in bulk, and the variety of what’s on offer is mind-boggling: There are databases about parents, cable customers, pregnant women, pizza eaters, mutual funds investors, and almost any niche group one can imagine. A typical database consists of a spreadsheet with row after row of names and key details: Sheila Gupta, 35, lives in Kolkata, runs a travel agency, and owns a BMW; Irfaan Khan, 52, lives in Greater Noida, and has a son who just applied to engineering college. The databases are usually updated every three months (the older one is, the less it is worth), and if you buy several at the same time, you’ll get a discount. Business is always brisk, and transactions are conducted quickly. No one will ask you for your name, let alone inquire why you want the phone numbers of five million people who have applied for bank loans.

There isn’t a reliable estimate of the size of India’s data economy or of how much money it generates annually. Regarding the former, each broker we spoke to had a different guess: One said only about one or two hundred professionals make up the top tier, another that every big Indian city has at least a thousand people trading data. To find them, potential customers need only look for their ads on social media or run searches with industry keywords and hashtags — “data,” “leads,” “database” — combined with detailed information about the kind of data they want and the city they want it from.

Privacy experts believe that the data-brokering industry has existed since the early days of the internet’s arrival in India. “Databases have been bought and sold in India for at least 15 years now. I remember a case from way back in 2006 of leaked employee data from Naukri.com (one of India’s first online job portals) being sold on CDs,” says Nikhil Pahwa, the editor and publisher of MediaNama, which covers technology policy. By 2009, data brokers were running SMS-marketing companies that offered complementary services: procuring targeted data and sending text messages in bulk. Back then, there was simply less data, “and those who had it could sell it at whatever price,” says Himanshu Bhatt, a data broker who claims to be retired. That is no longer the case: “Today, everyone has every kind of data,” he said.

No broker we contacted would openly discuss their methods of hunting, harvesting, and selling data. But the day-to-day work generally consists of following the trails that people leave during their travels around the internet. Brokers trawl data storage websites armed with a digital fishing net. “I was shocked when I was surfing [cloud-hosted data sites] one day and came across Aadhaar cards,” Bhatt remarked, referring to India’s state-issued biometric ID cards. Images of them were available to download in bulk, alongside completed loan applications and salary sheets.

Again, the legal boundaries here are far from clear. Anybody who has ever filled out a form on a coupon website or requested a refund for a movie ticket has effectively entered their information into a database that can be sold without their consent by the company it belongs to. A neighborhood cell phone store can sell demographic information to a political party for hyperlocal campaigning, and a fintech company can stealthily transfer an individual’s details from an astrology app onto its own server, to gauge that person’s creditworthiness. When somebody shares employment history on LinkedIn or contact details on a public directory, brokers can use basic software such as web scrapers to extract that data.

But why bother hacking into a database when you can buy it outright? More often, “brokers will directly approach a bank employee and tell them, ‘I need the high-end database’,” Bhatt said. And as demand for information increases, so, too, does data vulnerability. A 2019 survey found that 69% of Indian companies haven’t set up reliable data security systems; 44% have experienced at least one breach already. “In the past 12 months, we have seen an increasing trend of Indians’ data [appearing] on the dark web,” says Beenu Arora, the CEO of the global cyberintelligence firm Cyble….(More)”.

Consumer Bureau To Decide Who Owns Your Financial Data


Article by Jillian S. Ambroz: “A federal agency is gearing up to make wide-ranging policy changes on consumers’ access to their financial data.

The Consumer Financial Protection Bureau (CFPB) is looking to implement the area of the 2010 Dodd-Frank Wall Street Reform and Consumer Protection Act pertaining to a consumer’s rights to his or her own financial data. It is detailed in section 1033.

The agency has been laying the groundwork on this move for years, from requesting information in 2016 from financial institutions to hosting a symposium earlier this year on the problems of screen scraping, a risky but common method of collecting consumer data.

Now the agency, which was established by the Dodd-Frank Act, is asking for comments on this critical and controversial topic ahead of the proposed rulemaking. Unlike other regulations that affect single industries, this could be all-encompassing because the consumer data rule touches almost every market the agency covers, according to the story in American Banker.

The Trump administration all but ‘systematically neutered’ the agency.

With the ruling, the agency seeks to clarify its compliance expectations and help establish market practices to ensure consumers have access to consumer financial data. The agency sees an opportunity here to help shape this evolving area of financial technology, or fintech, recognizing both the opportunities and the risks to consumers as more fintechs become enmeshed with their data and day-to-day lives.

Its goal is “to better effectuate consumer access to financial records,” as stated in the regulatory filing….(More)”.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation


Paper by Khaled El Emam et al: “There has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them.

Objective: The purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data.

Methods: A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this “meaningful identity disclosure risk.” The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data.

Results: The meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively.

Conclusions: We have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data….(More)”.

How the U.S. Military Buys Location Data from Ordinary Apps


Joseph Cox at Vice: “The U.S. military is buying the granular movement data of people around the world, harvested from innocuous-seeming apps, Motherboard has learned. The most popular app among a group Motherboard analyzed connected to this sort of data sale is a Muslim prayer and Quran app that has more than 98 million downloads worldwide. Others include a Muslim dating app, a popular Craigslist app, an app for following storms, and a “level” app that can be used to help, for example, install shelves in a bedroom.

Through public records, interviews with developers, and technical analysis, Motherboard uncovered two separate, parallel data streams that the U.S. military uses, or has used, to obtain location data. One relies on a company called Babel Street, which creates a product called Locate X. U.S. Special Operations Command (USSOCOM), a branch of the military tasked with counterterrorism, counterinsurgency, and special reconnaissance, bought access to Locate X to assist on overseas special forces operations. The other stream is through a company called X-Mode, which obtains location data directly from apps, then sells that data to contractors, and by extension, the military.

The news highlights the opaque location data industry and the fact that the U.S. military, which has infamously used other location data to target drone strikes, is purchasing access to sensitive data. Many of the users of apps involved in the data supply chain are Muslim, which is notable considering that the United States has waged a decades-long war on predominantly Muslim terror groups in the Middle East, and has killed hundreds of thousands of civilians during its military operations in Pakistan, Afghanistan, and Iraq. Motherboard does not know of any specific operations in which this type of app-based location data has been used by the U.S. military.

The apps sending data to X-Mode include Muslim Pro, an app that reminds users when to pray and what direction Mecca is in relation to the user’s current location. The app has been downloaded over 50 million times on Android, according to the Google Play Store, and over 98 million in total across other platforms including iOS, according to Muslim Pro’s website….(More)”.

The responsible use of data for and about children: treading carefully and ethically


Q&A with Stefaan G. Verhulst and Andrew Young …” working in collaboration with UNICEF on an initiative called Responsible Data for Children initiative (RD4C) . Its focus is on data – the risks it poses to children, as well as the opportunities it offers.

You have been working with UNICEF on the Responsible Data for Children initiative (RD4C). What is this and why do we need to be talking more about ‘responsible data’?

To date, the relationship between the datafication of everyday life and child welfare has been under-explored, both by researchers in data ethics and those who work to advance the rights of children. This neglect is a lost opportunity, and also poses a risk to children.

Today’s children are the first generation to grow up amid the rapid datafication of virtually every aspect of social, cultural, political and economic life. This alone calls for greater scrutiny of the role played by data. An entire generation is being datafied, often starting before birth. Every year the average child will have more data collected about them in their lifetime than would a similar child born any year prior. Ironically, humanitarian and development organizations working with children are themselves among the key actors contributing to the increased collection of data. These organizations rely on a wide range of technologies, including biometrics, digital identity systems, remote-sensing technologies, mobile and social media messaging apps, and administrative data systems. The data generated by these tools and platforms inevitably includes potentially sensitive PII data (personally identifiable information) and DII data (demographically identifiable information). All of this begs much closer scrutiny, and a more systematic framework to guide how child-related data is collected, stored, and used.

Towards this aim, we have also been working with the Data for Children Collaborative, based in Edinburgh in establishing innovative and ethical practices around the use of data to improve the lives of children worldwide….(More)”.

Federated Learning for Privacy-Preserving Data Access


Paper by Małgorzata Śmietanka, Hirsh Pithadia and Philip Treleaven: “Federated learning is a pioneering privacy-preserving data technology and also a new machine learning model trained on distributed data sets.

Companies collect huge amounts of historic and real-time data to drive their business and collaborate with other organisations. However, data privacy is becoming increasingly important because of regulations (e.g. EU GDPR) and the need to protect their sensitive and personal data. Companies need to manage data access: firstly within their organizations (so they can control staff access), and secondly protecting raw data when collaborating with third parties. What is more, companies are increasingly looking to ‘monetize’ the data they’ve collected. However, under new legislations, utilising data by different organization is becoming increasingly difficult (Yu, 2016).

Federated learning pioneered by Google is the emerging privacy- preserving data technology and also a new class of distributed machine learning models. This paper discusses federated learning as a solution for privacy-preserving data access and distributed machine learning applied to distributed data sets. It also presents a privacy-preserving federated learning infrastructure….(More)”.

Not fit for Purpose: A critical analysis of the ‘Five Safes’


Paper by Chris Culnane, Benjamin I. P. Rubinstein, and David Watts: “Adopted by government agencies in Australia, New Zealand, and the UK as policy instrument or as embodied into legislation, the ‘Five Safes’ framework aims to manage risks of releasing data derived from personal information. Despite its popularity, the Five Safes has undergone little legal or technical critical analysis. We argue that the Fives Safes is fundamentally flawed: from being disconnected from existing legal protections and appropriation of notions of safety without providing any means to prefer strong technical measures, to viewing disclosure risk as static through time and not requiring repeat assessment. The Five Safes provides little confidence that resulting data sharing is performed using ‘safety’ best practice or for purposes in service of public interest….(More)”.

Third Wave of Open Data


Paper (and site) by Stefaan G. Verhulst, Andrew Young, Andrew J. Zahuranec, Susan Ariel Aaronson, Ania Calderon, and Matt Gee on “How To Accelerate the Re-Use of Data for Public Interest Purposes While Ensuring Data Rights and Community Flourishing”: “The paper begins with a description of earlier waves of open data. Emerging from freedom of information laws adopted over the last half century, the First Wave of Open Data brought about newfound transparency, albeit one only available on request to an audience largely composed of journalists, lawyers, and activists. 

The Second Wave of Open Data, seeking to go beyond access to public records and inspired by the open source movement, called upon national governments to make their data open by default. Yet, this approach too had its limitations, leaving many data silos at the subnational level and in the private sector untouched..

The Third Wave of Open Data seeks to build on earlier successes and take into account lessons learned to help open data realize its transformative potential. Incorporating insights from various data experts, the paper describes the emergence of a Third Wave driven by the following goals:

  1. Publishing with Purpose by matching the supply of data with the demand for it, providing assets that match public interests;
  2. Fostering Partnerships and Data Collaboration by forging relationships with  community-based organizations, NGOs, small businesses, local governments, and others who understand how data can be translated into meaningful real-world action;
  3. Advancing Open Data at the Subnational Level by providing resources to cities, municipalities, states, and provinces to address the lack of subnational information in many regions.
  4. Prioritizing Data Responsibility and Data Rights by understanding the risks of using (and not using) data to promote and preserve the public’s general welfare.

Riding the Wave

Achieving these goals will not be an easy task and will require investments and interventions across the data ecosystem. The paper highlights eight actions that decision and policy makers can take to foster more equitable, impactful benefits… (More) (PDF) “

Consumer Reports Study Finds Marketplace Demand for Privacy and Security


Press Release: “American consumers are increasingly concerned about privacy and data security when purchasing new products and services, which may be a competitive advantage to companies that take action towards these consumer values, a new Consumer Reports study finds. 

The new study, “Privacy Front and Center” from CR’s Digital Lab with support from Omidyar Network, looks at the commercial benefits for companies that differentiate their products based on privacy and data security. The study draws from a nationally representative CR survey of 5,085 adult U.S. residents conducted in February 2020, a meta-analysis of 25 years of public opinion studies, and a conjoint analysis that seeks to quantify how consumers weigh privacy and security in their hardware and software purchasing decisions. 

“This study shows that raising the standard for privacy and security is a win-win for consumers and the companies,” said Ben Moskowitz, the director of the Digital Lab at Consumer Reports. “Given the rapid proliferation of internet connected devices, the rise in data breaches and cyber attacks, and the demand from consumers for heightened privacy and security measures, there’s an undeniable business case for companies to invest in creating more private and secure products.” 

Here are some of the key findings from the study:

  • According to CR’s February 2020 nationally representative survey, 74% of consumers are at least moderately concerned about the privacy of their personal data.
  • Nearly all Americans (96%) agree that more should be done to ensure that companies protect the privacy of consumers.
  • A majority of smart product owners (62%) worry about potential loss of privacy when buying them for their home or family.
  • The privacy/security conscious consumer class seems to include more men and people of color.
  • Experiencing a data breach correlates with a higher willingness to pay for privacy, and 30% of Americans have experienced one.
  • Of the Android users who switched to iPhones, 32% indicated doing so because of Apple’s perceived privacy or security benefits relative to Android….(More)”.