The Great Scrape: The Clash Between Scraping and Privacy


Paper by Daniel J. Solove and Woodrow Hartzog: “Artificial intelligence (AI) systems depend on massive quantities of data, often gathered by “scraping” – the automated extraction of large amounts of data from the internet. A great deal of scraped data is about people. This personal data provides the grist for AI tools such as facial recognition, deep fakes, and generative AI. Although scraping enables web searching, archival, and meaningful scientific research, scraping for AI can also be objectionable or even harmful to individuals and society.

Organizations are scraping at an escalating pace and scale, even though many privacy laws are seemingly incongruous with the practice. In this Article, we contend that scraping must undergo a serious reckoning with privacy law.  Scraping violates nearly all of the key principles in privacy laws, including fairness; individual rights and control; transparency; consent; purpose specification and secondary use restrictions; data minimization; onward transfer; and data security. With scraping, data protection laws built around these requirements are ignored.

Scraping has evaded a reckoning with privacy law largely because scrapers act as if all publicly available data were free for the taking. But the public availability of scraped data shouldn’t give scrapers a free pass. Privacy law regularly protects publicly available data, and privacy principles are implicated even when personal data is accessible to others.

This Article explores the fundamental tension between scraping and privacy law. With the zealous pursuit and astronomical growth of AI, we are in the midst of what we call the “great scrape.” There must now be a great reconciliation…(More)”.

Everyone Has A Price — And Corporations Know Yours


Article by David Dayen: “Six years ago, I was at a conference at the University of Chicago, the intellectual heart of corporate-friendly capitalism, when my eyes found the cover of the Chicago Booth Review, the business school’s flagship publication. “Are You Ready for Personalized Pricing?” the headline asked. I wasn’t, so I started reading.

The story looked at how online shopping, persistent data collection, and machine-learning algorithms could combine to generate the stuff of economists’ dreams: individual prices for each customer. It even recounted an experiment in 2015, where online employment website ZipRecruiter essentially outsourced its pricing strategy to two University of Chicago economists, Sanjog Misra and Jean-Pierre Dubé…(More)”.

How the Rise of the Camera Launched a Fight to Protect Gilded Age Americans’ Privacy


Article by Sohini Desai: “In 1904, a widow named Elizabeth Peck had her portrait taken at a studio in a small Iowa town. The photographer sold the negatives to Duffy’s Pure Malt Whiskey, a company that avoided liquor taxes for years by falsely advertising its product as medicinal. Duffy’s ads claimed the fantastical: that it cured everything from influenza to consumption, that it was endorsed by clergymen, that it could help you live until the age of 106. The portrait of Peck ended up in one of these dubious ads, published in newspapers across the country alongside what appeared to be her unqualified praise: “After years of constant use of your Pure Malt Whiskey, both by myself and as given to patients in my capacity as nurse, I have no hesitation in recommending it.”

Duffy’s lies were numerous. Peck (misleadingly identified as “Mrs. A. Schuman”) was not a nurse, and she had not spent years constantly slinging back malt beverages. In fact, she fully abstained from alcohol. Peck never consented to the ad.

The camera’s first great age—which began in 1888 when George Eastman debuted the Kodak—is full of stories like this one. Beyond the wonders of a quickly developing art form and technology lay widespread lack of control over one’s own image, perverse incentives to make a quick buck, and generalized fear at the prospect of humiliation and the invasion of privacy…(More)”.

Cryptographers Discover a New Foundation for Quantum Secrecy


Article by Ben Brubaker: “…Say you want to send a private message, cast a secret vote or sign a document securely. If you do any of these tasks on a computer, you’re relying on encryption to keep your data safe. That encryption needs to withstand attacks from codebreakers with their own computers, so modern encryption methods rely on assumptions about what mathematical problems are hard for computers to solve.

But as cryptographers laid the mathematical foundations for this approach to information security in the 1980s, a few researchers discovered that computational hardness wasn’t the only way to safeguard secrets. Quantum theory, originally developed to understand the physics of atoms, turned out to have deep connections to information and cryptography. Researchers found ways to base the security of a few specific cryptographic tasks directly on the laws of physics. But these tasks were strange outliers — for all others, there seemed to be no alternative to the classical computational approach.

By the end of the millennium, quantum cryptography researchers thought that was the end of the story. But in just the past few years, the field has undergone another seismic shift.

“There’s been this rearrangement of what we believe is possible with quantum cryptography,” said Henry Yuen, a quantum information theorist at Columbia University.

In a string of recent papers, researchers have shown that most cryptographic tasks could still be accomplished securely even in hypothetical worlds where practically all computation is easy. All that matters is the difficulty of a special computational problem about quantum theory itself.

“The assumptions you need can be way, way, way weaker,” said Fermi Ma, a quantum cryptographer at the Simons Institute for the Theory of Computing in Berkeley, California. “This is giving us new insights into computational hardness itself.”…(More)”.

Uganda’s Sweeping Surveillance State Is Built on National ID Cards


Article by Olivia Solon: “Uganda has spent hundreds of millions of dollars in the past decade on biometric tools that document a person’s unique physical characteristics, such as their face, fingerprints and irises, to form the basis of a comprehensive identification system. While the system is central to many of the state’s everyday functions, as Museveni has grown increasingly authoritarian over nearly four decades in power, it has also become a powerful mechanism for surveilling politicians, journalists, human rights advocates and ordinary citizens, according to dozens of interviews and hundreds of pages of documents obtained and analyzed by Bloomberg and nonprofit investigative newsroom Lighthouse Reports.

It’s a cautionary tale for any country considering establishing a biometric identity system without rigorous checks and balances and input from civil society. Dozens of global south countries have adopted this approach as part of an effort to meet sustainable development goals from the UN, which considers having a legal identity to be a fundamental human right. But, despite billions of dollars of investment, with backing from organizations including the World Bank, those identity systems haven’t always lived up to expectations. In many cases, the key problem is the failure to register large swathes of the population, leading to exclusion from public services. But in other places, like Uganda, inclusion in the system has been weaponized for surveillance purposes.

A year-long investigation by Bloomberg and Lighthouse Reports sheds new light on the ways in which Museveni’s regime has built and deployed this system to target opponents and consolidate power. It shows how the underlying software and data sets are easily accessed by individuals at all levels of law enforcement, despite official claims to the contrary. It also highlights, in some cases for the first time, how senior government and law enforcement officials have used these tools to target individuals deemed to pose a political threat…(More)”.

What are location services and how do they work?


Article by Douglas Crawford: “Location services refer to a combination of technologies used in devices like smartphones and computers that use data from your device’s GPS, WiFi, mobile (cellular networks), and sometimes even Bluetooth connections to determine and track your geographic location.

This information can be accessed by your operating system (OS) and the apps installed on your device. In many cases, this allows them to perform their purpose correctly or otherwise deliver useful content and features. 

For example, navigation/map, weather, ridesharing (such Uber or Lyft), and health and fitness tracking apps require location services to perform their functions, while datingtravel, and social media apps can offer additional functionality with access to your device’s location services (such as being able to locate a Tinder match or see recommendations for nearby restaurants ).

There’s no doubt location services (and the apps that use them) can be useful. However, the technology can be (and is) also abused by apps to track your movements. The apps then usually sell this information to advertising and analytics companies  that combine it with other data to create a profile of you, which they can then use to sell ads. 

Unfortunately, this behavior is not limited to “rogue” apps. Apps usually regarded as legitimate, including almost all Google apps, Facebook, Instagram, and others, routinely send detailed and highly sensitive location details back to their developers by default. And it’s not just apps — operating systems themselves, such as Google’s Android and Microsoft Windows also closely track your movements using location services. 

This makes weighing the undeniable usefulness of location services with the need to maintain a basic level of privacy a tricky balancing act. However, because location services are so easy to abuse, all operating systems include built-in safeguards that give you some control over their use.

In this article, we’ll look at how location services work..(More)”.

The not-so-silent type: Vulnerabilities across keyboard apps reveal keystrokes to network eavesdroppers


Report by Jeffrey KnockelMona Wang, and Zoë Reichert: “Typing logographic languages such as Chinese is more difficult than typing alphabetic languages, where each letter can be represented by one key. There is no way to fit the tens of thousands of Chinese characters that exist onto a single keyboard. Despite this obvious challenge, technologies have developed which make typing in Chinese possible. To enable the input of Chinese characters, a writer will generally use a keyboard app with an “Input Method Editor” (IME). IMEs offer a variety of approaches to inputting Chinese characters, including via handwriting, voice, and optical character recognition (OCR). One popular phonetic input method is Zhuyin, and shape or stroke-based input methods such as Cangjie or Wubi are commonly used as well. However, used by nearly 76% of mainland Chinese keyboard users, the most popular way of typing in Chinese is the pinyin method, which is based on the pinyin romanization of Chinese characters.

All of the keyboard apps we analyze in this report fall into the category of input method editors (IMEs) that offer pinyin input. These keyboard apps are particularly interesting because they have grown to accommodate the challenge of allowing users to type Chinese characters quickly and easily. While many keyboard apps operate locally, solely within a user’s device, IME-based keyboard apps often have cloud features which enhance their functionality. Because of the complexities of predicting which characters a user may want to type next, especially in logographic languages like Chinese, IMEs often offer “cloud-based” prediction services which reach out over the network. Enabling “cloud-based” features in these apps means that longer strings of syllables that users type will be transmitted to servers elsewhere. As many have previously pointed out, “cloud-based” keyboards and input methods can function as vectors for surveillance and essentially behave as keyloggers. While the content of what users type is traveling from their device to the cloud, it is additionally vulnerable to network attackers if not properly secured. This report is not about how operators of cloud-based IMEs read users’ keystrokes, which is a phenomenon that has already been extensively studied and documented. This report is primarily concerned with the issue of protecting this sensitive data from network eavesdroppers…(More)”.

On the Meaning of Community Consent in a Biorepository Context


Article by Astha Kapoor, Samuel Moore, and Megan Doerr: “Biorepositories, vital for medical research, collect and store human biological samples and associated data for future use. However, our reliance solely on the individual consent of data contributors for biorepository data governance is becoming inadequate. Big data analysis focuses on large-scale behaviors and patterns, shifting focus from singular data points to identifying data “journeys” relevant to a collective. The individual becomes a small part of the analysis, with the harms and benefits emanating from the data occurring at an aggregated level.

Community refers to a particular qualitative aspect of a group of people that is not well captured by quantitative measures in biorepositories. This is not an excuse to dodge the question of how to account for communities in a biorepository context; rather, it shows that a framework is needed for defining different types of community that may be approached from a biorepository perspective. 

Engaging with communities in biorepository governance presents several challenges. Moving away from a purely individualized understanding of governance towards a more collectivizing approach necessitates an appreciation of the messiness of group identity, its ephemerality, and the conflicts entailed therein. So while community implies a certain degree of homogeneity (i.e., that all members of a community share something in common), it is important to understand that people can simultaneously consider themselves a member of a community while disagreeing with many of its members, the values the community holds, or the positions for which it advocates. The complex nature of community participation therefore requires proper treatment for it to be useful in a biorepository governance context…(More)”.

Murky Consent: An Approach to the Fictions of Consent in Privacy Law


Paper by Daniel J. Solove: “Consent plays a profound role in nearly all privacy laws. As Professor Heidi Hurd aptly said, consent works “moral magic” – it transforms things that would be illegal and immoral into lawful and legitimate activities. As to privacy, consent authorizes and legitimizes a wide range of data collection and processing.

There are generally two approaches to consent in privacy law. In the United States, the notice-and-choice approach predominates; organizations post a notice of their privacy practices and people are deemed to consent if they continue to do business with the organization or fail to opt out. In the European Union, the General Data Protection Regulation (GDPR) uses the express consent approach, where people must voluntarily and affirmatively consent.

Both approaches fail. The evidence of actual consent is non-existent under the notice-and-choice approach. Individuals are often pressured or manipulated, undermining the validity of their consent. The express consent approach also suffers from these problems – people are ill-equipped to decide about their privacy, and even experts cannot fully understand what algorithms will do with personal data. Express consent also is highly impractical; it inundates individuals with consent requests from thousands of organizations. Express consent cannot scale.

In this Article, I contend that most of the time, privacy consent is fictitious. Privacy law should take a new approach to consent that I call “murky consent.” Traditionally, consent has been binary – an on/off switch – but murky consent exists in the shadowy middle ground between full consent and no consent. Murky consent embraces the fact that consent in privacy is largely a set of fictions and is at best highly dubious….(More)”. See also: The Urgent Need to Reimagine Data Consent

The Secret Life of Data


Book by Aram Sinnreich and Jesse Gilbert: “…explore the many unpredictable, and often surprising, ways in which data surveillance, AI, and the constant presence of algorithms impact our culture and society in the age of global networks. The authors build on this basic premise: no matter what form data takes, and what purpose we think it’s being used for, data will always have a secret life. How this data will be used, by other people in other times and places, has profound implications for every aspect of our lives—from our intimate relationships to our professional lives to our political systems.

With the secret uses of data in mind, Sinnreich and Gilbert interview dozens of experts to explore a broad range of scenarios and contexts—from the playful to the profound to the problematic. Unlike most books about data and society that focus on the short-term effects of our immense data usage, The Secret Life of Data focuses primarily on the long-term consequences of humanity’s recent rush toward digitizing, storing, and analyzing every piece of data about ourselves and the world we live in. The authors advocate for “slow fixes” regarding our relationship to data, such as creating new laws and regulations, ethics and aesthetics, and models of production for our datafied society.

Cutting through the hype and hopelessness that so often inform discussions of data and society, The Secret Life of Data clearly and straightforwardly demonstrates how readers can play an active part in shaping how digital technology influences their lives and the world at large…(More)”