Big Data for the Greater Good


Book edited by Ali Emrouznejad and Vincent Charles: “This book highlights some of the most fascinating current uses, thought-provoking changes, and biggest challenges that Big Data means for our society. The explosive growth of data and advances in Big Data analytics have created a new frontier for innovation, competition, productivity, and well-being in almost every sector of our society, as well as a source of immense economic and societal value. From the derivation of customer feedback-based insights to fraud detection and preserving privacy; better medical treatments; agriculture and food management; and establishing low-voltage networks – many innovations for the greater good can stem from Big Data. Given the insights it provides, this book will be of interest to both researchers in the field of Big Data, and practitioners from various fields who intend to apply Big Data technologies to improve their strategic and operational decision-making processes….(More)”.

Balancing Act: Innovation vs. Privacy in the Age of Data Portability


Thursday, July 12, 2018 @ 2 MetroTech Center, Brooklyn, NY 11201

RSVP here.

The ability of people to move or copy data about themselves from one service to another — data portability — has been hailed as a way of increasing competition and driving innovation. In many areas, such as through the Open Banking initiative in the United Kingdom, the practice of data portability is fully underway and propagating. The launch of GDPR in Europe has also elevated the issue among companies and individuals alike. But recent online security breaches and other experiences of personal data being transferred surreptitiously from private companies, (e.g., Cambridge Analytica’s appropriation of Facebook data), highlight how data portability can also undermine people’s privacy.

The GovLab at the NYU Tandon School of Engineering is pleased to present Jeni Tennison, CEO of the Open Data Institute, for its next Ideas Lunch, where she will discuss how data portability has been regulated in the UK and Europe, and what governments, businesses and people need to do to strike the balance between its risks and benefits.

Jeni Tennison is the CEO of the Open Data Institute. She gained her PhD from the University of Nottingham then worked as an independent consultant, specialising in open data publishing and consumption, before joining the ODI in 2012. Jeni was awarded an OBE for services to technology and open data in the 2014 New Year Honours.

Before joining the ODI, Jeni was the technical architect and lead developer for legislation.gov.uk. She worked on the early linked data work on data.gov.uk, including helping to engineer new standards for publishing statistics as linked data. She continues her work within the UK’s public sector as a member of the Open Standards Board.

Jeni also works on international web standards. She was appointed to serve on the W3C’s Technical Architecture Group from 2011 to 2015 and in 2014 she started to co-chair the W3C’s CSV on the Web Working Group. She also sits on the Advisory Boards for Open Contracting Partnership and the Data Transparency Lab.

Twitter handle: @JeniT

Personal Data v. Big Data: Challenges of Commodification of Personal Data


Maria Bottis and  George Bouchagiar in the Open Journal of Philosophy: “Any firm today may, at little or no cost, build its own infrastructure to process personal data for commercial, economic, political, technological or any other purposes. Society has, therefore, turned into a privacy-unfriendly environment. The processing of personal data is essential for multiple economically and socially useful purposes, such as health care, education or terrorism prevention. But firms view personal data as a commodity, as a valuable asset, and heavily invest in processing for private gains. This article studies the potential to subject personal data to trade secret rules, so as to ensure the users’ control over their data without limiting the data’s free movement, and examines some positive scenarios of attributing commercial value to personal data….(More)”.

Data Protection and e-Privacy: From Spam and Cookies to Big Data, Machine Learning and Profiling


Chapter by Lilian Edwards in L Edwards ed Law, Policy and the Internet (Hart , 2018): “In this chapter, I examine in detail how data subjects are tracked, profiled and targeted by their activities on line and, increasingly, in the “offline” world as well. Tracking is part of both commercial and state surveillance, but in this chapter I concentrate on the former. The European law relating to spam, cookies, online behavioural advertising (OBA), machine learning (ML) and the Internet of Things (IoT) is examined in detail, using both the GDPR and the forthcoming draft ePrivacy Regulation. The chapter concludes by examining both code and law solutions which might find a way forward to protect user privacy and still enable innovation, by looking to paradigms not based around consent, and less likely to rely on a “transparency fallacy”. Particular attention is drawn to the new work around Personal Data Containers (PDCs) and distributed ML analytics….(More)”.

Why Do We Care So Much About Privacy?


Louis Menand in The New Yorker: “…Possibly the discussion is using the wrong vocabulary. “Privacy” is an odd name for the good that is being threatened by commercial exploitation and state surveillance. Privacy implies “It’s nobody’s business,” and that is not really what Roe v. Wade is about, or what the E.U. regulations are about, or even what Katz and Carpenter are about. The real issue is the one that Pollak and Martin, in their suit against the District of Columbia in the Muzak case, said it was: liberty. This means the freedom to choose what to do with your body, or who can see your personal information, or who can monitor your movements and record your calls—who gets to surveil your life and on what grounds.

As we are learning, the danger of data collection by online companies is not that they will use it to try to sell you stuff. The danger is that that information can so easily fall into the hands of parties whose motives are much less benign. A government, for example. A typical reaction to worries about the police listening to your phone conversations is the one Gary Hart had when it was suggested that reporters might tail him to see if he was having affairs: “You’d be bored.” They were not, as it turned out. We all may underestimate our susceptibility to persecution. “We were just talking about hardwood floors!” we say. But authorities who feel emboldened by the promise of a Presidential pardon or by a Justice Department that looks the other way may feel less inhibited about invading the spaces of people who belong to groups that the government has singled out as unpatriotic or undesirable. And we now have a government that does that….(More)”.

I want your (anonymized) social media data


Anthony Sanford at The Conversation: “Social media sites’ responses to the Facebook-Cambridge Analytica scandal and new European privacy regulations have given users much more control over who can access their data, and for what purposes. To me, as a social media user, these are positive developments: It’s scary to think what these platforms could do with the troves of data available about me. But as a researcher, increased restrictions on data sharing worry me.

I am among the many scholars who depend on data from social media to gain insights into people’s actions. In a rush to protect individuals’ privacy, I worry that an unintended casualty could be knowledge about human nature. My most recent work, for example, analyzes feelings people express on Twitter to explain why the stock market fluctuates so much over the course of a single day. There are applications well beyond finance. Other scholars have studied mass transit rider satisfactionemergency alert systems’ function during natural disasters and how online interactions influence people’s desire to lead healthy lifestyles.

This poses a dilemma – not just for me personally, but for society as a whole. Most people don’t want social media platforms to share or sell their personal information, unless specifically authorized by the individual user. But as members of a collective society, it’s useful to understand the social forces at work influencing everyday life and long-term trends. Before the recent crises, Facebook and other companies had already been making it hard for legitimate researchers to use their data, including by making it more difficult and more expensive to download and access data for analysis. The renewed public pressure for privacy means it’s likely to get even tougher….

It’s true – and concerning – that some presumably unethical people have tried to use social media data for their own benefit. But the data are not the actual problem, and cutting researchers’ access to data is not the solution. Doing so would also deprive society of the benefits of social media analysis.

Fortunately, there is a way to resolve this dilemma. Anonymization of data can keep people’s individual privacy intact, while giving researchers access to collective data that can yield important insights.

There’s even a strong model for how to strike that balance efficiently: the U.S. Census Bureau. For decades, that government agency has collected extremely personal data from households all across the country: ages, employment status, income levels, Social Security numbers and political affiliations. The results it publishes are very rich, but also not traceable to any individual.

It often is technically possible to reverse anonymity protections on data, using multiple pieces of anonymized information to identify the person they all relate to. The Census Bureau takes steps to prevent this.

For instance, when members of the public access census data, the Census Bureau restricts information that is likely to identify specific individuals, such as reporting there is just one person in a community with a particularly high- or low-income level.

For researchers the process is somewhat different, but provides significant protections both in law and in practice. Scholars have to pass the Census Bureau’s vetting process to make sure they are legitimate, and must undergo training about what they can and cannot do with the data. The penalties for violating the rules include not only being barred from using census data in the future, but also civil fines and even criminal prosecution.

Even then, what researchers get comes without a name or Social Security number. Instead, the Census Bureau uses what it calls “protected identification keys,” a random number that replaces data that would allow researchers to identify individuals.

Each person’s data is labeled with his or her own identification key, allowing researchers to link information of different types. For instance, a researcher wanting to track how long it takes people to complete a college degree could follow individuals’ education levels over time, thanks to the identification keys.

Social media platforms could implement a similar anonymization process instead of increasing hurdles – and cost – to access their data…(More)” .

User Perceptions of Privacy in Smart Homes


Paper by Serena Zheng, Marshini Chetty, and Nick Feamster: “Despite the increasing presence of Internet of Things (IoT) devices inside the home, we know little about how users feel about their privacy living with Internet-connected devices that continuously monitor and collect data in their homes. To gain insight into this state of affairs, we conducted eleven semi-structured interviews with owners of smart homes, investigating privacy values and expectations.

In this paper, we present the findings that emerged from our study: First, users prioritize the convenience and connectedness of their smart homes, and these values dictate their privacy opinions and behaviors. Second, user opinions about who should have access to their smart home data depend on the perceived benefit. Third, users assume their privacy is protected because they trust the manufacturers of their IoT devices. Our findings bring up several implications for IoT privacy, which include the need for design for privacy and evaluation standards….(More)”.

Data Detectives: More data and surveillance are transforming justice systems


Special issue by The Economist: “…the relationship between information and crime has changed in two ways, one absolute, one relative. In absolute terms, people generate more searchable information than they used to. Smartphones passively track and record where people go, who they talk to and for how long; their apps reveal subtler personal information, such as their political views, what they like to read and watch and how they spend their money. As more appliances and accoutrements become networked, so the amount of information people inadvertently create will continue to grow.

To track a suspect’s movements and conversations, police chiefs no longer need to allocate dozens of officers for round-the-clock stakeouts. They just need to seize the suspect’s phone and bypass its encryption. If he drives, police cars, streetlights and car parks equipped with automatic number-plate readers (ANPRs, known in America as automatic licence-plate readers or ALPRs) can track all his movements.

In relative terms, the gap between information technology and policy gapes ever wider. Most privacy laws were written for the age of postal services and fixed-line telephones. Courts give citizens protection from governments entering their homes or rifling through their personal papers. The law on people’s digital presence is less clear. In most liberal countries, police still must convince a judge to let them eavesdrop on phone calls.

But mobile-phone “metadata”—not the actual conversations, but data about who was called and when—enjoy less stringent protections. In 2006 the European Union issued a directive requiring telecom firms to retain customer metadata for up to two years for use in potential crime investigations. The European Court of Justice invalidated that law in 2014, after numerous countries challenged it in court, saying that it interfered with “the fundamental rights to respect for private life”. Today data-retention laws vary widely in Europe. Laws, and their interpretation, are changing in America, too. A case before the Supreme Court will determine whether police need a warrant to obtain metadata.

Less shoe leather

If you drive in a city anywhere in the developed world, ANPRs are almost certainly tracking you. This is not illegal. Police do not generally need a warrant to follow someone in public. However, people not suspected of committing a crime do not usually expect authorities to amass terabytes of data on every person they have met and every business visited. ANPRs offer a lot of that.

To some people, this may not matter. Toplines, an Israeli ANPR firm, wants to add voice- and facial-recognition to its Bluetooth-enabled cameras, and install them on private vehicles, turning every car on the road into a “mobile broadcast system” that collects and transmits data to a control centre that security forces can access. Its founder posits that insurance-rate discounts could incentivise drivers to become, in effect, freelance roving crime-detection units for the police, subjecting unwitting citizens to constant surveillance. In answer to a question about the implications of such data for privacy, a Toplines employee shrugs: Facebook and WhatsApp are spying on us anyway, he says. If the stream of information keeps people safer, who could object? “Privacy is dead.”

It is not. But this dangerously complacent attitude brings its demise ever closer….(More)”.

Data Pollution


Paper by Omri Ben-Shahar: “Digital information is the fuel of the new economy. But like the old economy’s carbon fuel, it also pollutes. Harmful “data emissions” are leaked into the digital ecosystem, disrupting social institutions and public interests. This article develops a novel framework- data pollution-to rethink the harms the data economy creates and the way they have to be regulated. It argues that social intervention should focus on the external harms from collection and misuse of personal data. The article challenges the hegemony of the prevailing view-that the harm from digital data enterprise is to the privacy of the people whose information is used. It claims that a central problem has been largely ignored: how the information individuals give affects others, and how it undermines and degrade public goods and interests. The data pollution metaphor offers a novel perspective why existing regulatory tools-torts, contracts, and disclosure law-are ineffective, mirroring their historical futility in curbing the external social harms from environmental pollution. The data pollution framework also opens up a rich roadmap for new regulatory devices-an environmental law for dataprotection-that focus on controlling these external effects. The article examines whether the general tools society has long used to control industrial pollution-production restrictions, carbon tax, and emissions liability-could be adapted to govern data pollution….(More)”.

The Open Revolution: Rewriting the rules of the information age


Book by Rufus Pollock: “Forget everything you think you know about the digital age. It’s not about privacy, surveillance, AI or blockchain—it’s about ownership. Because, in a digital age, who owns information controls the future.

In this urgent and provocative book, Rufus Pollock shows how today’s “Closed” digital economy is the source of problems ranging from growing inequality, to unaffordable medicines, to the power of a handful of tech monopolies to control how we think and vote. He proposes a solution that charts a path to a more equitable, innovative and profitable future for all….(More)”.