An AI That Reads Privacy Policies So That You Don’t Have To


Andy Greenberg at Wired: “…Today, researchers at Switzerland’s Federal Institute of Technology at Lausanne (EPFL), the University of Wisconsin and the University of Michigan announced the release of Polisis—short for “privacy policy analysis”—a new website and browser extension that uses their machine-learning-trained app to automatically read and make sense of any online service’s privacy policy, so you don’t have to.

In about 30 seconds, Polisis can read a privacy policy it’s never seen before and extract a readable summary, displayed in a graphic flow chart, of what kind of data a service collects, where that data could be sent, and whether a user can opt out of that collection or sharing. Polisis’ creators have also built a chat interface they call Pribot that’s designed to answer questions about any privacy policy, intended as a sort of privacy-focused paralegal advisor. Together, the researchers hope those tools can unlock the secrets of how tech firms use your data that have long been hidden in plain sight….

Polisis isn’t actually the first attempt to use machine learning to pull human-readable information out of privacy policies. Both Carnegie Mellon University and Columbia have made their own attempts at similar projects in recent years, points out NYU Law Professor Florencia Marotta-Wurgler, who has focused her own research on user interactions with terms of service contracts online. (One of her own studies showed that only .07 percent of users actually click on a terms of service link before clicking “agree.”) The Usable Privacy Policy Project, a collaboration that includes both Columbia and CMU, released its own automated tool to annotate privacy policies just last month. But Marotta-Wurgler notes that Polisis’ visual and chat-bot interfaces haven’t been tried before, and says the latest project is also more detailed in how it defines different kinds of data. “The granularity is really nice,” Marotta-Wurgler says. “It’s a way of communicating this information that’s more interactive.”…(More)”.

Building Trust in Data and Statistics


Shaida Badiee at UN World Data Forum: …What do we want for a 2030 data ecosystem?

Hope to achieve: A world where data are part of the DNA and culture of decision-making, used by all and valued as an important public good. A world where citizens trust the systems that produce data and have the skills and means to use and verify their quality and accuracy. A world where there are safeguards in place to protect privacy, while bringing the benefits of open data to all. In this world, countries value their national statistical systems, which are working independently with trusted partners in the public and private sectors and citizens to continuously meet the changing and expanding demands from data users and policy makers. Private sector data generators are generously sharing their data with public sector. And gaps in data are closing, making the dream of “leaving no one behind” come true, with SDG goals on the path to being met by 2030.

Hope to avoid: A world where large corporations control the bulk of national and international data and statistics with only limited sharing with the public sector, academics, and citizens. The culture of every man for himself and who pays, wins, dominates data sharing practices. National statistical systems are under-resourced and under-valued, with low trust from users, further weakening them and undermining their independence from political interference and their ability to control quality. The divide between those who have and those who do not have access, skills, and the ability to use data for decision-making and policy has widened. Data systems and their promise to count the uncounted and “leave no one behind” are falling behind due to low capacity and poor standards and institutions, and the hope of the 2030 agenda is fading.

With this vision in mind, are we on the right path? An optimist would say we are closer to the data ecosystem that we want to achieve. However, there are also some examples of movement in the wrong direction. There is no magic wand to make our wish come true, but a powerful enabler would be building trust in data and statistics. Therefore, this should be included as a goal in all our data strategies and action plans.

Here are some important building blocks underlying trust in data and statistics:

  1. Building strong organizational infrastructure, governance, and partnerships;
  2. Following sound data standards and principles for production, sharing, interoperability, and dissemination; and
  3. Addressing the last mile in the data value chain to meet users’ needs, create value with data, and ensure meaningful impacts…(More)”.

Self-Tracking: Empirical and Philosophical Investigations


Book edited by Btihaj Ajana: “…provides an empirical and philosophical investigation of self-tracking practices. In recent years, there has been an explosion of apps and devices that enable the data capturing and monitoring of everyday activities, behaviours and habits. Encouraged by movements such as the Quantified Self, a growing number of people are embracing this culture of quantification and tracking in the spirit of improving their health and wellbeing.
The aim of this book is to enhance understanding of this fast-growing trend, bringing together scholars who are working at the forefront of the critical study of self-tracking practices. Each chapter provides a different conceptual lens through which one can examine these practices, while grounding the discussion in relevant empirical examples.
From phenomenology to discourse analysis, from questions of identity, privacy and agency to issues of surveillance and tracking at the workplace, this edited collection takes on a wide, and yet focused, approach to the timely topic of self-tracking. It constitutes a useful companion for scholars, students and everyday users interested in the Quantified Self phenomenon…(More)”.

A Really Bad Blockchain Idea: Digital Identity Cards for Rohingya Refugees


Wayan Vota at ICTworks: “The Rohingya Project claims to be a grassroots initiative that will empower Rohingya refugees with a blockchain-leveraged financial ecosystem tied to digital identity cards….

What Could Possibly Go Wrong?

Concerns about Rohingya data collection are not new, so Linda Raftree‘s Facebook post about blockchain for biometrics started a spirited discussion on this escalation of techno-utopia. Several people put forth great points about the Rohingya Project’s potential failings. For me, there were four key questions originating in the discussion that we should all be debating:

1. Who Determines Ethnicity?

Ethnicity isn’t a scientific way to categorize humans. Ethnic groups are based on human constructs such as common ancestry, language, society, culture, or nationality. Who are the Rohingya Project to be the ones determining who is Rohingya or not? And what is this rigorous assessment they have that will do what science cannot?

Might it be better not to perpetuate the very divisions that cause these issues? Or at the very least, let people self-determine their own ethnicity.

2. Why Digitally Identify Refugees?

Let’s say that we could group a people based on objective metrics. Should we? Especially if that group is persecuted where it currently lives and in many of its surrounding countries? Wouldn’t making a list of who is persecuted be a handy reference for those who seek to persecute more?

Instead, shouldn’t we focus on changing the mindset of the persecutors and stop the persecution?

3. Why Blockchain for Biometrics?

How could linking a highly persecuted people’s biometric information, such as fingerprints, iris scans, and photographs, to a public, universal, and immutable distributed ledger be a good thing?

Might it be highly irresponsible to digitize all that information? Couldn’t that data be used by nefarious actors to perpetuate new and worse exploitation of Rohingya? India has already lost Aadhaar data and the Equafax lost Americans’ data. How will the small, lightly funded Rohingya Project do better?

Could it be possible that old-fashioned paper forms are a better solution than digital identity cards? Maybe laminate them for greater durability, but paper identity cards can be hidden, even destroyed if needed, to conceal information that could be used against the owner.

4. Why Experiment on the Powerless?

Rohingya refugees already suffer from massive power imbalances, and now they’ll be asked to give up their digital privacy, and use experimental technology, as part of an NGO’s experiment, in order to get needed services.

Its not like they’ll have the agency to say no. They are homeless, often penniless refugees, who will probably have no realistic way to opt-out of digital identity cards, even if they don’t want to be experimented on while they flee persecution….(More)”

Artificial intelligence and privacy


Report by the The Norwegian Data Protection Authority (DPA): “…If people cannot trust that information about them is being handled properly, it may limit their willingness to share information – for example with their doctor, or on social media. If we find ourselves in a situation in which sections of the population refuse to share information because they feel that their personal integrity is being violated, we will be faced with major challenges to our freedom of speech and to people’s trust in the authorities.

A refusal to share personal information will also represent a considerable challenge with regard to the commercial use of such data in sectors such as the media, retail trade and finance services.

About the report

This report elaborates on the legal opinions and the technologies described in the 2014 report «Big Data – privacy principles under pressure». In this report we will provide greater technical detail in describing artificial intelligence (AI), while also taking a closer look at four relevant AI challenges associated with the data protection principles embodied in the GDPR:

  • Fairness and discrimination
  • Purpose limitation
  • Data minimisation
  • Transparency and the right to information

This represents a selection of data protection concerns that in our opinion are most relevance for the use of AI today.

The target group for this report consists of people who work with, or who for other reasons are interested in, artificial intelligence. We hope that engineers, social scientists, lawyers and other specialists will find this report useful….(More) (Download Report)”.

A Roadmap to a Nationwide Data Infrastructure for Evidence-Based Policymaking


Introduction by Julia Lane and Andrew Reamer of a Special Issue of the Annals of the American Academy of Political and Social Science: “Throughout the United States, there is broad interest in expanding the nation’s capacity to design and implement public policy based on solid evidence. That interest has been stimulated by the new types of data that are available that can transform the way in which policy is designed and implemented. Yet progress in making use of sensitive data has been hindered by the legal, technical, and operational obstacles to access for research and evaluation. Progress has also been hindered by an almost exclusive focus on the interest and needs of the data users, rather than the interest and needs of the data providers. In addition, data stewardship is largely artisanal in nature.

There are very real consequences that result from lack of action. State and local governments are often hampered in their capacity to effectively mount and learn from innovative efforts. Although jurisdictions often have treasure troves of data from existing programs, the data are stove-piped, underused, and poorly maintained. The experience reported by one large city public health commissioner is too common: “We commissioners meet periodically to discuss specific childhood deaths in the city. In most cases, we each have a thick file on the child or family. But the only time we compare notes is after the child is dead.”1 In reality, most localities lack the technical, analytical, staffing, and legal capacity to make effective use of existing and emerging resources.

It is our sense that fundamental changes are necessary and a new approach must be taken to building data infrastructures. In particular,

  1. Privacy and confidentiality issues must be addressed at the beginning—not added as an afterthought.
  2. Data providers must be involved as key stakeholders throughout the design process.
  3. Workforce capacity must be developed at all levels.
  4. The scholarly community must be engaged to identify the value to research and policy….

To develop a roadmap for the creation of such an infrastructure, the Bill and Melinda Gates Foundation, together with the Laura and John Arnold Foundation, hosted a day-long workshop of more than sixty experts to discuss the findings of twelve commissioned papers and their implications for action. This volume of The ANNALS showcases those twelve articles. The workshop papers were grouped into three thematic areas: privacy and confidentiality, the views of data producers, and comprehensive strategies that have been used to build data infrastructures in other contexts. The authors and the attendees included computer scientists, social scientists, practitioners, and data producers.

This introductory article places the research in both an historical and a current context. It also provides a framework for understanding the contribution of the twelve articles….(More)”.

How the Data That Internet Companies Collect Can Be Used for the Public Good


Stefaan G. Verhulst and Andrew Young at Harvard Business Review: “…In particular, the vast streams of data generated through social media platforms, when analyzed responsibly, can offer insights into societal patterns and behaviors. These types of behaviors are hard to generate with existing social science methods. All this information poses its own problems, of complexity and noise, of risks to privacy and security, but it also represents tremendous potential for mobilizing new forms of intelligence.

In a recent report, we examine ways to harness this potential while limiting and addressing the challenges. Developed in collaboration with Facebook, the report seeks to understand how public and private organizations can join forces to use social media data — through data collaboratives — to mitigate and perhaps solve some our most intractable policy dilemmas.

Data Collaboratives: Public-Private Partnerships for Our Data Age 

For all of data’s potential to address public challenges, most data generated today is collected by the private sector. Typically ensconced in corporate databases, and tightly held in order to maintain competitive advantage, this data contains tremendous possible insights and avenues for policy innovation. But because the analytical expertise brought to bear on it is narrow, and limited by private ownership and access restrictions, its vast potential often goes untapped.

Data collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas , including the private sector, government, and civil society , can come together to exchange data and pool analytical expertise in order to create new public value. While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains….

Professionalizing the Responsible Use of Private Data for Public Good

For all its promise, the practice of data collaboratives remains ad hoc and limited. In part, this is a result of the lack of a well-defined, professionalized concept of data stewardship within corporations. Today, each attempt to establish a cross-sector partnership built on the analysis of social media data requires significant and time-consuming efforts, and businesses rarely have personnel tasked with undertaking such efforts and making relevant decisions.

As a consequence, the process of establishing data collaboratives and leveraging privately held data for evidence-based policy making and service delivery is onerous, generally one-off, not informed by best practices or any shared knowledge base, and prone to dissolution when the champions involved move on to other functions.

By establishing data stewardship as a corporate function, recognized within corporations as a valued responsibility, and by creating the methods and tools needed for responsible data-sharing, the practice of data collaboratives can become regularized, predictable, and de-risked.

If early efforts toward this end — from initiatives such as Facebook’s Data for Good efforts in the social media space and MasterCard’s Data Philanthropy approach around finance data — are meaningfully scaled and expanded, data stewards across the private sector can act as change agents responsible for determining what data to share and when, how to protect data, and how to act on insights gathered from the data.

Still, many companies (and others) continue to balk at the prospect of sharing “their” data, which is an understandable response given the reflex to guard corporate interests. But our research has indicated that many benefits can accrue not only to data recipients but also to those who share it. Data collaboration is not a zero-sum game.

With support from the Hewlett Foundation, we are embarking on a two-year project toward professionalizing data stewardship (and the use of data collaboratives) and establishing well-defined data responsibility approaches. We invite others to join us in working to transform this practice into a widespread, impactful means of leveraging private-sector assets, including social media data, to create positive public-sector outcomes around the world….(More)”.

 

Open Data Risk Assessment


Report by the Future of Privacy Forum: “The transparency goals of the open data movement serve important social, economic, and democratic functions in cities like Seattle. At the same time, some municipal datasets about the city and its citizens’ activities carry inherent risks to individual privacy when shared publicly. In 2016, the City of Seattle declared in its Open Data Policy that the city’s data would be “open by preference,” except when doing so may affect individual privacy. To ensure its Open Data Program effectively protects individuals, Seattle committed to performing an annual risk assessment and tasked the Future of Privacy Forum (FPF) with creating and deploying an initial privacy risk assessment methodology for open data.

This Report provides tools and guidance to the City of Seattle and other municipalities navigating the complex policy, operational, technical, organizational, and ethical standards that support privacyprotective open data programs. Although there is a growing body of research regarding open data privacy, open data managers and departmental data owners need to be able to employ a standardized methodology for assessing the privacy risks and benefits of particular datasets internally, without access to a bevy of expert statisticians, privacy lawyers, or philosophers. By optimizing its internal processes and procedures, developing and investing in advanced statistical disclosure control strategies, and following a flexible, risk-based assessment process, the City of Seattle – and other municipalities – can build mature open data programs that maximize the utility and openness of civic data while minimizing privacy risks to individuals and addressing community concerns about ethical challenges, fairness, and equity.

This Report first describes inherent privacy risks in an open data landscape, with an emphasis on potential harms related to re-identification, data quality, and fairness. To address these risks, the Report includes a Model Open Data Benefit-Risk Analysis (“Model Analysis”). The Model Analysis evaluates the types of data contained in a proposed open dataset, the potential benefits – and concomitant risks – of releasing the dataset publicly, and strategies for effective de-identification and risk mitigation. This holistic assessment guides city officials to determine whether to release the dataset openly, in a limited access environment, or to withhold it from publication (absent countervailing public policy considerations). …(More)”.

They Are Watching You—and Everything Else on the Planet


Cover article by Robert Draper for Special Issue of the National Geographic: “Technology and our increasing demand for security have put us all under surveillance. Is privacy becoming just a memory?…

In 1949, amid the specter of European authoritarianism, the British novelist George Orwell published his dystopian masterpiece 1984, with its grim admonition: “Big Brother is watching you.” As unsettling as this notion may have been, “watching” was a quaintly circumscribed undertaking back then. That very year, 1949, an American company released the first commercially available CCTV system. Two years later, in 1951, Kodak introduced its Brownie portable movie camera to an awestruck public.

Today more than 2.5 trillion images are shared or stored on the Internet annually—to say nothing of the billions more photographs and videos people keep to themselves. By 2020, one telecommunications company estimates, 6.1 billion people will have phones with picture-taking capabilities. Meanwhile, in a single year an estimated 106 million new surveillance cameras are sold. More than three million ATMs around the planet stare back at their customers. Tens of thousands of cameras known as automatic number plate recognition devices, or ANPRs, hover over roadways—to catch speeding motorists or parking violators but also, in the case of the United Kingdom, to track the comings and goings of suspected criminals. The untallied but growing number of people wearing body cameras now includes not just police but also hospital workers and others who aren’t law enforcement officers. Proliferating as well are personal monitoring devices—dash cams, cyclist helmet cameras to record collisions, doorbells equipped with lenses to catch package thieves—that are fast becoming a part of many a city dweller’s everyday arsenal. Even less quantifiable, but far more vexing, are the billions of images of unsuspecting citizens captured by facial-recognition technology and stored in law enforcement and private-sector databases over which our control is practically nonexistent.

Those are merely the “watching” devices that we’re capable of seeing. Presently the skies are cluttered with drones—2.5 million of which were purchased in 2016 by American hobbyists and businesses. That figure doesn’t include the fleet of unmanned aerial vehicles used by the U.S. government not only to bomb terrorists in Yemen but also to help stop illegal immigrants entering from Mexico, monitor hurricane flooding in Texas, and catch cattle thieves in North Dakota. Nor does it include the many thousands of airborne spying devices employed by other countries—among them Russia, China, Iran, and North Korea.

We’re being watched from the heavens as well. More than 1,700 satellites monitor our planet. From a distance of about 300 miles, some of them can discern a herd of buffalo or the stages of a forest fire. From outer space, a camera clicks and a detailed image of the block where we work can be acquired by a total stranger….

This is—to lift the title from another British futurist, Aldous Huxley—our brave new world. That we can see it coming is cold comfort since, as Carnegie Mellon University professor of information technology Alessandro Acquisti says, “in the cat-and-mouse game of privacy protection, the data subject is always the weaker side of the game.” Simply submitting to the game is a dispiriting proposition. But to actively seek to protect one’s privacy can be even more demoralizing. University of Texas American studies professor Randolph Lewis writes in his new book, Under Surveillance: Being Watched in Modern America, “Surveillance is often exhausting to those who really feel its undertow: it overwhelms with its constant badgering, its omnipresent mysteries, its endless tabulations of movements, purchases, potentialities.”

The desire for privacy, Acquisti says, “is a universal trait among humans, across cultures and across time. You find evidence of it in ancient Rome, ancient Greece, in the Bible, in the Quran. What’s worrisome is that if all of us at an individual level suffer from the loss of privacy, society as a whole may realize its value only after we’ve lost it for good.”…(More)”.

Extracting crowd intelligence from pervasive and social big data


Introduction by Leye Wang, Vincent Gauthier, Guanling Chen and Luis Moreira-Matias of Special Issue of the Journal of Ambient Intelligence and Humanized Computing: “With the prevalence of ubiquitous computing devices (smartphones, wearable devices, etc.) and social network services (Facebook, Twitter, etc.), humans are generating massive digital traces continuously in their daily life. Considering the invaluable crowd intelligence residing in these pervasive and social big data, a spectrum of opportunities is emerging to enable promising smart applications for easing individual life, increasing company profit, as well as facilitating city development. However, the nature of big data also poses fundamental challenges on the techniques and applications relying on the pervasive and social big data from multiple perspectives such as algorithm effectiveness, computation speed, energy efficiency, user privacy, server security, data heterogeneity and system scalability. This special issue presents the state-of-the-art research achievements in addressing these challenges. After the rigorous review process of reviewers and guest editors, eight papers were accepted as follows.

The first paper “Automated recognition of hypertension through overnight continuous HRV monitoring” by Ni et al. proposes a non-invasive way to differentiate hypertension patients from healthy people with the pervasive sensors such as a waist belt. To this end, the authors train a machine learning model based on the heart rate data sensed from waists worn by a crowd of people, and the experiments show that the detection accuracy is around 93%.

The second paper “The workforce analyzer: group discovery among LinkedIn public profiles” by Dai et al. describes two users’ group discovery methods among LinkedIn public profiles. One is based on K-means and another is based on SVM. The authors contrast results of both methods and provide insights about the trending professional orientations of the workforce from an online perspective.

The third paper “Tweet and followee personalized recommendations based on knowledge graphs” by Pla Karidi et al. present an efficient semantic recommendation method that helps users filter the Twitter stream for interesting content. The foundation of this method is a knowledge graph that can represent all user topics of interest as a variety of concepts, objects, events, persons, entities, locations and the relations between them. An important advantage of the authors’ method is that it reduces the effects of problems such as over-recommendation and over-specialization.

The fourth paper “CrowdTravel: scenic spot profiling by using heterogeneous crowdsourced data” by Guo et al. proposes CrowdTravel, a multi-source social media data fusion approach for multi-aspect tourism information perception, which can provide travelling assistance for tourists by crowd intelligence mining. Experiments over a dataset of several popular scenic spots in Beijing and Xi’an, China, indicate that the authors’ approach attains fine-grained characterization for the scenic spots and delivers excellent performance.

The fifth paper “Internet of Things based activity surveillance of defence personnel” by Bhatia et al. presents a comprehensive IoT-based framework for analyzing national integrity of defence personnel with consideration to his/her daily activities. Specifically, Integrity Index Value is defined for every defence personnel based on different social engagements, and activities for detecting the vulnerability to national security. In addition to this, a probabilistic decision tree based automated decision making is presented to aid defence officials in analyzing various activities of a defence personnel for his/her integrity assessment.

The sixth paper “Recommending property with short days-on-market for estate agency” by Mou et al. proposes an estate with short days-on-market appraisal framework to automatically recommend those estates using transaction data and profile information crawled from websites. Both the spatial and temporal characteristics of an estate are integrated into the framework. The results show that the proposed framework can estimate accurately about 78% estates.

The seventh paper “An anonymous data reporting strategy with ensuring incentives for mobile crowd-sensing” by Li et al. proposes a system and a strategy to ensure anonymous data reporting while ensuring incentives simultaneously. The proposed protocol is arranged in five stages that mainly leverage three concepts: (1) slot reservation based on shuffle, (2) data submission based on bulk transfer and multi-player dc-nets, and (3) incentive mechanism based on blind signature.

The last paper “Semantic place prediction from crowd-sensed mobile phone data” by Celik et al. semantically classifes places visited by smart phone users utilizing the data collected from sensors and wireless interfaces available on the phones as well as phone usage patterns, such as battery level, and time-related information, with machine learning algorithms. For this study, the authors collect data from 15 participants at Galatasaray University for 1 month, and try different classification algorithms such as decision tree, random forest, k-nearest neighbour, naive Bayes, and multi-layer perceptron….(More)”.