An AI That Reads Privacy Policies So That You Don’t Have To


Andy Greenberg at Wired: “…Today, researchers at Switzerland’s Federal Institute of Technology at Lausanne (EPFL), the University of Wisconsin and the University of Michigan announced the release of Polisis—short for “privacy policy analysis”—a new website and browser extension that uses their machine-learning-trained app to automatically read and make sense of any online service’s privacy policy, so you don’t have to.

In about 30 seconds, Polisis can read a privacy policy it’s never seen before and extract a readable summary, displayed in a graphic flow chart, of what kind of data a service collects, where that data could be sent, and whether a user can opt out of that collection or sharing. Polisis’ creators have also built a chat interface they call Pribot that’s designed to answer questions about any privacy policy, intended as a sort of privacy-focused paralegal advisor. Together, the researchers hope those tools can unlock the secrets of how tech firms use your data that have long been hidden in plain sight….

Polisis isn’t actually the first attempt to use machine learning to pull human-readable information out of privacy policies. Both Carnegie Mellon University and Columbia have made their own attempts at similar projects in recent years, points out NYU Law Professor Florencia Marotta-Wurgler, who has focused her own research on user interactions with terms of service contracts online. (One of her own studies showed that only .07 percent of users actually click on a terms of service link before clicking “agree.”) The Usable Privacy Policy Project, a collaboration that includes both Columbia and CMU, released its own automated tool to annotate privacy policies just last month. But Marotta-Wurgler notes that Polisis’ visual and chat-bot interfaces haven’t been tried before, and says the latest project is also more detailed in how it defines different kinds of data. “The granularity is really nice,” Marotta-Wurgler says. “It’s a way of communicating this information that’s more interactive.”…(More)”.

How AI Could Help the Public Sector


Emma Martinho-Truswell in the Harvard Business Review: “A public school teacher grading papers faster is a small example of the wide-ranging benefits that artificial intelligence could bring to the public sector. A.I could be used to make government agencies more efficient, to improve the job satisfaction of public servants, and to increase the quality of services offered. Talent and motivation are wasted doing routine tasks when they could be doing more creative ones.

Applications of artificial intelligence to the public sector are broad and growing, with early experiments taking place around the world. In addition to education, public servants are using AI to help them make welfare payments and immigration decisions, detect fraud, plan new infrastructure projects, answer citizen queries, adjudicate bail hearings, triage health care cases, and establish drone paths.  The decisions we are making now will shape the impact of artificial intelligence on these and other government functions. Which tasks will be handed over to machines? And how should governments spend the labor time saved by artificial intelligence?

So far, the most promising applications of artificial intelligence use machine learning, in which a computer program learns and improves its own answers to a question by creating and iterating algorithms from a collection of data. This data is often in enormous quantities and from many sources, and a machine learning algorithm can find new connections among data that humans might not have expected. IBM’s Watson, for example, is a treatment recommendation-bot, sometimes finding treatments that human doctors might not have considered or known about.

Machine learning program may be better, cheaper, faster, or more accurate than humans at tasks that involve lots of data, complicated calculations, or repetitive tasks with clear rules. Those in public service, and in many other big organizations, may recognize part of their job in that description. The very fact that government workers are often following a set of rules — a policy or set of procedures — already presents many opportunities for automation.

To be useful, a machine learning program does not need to be better than a human in every case. In my work, we expect that much of the “low hanging fruit” of government use of machine learning will be as a first line of analysis or decision-making. Human judgment will then be critical to interpret results, manage harder cases, or hear appeals.

When the work of public servants can be done in less time, a government might reduce its staff numbers, and return money saved to taxpayers — and I am sure that some governments will pursue that option. But it’s not necessarily the one I would recommend. Governments could instead choose to invest in the quality of its services. They can re-employ workers’ time towards more rewarding work that requires lateral thinking, empathy, and creativity — all things at which humans continue to outperform even the most sophisticated AI program….(More)”.

Improving refugee integration through data-driven algorithmic assignment


Kirk Bansak, et al in Science Magazine: “Developed democracies are settling an increased number of refugees, many of whom face challenges integrating into host societies. We developed a flexible data-driven algorithm that assigns refugees across resettlement locations to improve integration outcomes. The algorithm uses a combination of supervised machine learning and optimal matching to discover and leverage synergies between refugee characteristics and resettlement sites.

The algorithm was tested on historical registry data from two countries with different assignment regimes and refugee populations, the United States and Switzerland. Our approach led to gains of roughly 40 to 70%, on average, in refugees’ employment outcomes relative to current assignment practices. This approach can provide governments with a practical and cost-efficient policy tool that can be immediately implemented within existing institutional structures….(More)”.

Extracting crowd intelligence from pervasive and social big data


Introduction by Leye Wang, Vincent Gauthier, Guanling Chen and Luis Moreira-Matias of Special Issue of the Journal of Ambient Intelligence and Humanized Computing: “With the prevalence of ubiquitous computing devices (smartphones, wearable devices, etc.) and social network services (Facebook, Twitter, etc.), humans are generating massive digital traces continuously in their daily life. Considering the invaluable crowd intelligence residing in these pervasive and social big data, a spectrum of opportunities is emerging to enable promising smart applications for easing individual life, increasing company profit, as well as facilitating city development. However, the nature of big data also poses fundamental challenges on the techniques and applications relying on the pervasive and social big data from multiple perspectives such as algorithm effectiveness, computation speed, energy efficiency, user privacy, server security, data heterogeneity and system scalability. This special issue presents the state-of-the-art research achievements in addressing these challenges. After the rigorous review process of reviewers and guest editors, eight papers were accepted as follows.

The first paper “Automated recognition of hypertension through overnight continuous HRV monitoring” by Ni et al. proposes a non-invasive way to differentiate hypertension patients from healthy people with the pervasive sensors such as a waist belt. To this end, the authors train a machine learning model based on the heart rate data sensed from waists worn by a crowd of people, and the experiments show that the detection accuracy is around 93%.

The second paper “The workforce analyzer: group discovery among LinkedIn public profiles” by Dai et al. describes two users’ group discovery methods among LinkedIn public profiles. One is based on K-means and another is based on SVM. The authors contrast results of both methods and provide insights about the trending professional orientations of the workforce from an online perspective.

The third paper “Tweet and followee personalized recommendations based on knowledge graphs” by Pla Karidi et al. present an efficient semantic recommendation method that helps users filter the Twitter stream for interesting content. The foundation of this method is a knowledge graph that can represent all user topics of interest as a variety of concepts, objects, events, persons, entities, locations and the relations between them. An important advantage of the authors’ method is that it reduces the effects of problems such as over-recommendation and over-specialization.

The fourth paper “CrowdTravel: scenic spot profiling by using heterogeneous crowdsourced data” by Guo et al. proposes CrowdTravel, a multi-source social media data fusion approach for multi-aspect tourism information perception, which can provide travelling assistance for tourists by crowd intelligence mining. Experiments over a dataset of several popular scenic spots in Beijing and Xi’an, China, indicate that the authors’ approach attains fine-grained characterization for the scenic spots and delivers excellent performance.

The fifth paper “Internet of Things based activity surveillance of defence personnel” by Bhatia et al. presents a comprehensive IoT-based framework for analyzing national integrity of defence personnel with consideration to his/her daily activities. Specifically, Integrity Index Value is defined for every defence personnel based on different social engagements, and activities for detecting the vulnerability to national security. In addition to this, a probabilistic decision tree based automated decision making is presented to aid defence officials in analyzing various activities of a defence personnel for his/her integrity assessment.

The sixth paper “Recommending property with short days-on-market for estate agency” by Mou et al. proposes an estate with short days-on-market appraisal framework to automatically recommend those estates using transaction data and profile information crawled from websites. Both the spatial and temporal characteristics of an estate are integrated into the framework. The results show that the proposed framework can estimate accurately about 78% estates.

The seventh paper “An anonymous data reporting strategy with ensuring incentives for mobile crowd-sensing” by Li et al. proposes a system and a strategy to ensure anonymous data reporting while ensuring incentives simultaneously. The proposed protocol is arranged in five stages that mainly leverage three concepts: (1) slot reservation based on shuffle, (2) data submission based on bulk transfer and multi-player dc-nets, and (3) incentive mechanism based on blind signature.

The last paper “Semantic place prediction from crowd-sensed mobile phone data” by Celik et al. semantically classifes places visited by smart phone users utilizing the data collected from sensors and wireless interfaces available on the phones as well as phone usage patterns, such as battery level, and time-related information, with machine learning algorithms. For this study, the authors collect data from 15 participants at Galatasaray University for 1 month, and try different classification algorithms such as decision tree, random forest, k-nearest neighbour, naive Bayes, and multi-layer perceptron….(More)”.

Advanced Design for the Public Sector


Essay by Kristofer Kelly-Frere & Jonathan Veale: “…It might surprise some, but it is now common for governments across Canada to employ in-house designers to work on very complex and public issues.

There are design teams giving shape to experiences, services, processes, programs, infrastructure and policies. The Alberta CoLab, the Ontario Digital Service, BC’s Government Digital Experience Division, the Canadian Digital Service, Calgary’s Civic Innovation YYC, and, in partnership with government,MaRS Solutions Lab stand out. The Government of Nova Scotia recently launched the NS CoLab. There are many, many more. Perhaps hundreds.

Design-thinking. Service Design. Systemic Design. Strategic Design. They are part of the same story. Connected by their ability to focus and shape a transformation of some kind. Each is an advanced form of design oriented directly at humanizing legacy systems — massive services built by a culture that increasingly appears out-of-sorts with our world. We don’t need a new design pantheon, we need a unifying force.

We have no shortage of systems that require reform. And no shortage of challenges. Among them, the inability to assemble a common understanding of the problems in the first place, and then a lack of agency over these unwieldy systems. We have fanatics and nativists who believe in simple, regressive and violent solutions. We have a social economy that elevates these marginal voices. We have well-vested interests who benefit from maintaining the status quo and who lack actionable migration paths to new models. The median public may no longer see themselves in liberal democracy. Populism and dogmatism is rampant. The government, in some spheres, is not credible or trusted.

The traditional designer’s niche is narrowing at the same time government itself is becoming fragile. It is already cliche to point out that private wealth and resources allow broad segments of the population to “opt out.” This is quite apparent at the municipal level where privatized sources of security, water, fire protection and even sidewalks effectively produce private shadow governments. Scaling up, the most wealthy may simply purchase residency or citizenship or invest in emerging nation states. Without re-invention this erosion will continue. At the same time artificial intelligence, machine learning and automation are already displacing frontline design and creative work. This is the opportunity: Building systems awareness and agency on the foundations of craft and empathy that are core to human centered design. Time is of the essence. Transitions between one era to the next are historically tumultuous times. Moreover, these changes proceed faster than expected and in unexpected directions….(More).

Big Data and medicine: a big deal?


V. Mayer-Schönberger and E. Ingelsson in the Journal of Internal Medicine: “Big Data promises huge benefits for medical research. Looking beyond superficial increases in the amount of data collected, we identify three key areas where Big Data differs from conventional analyses of data samples: (i) data are captured more comprehensively relative to the phenomenon under study; this reduces some bias but surfaces important trade-offs, such as between data quantity and data quality; (ii) data are often analysed using machine learning tools, such as neural networks rather than conventional statistical methods resulting in systems that over time capture insights implicit in data, but remain black boxes, rarely revealing causal connections; and (iii) the purpose of the analyses of data is no longer simply answering existing questions, but hinting at novel ones and generating promising new hypotheses. As a consequence, when performed right, Big Data analyses can accelerate research.

Because Big Data approaches differ so fundamentally from small data ones, research structures, processes and mindsets need to adjust. The latent value of data is being reaped through repeated reuse of data, which runs counter to existing practices not only regarding data privacy, but data management more generally. Consequently, we suggest a number of adjustments such as boards reviewing responsible data use, and incentives to facilitate comprehensive data sharing. As data’s role changes to a resource of insight, we also need to acknowledge the importance of collecting and making data available as a crucial part of our research endeavours, and reassess our formal processes from career advancement to treatment approval….(More)”.

Artificial intelligence and smart cities


Essay by Michael Batty at Urban Analytics and City Sciences: “…The notion of the smart city of course conjures up these images of such an automated future. Much of our thinking about this future, certainly in the more popular press, is about everything ranging from the latest App on our smart phones to driverless cars while somewhat deeper concerns are about efficiency gains due to the automation of services ranging from transit to the delivery of energy. There is no doubt that routine and repetitive processes – algorithms if you like – are improving at an exponential rate in terms of the data they can process and the speed of execution, faithfully following Moore’s Law.

Pattern recognition techniques that lie at the basis of machine learning are highly routinized iterative schemes where the pattern in question – be it a signature, a face, the environment around a driverless car and so on – is computed as an elaborate averaging procedure which takes a series of elements of the pattern and weights them in such a way that the pattern can be reproduced perfectly by the combinations of elements of the original pattern and the weights. This is in essence the way neural networks work. When one says that they ‘learn’ and that the current focus is on ‘deep learning’, all that is meant is that with complex patterns and environments, many layers of neurons (elements of the pattern) are defined and the iterative procedures are run until there is a convergence with the pattern that is to be explained. Such processes are iterative, additive and not much more than sophisticated averaging but using machines that can operate virtually at the speed of light and thus process vast volumes of big data. When these kinds of algorithm can be run in real time and many already can be, then there is the prospect of many kinds of routine behaviour being displaced. It is in this sense that AI might herald in an era of truly disruptive processes. This according to Brynjolfsson and McAfee is beginning to happen as we reach the second half of the chess board.

The real issue in terms of AI involves problems that are peculiarly human. Much of our work is highly routinized and many of our daily actions and decisions are based on relatively straightforward patterns of stimulus and response. The big questions involve the extent to which those of our behaviours which are not straightforward can be automated. In fact, although machines are able to beat human players in many board games and there is now the prospect of machines beating the very machines that were originally designed to play against humans, the real power of AI may well come from collaboratives of man and machine, working together, rather than ever more powerful machines working by themselves. In the last 10 years, some of my editorials have tracked what is happening in the real-time city – the smart city as it is popularly called – which has become key to many new initiatives in cities. In fact, cities – particularly big cities, world cities – have become the flavour of the month but the focus has not been on their long-term evolution but on how we use them on a minute by minute to week by week basis.

Many of the patterns that define the smart city on these short-term cycles can be predicted using AI largely because they are highly routinized but even for highly routine patterns, there are limits on the extent to which we can explain them and reproduce them. Much advancement in AI within the smart city will come from automation of the routine, such as the use of energy, the delivery of location-based services, transit using information being fed to operators and travellers in real time and so on. I think we will see some quite impressive advances in these areas in the next decade and beyond. But the key issue in urban planning is not just this short term but the long term and it is here that the prospects for AI are more problematic….(More)”.

AI System Sorts News Articles By Whether or Not They Contain Actual Information


Michael Byrne at Motherboard:”… in a larger sense it’s worth wondering to what degree the larger news feed is being diluted by news stories that are not “content dense.” That is, what’s the real ratio between signal and noise, objectively speaking? To start, we’d need a reasonably objective metric of content density and a reasonably objective mechanism for evaluating news stories in terms of that metric.

In a recent paper published in the Journal of Artificial Intelligence Research, computer scientists Ani Nenkova and Yinfei Yang, of Google and the University of Pennsylvania, respectively, describe a new machine learning approach to classifying written journalism according to a formalized idea of “content density.” With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles.

At a high level this works like most any other machine learning system. Start with a big batch of data—news articles, in this case—and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers….(More)”.

Big Data Challenge for Social Sciences: From Society and Opinion to Replications


Symposium Paper by Dominique Boullier: “When in 2007 Savage and Burrows pointed out ‘the coming crisis of empirical methods’, they were not expecting to be so right. Their paper however became a landmark, signifying the social sciences’ reaction to the tremendous shock triggered by digital methods. As they frankly acknowledge in a more recent paper, they did not even imagine the extent to which their prediction might become true, in an age of Big Data, where sources and models have to be revised in the light of extended computing power and radically innovative mathematical approaches.They signalled not just a debate about academic methods but also a momentum for ‘commercial sociology’ in which platforms acquire the capacity to add ‘another major nail in the coffin of academic sociology claims to jurisdiction over knowledge of the social’, because ‘research methods (are) an intrinsic feature of contemporary capitalist organisations’ (Burrows and Savage, 2014, p. 2). This need for a serious account of research methods is well tuned with the claims of Social Studies of Science that should be applied to the social sciences as well.

I would like to build on these insights and principles of Burrows and Savage to propose an historical and systematic account of quantification during the last century, following in the footsteps of Alain Desrosières, and in which we see Big Data and Machine Learning as a major shift in the way social science can be performed. And since, according to Burrows and Savage (2014, p. 5), ‘the use of new data sources involves a contestation over the social itself’, I will take the risk here of identifying and defining the entities that are supposed to encapsulate the social for each kind of method: beyond the reign of ‘society’ and ‘opinion’, I will point at the emergence of the ‘replications’ that are fabricated by digital platforms but are radically different from previous entities. This is a challenge to invent not only new methods but also a new process of reflexivity for societies, made available by new stakeholders (namely, the digital platforms) which transform reflexivity into reactivity (as operational quantifiers always tend to)….(More)”.

Blockchain: Unpacking the disruptive potential of blockchain technology for human development.


IDRC white paper: “In the scramble to harness new technologies to propel innovation around the world, artificial intelligence, robotics, machine learning, and blockchain technologies are being explored and deployed in a wide variety of contexts globally.

Although blockchain is one of the most hyped of these new technologies, it is also perhaps the least understood. Blockchain is the distributed ledger — a database that is shared across multiple sites or institutions to furnish a secure and transparent record of events occurring during the provision of a service or contract — that supports cryptocurrencies (digital assets designed to work as mediums of exchange).

Blockchain is now underpinning applications such as land registries and identity services, but as its popularity grows, its relevance in addressing socio-economic gaps and supporting development targets like the globally-recognized UN Sustainable Development Goals is critical to unpack. Moreover, for countries in the global South that want to be more than just end users or consumers, the complex infrastructure requirements and operating costs of blockchain could prove challenging. For the purposes of real development, we need to not only understand how blockchain is workable, but also who is able to harness it to foster social inclusion and promote democratic governance.

This white paper explores the potential of blockchain technology to support human development. It provides a non-technical overview, illustrates a range of applications, and offers a series of conclusions and recommendations for additional research and potential development programming….(More)”.