Artificial intelligence could identify gang crimes—and ignite an ethical firestorm


Matthew Hutson at Science: “When someone roughs up a pedestrian, robs a store, or kills in cold blood, police want to know whether the perpetrator was a gang member: Do they need to send in a special enforcement team? Should they expect a crime in retaliation? Now, a new algorithm is trying to automate the process of identifying gang crimes. But some scientists warn that far from reducing gang violence, the program could do the opposite by eroding trust in communities, or it could brand innocent people as gang members.

That has created some tensions. At a presentation of the new program this month, one audience member grew so upset he stormed out of the talk, and some of the creators of the program have been tight-lipped about how it could be used….

For years, scientists have been using computer algorithms to map criminal networks, or to guess where and when future crimes might take place, a practice known as predictive policing. But little work has been done on labeling past crimes as gang-related.

In the new work, researchers developed a system that can identify a crime as gang-related based on only four pieces of information: the primary weapon, the number of suspects, and the neighborhood and location (such as an alley or street corner) where the crime took place. Such analytics, which can help characterize crimes before they’re fully investigated, could change how police respond, says Doug Haubert, city prosecutor for Long Beach, California, who has authored strategies on gang prevention.

To classify crimes, the researchers invented something called a partially generative neural network. A neural network is made of layers of small computing elements that process data in a way reminiscent of the brain’s neurons. A form of machine learning, it improves based on feedback—whether its judgments were right. In this case, researchers trained their algorithm using data from the Los Angeles Police Department (LAPD) in California from 2014 to 2016 on more than 50,000 gang-related and non–gang-related homicides, aggravated assaults, and robberies.

The researchers then tested their algorithm on another set of LAPD data. The network was “partially generative,” because even when it did not receive an officer’s narrative summary of a crime, it could use the four factors noted above to fill in that missing information and then use all the pieces to infer whether a crime was gang-related. Compared with a stripped-down version of the network that didn’t use this novel approach, the partially generative algorithm reduced errors by close to 30%, the team reported at the Artificial Intelligence, Ethics, and Society (AIES) conference this month in New Orleans, Louisiana. The researchers have not yet tested their algorithm’s accuracy against trained officers.

It’s an “interesting paper,” says Pete Burnap, a computer scientist at Cardiff University who has studied crime data. But although the predictions could be useful, it’s possible they would be no better than officers’ intuitions, he says. Haubert agrees, but he says that having the assistance of data modeling could sometimes produce “better and faster results.” Such analytics, he says, “would be especially useful in large urban areas where a lot of data is available.”…(More).

Your Data Is Crucial to a Robotic Age. Shouldn’t You Be Paid for It?


The New York Times: “The idea has been around for a bit. Jaron Lanier, the tech philosopher and virtual-reality pioneer who now works for Microsoft Research, proposed it in his 2013 book, “Who Owns the Future?,” as a needed corrective to an online economy mostly financed by advertisers’ covert manipulation of users’ consumer choices.

It is being picked up in “Radical Markets,” a book due out shortly from Eric A. Posner of the University of Chicago Law School and E. Glen Weyl, principal researcher at Microsoft. And it is playing into European efforts to collect tax revenue from American internet giants.

In a report obtained last month by Politico, the European Commission proposes to impose a tax on the revenue of digital companies based on their users’ location, on the grounds that “a significant part of the value of a business is created where the users are based and data is collected and processed.”

Users’ data is a valuable commodity. Facebook offers advertisers precisely targeted audiences based on user profiles. YouTube, too, uses users’ preferences to tailor its feed. Still, this pales in comparison with how valuable data is about to become, as the footprint of artificial intelligence extends across the economy.

Data is the crucial ingredient of the A.I. revolution. Training systems to perform even relatively straightforward tasks like voice translation, voice transcription or image recognition requires vast amounts of data — like tagged photos, to identify their content, or recordings with transcriptions.

“Among leading A.I. teams, many can likely replicate others’ software in, at most, one to two years,” notes the technologist Andrew Ng. “But it is exceedingly difficult to get access to someone else’s data. Thus data, rather than software, is the defensible barrier for many businesses.”

We may think we get a fair deal, offering our data as the price of sharing puppy pictures. By other metrics, we are being victimized: In the largest technology companies, the share of income going to labor is only about 5 to 15 percent, Mr. Posner and Mr. Weyl write. That’s way below Walmart’s 80 percent. Consumer data amounts to work they get free….

The big question, of course, is how we get there from here. My guess is that it would be naïve to expect Google and Facebook to start paying for user data of their own accord, even if that improved the quality of the information. Could policymakers step in, somewhat the way the European Commission did, demanding that technology companies compute the value of consumer data?…(More)”.

Journalism and artificial intelligence


Notes by Charlie Beckett (at LSE’s Media Policy Project Blog) : “…AI and machine learning is a big deal for journalism and news information. Possibly as important as the other developments we have seen in the last 20 years such as online platforms, digital tools and social media. My 2008 book on how journalism was being revolutionised by technology was called SuperMedia because these technologies offered extraordinary opportunities to make journalism much more efficient and effective – but also to transform what we mean by news and how we relate to it as individuals and communities. Of course, that can be super good or super bad.

Artificial intelligence and machine learning can help the news media with its three core problems:

  1. The overabundance of information and sources that leave the public confused
  2. The credibility of journalism in a world of disinformation and falling trust and literacy
  3. The Business model crisis – how can journalism become more efficient – avoiding duplication; be more engaged, add value and be relevant to the individual’s and communities’ need for quality, accurate information and informed, useful debate.

But like any technology they can also be used by bad people or for bad purposes: in journalism that can mean clickbait, misinformation, propaganda, and trolling.

Some caveats about using AI in journalism:

  1. Narratives are difficult to program. Trusted journalists are needed to understand and write meaningful stories.
  2. Artificial Intelligence needs human inputs. Skilled journalists are required to double check results and interpret them.
  3. Artificial Intelligence increases quantity, not quality. It’s still up to the editorial team and developers to decide what kind of journalism the AI will help create….(More)”.

Global Fishing Watch And The Power Of Data To Understand Our Natural World


A year and a half ago I wrote about the public debut of the Global Fishing Watch project as a showcase of what becomes possible when massive datasets are made accessible to the general public through easy-to-use interfaces that allow them to explore the planet they inhabit. At the time I noted how the project drove home the divide between the “glittering technological innovation of Silicon Valley and the technological dark ages of the development community” and what becomes possible when technologists and development organizations come together to apply incredible technology not for commercial gain, but rather to save the world itself. Continuing those efforts, last week Global Fishing Watch launched what it describes as the “the first ever dataset of global industrial fishing activities (all countries, all gears),” making the entire dataset freely accessible to seed new scientific, activist, governmental, journalistic and citizen understanding of the state of global fishing.

The Global Fishing Watch project stands as a powerful model for data-driven development work done right and hopefully, the rise of notable efforts like it will eventually catalyze the broader development community to emerge from the stone age of technology and more openly embrace the technological revolution. While it has a very long way to go, there are signs of hope for the development community as pockets of innovation begin to infuse the power of data-driven decision making and situational awareness into everything from disaster response to proactive planning to shaping legislative action.

Bringing technologists and development organizations together is not always that easy and the most creative solutions aren’t always to be found among the “usual suspects.” Open data and open challenges built upon them offer the potential for organizations to reach beyond the usual communities they interact with and identify innovative new approaches to the grand challenges of their fields. Just last month a collaboration of the World Bank, WeRobotics and OpenAerialMap launched a data challenge to apply deep learning to assess aerial imagery in the immediate aftermath of disasters to determine the impact to food producing trees and to road networks. By launching the effort as an open AI challenge, the goal is to reach the broader AI and open development communities at the forefront of creative and novel algorithmic approaches….(More)”.

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation


Report by Miles Brundage et al: “Artificial intelligence and machine learning capabilities are growing at an unprecedented rate. These technologies have many widely beneficial applications, ranging from machine translation to medical image analysis. Countless more such applications are being developed and can be expected over the long term. Less attention has historically been paid to the ways in which artificial intelligence can be used maliciously. This report surveys the landscape of potential security threats from malicious uses of artificial intelligence technologies, and proposes ways to better forecast, prevent, and mitigate these threats. We analyze, but do not conclusively resolve, the question of what the long-term equilibrium between attackers and defenders will be. We focus instead on what sorts of attacks we are likely to see soon if adequate defenses are not developed.

In response to the changing threat landscape we make four high-level recommendations:

1. Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI.

2. Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuserelated considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.

3. Best practices should be identified in research areas with more mature methods for addressing dual-use concerns, such as computer security, and imported where applicable to the case of AI.

4. Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges….(More)”.

How AI-Driven Insurance Could Reduce Gun Violence


Jason Pontin at WIRED: “As a political issue, guns have become part of America’s endless, arid culture wars, where Red and Blue tribes skirmish for political and cultural advantage. But what if there were a compromise? Economics and machine learning suggest an answer, potentially acceptable to Americans in both camps.

Economists sometimes talk about “negative externalities,” market failures where the full costs of transactions are borne by third parties. Pollution is an externality, because society bears the costs of environmental degradation. The 20th-century British economist Arthur Pigou, who formally described externalities, also proposed their solution: so-called “Pigovian taxes,” where governments charge producers or customers, reducing the quantity of the offending products and sometimes paying for ameliorative measures. Pigovian taxes have been used to fight cigarette smoking or improve air quality, and are the favorite prescription of economists for reducing greenhouse gases. But they don’t work perfectly, because it’s hard for governments to estimate the costs of externalities.

Gun violence is a negative externality too. The choices of millions of Americans to buy guns overflow into uncaptured costs for society in the form of crimes, suicides, murders, and mass shootings. A flat gun tax would be a blunt instrument: It could only reduce gun violence by raising the costs of gun ownership so high that almost no one could legally own a gun, which would swell the black market for guns and probably increase crime. But insurers are very good at estimating the risks and liabilities of individual choices; insurance could capture the externalities of gun violence in a smarter, more responsive fashion.

Here’s the proposed compromise: States should require gun owners to be licensed and pay insurance, just as car owners must be licensed and insured today….

The actuaries who research risk have always considered a wide variety of factors when helping insurers price the cost of a policy. Car, home, and life insurance can vary according to a policy holder’s age, health, criminal record, employment, residence, and many other variables. But in recent years, machine learning and data analytics have provided actuaries with new predictive powers. According to Yann LeCun, the director of artificial intelligence at Facebook and the primary inventor of an important technique in deep learning called convolution, “Deep learning systems provide better statistical models with enough data. They can be advantageously applied to risk evaluation, and convolutional neural nets can be very good at prediction, because they can take into account a long window of past values.”

State Farm, Liberty Mutual, Allstate, and Progressive Insurance have all used algorithms to improve their predictive analysis and to more accurately distribute risk among their policy holders. For instance, in late 2015, Progressive created a telematics app called Snapshot that individual drivers used to collect information on their driving. In the subsequent two years, 14 billion miles of driving data were collected all over the country and analyzed on Progressive’s machine learning platform, H20.ai, resulting in discounts of $600 million for their policy holders. On average, machine learning produced a $130 discount for Progressive customers.

When the financial writer John Wasik popularized gun insurance in a series of posts in Forbes in 2012 and 2013, the NRA’s argument about prior constraints was a reasonable objection. Wasik proposed charging different rates to different types of gun owners, but there were too many factors that would have to be tracked over too long a period to drive down costs for low-risk policy holders. Today, using deep learning, the idea is more practical: Insurers could measure the interaction of dozens or hundreds of factors, predicting the risks of gun ownership and controlling costs for low-risk gun owners. Other, more risky bets might pay more. Some very risky would-be gun owners might be unable to find insurance at all. Gun insurance could even be dynamically priced, changing as the conditions of the policy holders’ lives altered, and the gun owners proved themselves better or worse risks.

Requiring gun owners to buy insurance wouldn’t eliminate gun violence in America. But a political solution to the problem of gun violence is chimerical….(More)”.

Data-Driven Regulation and Governance in Smart Cities


Chapter by Sofia Ranchordas and Abram Klop in Berlee, V. Mak, E. Tjong Tjin Tai (Eds), Research Handbook on Data Science and Law (Edward Elgar, 2018): “This paper discusses the concept of data-driven regulation and governance in the context of smart cities by describing how these urban centres harness these technologies to collect and process information about citizens, traffic, urban planning or waste production. It describes how several smart cities throughout the world currently employ data science, big data, AI, Internet of Things (‘IoT’), and predictive analytics to improve the efficiency of their services and decision-making.

Furthermore, this paper analyses the legal challenges of employing these technologies to influence or determine the content of local regulation and governance. It explores in particular three specific challenges: the disconnect between traditional administrative law frameworks and data-driven regulation and governance, the effects of the privatization of public services and citizen needs due to the growing outsourcing of smart cities technologies to private companies; and the limited transparency and accountability that characterizes data-driven administrative processes. This paper draws on a review of interdisciplinary literature on smart cities and offers illustrations of data-driven regulation and governance practices from different jurisdictions….(More)”.

Prediction, Judgment and Complexity


NBER Working Paper by Agrawal, Ajay and Gans, Joshua S. and Goldfarb, Avi: “We interpret recent developments in the field of artificial intelligence (AI) as improvements in prediction technology. In this paper, we explore the consequences of improved prediction in decision-making. To do so, we adapt existing models of decision-making under uncertainty to account for the process of determining payoffs. We label this process of determining the payoffs ‘judgment.’ There is a risky action, whose payoff depends on the state, and a safe action with the same payoff in every state. Judgment is costly; for each potential state, it requires thought on what the payoff might be. Prediction and judgment are complements as long as judgment is not too difficult. We show that in complex environments with a large number of potential states, the effect of improvements in prediction on the importance of judgment depend a great deal on whether the improvements in prediction enable automated decision-making. We discuss the implications of improved prediction in the face of complexity for automation, contracts, and firm boundaries….(More)”.

The future of statistics and data science


Paper by Sofia C. Olhede and Patrick J. Wolfe in Statistics & Probability Letters: “The Danish physicist Niels Bohr is said to have remarked: “Prediction is very difficult, especially about the future”. Predicting the future of statistics in the era of big data is not so very different from prediction about anything else. Ever since we started to collect data to predict cycles of the moon, seasons, and hence future agriculture yields, humankind has worked to infer information from indirect observations for the purpose of making predictions.

Even while acknowledging the momentous difficulty in making predictions about the future, a few topics stand out clearly as lying at the current and future intersection of statistics and data science. Not all of these topics are of a strictly technical nature, but all have technical repercussions for our field. How might these repercussions shape the still relatively young field of statistics? And what can sound statistical theory and methods bring to our understanding of the foundations of data science? In this article we discuss these issues and explore how new open questions motivated by data science may in turn necessitate new statistical theory and methods now and in the future.

Together, the ubiquity of sensing devices, the low cost of data storage, and the commoditization of computing have led to a volume and variety of modern data sets that would have been unthinkable even a decade ago. We see four important implications for statistics.

First, many modern data sets are related in some way to human behavior. Data might have been collected by interacting with human beings, or personal or private information traceable back to a given set of individuals might have been handled at some stage. Mathematical or theoretical statistics traditionally does not concern itself with the finer points of human behavior, and indeed many of us have only had limited training in the rules and regulations that pertain to data derived from human subjects. Yet inevitably in a data-rich world, our technical developments cannot be divorced from the types of data sets we can collect and analyze, and how we can handle and store them.

Second, the importance of data to our economies and civil societies means that the future of regulation will look not only to protect our privacy, and how we store information about ourselves, but also to include what we are allowed to do with that data. For example, as we collect high-dimensional vectors about many family units across time and space in a given region or country, privacy will be limited by that high-dimensional space, but our wish to control what we do with data will go beyond that….

Third, the growing complexity of algorithms is matched by an increasing variety and complexity of data. Data sets now come in a variety of forms that can be highly unstructured, including images, text, sound, and various other new forms. These different types of observations have to be understood together, resulting in multimodal data, in which a single phenomenon or event is observed through different types of measurement devices. Rather than having one phenomenon corresponding to single scalar values, a much more complex object is typically recorded. This could be a three-dimensional shape, for example in medical imaging, or multiple types of recordings such as functional magnetic resonance imaging and simultaneous electroencephalography in neuroscience. Data science therefore challenges us to describe these more complex structures, modeling them in terms of their intrinsic patterns.

Finally, the types of data sets we now face are far from satisfying the classical statistical assumptions of identically distributed and independent observations. Observations are often “found” or repurposed from other sampling mechanisms, rather than necessarily resulting from designed experiments….

 Our field will either meet these challenges and become increasingly ubiquitous, or risk rapidly becoming irrelevant to the future of data science and artificial intelligence….(More)”.

What if technology could help improve conversations online?


Introduction to “Perspective”: “Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions….Perspective is an API that makes it easier to host better conversations. The API uses machine learning models to score the perceived impact a comment might have on a conversation. Developers and publishers can use this score to give realtime feedback to commenters or help moderators do their job, or allow readers to more easily find relevant information, as illustrated in two experiments below. We’ll be releasing more machine learning models later in the year, but our first model identifies whether a comment could be perceived as “toxic” to a discussion….(More)”.