DATA – Page 369 – The Living Library

Artificial intelligence could identify gang crimes—and ignite an ethical firestorm

Curated on March 7, 2018August 3, 2018 by Stefaan Verhulst

Matthew Hutson at Science: “When someone roughs up a pedestrian, robs a store, or kills in cold blood, police want to know whether the perpetrator was a gang member: Do they need to send in a special enforcement team? Should they expect a crime in retaliation? Now, a new algorithm is trying to automate the process of identifying gang crimes. But some scientists warn that far from reducing gang violence, the program could do the opposite by eroding trust in communities, or it could brand innocent people as gang members.

That has created some tensions. At a presentation of the new program this month, one audience member grew so upset he stormed out of the talk, and some of the creators of the program have been tight-lipped about how it could be used….

For years, scientists have been using computer algorithms to map criminal networks, or to guess where and when future crimes might take place, a practice known as predictive policing. But little work has been done on labeling past crimes as gang-related.

In the new work, researchers developed a system that can identify a crime as gang-related based on only four pieces of information: the primary weapon, the number of suspects, and the neighborhood and location (such as an alley or street corner) where the crime took place. Such analytics, which can help characterize crimes before they’re fully investigated, could change how police respond, says Doug Haubert, city prosecutor for Long Beach, California, who has authored strategies on gang prevention.

To classify crimes, the researchers invented something called a partially generative neural network. A neural network is made of layers of small computing elements that process data in a way reminiscent of the brain’s neurons. A form of machine learning, it improves based on feedback—whether its judgments were right. In this case, researchers trained their algorithm using data from the Los Angeles Police Department (LAPD) in California from 2014 to 2016 on more than 50,000 gang-related and non–gang-related homicides, aggravated assaults, and robberies.

The researchers then tested their algorithm on another set of LAPD data. The network was “partially generative,” because even when it did not receive an officer’s narrative summary of a crime, it could use the four factors noted above to fill in that missing information and then use all the pieces to infer whether a crime was gang-related. Compared with a stripped-down version of the network that didn’t use this novel approach, the partially generative algorithm reduced errors by close to 30%, the team reported at the Artificial Intelligence, Ethics, and Society (AIES) conference this month in New Orleans, Louisiana. The researchers have not yet tested their algorithm’s accuracy against trained officers.

It’s an “interesting paper,” says Pete Burnap, a computer scientist at Cardiff University who has studied crime data. But although the predictions could be useful, it’s possible they would be no better than officers’ intuitions, he says. Haubert agrees, but he says that having the assistance of data modeling could sometimes produce “better and faster results.” Such analytics, he says, “would be especially useful in large urban areas where a lot of data is available.”…(More).

Infection forecasts powered by big data

Curated on March 7, 2018July 19, 2019 by Stefaan Verhulst

Michael Eisenstein at Nature: “…The good news is that the present era of widespread access to the Internet and digital health has created a rich reservoir of valuable data for researchers to dive into….By harvesting and combining these streams of big data with conventional ways of monitoring infectious diseases, the public-health community could gain fresh powers to catch and curb emerging outbreaks before they rage out of control.

Going viral

Data scientists at Google were the first to make a major splash using data gathered online to track infectious diseases. The Google Flu Trends algorithm, launched in November 2008, combed through hundreds of billions of users’ queries on the popular search engine to look for small increases in flu-related terms such as symptoms or vaccine availability. Initial data suggested that Google Flu Trends could accurately map the incidence of flu with a lag of roughly one day. “It was a very exciting use of these data for the purpose of public health,” says Brownstein. “It really did start a whole revolution and new field of work in query data.”

Unfortunately, Google Flu Trends faltered when it mattered the most, completely missing the onset in April 2009 of the H1N1 pandemic. The algorithm also ran into trouble later on in the pandemic. It had been trained against seasonal fluctuations of flu, says Viboud, but people’s behaviour changed in the wake of panic fuelled by media reports — and that threw off Google’s data. …

Nevertheless, its work with Internet usage data was inspirational for infectious-disease researchers. A subsequent study from a team led by Cecilia Marques-Toledo at the Federal University of Minas Gerais in Belo Horizonte, Brazil, used Twitter to get high-resolution data on the spread of dengue fever in the country. The researchers could quickly map new cases to specific cities and even predict where the disease might spread to next (C. A. Marques-Toledo et al. PLoS Negl. Trop. Dis. 11, e0005729; 2017). Similarly, Brownstein and his colleagues were able to use search data from Google and Twitter to project the spread of Zika virus in Latin America several weeks before formal outbreak declarations were made by public-health officials. Both Internet services are used widely, which makes them data-rich resources. But they are also proprietary systems for which access to data is controlled by a third party; for that reason, Generous and his colleagues have opted instead to make use of search data from Wikipedia, which is open source. “You can get the access logs, and how many people are viewing articles, which serves as a pretty good proxy for search interest,” he says.

However, the problems that sank Google Flu Trends still exist….Additionally, online activity differs for infectious conditions with a social stigma such as syphilis or AIDS, because people who are or might be affected are more likely to be concerned about privacy. Appropriate search-term selection is essential: Generous notes that initial attempts to track flu on Twitter were confounded by irrelevant tweets about ‘Bieber fever’ — a decidedly non-fatal condition affecting fans of Canadian pop star Justin Bieber.

Alternatively, researchers can go straight to the source — by using smartphone apps to ask people directly about their health. Brownstein’s team has partnered with the Skoll Global Threats Fund to develop an app called Flu Near You, through which users can voluntarily report symptoms of infection and other information. “You get more detailed demographics about age and gender and vaccination status — things that you can’t get from other sources,” says Brownstein. Ten European Union member states are involved in a similar surveillance programme known as Influenzanet, which has generally maintained 30,000–40,000 active users for seven consecutive flu seasons. These voluntary reporting systems are particularly useful for diseases such as flu, for which many people do not bother going to the doctor — although it can be hard to persuade people to participate for no immediate benefit, says Brownstein. “But we still get a good signal from the people that are willing to be a part of this.”…(More)”.

Your Data Is Crucial to a Robotic Age. Shouldn’t You Be Paid for It?

Curated on March 7, 2018August 3, 2018 by Stefaan Verhulst

Eduardo Porter at The New York Times: “The idea has been around for a bit. Jaron Lanier, the tech philosopher and virtual-reality pioneer who now works for Microsoft Research, proposed it in his 2013 book, “Who Owns the Future?,” as a needed corrective to an online economy mostly financed by advertisers’ covert manipulation of users’ consumer choices.

It is being picked up in “Radical Markets,” a book due out shortly from Eric A. Posner of the University of Chicago Law School and E. Glen Weyl, principal researcher at Microsoft. And it is playing into European efforts to collect tax revenue from American internet giants.

In a report obtained last month by Politico, the European Commission proposes to impose a tax on the revenue of digital companies based on their users’ location, on the grounds that “a significant part of the value of a business is created where the users are based and data is collected and processed.”

Users’ data is a valuable commodity. Facebook offers advertisers precisely targeted audiences based on user profiles. YouTube, too, uses users’ preferences to tailor its feed. Still, this pales in comparison with how valuable data is about to become, as the footprint of artificial intelligence extends across the economy.

Data is the crucial ingredient of the A.I. revolution. Training systems to perform even relatively straightforward tasks like voice translation, voice transcription or image recognition requires vast amounts of data — like tagged photos, to identify their content, or recordings with transcriptions.

“Among leading A.I. teams, many can likely replicate others’ software in, at most, one to two years,” notes the technologist Andrew Ng. “But it is exceedingly difficult to get access to someone else’s data. Thus data, rather than software, is the defensible barrier for many businesses.”

We may think we get a fair deal, offering our data as the price of sharing puppy pictures. By other metrics, we are being victimized: In the largest technology companies, the share of income going to labor is only about 5 to 15 percent, Mr. Posner and Mr. Weyl write. That’s way below Walmart’s 80 percent. Consumer data amounts to work they get free….

The big question, of course, is how we get there from here. My guess is that it would be naïve to expect Google and Facebook to start paying for user data of their own accord, even if that improved the quality of the information. Could policymakers step in, somewhat the way the European Commission did, demanding that technology companies compute the value of consumer data?…(More)”.

Trustworthy data will transform the world

Curated on March 6, 2018August 3, 2018 by Stefaan Verhulst

John Thornhill at the Financial Times: “The internet’s original sin was identified as early as 1993 in a New Yorker cartoon. “On the internet, nobody knows you’re a dog,” the caption ran beneath an illustration of a pooch at a keyboard. That anonymity has brought some benefits. But it has also created myriad problems, injecting distrust into the digital world. If you do not know the provenance and integrity of information and data, how can you trust their veracity?

That has led to many of the scourges of our times, such as cyber crime, identity theft and fake news. In his Alan Turing Institute lecture in London last week, the American computer scientist Sandy Pentland outlined the massive gains that could result from trusted data.

The MIT professor argued that the explosion of such information would give us the capability to understand our world in far more detail than ever before. Most of what we know in the fields of sociology, psychology, political science and medicine is derived from tiny experiments in controlled environments. But the data revolution enables us to observe behaviour as it happens at mass scale in the real world. That feedback could provide invaluable evidence about which theories are most valid and which policies and products work best.

The promise is that we make soft social science harder and more predictive. That, in turn, could lead to better organisations, fairer government, and more effective monitoring of our progress towards achieving collective ambitions, such as the UN’s sustainable development goals. To take one small example, Mr Pentland illustrated the strong correlation between connectivity and wealth. By studying the telephone records of 100,000 users in south-east Asia, researchers have plotted social connectivity against income. The conclusion: “The more diverse your connections, the more money you have.” This is not necessarily a causal relationship but it does have a strong causal element, he suggested.

Similar studies of European cities have shown an almost total segregation between groups of different socio-economic status. That lack of connectivity has to be addressed if our politics is not to descend further into a meaningless dialogue.

Data give us a new way to measure progress.

For years, the Open Data movement has been working to create public data sets that can better inform decision making. This worldwide movement is prising open anonymised public data sets, such as transport records, so that they can be used by academics, entrepreneurs and civil society groups. However, much of the most valuable data is held by private entities, notably the consumer tech companies, telecoms operators, retailers and banks. “The big win would be to include private data as a public good,” Mr Pentland said….(More)”.

Using Open Data for Public Services

Curated on March 5, 2018August 3, 2018 by Stefaan Verhulst

New report by the Open Data Institute: “…Today we’re publishing our initial findings based on examining 8 examples where open data supports the delivery of a public service. We have defined 3 high-level ‘patterns’ for how open data is used in public services. We think these could be helpful for others looking to redesign and deliver better services.

The patterns are summarised in the table below:

The first pattern is perhaps the model which everyone is most familiar with as it’s used by the likes of Citymapper, who use open transport data from Transport for London to inform passengers about routes and timings, and other citizen-focused apps. Data is released by a public sector organisation about a public service and a third organisation uses this data to provide a complementary service, online or face-face, to help citizens use the public service.

The second pattern involves the release of open data in the service delivery chain. Open data is used to plan public service delivery and make service delivery chains more efficient. Examples provided in the report include local authorities’ release of open spending, contract and tender data, which is used by Spend Network to support better value for money in public expenditure.

In the third pattern, public sector organisations commissioning services and external organisations involved in service delivery make strategic decisions based on insights and patterns revealed by open data. Visualisations of open data can inform policies on job seeker allowance, as shown in the example from the Department for Work and Pensions in the report.

As well as identifying these patterns, we have created ecosystem maps of the public services we have examined to help understand the relationships and the mechanisms by which open data supports each of them….

Having compared the ecosystems of the examples we have considered so far, the report sets out practical recommendations for those involved in the delivery of public services and for Central Government for the better use of open data in the delivery of public services.

The recommendations are focused on organisational collaboration; technology infrastructure, digital skills and literacy; open standards for data; senior level championing; peer networks; intermediaries; and problem focus….(More)”.

Citicafe: conversation-based intelligent platform for citizen engagement

Curated on March 3, 2018August 3, 2018 by Stefaan Verhulst

Paper by Amol Dumrewal et al in the Proceedings of the ACM India Joint International Conference on Data Science and Management of Data: “Community civic engagement is a new and emerging trend in urban cities driven by the mission of developing responsible citizenship. The recognition of civic potential in every citizen goes a long way in creating sustainable societies. Technology is playing a vital role in helping this mission and over the last couple of years, there have been a plethora of social media avenues to report civic issues. Sites like Twitter, Facebook, and other online portals help citizens to report issues and register complaints. These complaints are analyzed by the public services to help understand and in-turn address these issues. However, once the complaint is registered, often no formal or informal feedback is given back from these sites to the citizens. This de-motivates citizens and may deter them from registering further complaints. In addition, these sites offer no holistic information about a neighborhood to the citizens. It is useful for people to know whether there are similar complaints posted by other people in the same area, the profile of all complaints and a know-how of how and when these complaints will be addressed.

In this paper, we create a conversation-based platform CitiCafe for enhancing citizen engagement front-ended by a virtual agent with a Twitter interface. This platform back-end stores and processes information pertaining to civic complaints in a city. A Twitter based conversation service allows citizens to have a direct correspondence with CitiCafe via “tweets” and direct messages. The platform also helps citizens to (a) report problems and (b) gather information related to civic issues in different neighborhoods. This can also help, in the long run, to develop civic conversations among citizens and also between citizens and public services….(More)”.

Global Fishing Watch And The Power Of Data To Understand Our Natural World

Curated on March 1, 2018August 3, 2018 by Stefaan Verhulst

Kalev Leetaru at Forbes: “A year and a half ago I wrote about the public debut of the Global Fishing Watch project as a showcase of what becomes possible when massive datasets are made accessible to the general public through easy-to-use interfaces that allow them to explore the planet they inhabit. At the time I noted how the project drove home the divide between the “glittering technological innovation of Silicon Valley and the technological dark ages of the development community” and what becomes possible when technologists and development organizations come together to apply incredible technology not for commercial gain, but rather to save the world itself. Continuing those efforts, last week Global Fishing Watch launched what it describes as the “the first ever dataset of global industrial fishing activities (all countries, all gears),” making the entire dataset freely accessible to seed new scientific, activist, governmental, journalistic and citizen understanding of the state of global fishing.

The Global Fishing Watch project stands as a powerful model for data-driven development work done right and hopefully, the rise of notable efforts like it will eventually catalyze the broader development community to emerge from the stone age of technology and more openly embrace the technological revolution. While it has a very long way to go, there are signs of hope for the development community as pockets of innovation begin to infuse the power of data-driven decision making and situational awareness into everything from disaster response to proactive planning to shaping legislative action.

Bringing technologists and development organizations together is not always that easy and the most creative solutions aren’t always to be found among the “usual suspects.” Open data and open challenges built upon them offer the potential for organizations to reach beyond the usual communities they interact with and identify innovative new approaches to the grand challenges of their fields. Just last month a collaboration of the World Bank, WeRobotics and OpenAerialMap launched a data challenge to apply deep learning to assess aerial imagery in the immediate aftermath of disasters to determine the impact to food producing trees and to road networks. By launching the effort as an open AI challenge, the goal is to reach the broader AI and open development communities at the forefront of creative and novel algorithmic approaches….(More)”.

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

Curated on March 1, 2018August 3, 2018 by Stefaan Verhulst

Report by Miles Brundage et al: “Artificial intelligence and machine learning capabilities are growing at an unprecedented rate. These technologies have many widely beneficial applications, ranging from machine translation to medical image analysis. Countless more such applications are being developed and can be expected over the long term. Less attention has historically been paid to the ways in which artificial intelligence can be used maliciously. This report surveys the landscape of potential security threats from malicious uses of artificial intelligence technologies, and proposes ways to better forecast, prevent, and mitigate these threats. We analyze, but do not conclusively resolve, the question of what the long-term equilibrium between attackers and defenders will be. We focus instead on what sorts of attacks we are likely to see soon if adequate defenses are not developed.

In response to the changing threat landscape we make four high-level recommendations:

1. Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI.

2. Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuserelated considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.

3. Best practices should be identified in research areas with more mature methods for addressing dual-use concerns, such as computer security, and imported where applicable to the case of AI.

4. Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges….(More)”.

Liquid democracy uses blockchain to fix politics, and now you can vote for it

Curated on February 28, 2018August 3, 2018 by Stefaan Verhulst

Danny Crichton at TechCrunch: “…Confidence in Congress remains pitifully low, driven by perceived low ethical standards and an increasing awareness that politics is bought by the highest bidder.

Now, a group of technologists and blockchain enthusiasts are asking whether a new approach could reform the system, bringing citizens closer to their representatives and holding congressmen accountable to their voters in a public, verifiable way. And if you live in western San Francisco, you can actually vote to put this system into office.

The concept is known as liquid democracy, and it’s a solid choice for fixing a broken system. The idea is that every person should have the right to give feedback on a policy issue or a piece of new legislation, but often people don’t have the time to do so. Using a liquid democracy platform, however, that voter can select a personal representative who has the authority to be a proxy for their vote. That proxy can be changed at will as a voter’s interests change.

Here is where the magic happens. Those proxies can themselves proxy their votes to other people, creating a directed network graph, ideally connecting every voter to politicians and all publicly verified on a blockchain. While there may be 700,000 people in a congressional district, potentially only a few hundred of a few thousand “super proxies” would need to be deeply engaged in the system for better representation to take place.

David Ernst is a leader of the liquid democracy movement and now a candidate for California Assembly District 19, which centers on the western half of San Francisco. He is ardently committed to the concept, and despite its novelty, believes that this is the path forward for improving governance….

Following college (which he began at age 16) and a few startup jobs, Ernst began working as CTO of a startup called Numerai, a crypto-backed decentralized hedge fund that allows data scientists to earn money when they solve data challenges. “The idea was that we can include many more people to participate in the system who weren’t able to before,” Ernst explained. That’s when it hit him that the decentralized nature of blockchain could allow for more participation in politics, fusing his two passions.

Ernst followed the campaign of the Flux Party in Australia in 2016, which is trying to implement what it calls “issue-based direct democracy” in that country’s legislature. “That was when something clicked,” he said. A congressman for example could commit to voting the calculated liquid democracy position, and “We could elect these sort of remote-controlled politicians as a way to graft this new system onto the old system.”

He built a platform called United.vote to handle the logistics of selecting personal representatives and voting on issues. More importantly, the app then tracks how those votes compare to the votes of congressmen and provides a scorecard….(More)”.