Predictive Analytics


Revised book by Eric Siegel: “Prediction is powered by the world’s most potent, flourishing unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn.

Predictive analytics unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future drives millions of decisions more effectively, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate.

In this lucid, captivating introduction — now in its Revised and Updated edition — former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction:

    • What type of mortgage risk Chase Bank predicted before the recession.
    • Predicting which people will drop out of school, cancel a subscription, or get divorced before they even know it themselves.
    • Why early retirement predicts a shorter life expectancy and vegetarians miss fewer flights.
    • Five reasons why organizations predict death — including one health insurance company.
    • How U.S. Bank and Obama for America calculated — and Hillary for America 2016 plans to calculate — the way to most strongly persuade each individual.
    • Why the NSA wants all your data: machine learning supercomputers to fight terrorism.
    • How IBM’s Watson computer used predictive modeling to answer questions and beat the human champs on TV’s Jeopardy!
    • How companies ascertain untold, private truths — how Target figures out you’re pregnant and Hewlett-Packard deduces you’re about to quit your job.
    • How judges and parole boards rely on crime-predicting computers to decide how long convicts remain in prison.
    • 183 examples from Airbnb, the BBC, Citibank, ConEd, Facebook, Ford, Google, the IRS, LinkedIn, Match.com, MTV, Netflix, PayPal, Pfizer, Spotify, Uber, UPS, Wikipedia, and more….(More)”

 

Daedalus Issue on “The Internet”


Press release: “Thirty years ago, the Internet was a network that primarily delivered email among academic and government employees. Today, it is rapidly evolving into a control system for our physical environment through the Internet of Things, as mobile and wearable technology more tightly integrate the Internet into our everyday lives.

How will the future Internet be shaped by the design choices that we are making today? Could the Internet evolve into a fundamentally different platform than the one to which we have grown accustomed? As an alternative to big data, what would it mean to make ubiquitously collected data safely available to individuals as small data? How could we attain both security and privacy in the face of trends that seem to offer neither? And what role do public institutions, such as libraries, have in an environment that becomes more privatized by the day?

These are some of the questions addressed in the Winter 2016 issue of Daedalus on “The Internet.”  As guest editors David D. Clark (Senior Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory) and Yochai Benkler (Berkman Professor of Entrepreneurial Legal Studies at Harvard Law School and Faculty Co-Director of the Berkman Center for Internet and Society at Harvard University) have observed, the Internet “has become increasingly privately owned, commercial, productive, creative, and dangerous.”

Some of the themes explored in the issue include:

  • The conflicts that emerge among governments, corporate stakeholders, and Internet users through choices that are made in the design of the Internet
  • The challenges—including those of privacy and security—that materialize in the evolution from fixed terminals to ubiquitous computing
  • The role of public institutions in shaping the Internet’s privately owned open spaces
  • The ownership and security of data used for automatic control of connected devices, and
  • Consumer demand for “free” services—developed and supported through the sale of user data to advertisers….

Essays in the Winter 2016 issue of Daedalus include:

  • The Contingent Internet by David D. Clark (MIT)
  • Degrees of Freedom, Dimensions of Power by Yochai Benkler (Harvard Law School)
  • Edge Networks and Devices for the Internet of Things by Peter T. Kirstein (University College London)
  • Reassembling Our Digital Selves by Deborah Estrin (Cornell Tech and Weill Cornell Medical College) and Ari Juels (Cornell Tech)
  • Choices: Privacy and Surveillance in a Once and Future Internet by Susan Landau (Worcester Polytechnic Institute)
  • As Pirates Become CEOs: The Closing of the Open Internet by Zeynep Tufekci (University of North Carolina at Chapel Hill)
  • Design Choices for Libraries in the Digital-Plus Era by John Palfrey (Phillips Academy)…(More)

See also: Introduction

Big Data Analysis: New Algorithms for a New Society


Book edited by Nathalie Japkowicz and Jerzy Stefanowski: “This edited volume is devoted to Big Data Analysis from a Machine Learning standpoint as presented by some of the most eminent researchers in this area.

It demonstrates that Big Data Analysis opens up new research problems which were either never considered before, or were only considered within a limited range. In addition to providing methodological discussions on the principles of mining Big Data and the difference between traditional statistical data analysis and newer computing frameworks, this book presents recently developed algorithms affecting such areas as business, financial forecasting, human mobility, the Internet of Things, information networks, bioinformatics, medical systems and life science. It explores, through a number of specific examples, how the study of Big Data Analysis has evolved and how it has started and will most likely continue to affect society. While the benefits brought upon by Big Data Analysis are underlined, the book also discusses some of the warnings that have been issued concerning the potential dangers of Big Data Analysis along with its pitfalls and challenges….(More)”

OpenAI won’t benefit humanity without data-sharing


 at the Guardian: “There is a common misconception about what drives the digital-intelligence revolution. People seem to have the idea that artificial intelligence researchers are directly programming an intelligence; telling it what to do and how to react. There is also the belief that when we interact with this intelligence we are processed by an “algorithm” – one that is subject to the whims of the designer and encodes his or her prejudices.

OpenAI, a new non-profit artificial intelligence company that was founded on Friday, wants to develop digital intelligence that will benefit humanity. By sharing its sentient algorithms with all, the venture, backed by a host of Silicon Valley billionaires, including Elon Musk and Peter Thiel, wants to avoid theexistential risks associated with the technology.

OpenAI’s launch announcement was timed to coincide with this year’s Neural Information Processing Systems conference: the main academic outlet for scientific advances in machine learning, which I chaired. Machine learning is the technology that underpins the new generation of AI breakthroughs.

One of OpenAI’s main ideas is to collaborate openly, publishing code and papers. This is admirable and the wider community is already excited by what the company could achieve.

OpenAI is not the first company to target digital intelligence, and certainly not the first to publish code and papers. Both Facebook and Google have already shared code. They were also present at the same conference. All three companies hosted parties with open bars, aiming to entice the latest and brightest minds.

However, the way machine learning works means that making algorithms available isn’t necessarily as useful as one might think. A machine- learning algorithm is subtly different from popular perception.

Just as in baking we don’t have control over how the cake will emerge from the oven, in machine learning we don’t control every decision that the computer will make. In machine learning the quality of the ingredients, the quality of the data provided, has a massive impact on the intelligence that is produced.

For intelligent decision-making the recipe needs to be carefully applied to the data: this is the process we refer to as learning. The result is the combination of our data and the recipe. We need both to make predictions.

By sharing their algorithms, Facebook and Google are merely sharing the recipe. Someone has to provide the eggs and flour and provide the baking facilities (which in Google and Facebook’s case are vast data-computation facilities, often located near hydroelectric power stations for cheaper electricity).

So even before they start, an open question for OpenAI is how will it ensure it has access to the data on the necessary scale to make progress?…(More)”

China’s Biggest Polluters Face Wrath of Data-Wielding Citizens


Bloomberg News: “Besides facing hefty fines, criminal punishments and the possibility of closing, the worst emitters in China risk additional public anger as new smartphone applications and lower-cost monitoring devices widen access to data on pollution sources.

The Blue Map app, developed by the Institute of Public & Environmental Affairs with support from the SEE Foundation and the Alibaba Foundation, provides pollution data from more than 3,000 large coal-power, steel, cement and petrochemical production plants. Origins Technology Ltd. in July began sale of the Laser Egg, a palm-sized air quality monitor used to track indoor and outdoor air quality by measuring fine particulate matter in the air.

“Letting people know the sources of regional pollution will help the push for control over emissions of every chimney,” said Ma Jun, the founder and director of the Beijing-based IPE.

The phone map and Laser Egg are the latest levers in prying control over information on air quality from the hands of the few to the many, and they’re beginning to weigh on how officials respond to the issue. Numerous smartphone applications, including those developed by SINA Corp. and Moji Fengyun (Beijing) Software Technology Development Co., now provide people in China with real-time access to air quality readings, essentially democratizing what was once an information pipeline available only to the government.

“China’s continuing struggle to control and reduce air pollution exemplifies the government’s fear that lifestyle issues will mutate into demands for political change,” said Mary Gallagher, an associate professor of political science at the University of Michigan.

Even the government is getting in on the act. The Ministry of Environmental Protection rolled out a smartphone application called “Nationwide Air Quality” with the help ofWuhan Juzheng Environmental Science & Technology Co. at the end of 2013.

“As citizens know more about air pollution, more pressure will be put on the government,” said Xu Qinxiang, a technology manager at Wuhan Juzheng. “This will urge the government to control pollutant sources and upgrade heavy industries.”

 Laser Egg

Sources of air quality data come from the China National Environment Monitoring Center, local environmental protection bureaus and non-Chinese sources such as the U.S. Embassy’s website in Beijing, Xu said.

Air quality is a controversial subject in China. Since 2012, the public has pushed the government to move more quickly than planned to begin releasing data measuring pollution levels — especially of PM2.5, the particulates most harmful to human health.

The reading was 267 micrograms per cubic meter at 10 a.m. Monday near Tiananmen Square, according to the Beijing Municipal Environmental Monitoring Center. The World Health Organization cautions against 24-hour exposure to concentrations higher than 25.

The availability of data appears to be filling a need, especially with the arrival of colder temperatures and the associated smog that blanketed Beijing and northern Chinarecently….

“With more disclosure of the data, everyone becomes more sensitive, hoping the government can do something,” Li Yajuan, a 27-year-old office secretary, said in an interview in Beijing’s Fuchengmen area. “It’s our own living environment after all.”

Efforts to make products linked to air data continue. IBM has been developing artificial intelligence to help fight Beijing’s toxic air pollution, and plans to work with other municipalities in China and India on similar projects to manage air quality….(More)”

Decoding the Future for National Security


George I. Seffers at Signal: “U.S. intelligence agencies are in the business of predicting the future, but no one has systematically evaluated the accuracy of those predictions—until now. The intelligence community’s cutting-edge research and development agency uses a handful of predictive analytics programs to measure and improve the ability to forecast major events, including political upheavals, disease outbreaks, insider threats and cyber attacks.

The Office for Anticipating Surprise at the Intelligence Advanced Research Projects Activity (IARPA) is a place where crystal balls come in the form of software, tournaments and throngs of people. The office sponsors eight programs designed to improve predictive analytics, which uses a variety of data to forecast events. The programs all focus on incidents outside of the United States, and the information is anonymized to protect privacy. The programs are in different stages, some having recently ended as others are preparing to award contracts.

But they all have one more thing in common: They use tournaments to advance the state of the predictive analytic arts. “We decided to run a series of forecasting tournaments in which people from around the world generate forecasts about, now, thousands of real-world events,” says Jason Matheny, IARPA’s new director. “All of our programs on predictive analytics do use this tournament style of funding and evaluating research.” The Open Source Indicators program used a crowdsourcing technique in which people across the globe offered their predictions on such events as political uprisings, disease outbreaks and elections.

The data analyzed included social media trends, Web search queries and even cancelled dinner reservations—an indication that people are sick. “The methods applied to this were all automated. They used machine learning to comb through billions of pieces of data to look for that signal, that leading indicator, that an event was about to happen,” Matheny explains. “And they made amazing progress. They were able to predict disease outbreaks weeks earlier than traditional reporting.” The recently completed Aggregative Contingent Estimation (ACE) program also used a crowdsourcing competition in which people predicted events, including whether weapons would be tested, treaties would be signed or armed conflict would break out along certain borders. Volunteers were asked to provide information about their own background and what sources they used. IARPA also tested participants’ cognitive reasoning abilities. Volunteers provided their forecasts every day, and IARPA personnel kept score. Interestingly, they discovered the “deep domain” experts were not the best at predicting events. Instead, people with a certain style of thinking came out the winners. “They read a lot, not just from one source, but from multiple sources that come from different viewpoints. They have different sources of data, and they revise their judgments when presented with new information. They don’t stick to their guns,” Matheny reveals. …

The ACE research also contributed to a recently released book, Superforecasting: The Art and Science of Prediction, according to the IARPA director. The book was co-authored, along with Dan Gardner, by Philip Tetlock, the Annenberg University professor of psychology and management at the University of Pennsylvania who also served as a principal investigator for the ACE program. Like ACE, the Crowdsourcing Evidence, Argumentation, Thinking and Evaluation program uses the forecasting tournament format, but it also requires participants to explain and defend their reasoning. The initiative aims to improve analytic thinking by combining structured reasoning techniques with crowdsourcing.

Meanwhile, the Foresight and Understanding from Scientific Exposition (FUSE) program forecasts science and technology breakthroughs….(More)”

Artificial Intelligence Aims to Make Wikipedia Friendlier and Better


Tom Simonite in MIT Technology Review: “Software trained to know the difference between an honest mistake and intentional vandalism is being rolled out in an effort to make editing Wikipedia less psychologically bruising. It was developed by the Wikimedia Foundation, the nonprofit organization that supports Wikipedia.

One motivation for the project is a significant decline in the number of people considered active contributors to the flagship English-language Wikipedia: it has fallen by 40 percent over the past eight years, to about 30,000. Research indicates that the problem is rooted in Wikipedians’ complex bureaucracy and their often hard-line responses to newcomers’ mistakes, enabled by semi-automated tools that make deleting new changes easy (see “The Decline of Wikipedia”).

Aaron Halfaker, a senior research scientist at Wikimedia Foundation who helped diagnose that problem, is now leading the project trying to fight it, which relies on algorithms with a sense for human fallibility. His ORES system, for “Objective Revision Evaluation Service,” can be trained to score the quality of new changes to Wikipedia and judge whether an edit was made in good faith or not….

ORES can allow editing tools to direct people to review the most damaging changes. The software can also help editors treat rookie or innocent mistakes more appropriately, says Halfaker. “I suspect the aggressive behavior of Wikipedians doing quality control is because they’re making judgments really fast and they’re not encouraged to have a human interaction with the person,” he says. “This enables a tool to say, ‘If you’re going to revert this, maybe you should be careful and send the person who made the edit a message.’”

..Earlier efforts to make Wikipedia more welcoming to newcomers have been stymied by the very community that’s supposed to benefit. Wikipedians rose up in 2013 when Wikimedia made a word-processor-style editing interface the default, forcing the foundation to make it opt-in instead. To this day, the default editor uses a complicated markup language called Wikitext…(More)”

Robots Will Make Leeds the First Self-Repairing City


Emiko Jozuka at Motherboard: “Researchers in Britain want to make the first “self-repairing” city by 2035. How will they do this? By creating autonomous repair robots that patrol the streets and drainage systems, making sure your car doesn’t dip into a pothole, and that you don’t experience any gas leaks.

“The idea is to create a city that behaves almost like a living organism,” said Raul Fuentes, a researcher at the School of Civil Engineering at Leeds University, who is working on the project. “The robots will act like white cells that are able to identify bacteria or viruses and attack them. It’s kind of like an immune system.”

The £4.2 million ($6.4 million) national infrastructure project is in collaboration with Leeds City Council and the UK Collaboration for Research in Infrastructures and Cities (UKCRIC). The aim is to create a fleet of robot repair workers who will live in Leeds city, spot problems, and sort them out before they become even bigger ones by 2035, said Fuentes. The project is set to launch officially in January 2016, he added.

For their five-year project—which has a vision that extends until 2050—the researchers will develop robot designs and technologies that focus on three main areas. The first is to create drones that can perch on high structures and repair things like street lamps; the second is to develop drones that can autonomously spot when a pothole is about to form and zone in and patch that up before it worsens; and the third is to develop robots that will live in utility pipes so they can inspect, repair, and report back to humans when they spot an issue.

“The robots will be living permanently in the city, and they’ll be able to identify issues before they become real problems,” explained Fuentes. The researchers are working on making the robots autonomous, and want them to be living in swarms or packs where they can communicate with one another on how best they could get the repair job done….(More)

How Satellite Data and Artificial Intelligence could help us understand poverty better


Maya Craig at Fast Company: “Governments and development organizations currently measure poverty levels by conducting door-to-door surveys. The new partnership will test the use of AI to supplement these surveys and increase the accuracy of poverty data. Orbital said its AI software will analyze satellite images to see if characteristics such as building height and rooftop material can effectively indicate wealth.

The pilot study will be conducted in Sri Lanka. If successful, the World Bank hopes to scale it worldwide. A recent study conducted by the organization found that more than 50 countries lack legitimate poverty estimates, which limits the ability of the development community to support the world’s poorest populations.

“Data depravation is a serious issue, especially in many of the countries where we need it most,” says David Newhouse, senior economist at the World Bank. “This technology has the potential to help us get that data more frequently and at a finer level of detail than is currently possible.”

The announcement is the latest in an emerging industry of AI analysis of satellite photos. A growing number of investors and entrepreneurs are betting that the convergence of these fields will have far-reaching impacts on business, policy, resource management and disaster response.

Wall Street’s biggest hedge-fund businesses have begun using the technology to improve investment strategies. The Pew Charitable Trust employs the method to monitor oceans for illegal fishing activities. And startups like San Francisco-based Mavrx use similar analytics to optimize crop harvest.

The commercial earth-imaging satellite market, valued at $2.7 billion in 2014, is predicted to grow by 14% each year through the decade, according to a recent report.

As recently as two years ago, there were just four commercial earth imaging satellites operated in the U.S., and government contracts accounted for about 70% of imagery sales. By 2020, there will be hundreds of private-sector “smallsats” in orbit capturing imagery that will be easily accessible online. Companies like Skybox Imaging and Planet Labs have the first of these smallsats already active, with plans for more.

The images generated by these companies will be among the world’s largest data sets. And recent breakthroughs in AI research have made it possible to analyze these images to inform decision-making…(More)”

The importance of human innovation in A.I. ethics


John C. Havens at Mashable: “….While welcoming the feedback that sensors, data and Artificial Intelligence provide, we’re at a critical inflection point. Demarcating the parameters between assistance and automation has never been more central to human well-being. But today, beauty is in the AI of the beholder. Desensitized to the value of personal data, we hemorrhage precious insights regarding our identity that define the moral nuances necessary to navigate algorithmic modernity.

If no values-based standards exist for Artificial Intelligence, then the biases of its manufacturers will define our universal code of human ethics. But this should not be their cross to bear alone. It’s time to stop vilifying the AI community and start defining in concert with their creations what the good life means surrounding our consciousness and code.

The intention of the ethics

“Begin as you mean to go forward.” Michael Stewart is founder, chairman & CEO of Lucid, an Artificial Intelligence company based in Austin that recently announced the formation of the industry’s first Ethics Advisory Panel (EAP). While Google claimed creation of a similar board when acquiring AI firm DeepMind in January 2014, no public realization of its efforts currently exist (as confirmed by a PR rep from Google for this piece). Lucid’s Panel, by comparison, has already begun functioning as a separate organization from the analytics side of the business and provides oversight for the company and its customers. “Our efforts,” Stewart says, “are guided by the principle that our ethics group is obsessed with making sure the impact of our technology is good.”

Kay Firth-Butterfield is chief officer of the EAP, and is charged with being on the vanguard of the ethical issues affecting the AI industry and society as a whole. Internally, the EAP provides the hub of ethical behavior for the company. Someone from Firth-Butterfield’s office even sits on all core product development teams. “Externally,” she notes, “we plan to apply Cyc intelligence (shorthand for ‘encyclopedia,’ Lucid’s AI causal reasoning platform) for research to demonstrate the benefits of AI and to advise Lucid’s leadership on key decisions, such as the recent signing of the LAWS letter and the end use of customer applications.”

Ensuring the impact of AI technology is positive doesn’t happen by default. But as Lucid is demonstrating, ethics doesn’t have to stymie innovation by dwelling solely in the realm of risk mitigation. Ethical processes aligning with a company’s core values can provide more deeply relevant products and increased public trust. Transparently including your customer’s values in these processes puts the person back into personalization….(Mashable)”