Ways to think about machine learning


Benedict Evans: “We’re now four or five years into the current explosion of machine learning, and pretty much everyone has heard of it. It’s not just that startups are forming every day or that the big tech platform companies are rebuilding themselves around it – everyone outside tech has read the Economist or BusinessWeek cover story, and many big companies have some projects underway. We know this is a Next Big Thing.

Going a step further, we mostly understand what neural networks might be, in theory, and we get that this might be about patterns and data. Machine learning lets us find patterns or structures in data that are implicit and probabilistic (hence ‘inferred’) rather than explicit, that previously only people and not computers could find. They address a class of questions that were previously ‘hard for computers and easy for people’, or, perhaps more usefully, ‘hard for people to describe to computers’. And we’ve seen some cool (or worrying, depending on your perspective) speech and vision demos.

I don’t think, though, that we yet have a settled sense of quite what machine learning means – what it will mean for tech companies or for companies in the broader economy, how to think structurally about what new things it could enable, or what machine learning means for all the rest of us, and what important problems it might actually be able to solve.

This isn’t helped by the term ‘artificial intelligence’, which tends to end any conversation as soon as it’s begun. As soon as we say ‘AI’, it’s as though the black monolith from the beginning of 2001 has appeared, and we all become apes screaming at it and shaking our fists. You can’t analyze ‘AI’.

Indeed, I think one could propose a whole list of unhelpful ways of talking about current developments in machine learning. For example:

  • Data is the new oil
  • Google and China (or Facebook, or Amazon, or BAT) have all the data
  • AI will take all the jobs
  • And, of course, saying AI itself.

More useful things to talk about, perhaps, might be:

  • Automation
  • Enabling technology layers
  • Relational databases. …(More).

We Need to Save Ignorance From AI


Christina Leuker and Wouter van den Bos in Nautilus:  “After the fall of the Berlin Wall, East German citizens were offered the chance to read the files kept on them by the Stasi, the much-feared Communist-era secret police service. To date, it is estimated that only 10 percent have taken the opportunity.

In 2007, James Watson, the co-discoverer of the structure of DNA, asked that he not be given any information about his APOE gene, one allele of which is a known risk factor for Alzheimer’s disease.

Most people tell pollsters that, given the choice, they would prefer not to know the date of their own death—or even the future dates of happy events.

Each of these is an example of willful ignorance. Socrates may have made the case that the unexamined life is not worth living, and Hobbes may have argued that curiosity is mankind’s primary passion, but many of our oldest stories actually describe the dangers of knowing too much. From Adam and Eve and the tree of knowledge to Prometheus stealing the secret of fire, they teach us that real-life decisions need to strike a delicate balance between choosing to know, and choosing not to.

But what if a technology came along that shifted this balance unpredictably, complicating how we make decisions about when to remain ignorant? That technology is here: It’s called artificial intelligence.

AI can find patterns and make inferences using relatively little data. Only a handful of Facebook likes are necessary to predict your personality, race, and gender, for example. Another computer algorithm claims it can distinguish between homosexual and heterosexual men with 81 percent accuracy, and homosexual and heterosexual women with 71 percent accuracy, based on their picture alone. An algorithm named COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) can predict criminal recidivism from data like juvenile arrests, criminal records in the family, education, social isolation, and leisure activities with 65 percent accuracy….

Recently, though, the psychologist Ralph Hertwig and legal scholar Christoph Engel have published an extensive taxonomy of motives for deliberate ignorance. They identified two sets of motives, in particular, that have a particular relevance to the need for ignorance in the face of AI.

The first set of motives revolves around impartiality and fairness. Simply put, knowledge can sometimes corrupt judgment, and we often choose to remain deliberately ignorant in response. For example, peer reviews of academic papers are usually anonymous. Insurance companies in most countries are not permitted to know all the details of their client’s health before they enroll; they only know general risk factors. This type of consideration is particularly relevant to AI, because AI can produce highly prejudicial information….(More)”.

Against the Dehumanisation of Decision-Making – Algorithmic Decisions at the Crossroads of Intellectual Property, Data Protection, and Freedom of Information


Paper by Guido Noto La Diega: “Nowadays algorithms can decide if one can get a loan, is allowed to cross a border, or must go to prison. Artificial intelligence techniques (natural language processing and machine learning in the first place) enable private and public decision-makers to analyse big data in order to build profiles, which are used to make decisions in an automated way.

This work presents ten arguments against algorithmic decision-making. These revolve around the concepts of ubiquitous discretionary interpretation, holistic intuition, algorithmic bias, the three black boxes, psychology of conformity, power of sanctions, civilising force of hypocrisy, pluralism, empathy, and technocracy.

The lack of transparency of the algorithmic decision-making process does not stem merely from the characteristics of the relevant techniques used, which can make it impossible to access the rationale of the decision. It depends also on the abuse of and overlap between intellectual property rights (the “legal black box”). In the US, nearly half a million patented inventions concern algorithms; more than 67% of the algorithm-related patents were issued over the last ten years and the trend is increasing.

To counter the increased monopolisation of algorithms by means of intellectual property rights (with trade secrets leading the way), this paper presents three legal routes that enable citizens to ‘open’ the algorithms.

First, copyright and patent exceptions, as well as trade secrets are discussed.

Second, the GDPR is critically assessed. In principle, data controllers are not allowed to use algorithms to take decisions that have legal effects on the data subject’s life or similarly significantly affect them. However, when they are allowed to do so, the data subject still has the right to obtain human intervention, to express their point of view, as well as to contest the decision. Additionally, the data controller shall provide meaningful information about the logic involved in the algorithmic decision.

Third, this paper critically analyses the first known case of a court using the access right under the freedom of information regime to grant an injunction to release the source code of the computer program that implements an algorithm.

Only an integrated approach – which takes into account intellectual property, data protection, and freedom of information – may provide the citizen affected by an algorithmic decision of an effective remedy as required by the Charter of Fundamental Rights of the EU and the European Convention on Human Rights….(More)”.

Ontario is trying a wild experiment: Opening access to its residents’ health data


Dave Gershorn at Quartz: “The world’s most powerful technology companies have a vision for the future of healthcare. You’ll still go to your doctor’s office, sit in a waiting room, and explain your problem to someone in a white coat. But instead of relying solely on their own experience and knowledge, your doctor will consult an algorithm that’s been trained on the symptoms, diagnoses, and outcomes of millions of other patients. Instead of a radiologist reading your x-ray, a computer will be able to detect minute differences and instantly identify a tumor or lesion. Or at least that’s the goal.

AI systems like these, currently under development by companies including Google and IBM, can’t read textbooks and journals, attend lectures, and do rounds—they need millions of real life examples to understand all the different variations between one patient and another. In general, AI is only as good as the data it’s trained on, but medical data is exceedingly private—most developed countries have strict health data protection laws, such as HIPAA in the United States….

These approaches, which favor companies with considerable resources, are pretty much the only way to get large troves of health data in the US because the American health system is so disparate. Healthcare providers keep personal files on each of their patients, and can only transmit them to other accredited healthcare workers at the patient’s request. There’s no single place where all health data exists. It’s more secure, but less efficient for analysis and research.

Ontario, Canada, might have a solution, thanks to its single-payer healthcare system. All of Ontario’s health data exists in a few enormous caches under government control. (After all, the government needs to keep track of all the bills its paying.) Similar structures exist elsewhere in Canada, such as Quebec, but Toronto, which has become a major hub for AI research, wants to lead the charge in providing this data to businesses.

Until now, the only people allowed to study this data were government organizations or researchers who partnered with the government to study disease. But Ontario has now entrusted the MaRS Discovery District—a cross between a tech incubator and WeWork—to build a platform for approved companies and researchers to access this data, dubbed Project Spark. The project, initiated by MaRS and Canada’s University Health Network, began exploring how to share this data after both organizations expressed interest to the government about giving broader health data access to researchers and companies looking to build healthcare-related tools.

Project Spark’s goal is to create an API, or a way for developers to request information from the government’s data cache. This could be used to create an app for doctors to access the full medical history of a new patient. Ontarians could access their health records at any time through similar software, and catalog health issues as they occur. Or researchers, like the ones trying to build AI to assist doctors, could request a different level of access that provides anonymized data on Ontarians who meet certain criteria. If you wanted to study every Ontarian who had Alzheimer’s disease over the last 40 years, that data would only be authorization and a few lines of code away.

There are currently 100 companies lined up to get access to data, comprised of health records from Ontario’s 14 million residents. (MaRS won’t say who the companies are). …(More)”

AI Nationalism


Blog by Ian Hogarth: “The central prediction I want to make and defend in this post is that continued rapid progress in machine learning will drive the emergence of a new kind of geopolitics; I have been calling it AI Nationalism. Machine learning is an omni-use technology that will come to touch all sectors and parts of society.

The transformation of both the economy and the military by machine learning will create instability at the national and international level forcing governments to act. AI policy will become the single most important area of government policy. An accelerated arms race will emerge between key countries and we will see increased protectionist state action to support national champions, block takeovers by foreign firms and attract talent. I use the example of Google, DeepMind and the UK as a specific example of this issue.

This arms race will potentially speed up the pace of AI development and shorten the timescale for getting to AGI. Although there will be many common aspects to this techno-nationalist agenda, there will also be important state specific policies. There is a difference between predicting that something will happen and believing this is a good thing. Nationalism is a dangerous path, particular when the international order and international norms will be in flux as a result and in the concluding section I discuss how a period of AI Nationalism might transition to one of global cooperation where AI is treated as a global public good….(More)”.

Big Data and AI – A transformational shift for government: So, what next for research?


Irina Pencheva, Marc Esteve and Slava Jenkin Mikhaylov in Public Policy and Administration: “Big Data and artificial intelligence will have a profound transformational impact on governments around the world. Thus, it is important for scholars to provide a useful analysis on the topic to public managers and policymakers. This study offers an in-depth review of the Policy and Administration literature on the role of Big Data and advanced analytics in the public sector. It provides an overview of the key themes in the research field, namely the application and benefits of Big Data throughout the policy process, and challenges to its adoption and the resulting implications for the public sector. It is argued that research on the subject is still nascent and more should be done to ensure that the theory adds real value to practitioners. A critical assessment of the strengths and limitations of the existing literature is developed, and a future research agenda to address these gaps and enrich our understanding of the topic is proposed…(More)”.

Our Infant Information Revolution


Joseph Nye at Project Syndicate: “…When people are overwhelmed by the volume of information confronting them, it is hard to know what to focus on. Attention, not information, becomes the scarce resource. The soft power of attraction becomes an even more vital power resource than in the past, but so does the hard, sharp power of information warfare. And as reputation becomes more vital, political struggles over the creation and destruction of credibility multiply. Information that appears to be propaganda may not only be scorned, but may also prove counterproductive if it undermines a country’s reputation for credibility.

During the Iraq War, for example, the treatment of prisoners at Abu Ghraib and Guantanamo Bay in a manner inconsistent with America’s declared values led to perceptions of hypocrisy that could not be reversed by broadcasting images of Muslims living well in America. Similarly, President Donald Trump’s tweets that prove to be demonstrably false undercut American credibility and reduce its soft power.

The effectiveness of public diplomacy is judged by the number of minds changed (as measured by interviews or polls), not dollars spent. It is interesting to note that polls and the Portland index of the Soft Power 30show a decline in American soft power since the beginning of the Trump administration. Tweets can help to set the global agenda, but they do not produce soft power if they are not credible.

Now the rapidly advancing technology of artificial intelligence or machine learning is accelerating all of these processes. Robotic messages are often difficult to detect. But it remains to be seen whether credibility and a compelling narrative can be fully automated….(More)”.

Data Protection and e-Privacy: From Spam and Cookies to Big Data, Machine Learning and Profiling


Chapter by Lilian Edwards in L Edwards ed Law, Policy and the Internet (Hart , 2018): “In this chapter, I examine in detail how data subjects are tracked, profiled and targeted by their activities on line and, increasingly, in the “offline” world as well. Tracking is part of both commercial and state surveillance, but in this chapter I concentrate on the former. The European law relating to spam, cookies, online behavioural advertising (OBA), machine learning (ML) and the Internet of Things (IoT) is examined in detail, using both the GDPR and the forthcoming draft ePrivacy Regulation. The chapter concludes by examining both code and law solutions which might find a way forward to protect user privacy and still enable innovation, by looking to paradigms not based around consent, and less likely to rely on a “transparency fallacy”. Particular attention is drawn to the new work around Personal Data Containers (PDCs) and distributed ML analytics….(More)”.

The Open Revolution: Rewriting the rules of the information age


Book by Rufus Pollock: “Forget everything you think you know about the digital age. It’s not about privacy, surveillance, AI or blockchain—it’s about ownership. Because, in a digital age, who owns information controls the future.

In this urgent and provocative book, Rufus Pollock shows how today’s “Closed” digital economy is the source of problems ranging from growing inequality, to unaffordable medicines, to the power of a handful of tech monopolies to control how we think and vote. He proposes a solution that charts a path to a more equitable, innovative and profitable future for all….(More)”.

The distributed power of smartphones for medical research


Adi Gaskell: “One of the more significant areas of promise in health technology is the ability for data to be generated by us as individuals, and for AI to provide insights based upon this live stream of lifestyle data.  An example of what’s possible comes via a project researchers at Imperial College London have undertaken with the Vodafone Foundation.

The project aims to tap into the power of users smartphones to crunch cancer related data whilst they sleep.  Such distributed computing projects have been popular for some time, but this is one of the first to utilize the power in our smartphones.

The rationale for the project is identical to that of the early distributed computing ventures, such as SETI@Home, which utilized spare computing resources to process data from space.  The average smartphone contains a huge amount of computing power that generally lies dormant over night.

Dream Lab

Users participate by downloading the DreamLab app onto their phone and run it for six hours overnight as the phone charges.  The sleep downloads a small packet of data overnight, with the processors in the phone then running millions of calculations, uploading the results to a central server, and clearing the data from the phone.

The app has already been used in Australia, with researchers using it to crunch data for pancreatic cancer, and is now ready to be used for the first time in Europe.  If they can secure 100,000 users running the app each night, the team can process as much data as a single desktop computer could process in 100 years.

“Through harnessing distributed computing power, DreamLab is helping to make personalised medicine a reality,” the researchers say.  “This project demonstrates how Imperial’s innovative research partnerships with corporate partners and members of the public are working together to tackle some of the biggest problems we face today, generating real societal impact.”…(More)”.