We Need Transparency in Algorithms, But Too Much Can Backfire


Kartik Hosanagar and Vivian Jair at Harvard Business Review: “In 2013, Stanford professor Clifford Nass faced a student revolt. Nass’s students claimed that those in one section of his technology interface course received higher grades on the final exam than counterparts in another. Unfortunately, they were right: two different teaching assistants had graded the two different sections’ exams, and one had been more lenient than the other. Students with similar answers had ended up with different grades.

Nass, a computer scientist, recognized the unfairness and created a technical fix: a simple statistical model to adjust scores, where students got a certain percentage boost on their final mark when graded by a TA known to give grades that percentage lower than average. In the spirit of openness, Nass sent out emails to the class with a full explanation of his algorithm. Further complaints poured in, some even angrier than before. Where had he gone wrong?…

Kizilcec had in fact tested three levels of transparency: low and medium but also high, where the students got not only a paragraph explaining the grading process but also their raw peer-graded scores and how these were each precisely adjusted by the algorithm to get to a final grade. And this is where the results got more interesting. In the experiment, while medium transparency increased trust significantly, high transparency eroded it completely, to the point where trust levels were either equal to or lower than among students experiencing low transparency.

Making Modern AI Transparent: A Fool’s Errand?

 What are businesses to take home from this experiment?  It suggests that technical transparency – revealing the source code, inputs, and outputs of the algorithm – can build trust in many situations. But most algorithms in the world today are created and managed by for-profit companies, and many businesses regard their algorithms as highly valuable forms of intellectual property that must remain in a “black box.” Some lawmakers have proposed a compromise, suggesting that the source code be revealed to regulators or auditors in the event of a serious problem, and this adjudicator will assure consumers that the process is fair.

This approach merely shifts the burden of belief from the algorithm itself to the regulators. This may a palatable solution in many arenas: for example, few of us fully understand financial markets, so we trust the SEC to take on oversight. But in a world where decisions large and small, personal and societal, are being handed over to algorithms, this becomes less acceptable.

Another problem with technical transparency is that it makes algorithms vulnerable to gaming. If an instructor releases the complete source code for an algorithm grading student essays, it becomes easy for students to exploit loopholes in the code:  maybe, for example, the algorithm seeks evidence that the students have done research by looking for phrases such as “according to published research.” A student might then deliberately use this language at the start of every paragraph in her essay.

But the biggest problem is that modern AI is making source code – transparent or not – less relevant compared with other factors in algorithmic functioning. Specifically, machine learning algorithms – and deep learning algorithms in particular – are usually built on just a few hundred lines of code. The algorithms logic is mostly learned from training data and is rarely reflected in its source code. Which is to say, some of today’s best-performing algorithms are often the most opaque. High transparency might involve getting our heads around reams and reams of data – and then still only being able to guess at what lessons the algorithm has learned from it.

This is where Kizilcec’s work becomes relevant – a way to embrace rather than despair over deep learning’s impenetrability. His work shows that users will not trust black box models, but they don’t need – or even want – extremely high levels of transparency. That means responsible companies need not fret over what percentage of source code to reveal, or how to help users “read” massive datasets. Instead, they should work to provide basic insights on the factors driving algorithmic decisions….(More)”

Doing good data science


Mike Loukides, Hilary Mason and DJ Patil at O’Reilly: “(This post is the first in a series on data ethics) The hard thing about being an ethical data scientist isn’t understanding ethics. It’s the junction between ethical ideas and practice. It’s doing good data science.

There has been a lot of healthy discussion about data ethics lately. We want to be clear: that discussion is good, and necessary. But it’s also not the biggest problem we face. We already have good standards for data ethics. The ACM’s code of ethics, which dates back to 1993, is clear, concise, and surprisingly forward-thinking; 25 years later, it’s a great start for anyone thinking about ethics. The American Statistical Association has a good set of ethical guidelines for working with data. So, we’re not working in a vacuum.

And, while there are always exceptions, we believe that most people want to be fair. Data scientists and software developers don’t want to harm the people using their products. There are exceptions, of course; we call them criminals and con artists. Defining “fairness” is difficult, and perhaps impossible, given the many crosscutting layers of “fairness” that we might be concerned with. But we don’t have to solve that problem in advance, and it’s not going to be solved in a simple statement of ethical principles, anyway.

The problem we face is different: how do we put ethical principles into practice? We’re not talking about an abstract commitment to being fair. Ethical principles are worse than useless if we don’t allow them to change our practice, if they don’t have any effect on what we do day-to-day. For data scientists, whether you’re doing classical data analysis or leading-edge AI, that’s a big challenge. We need to understand how to build the software systems that implement fairness. That’s what we mean by doing good data science.

Any code of data ethics will tell you that you shouldn’t collect data from experimental subjects without informed consent. But that code won’t tell you how to implement “informed consent.” Informed consent is easy when you’re interviewing a few dozen people in person for a psychology experiment. Informed consent means something different when someone clicks on an item in an online catalog (hello, Amazon), and ads for that item start following them around ad infinitum. Do you use a pop-up to ask for permission to use their choice in targeted advertising? How many customers would you lose? Informed consent means something yet again when you’re asking someone to fill out a profile for a social site, and you might (or might not) use that data for any number of experimental purposes. Do you pop up a consent form in impenetrable legalese that basically says “we will use your data, but we don’t know for what”? Do you phrase this agreement as an opt-out, and hide it somewhere on the site where nobody will find it?…

To put ethical principles into practice, we need space to be ethical. We need the ability to have conversations about what ethics means, what it will cost, and what solutions to implement. As technologists, we frequently share best practices at conferences, write blog posts, and develop open source technologies—but we rarely discuss problems such as how to obtain informed consent.

There are several facets to this space that we need to think about.

First, we need corporate cultures in which discussions about fairness, about the proper use of data, and about the harm that can be done by inappropriate use of data can be considered. In turn, this means that we can’t rush products out the door without thinking about how they’re used. We can’t allow “internet time” to mean ignoring the consequences. Indeed, computer security has shown us the consequences of ignoring the consequences: many companies that have never taken the time to implement good security practices and safeguards are now paying with damage to their reputations and their finances. We need to do the same when thinking about issues like fairness, accountability, and unintended consequences….(More)”.

Let’s make private data into a public good


Article by Mariana Mazzucato: “The internet giants depend on our data. A new relationship between us and them could deliver real value to society….We should ask how the value of these companies has been created, how that value has been measured, and who benefits from it. If we go by national accounts, the contribution of internet platforms to national income (as measured, for example, by GDP) is represented by the advertisement-related services they sell. But does that make sense? It’s not clear that ads really contribute to the national product, let alone to social well-being—which should be the aim of economic activity. Measuring the value of a company like Google or Facebook by the number of ads it sells is consistent with standard neoclassical economics, which interprets any market-based transaction as signaling the production of some kind of output—in other words, no matter what the thing is, as long as a price is received, it must be valuable. But in the case of these internet companies, that’s misleading: if online giants contribute to social well-being, they do it through the services they provide to users, not through the accompanying advertisements.

This way we have of ascribing value to what the internet giants produce is completely confusing, and it’s generating a paradoxical result: their advertising activities are counted as a net contribution to national income, while the more valuable services they provide to users are not.

Let’s not forget that a large part of the technology and necessary data was created by all of us, and should thus belong to all of us. The underlying infrastructure that all these companies rely on was created collectively (via the tax dollars that built the internet), and it also feeds off network effects that are produced collectively. There is indeed no reason why the public’s data should not be owned by a public repository that sells the data to the tech giants, rather than vice versa. But the key issue here is not just sending a portion of the profits from data back to citizens but also allowing them to shape the digital economy in a way that satisfies public needs. Using big data and AI to improve the services provided by the welfare state—from health care to social housing—is just one example.

Only by thinking about digital platforms as collective creations can we construct a new model that offers something of real value, driven by public purpose. We’re never far from a media story that stirs up a debate about the need to regulate tech companies, which creates a sense that there’s a war between their interests and those of national governments. We need to move beyond this narrative. The digital economy must be subject to the needs of all sides; it’s a partnership of equals where regulators should have the confidence to be market shapers and value creators….(More)”.

Artificial Intelligence


Stanford Encyclopedia of Philosophy: “Artificial intelligence (AI) is the field devoted to building artificial animals (or at least artificial creatures that – in suitable contexts – appear to be animals) and, for many, artificial persons (or at least artificial creatures that – in suitable contexts – appear to be persons).[1] Such goals immediately ensure that AI is a discipline of considerable interest to many philosophers, and this has been confirmed (e.g.) by the energetic attempt, on the part of numerous philosophers, to show that these goals are in fact un/attainable. On the constructive side, many of the core formalisms and techniques used in AI come out of, and are indeed still much used and refined in, philosophy: first-order logic and its extensions; intensional logics suitable for the modeling of doxastic attitudes and deontic reasoning; inductive logic, probability theory, and probabilistic reasoning; practical reasoning and planning, and so on. In light of this, some philosophers conduct AI research and development as philosophy.

In the present entry, the history of AI is briefly recounted, proposed definitions of the field are discussed, and an overview of the field is provided. In addition, both philosophical AI (AI pursued as and out of philosophy) and philosophy of AI are discussed, via examples of both. The entry ends with some de rigueur speculative commentary regarding the future of AI….(More)”.

Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates


Marshall Allen at ProPublica: “With little public scrutiny, the health insurance industry has joined forces with data brokers to vacuum up personal details about hundreds of millions of Americans, including, odds are, many readers of this story. The companies are tracking your race, education level, TV habits, marital status, net worth. They’re collecting what you post on social media, whether you’re behind on your bills, what you order online. Then they feed this information into complicated computer algorithms that spit out predictions about how much your health care could cost them.

Are you a woman who recently changed your name? You could be newly married and have a pricey pregnancy pending. Or maybe you’re stressed and anxious from a recent divorce. That, too, the computer models predict, may run up your medical bills.

Are you a woman who’s purchased plus-size clothing? You’re considered at risk of depression. Mental health care can be expensive.

Low-income and a minority? That means, the data brokers say, you are more likely to live in a dilapidated and dangerous neighborhood, increasing your health risks.

“We sit on oceans of data,” said Eric McCulley, director of strategic solutions for LexisNexis Risk Solutions, during a conversation at the data firm’s booth. And he isn’t apologetic about using it. “The fact is, our data is in the public domain,” he said. “We didn’t put it out there.”

Insurers contend they use the information to spot health issues in their clients — and flag them so they get services they need. And companies like LexisNexis say the data shouldn’t be used to set prices. But as a research scientist from one company told me: “I can’t say it hasn’t happened.”

At a time when every week brings a new privacy scandal and worries abound about the misuse of personal information, patient advocates and privacy scholars say the insurance industry’s data gathering runs counter to its touted, and federally required, allegiance to patients’ medical privacy. The Health Insurance Portability and Accountability Act, or HIPAA, only protects medical information.

“We have a health privacy machine that’s in crisis,” said Frank Pasquale, a professor at the University of Maryland Carey School of Law who specializes in issues related to machine learning and algorithms. “We have a law that only covers one source of health information. They are rapidly developing another source.”…(More)”.

How Charities Are Using Artificial Intelligence to Boost Impact


Nicole Wallace at the Chronicle of Philanthropy: “The chaos and confusion of conflict often separate family members fleeing for safety. The nonprofit Refunite uses advanced technology to help loved ones reconnect, sometimes across continents and after years of separation.

Refugees register with the service by providing basic information — their name, age, birthplace, clan and subclan, and so forth — along with similar facts about the people they’re trying to find. Powerful algorithms search for possible matches among the more than 1.1 million individuals in the Refunite system. The analytics are further refined using the more than 2,000 searches that the refugees themselves do daily.

The goal: find loved ones or those connected to them who might help in the hunt. Since Refunite introduced the first version of the system in 2010, it has helped more than 40,000 people reconnect.

One factor complicating the work: Cultures define family lineage differently. Refunite co-founder Christopher Mikkelsen confronted this problem when he asked a boy in a refugee camp if he knew where his mother was. “He asked me, ‘Well, what mother do you mean?’ ” Mikkelsen remembers. “And I went, ‘Uh-huh, this is going to be challenging.’ ”

Fortunately, artificial intelligence is well suited to learn and recognize different family patterns. But the technology struggles with some simple things like distinguishing the image of a chicken from that of a car. Mikkelsen believes refugees in camps could offset this weakness by tagging photographs — “car” or “not car” — to help train algorithms. Such work could earn them badly needed cash: The group hopes to set up a system that pays refugees for doing such work.

“To an American, earning $4 a day just isn’t viable as a living,” Mikkelsen says. “But to the global poor, getting an access point to earning this is revolutionizing.”

Another group, Wild Me, a nonprofit created by scientists and technologists, has created an open-source software platform that combines artificial intelligence and image recognition, to identify and track individual animals. Using the system, scientists can better estimate the number of endangered animals and follow them over large expanses without using invasive techniques….

To fight sex trafficking, police officers often go undercover and interact with people trying to buy sex online. Sadly, demand is high, and there are never enough officers.

Enter Seattle Against Slavery. The nonprofit’s tech-savvy volunteers created chatbots designed to disrupt sex trafficking significantly. Using input from trafficking survivors and law-enforcement agencies, the bots can conduct simultaneous conversations with hundreds of people, engaging them in multiple, drawn-out conversations, and arranging rendezvous that don’t materialize. The group hopes to frustrate buyers so much that they give up their hunt for sex online….

A Philadelphia charity is using machine learning to adapt its services to clients’ needs.

Benefits Data Trust helps people enroll for government-assistance programs like food stamps and Medicaid. Since 2005, the group has helped more than 650,000 people access $7 billion in aid.

The nonprofit has data-sharing agreements with jurisdictions to access more than 40 lists of people who likely qualify for government benefits but do not receive them. The charity contacts those who might be eligible and encourages them to call the Benefits Data Trust for help applying….(More)”.

What is a data trust?


Essay by Jack Hardinges at ODI: “There are different interpretations of what a data trust is, or should be…

There’s not a well-used definition of ‘a data trust’, or even consensus on what one is. Much of the recent interest in data trusts in the UK has been fuelled by them being recommended as a way to ‘share data in a fair, safe and equitable way’ by a UK government-commissioned independent review into Artificial Intelligence (AI) in 2017. However, there has been wider international interest in the concept for some time.

At a very high level, the aim of data trusts appears to be to give people and organisations confidence when enabling access to data in ways that provide them with some value (either directly or indirectly) in return. Beyond that high level goal, there are a variety of thoughts about what form they should take. In our work so far, we’ve found different interpretations of the term ‘data trust’:

  • A data trust as a repeatable framework of terms and mechanisms.
  • A data trust as a mutual organisation.
  • A data trust as a legal structure.
  • A data trust as a store of data.
  • A data trust as public oversight of data access….(More)”

What if people were paid for their data?


The Economist: “Data Slavery” Jennifer Lyn Morone, an American artist, thinks this is the state in which most people now live. To get free online services, she laments, they hand over intimate information to technology firms. “Personal data are much more valuable than you think,” she says. To highlight this sorry state of affairs, Ms Morone has resorted to what she calls “extreme capitalism”: she registered herself as a company in Delaware in an effort to exploit her personal data for financial gain. She created dossiers containing different subsets of data, which she displayed in a London gallery in 2016 and offered for sale, starting at £100 ($135). The entire collection, including her health data and social-security number, can be had for £7,000.

Only a few buyers have taken her up on this offer and she finds “the whole thing really absurd”. ..Given the current state of digital affairs, in which the collection and exploitation of personal data is dominated by big tech firms, Ms Morone’s approach, in which individuals offer their data for sale, seems unlikely to catch on. But what if people really controlled their data—and the tech giants were required to pay for access? What would such a data economy look like?…

Labour, like data, is a resource that is hard to pin down. Workers were not properly compensated for labour for most of human history. Even once people were free to sell their labour, it took decades for wages to reach liveable levels on average. History won’t repeat itself, but chances are that it will rhyme, Mr Weyl predicts in “Radical Markets”, a provocative new book he has co-written with Eric Posner of the University of Chicago. He argues that in the age of artificial intelligence, it makes sense to treat data as a form of labour.

To understand why, it helps to keep in mind that “artificial intelligence” is something of a misnomer. Messrs Weyl and Posner call it “collective intelligence”: most AI algorithms need to be trained using reams of human-generated examples, in a process called machine learning. Unless they know what the right answers (provided by humans) are meant to be, algorithms cannot translate languages, understand speech or recognise objects in images. Data provided by humans can thus be seen as a form of labour which powers AI. As the data economy grows up, such data work will take many forms. Much of it will be passive, as people engage in all kinds of activities—liking social-media posts, listening to music, recommending restaurants—that generate the data needed to power new services. But some people’s data work will be more active, as they make decisions (such as labelling images or steering a car through a busy city) that can be used as the basis for training AI systems….

But much still needs to happen for personal data to be widely considered as labour, and paid for as such. For one thing, the right legal framework will be needed to encourage the emergence of a new data economy. The European Union’s new General Data Protection Regulation, which came into effect in May, already gives people extensive rights to check, download and even delete personal data held by companies. Second, the technology to keep track of data flows needs to become much more capable. Research to calculate the value of particular data to an AI service is in its infancy.

Third, and most important, people will have to develop a “class consciousness” as data workers. Most people say they want their personal information to be protected, but then trade it away for nearly nothing, something known as the “privacy paradox”. Yet things may be changing: more than 90% of Americans think being in control of who can get data on them is important, according to the Pew Research Centre, a think-tank….(More)”.

Ways to think about machine learning


Benedict Evans: “We’re now four or five years into the current explosion of machine learning, and pretty much everyone has heard of it. It’s not just that startups are forming every day or that the big tech platform companies are rebuilding themselves around it – everyone outside tech has read the Economist or BusinessWeek cover story, and many big companies have some projects underway. We know this is a Next Big Thing.

Going a step further, we mostly understand what neural networks might be, in theory, and we get that this might be about patterns and data. Machine learning lets us find patterns or structures in data that are implicit and probabilistic (hence ‘inferred’) rather than explicit, that previously only people and not computers could find. They address a class of questions that were previously ‘hard for computers and easy for people’, or, perhaps more usefully, ‘hard for people to describe to computers’. And we’ve seen some cool (or worrying, depending on your perspective) speech and vision demos.

I don’t think, though, that we yet have a settled sense of quite what machine learning means – what it will mean for tech companies or for companies in the broader economy, how to think structurally about what new things it could enable, or what machine learning means for all the rest of us, and what important problems it might actually be able to solve.

This isn’t helped by the term ‘artificial intelligence’, which tends to end any conversation as soon as it’s begun. As soon as we say ‘AI’, it’s as though the black monolith from the beginning of 2001 has appeared, and we all become apes screaming at it and shaking our fists. You can’t analyze ‘AI’.

Indeed, I think one could propose a whole list of unhelpful ways of talking about current developments in machine learning. For example:

  • Data is the new oil
  • Google and China (or Facebook, or Amazon, or BAT) have all the data
  • AI will take all the jobs
  • And, of course, saying AI itself.

More useful things to talk about, perhaps, might be:

  • Automation
  • Enabling technology layers
  • Relational databases. …(More).

We Need to Save Ignorance From AI


Christina Leuker and Wouter van den Bos in Nautilus:  “After the fall of the Berlin Wall, East German citizens were offered the chance to read the files kept on them by the Stasi, the much-feared Communist-era secret police service. To date, it is estimated that only 10 percent have taken the opportunity.

In 2007, James Watson, the co-discoverer of the structure of DNA, asked that he not be given any information about his APOE gene, one allele of which is a known risk factor for Alzheimer’s disease.

Most people tell pollsters that, given the choice, they would prefer not to know the date of their own death—or even the future dates of happy events.

Each of these is an example of willful ignorance. Socrates may have made the case that the unexamined life is not worth living, and Hobbes may have argued that curiosity is mankind’s primary passion, but many of our oldest stories actually describe the dangers of knowing too much. From Adam and Eve and the tree of knowledge to Prometheus stealing the secret of fire, they teach us that real-life decisions need to strike a delicate balance between choosing to know, and choosing not to.

But what if a technology came along that shifted this balance unpredictably, complicating how we make decisions about when to remain ignorant? That technology is here: It’s called artificial intelligence.

AI can find patterns and make inferences using relatively little data. Only a handful of Facebook likes are necessary to predict your personality, race, and gender, for example. Another computer algorithm claims it can distinguish between homosexual and heterosexual men with 81 percent accuracy, and homosexual and heterosexual women with 71 percent accuracy, based on their picture alone. An algorithm named COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) can predict criminal recidivism from data like juvenile arrests, criminal records in the family, education, social isolation, and leisure activities with 65 percent accuracy….

Recently, though, the psychologist Ralph Hertwig and legal scholar Christoph Engel have published an extensive taxonomy of motives for deliberate ignorance. They identified two sets of motives, in particular, that have a particular relevance to the need for ignorance in the face of AI.

The first set of motives revolves around impartiality and fairness. Simply put, knowledge can sometimes corrupt judgment, and we often choose to remain deliberately ignorant in response. For example, peer reviews of academic papers are usually anonymous. Insurance companies in most countries are not permitted to know all the details of their client’s health before they enroll; they only know general risk factors. This type of consideration is particularly relevant to AI, because AI can produce highly prejudicial information….(More)”.