How Can We Overcome the Challenge of Biased and Incomplete Data?


Knowledge@Wharton: “Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.  

In a conversation with Knowledge@Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases….

….Knowledge@Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?

Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you….(More)”.

107 Years Later, The Titanic Sinking Helps Train Problem-Solving AI


Kiona N. Smith at Forbes: “What could the 107-year-old tragedy of the Titanic possibly have to do with modern problems like sustainable agriculture, human trafficking, or health insurance premiums? Data turns out to be the common thread. The modern world, for better or or worse, increasingly turns to algorithms to look for patterns in the data and and make predictions based on those patterns. And the basic methods are the same whether the question they’re trying to answer is “Would this person survive the Titanic sinking?” or “What are the most likely routes for human trafficking?”

An Enduring Problem

Predicting survival at sea based on the Titanic dataset is a standard practice problem for aspiring data scientists and programmers. Here’s the basic challenge: feed your algorithm a portion of the Titanic passenger list, which includes some basic variables describing each passenger and their fate. From that data, the algorithm (if you’ve programmed it well) should be able to draw some conclusions about which variables made a person more likely to live or die on that cold April night in 1912. To test its success, you then give the algorithm the rest of the passenger list (minus the outcomes) and see how well it predicts their fates.

Online communities like Kaggle.com have held competitions to see who can develop the algorithm that predicts survival most accurately, and it’s also a common problem presented to university classes. The passenger list is big enough to be useful, but small enough to be manageable for beginners. There’s a simple set out of outcomes — life or death — and around a dozen variables to work with, so the problem is simple enough for beginners to tackle but just complex enough to be interesting. And because the Titanic’s story is so famous, even more than a century later, the problem still resonates.

“It’s interesting to see that even in such a simple problem as the Titanic, there are nuggets,” said Sagie Davidovich, Co-Founder & CEO of SparkBeyond, who used the Titanic problem as an early test for SparkBeyond’s AI platform and still uses it as a way to demonstrate the technology to prospective customers….(More)”.

Africa must reap the benefits of its own data


Tshilidzi Marwala at Business Insider: “Twenty-two years ago when I was a doctoral student in artificial intelligence (AI) at the University of Cambridge, I had to create all the AI algorithms I needed to understand the complex phenomena related to this field.

For starters, AI is a computer software that performs intelligent tasks that normally require human beings, while an algorithm is a set of rules that instruct a computer to execute specific tasks. In that era, the ability to create AI algorithms was more important than the ability to acquire and use data.

Google has created an open-source library called TensorFlow, which contains all the developed AI algorithms. This way Google wants people to develop applications (apps) using their software, with the payoff being that Google will collect data on any individual using the apps developed with TensorFlow.

Today, an AI algorithm is not a competitive advantage but data is. The World Economic Forum calls data the new “oxygen”, while Chinese AI specialist Kai-Fu Lee calls it the new “oil”.

Africa’s population is increasing faster than in any region in the world. The continent has a population of 1.3-billion people and a total nominal GDP of $2.3-trillion. This increase in the population is in effect an increase in data, and if data is the new oil, it is akin to an increase in oil reserve.

Even oil-rich countries such as Saudi Arabia do not experience an increase in their oil reserve. How do we as Africans take advantage of this huge amount of data?

There are two categories of data in Africa: heritage and personal. Heritage data resides in society, whereas personal data resides in individuals. Heritage data includes data gathered from our languages, emotions and accents. Personal data includes health, facial and fingerprint data.

Facebook, Amazon, Apple, Netflix and Google are data companies. They trade data to advertisers, banks and political parties, among others. For example, the controversial company Cambridge Analytica harvested Facebook data to influence the presidential election that potentially contributed to Donald Trump’s victory in the US elections.

The company Google collects language data to build an application called Google Translate that translates from one language to another. This app claims to cover African languages such as Zulu, Yoruba and Swahili. Google Translate is less effective in handling African languages than it is in handling European and Asian languages.

Now, how do we capitalise on our language heritage to create economic value? We need to build our own language database and create our own versions of Google Translate.

An important area is the creation of an African emotion database. Different cultures exhibit emotions differently. These are very important in areas such as safety of cars and aeroplanes. If we can build a system that can read pilots’ emotions, this would enable us to establish if a pilot is in a good state of mind to operate an aircraft, which would increase safety.

To capitalise on the African emotion database, we should create a data bank that captures emotions of African people in various parts of the continent, and then use this database to create AI apps to read people’s emotions. Mercedes-Benz has already implemented the “Attention Assist”, which alerts drivers to fatigue.

Another important area is the creation of an African health database. AI algorithms are able to diagnose diseases better than human doctors. However, these algorithms depend on the availability of data. To capitalise on this, we need to collect such data and use it to build algorithms that will be able to augment medical care….(More)”.

Beyond Bias: Re-Imagining the Terms of ‘Ethical AI’ in Criminal Law


Paper by Chelsea Barabas: “Data-driven decision-making regimes, often branded as “artificial intelligence,” are rapidly proliferating across the US criminal justice system as a means of predicting and managing the risk of crime and addressing accusations of discriminatory practices. These data regimes have come under increased scrutiny, as critics point out the myriad ways that they can reproduce or even amplify pre-existing biases in the criminal justice system. This essay examines contemporary debates regarding the use of “artificial intelligence” as a vehicle for criminal justice reform, by closely examining two general approaches to, what has been widely branded as, “algorithmic fairness” in criminal law: 1) the development of formal fairness criteria and accuracy measures that illustrate the trade-offs of different algorithmic interventions and 2) the development of “best practices” and managerialist standards for maintaining a baseline of accuracy, transparency and validity in these systems.

The essay argues that attempts to render AI-branded tools more accurate by addressing narrow notions of “bias,” miss the deeper methodological and epistemological issues regarding the fairness of these tools. The key question is whether predictive tools reflect and reinforce punitive practices that drive disparate outcomes, and how data regimes interact with the penal ideology to naturalize these practices. The article concludes by calling for an abolitionist understanding of the role and function of the carceral state, in order to fundamentally reformulate the questions we ask, the way we characterize existing data, and how we identify and fill gaps in existing data regimes of the carceral state….(More)”

MegaPixels


About: “…MegaPixels is an art and research project first launched in 2017 for an installation at Tactical Technology Collective’s GlassRoom about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a commission by Elevate Arts festival in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.

MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the several of the same technology companies who have created datasets presented on this site.

MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing an academic paper, MegaPixels is a website-first research project, with an academic publication to follow.

One of the main focuses of the dataset investigations presented on this site is to uncover where funding originated. Because of our emphasis on other researcher’s funding sources, it is important that we are transparent about our own….(More)”.

San Francisco becomes the first US city to ban facial recognition by government agencies


Colin Lecher at The Verge: “In a first for a city in the United States, San Francisco has voted to ban its government agencies from using facial recognition technology.

The city’s Board of Supervisors voted eight to one to approve the proposal, set to take effect in a month, that would bar city agencies, including law enforcement, from using the tool. The ordinance would also require city agencies to get board approval for their use of surveillance technology, and set up audits of surveillance tech already in use. Other cities have approved similar transparency measures.“

The plan, called the Stop Secret Surveillance Ordinance, was spearheaded by Supervisor Aaron Peskin. In a statement read ahead of the vote, Peskin said it was “an ordinance about having accountability around surveillance technology.”

“This is not an anti-technology policy,” he said, stressing that many tools used by law enforcement are still important to the city’s security. Still, he added, facial recognition is “uniquely dangerous and oppressive.”

The ban comes amid a broader debate over facial recognition, which can be used to rapidly identify people and has triggered new questions about civil liberties. Experts have raised specific concerns about the tools, as studies have demonstrated instances of troubling bias and error rates.

Microsoft, which offers facial recognition tools, has called for some form of regulation for the technology — but how, exactly, to regulate the tool has been contested. Proposals have ranged from light regulation to full moratoriums. Legislation has largely stalled, however.

San Francisco’s decision will inevitably be used as an example as the debate continues and other cities and states decide whether and how to regulate facial recognition. Civil liberties groups like the ACLU of Northern California have already thrown their support behind the San Francisco plan, while law enforcement in the area has pushed back….(More)”.

How AI could save lives without spilling medical secrets


Will Knight at MIT Technology Review: “The potential for artificial intelligence to transform health care is huge, but there’s a big catch.

AI algorithms will need vast amounts of medical data on which to train before machine learning can deliver powerful new ways to spot and understand the cause of disease. That means imagery, genomic information, or electronic health records—all potentially very sensitive information.

That’s why researchers are working on ways to let AI learn from large amounts of medical data while making it very hard for that data to leak.

One promising approach is now getting its first big test at Stanford Medical School in California. Patients there can choose to contribute their medical data to an AI system that can be trained to diagnose eye disease without ever actually accessing their personal details.

Participants submit ophthalmology test results and health record data through an app. The information is used to train a machine-learning model to identify signs of eye disease in the images. But the data is protected by technology developed by Oasis Labs, a startup spun out of UC Berkeley, which guarantees that the information cannot be leaked or misused. The startup was granted permission by regulators to start the trial last week.

The sensitivity of private patient data is a looming problem. AI algorithms trained on data from different hospitals could potentially diagnose illness, prevent disease, and extend lives. But in many countries medical records cannot easily be shared and fed to these algorithms for legal reasons. Research on using AI to spot disease in medical images or data usually involves relatively small data sets, which greatly limits the technology’s promise….

Oasis stores the private patient data on a secure chip, designed in collaboration with other researchers at Berkeley. The data remains within the Oasis cloud; outsiders are able to run algorithms on the data, and receive the results, without its ever leaving the system. A smart contractsoftware that runs on top of a blockchain—is triggered when a request to access the data is received. This software logs how the data was used and also checks to make sure the machine-learning computation was carried out correctly….(More)”.

Artificial Intelligence and Digital Repression: Global Challenges to Governance


Paper by Steven Feldstein: “Across the world, artificial intelligence (AI) is showing its potential for abetting repressive regimes and upending the relationship between citizen and state, thereby exacerbating a global resurgence of authoritarianism. AI is a component in a broader ecosystem of digital repression, but it is relevant to several different techniques, including surveillance, censorship, disinformation, and cyber attacks. AI offers three distinct advantages to autocratic leaders: it helps solve principal-agent loyalty problems, it offers substantial cost-efficiencies over traditional means of surveillance, and it is particularly effective against external regime challenges. China is a key proliferator of AI technology to authoritarian and illiberal regimes; such proliferation is an important component of Chinese geopolitical strategy. To counter the spread of high-tech repression abroad, as well as potential abuses at home, policy makers in democratic states must think seriously about how to mitigate harms and to shape better practices….(More)”

The future of work? Work of the future!


European Commission: “While historical evidence suggests that previous waves of automation have been overwhelmingly positive for the economy and society, AI is in a different league, with the potential to be much more disruptive. It builds upon other digital technologies but also brings about and amplifies major socioeconomic changes of its own.

What do recent technological developments in AI and robotisation mean for the economy, businesses and jobs? Should we be worried or excited? Which jobs will be destroyed and which new ones created? What should education systems, businesses, governments and social partners do to manage the coming transition successfully?
These are some of the questions considered by Michel Servoz, Senior Adviser on Artificial Intelligence, Robotics and the Future of Labour, in this in-depth study requested by European Commission President Jean-Claude Juncker….(More)”.

Digital inequalities in the age of artificial intelligence and big data


Paper by Christoph Lutz: “In this literature review, I summarize key concepts and findings from the rich academic literature on digital inequalities. I propose that digital inequalities research should look more into labor‐ and big data‐related questions such as inequalities in online labor markets and the negative effects of algorithmic decision‐making for vulnerable population groups.

The article engages with the sociological literature on digital inequalities and explains the general approach to digital inequalities, based on the distinction of first‐, second‐, and third‐level digital divides. First, inequalities in access to digital technologies are discussed. This discussion is extended to emerging technologies, including the Internet‐of‐things and artificial intelligence‐powered systems such as smart speakers. Second, inequalities in digital skills and technology use are reviewed and connected to the discourse on new forms of work such as the sharing economy or gig economy. Third and finally, the discourse on the outcomes, in the form of benefits or harms, from digital technology use is taken up.

Here, I propose to integrate the digital inequalities literature more strongly with critical algorithm studies and recent discussions about datafication, digital footprints, and information privacy….(More)”.