Data & Policy: A new venue to study and explore policy–data interaction


Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

Introducing ‘AI Commons’: A framework for collaboration to achieve global impact


Press Release: “Last week’s 3rd annual AI for Good Global Summit once again showcased the growing number of Artificial Intelligence (AI) projects with promise to advance the United Nations Sustainable Development Goals (SDGs).

Now, using the Summit’s momentum, AI innovators and humanitarian leaders are prepared to take the ‘AI for Good’ movement to the next level.

They are working together to launch an ‘AI Commons’ that aims to scale AI for Good projects and maximize their impact across the world.

The AI Commons will enable AI adopters to connect with AI specialists and data owners to align incentives for innovation and develop AI solutions to precisely defined problems.

“The concept of AI Commons has developed over three editions of the Summit and is now motivating implementation,” said ITU Secretary-General Houlin Zhao in closing remarks to the summit. “AI and data need to be a shared resource if we are serious about scaling AI for good. The community supporting the Summit is creating infrastructure to scale-up their collaboration − to convert the principles underlying the Summit into global impact.”…

The AI Commons will provide an open framework for collaboration, a decentralized system to democratize problem solving with AI.

It aims to be a “knowledge space”, says Banifatemi, answering a key question: “How can problem solving with AI become common knowledge?”

“The goal is to be an open initiative, like a Linux effort, like an open-source network, where everyone can participate and we jointly share and we create an abundance of knowledge, knowledge of how we can solve problems with AI,” said Banifatemi.

AI development and application will build on the state of the art, enabling AI solutions to scale with the help of shared datasets, testing and simulation environments, AI models and associated software, and storage and computing resources….(More)”.

AI and the Global South: Designing for Other Worlds


Chapter by Chinmayi Arun in Markus D. Dubber, Frank Pasquale, and Sunit Das (eds.), The Oxford Handbook of Ethics of AI: “This chapter is about the ways in which AI affects, and will continue to affect, the Global South. It highlights why the design and deployment of AI in the South should concern us. 

Towards this, it discusses what is meant by the South. The term has a history connected with the ‘Third World’ and has referred to countries that share post-colonial history and certain development goals. However scholars have expanded and refined on it to include different kinds of marginal, disenfranchised populations such that the South is now a plural concept – there are Souths. 

The risks of the ways in which AI affects Southern populations include concerns of discrimination, bias, oppression, exclusion and bad design. These can be exacerbated in the context of vulnerable populations, especially those without access to human rights law or institutional remedies. This Chapter outlines these risks as well as the international human rights law that is applicable. It argues that a human rights, centric, inclusive, empowering context-driven approach is necessary….(More)”.

The Tricky Ethics of Using YouTube Videos for Academic Research


Jane C.Hu in P/S Magazine: “…But just because something is legal doesn’t mean it’s ethical. That doesn’t mean it’s necessarily unethical, either, but it’s worth asking questions about how and why researchers use social media posts, and whether those uses could be harmful. I was once a researcher who had to obtain human-subjects approval from a university institutional review board, and I know it can be a painstaking application process with long wait times. Collecting data from individuals takes a long time too. If you could just sub in YouTube videos in place of collecting your own data, that saves time, money, and effort. But that could be at the expense of the people whose data you’re scraping.

But, you might say, if people don’t want to be studied online, then they shouldn’t post anything. But most people don’t fully understand what “publicly available” really means or its ramifications. “You might know intellectually that technically anyone can see a tweet, but you still conceptualize your audience as being your 200 Twitter followers,” Fiesler says. In her research, she’s found that the majority of people she’s polled have no clue that researchers study public tweets.

Some may disagree that it’s researchers’ responsibility to work around social media users’ ignorance, but Fiesler and others are calling for their colleagues to be more mindful about any work that uses publicly available data. For instance, Ashley Patterson, an assistant professor of language and literacy at Penn State University, ultimately decided to use YouTube videos in her dissertation work on biracial individuals’ educational experiences. That’s a decision she arrived at after carefully considering her options each step of the way. “I had to set my own levels of ethical standards and hold myself to it, because I knew no one else would,” she says. One of Patterson’s first steps was to ask herself what YouTube videos would add to her work, and whether there were any other ways to collect her data. “It’s not a matter of whether it makes my life easier, or whether it’s ‘just data out there’ that would otherwise go to waste. The nature of my question and the response I was looking for made this an appropriate piece [of my work],” she says.

Researchers may also want to consider qualitative, hard-to-quantify contextual cues when weighing ethical decisions. What kind of data is being used? Fiesler points out that tweets about, say, a television show are way less personal than ones about a sensitive medical condition. Anonymized written materials, like Facebook posts, could be less invasive than using someone’s face and voice from a YouTube video. And the potential consequences of the research project are worth considering too. For instance, Fiesler and other critics have pointed out that researchers who used YouTube videos of people documenting their experience undergoing hormone replacement therapy to train an artificial intelligence to identify trans people could be putting their unwitting participants in danger. It’s not obvious how the results of Speech2Face will be used, and, when asked for comment, the paper’s researchers said they’d prefer to quote from their paper, which pointed to a helpful purpose: providing a “representative face” based on the speaker’s voice on a phone call. But one can also imagine dangerous applications, like doxing anonymous YouTubers.

One way to get ahead of this, perhaps, is to take steps to explicitly inform participants their data is being used. Fiesler says that, when her team asked people how they’d feel after learning their tweets had been used for research, “not everyone was necessarily super upset, but most people were surprised.” They also seemed curious; 85 percent of participants said that, if their tweet were included in research, they’d want to read the resulting paper. “In human-subjects research, the ethical standard is informed consent, but inform and consent can be pulled apart; you could potentially inform people without getting their consent,” Fiesler suggests….(More)”.

How Can We Overcome the Challenge of Biased and Incomplete Data?


Knowledge@Wharton: “Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.  

In a conversation with Knowledge@Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases….

….Knowledge@Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?

Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you….(More)”.

107 Years Later, The Titanic Sinking Helps Train Problem-Solving AI


Kiona N. Smith at Forbes: “What could the 107-year-old tragedy of the Titanic possibly have to do with modern problems like sustainable agriculture, human trafficking, or health insurance premiums? Data turns out to be the common thread. The modern world, for better or or worse, increasingly turns to algorithms to look for patterns in the data and and make predictions based on those patterns. And the basic methods are the same whether the question they’re trying to answer is “Would this person survive the Titanic sinking?” or “What are the most likely routes for human trafficking?”

An Enduring Problem

Predicting survival at sea based on the Titanic dataset is a standard practice problem for aspiring data scientists and programmers. Here’s the basic challenge: feed your algorithm a portion of the Titanic passenger list, which includes some basic variables describing each passenger and their fate. From that data, the algorithm (if you’ve programmed it well) should be able to draw some conclusions about which variables made a person more likely to live or die on that cold April night in 1912. To test its success, you then give the algorithm the rest of the passenger list (minus the outcomes) and see how well it predicts their fates.

Online communities like Kaggle.com have held competitions to see who can develop the algorithm that predicts survival most accurately, and it’s also a common problem presented to university classes. The passenger list is big enough to be useful, but small enough to be manageable for beginners. There’s a simple set out of outcomes — life or death — and around a dozen variables to work with, so the problem is simple enough for beginners to tackle but just complex enough to be interesting. And because the Titanic’s story is so famous, even more than a century later, the problem still resonates.

“It’s interesting to see that even in such a simple problem as the Titanic, there are nuggets,” said Sagie Davidovich, Co-Founder & CEO of SparkBeyond, who used the Titanic problem as an early test for SparkBeyond’s AI platform and still uses it as a way to demonstrate the technology to prospective customers….(More)”.

Africa must reap the benefits of its own data


Tshilidzi Marwala at Business Insider: “Twenty-two years ago when I was a doctoral student in artificial intelligence (AI) at the University of Cambridge, I had to create all the AI algorithms I needed to understand the complex phenomena related to this field.

For starters, AI is a computer software that performs intelligent tasks that normally require human beings, while an algorithm is a set of rules that instruct a computer to execute specific tasks. In that era, the ability to create AI algorithms was more important than the ability to acquire and use data.

Google has created an open-source library called TensorFlow, which contains all the developed AI algorithms. This way Google wants people to develop applications (apps) using their software, with the payoff being that Google will collect data on any individual using the apps developed with TensorFlow.

Today, an AI algorithm is not a competitive advantage but data is. The World Economic Forum calls data the new “oxygen”, while Chinese AI specialist Kai-Fu Lee calls it the new “oil”.

Africa’s population is increasing faster than in any region in the world. The continent has a population of 1.3-billion people and a total nominal GDP of $2.3-trillion. This increase in the population is in effect an increase in data, and if data is the new oil, it is akin to an increase in oil reserve.

Even oil-rich countries such as Saudi Arabia do not experience an increase in their oil reserve. How do we as Africans take advantage of this huge amount of data?

There are two categories of data in Africa: heritage and personal. Heritage data resides in society, whereas personal data resides in individuals. Heritage data includes data gathered from our languages, emotions and accents. Personal data includes health, facial and fingerprint data.

Facebook, Amazon, Apple, Netflix and Google are data companies. They trade data to advertisers, banks and political parties, among others. For example, the controversial company Cambridge Analytica harvested Facebook data to influence the presidential election that potentially contributed to Donald Trump’s victory in the US elections.

The company Google collects language data to build an application called Google Translate that translates from one language to another. This app claims to cover African languages such as Zulu, Yoruba and Swahili. Google Translate is less effective in handling African languages than it is in handling European and Asian languages.

Now, how do we capitalise on our language heritage to create economic value? We need to build our own language database and create our own versions of Google Translate.

An important area is the creation of an African emotion database. Different cultures exhibit emotions differently. These are very important in areas such as safety of cars and aeroplanes. If we can build a system that can read pilots’ emotions, this would enable us to establish if a pilot is in a good state of mind to operate an aircraft, which would increase safety.

To capitalise on the African emotion database, we should create a data bank that captures emotions of African people in various parts of the continent, and then use this database to create AI apps to read people’s emotions. Mercedes-Benz has already implemented the “Attention Assist”, which alerts drivers to fatigue.

Another important area is the creation of an African health database. AI algorithms are able to diagnose diseases better than human doctors. However, these algorithms depend on the availability of data. To capitalise on this, we need to collect such data and use it to build algorithms that will be able to augment medical care….(More)”.

Beyond Bias: Re-Imagining the Terms of ‘Ethical AI’ in Criminal Law


Paper by Chelsea Barabas: “Data-driven decision-making regimes, often branded as “artificial intelligence,” are rapidly proliferating across the US criminal justice system as a means of predicting and managing the risk of crime and addressing accusations of discriminatory practices. These data regimes have come under increased scrutiny, as critics point out the myriad ways that they can reproduce or even amplify pre-existing biases in the criminal justice system. This essay examines contemporary debates regarding the use of “artificial intelligence” as a vehicle for criminal justice reform, by closely examining two general approaches to, what has been widely branded as, “algorithmic fairness” in criminal law: 1) the development of formal fairness criteria and accuracy measures that illustrate the trade-offs of different algorithmic interventions and 2) the development of “best practices” and managerialist standards for maintaining a baseline of accuracy, transparency and validity in these systems.

The essay argues that attempts to render AI-branded tools more accurate by addressing narrow notions of “bias,” miss the deeper methodological and epistemological issues regarding the fairness of these tools. The key question is whether predictive tools reflect and reinforce punitive practices that drive disparate outcomes, and how data regimes interact with the penal ideology to naturalize these practices. The article concludes by calling for an abolitionist understanding of the role and function of the carceral state, in order to fundamentally reformulate the questions we ask, the way we characterize existing data, and how we identify and fill gaps in existing data regimes of the carceral state….(More)”

MegaPixels


About: “…MegaPixels is an art and research project first launched in 2017 for an installation at Tactical Technology Collective’s GlassRoom about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a commission by Elevate Arts festival in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.

MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the several of the same technology companies who have created datasets presented on this site.

MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing an academic paper, MegaPixels is a website-first research project, with an academic publication to follow.

One of the main focuses of the dataset investigations presented on this site is to uncover where funding originated. Because of our emphasis on other researcher’s funding sources, it is important that we are transparent about our own….(More)”.

San Francisco becomes the first US city to ban facial recognition by government agencies


Colin Lecher at The Verge: “In a first for a city in the United States, San Francisco has voted to ban its government agencies from using facial recognition technology.

The city’s Board of Supervisors voted eight to one to approve the proposal, set to take effect in a month, that would bar city agencies, including law enforcement, from using the tool. The ordinance would also require city agencies to get board approval for their use of surveillance technology, and set up audits of surveillance tech already in use. Other cities have approved similar transparency measures.“

The plan, called the Stop Secret Surveillance Ordinance, was spearheaded by Supervisor Aaron Peskin. In a statement read ahead of the vote, Peskin said it was “an ordinance about having accountability around surveillance technology.”

“This is not an anti-technology policy,” he said, stressing that many tools used by law enforcement are still important to the city’s security. Still, he added, facial recognition is “uniquely dangerous and oppressive.”

The ban comes amid a broader debate over facial recognition, which can be used to rapidly identify people and has triggered new questions about civil liberties. Experts have raised specific concerns about the tools, as studies have demonstrated instances of troubling bias and error rates.

Microsoft, which offers facial recognition tools, has called for some form of regulation for the technology — but how, exactly, to regulate the tool has been contested. Proposals have ranged from light regulation to full moratoriums. Legislation has largely stalled, however.

San Francisco’s decision will inevitably be used as an example as the debate continues and other cities and states decide whether and how to regulate facial recognition. Civil liberties groups like the ACLU of Northern California have already thrown their support behind the San Francisco plan, while law enforcement in the area has pushed back….(More)”.