Bernhard Rieder at Digital Culture & Society: “This paper develops a critique of Big Data and associated analytical techniques by focusing not on errors – skewed or imperfect datasets, false positives, underrepresentation, and so forth – but on data mining that works. After a quick framing of these practices as interested readings of reality, I address the question of how data analytics and, in particular, machine learning reveal and operate on the structured and unequal character of contemporary societies, installing “economic morality” (Allen 2012) as the central guiding principle. Rather than critiquing the methods behind Big Data, I inquire into the way these methods make the many differences in decentred, non-traditional societies knowable and, as a consequence, ready for profitable distinction and decision-making. The objective, in short, is to add to our understanding of the “profound ideological role at the intersection of sociality, research, and commerce” (van Dijck 2014: 201) the collection and analysis of large quantities of multifarious data have come to play. Such an understanding needs to embed Big Data in a larger, more fundamental critique of the societal context it operates in….(More)”.
Discrimination by algorithm: scientists devise test to detect AI bias
Hannah Devlin at the Guardian: “There was the voice recognition software that struggled to understand women, the crime prediction algorithm that targeted black neighbourhoods and the online ad platform which was more likely to show men highly paid executive jobs.
Concerns have been growing about AI’s so-called “white guy problem” and now scientists have devised a way to test whether an algorithm is introducing gender or racial biases into decision-making.
Mortiz Hardt, a senior research scientist at Google and a co-author of the paper, said: “Decisions based on machine learning can be both incredibly useful and have a profound impact on our lives … Despite the need, a vetted methodology in machine learning for preventing this kind of discrimination based on sensitive attributes has been lacking.”
The paper was one of several on detecting discrimination by algorithms to be presented at the Neural Information Processing Systems (NIPS) conference in Barcelona this month, indicating a growing recognition of the problem.
Nathan Srebro, a computer scientist at the Toyota Technological Institute at Chicago and co-author, said: “We are trying to enforce that you will not have inappropriate bias in the statistical prediction.”
The test is aimed at machine learning programs, which learn to make predictions about the future by crunching through vast quantities of existing data. Since the decision-making criteria are essentially learnt by the computer, rather than being pre-programmed by humans, the exact logic behind decisions is often opaque, even to the scientists who wrote the software….“Our criteria does not look at the innards of the learning algorithm,” said Srebro. “It just looks at the predictions it makes.”
Their approach, called Equality of Opportunity in Supervised Learning, works on the basic principle that when an algorithm makes a decision about an individual – be it to show them an online ad or award them parole – the decision should not reveal anything about the individual’s race or gender beyond what might be gleaned from the data itself.
For instance, if men were on average twice as likely to default on bank loans than women, and if you knew that a particular individual in a dataset had defaulted on a loan, you could reasonably conclude they were more likely (but not certain) to be male.
However, if an algorithm calculated that the most profitable strategy for a lender was to reject all loan applications from men and accept all female applications, the decision would precisely confirm a person’s gender.
“This can be interpreted as inappropriate discrimination,” said Srebro….(More)”.
Four steps to precision public health
By contrast, a campaign against yellow fever launched this year in sub-Saharan Africa defines risk at the level of entire nations, often hundreds of thousands of square kilometres. More granular assessments have been deemed too complex.
The use of data to guide interventions that benefit populations more efficiently is a strategy we call precision public health. It requires robust primary surveillance data, rapid application of sophisticated analytics to track the geographical distribution of disease, and the capacity to act on such information1.
The availability and use of precise data is becoming the norm in wealthy countries. But large swathes of the developing world are not reaping its advantages. In Guinea, it took months to assemble enough data to clearly identify the start of the largest Ebola outbreak in history. This should take days. Sub-Saharan Africa has the highest rates of childhood mortality in the world; it is also where we know the least about causes of death…..
The value of precise disease tracking was baked into epidemiology from the start. In 1854, John Snow famously located cholera cases in London. His mapping of the spread of infection through contaminated water dealt a blow to the idea that the disease was caused by bad air. These days, people and pathogens move across the globe swiftly and in great numbers. In 2009, the H1N1 ‘swine flu’ influenza virus took just 35 days to spread from Mexico and the United States to China, South Korea and 12 other countries…
The public-health community is sharing more data faster; expectations are higher than ever that data will be available from clinical trials and from disease surveillance. In the past two years, the US National Institutes of Health, the Wellcome Trust in London and the Gates Foundation have all instituted open data policies for their grant recipients, and leading journals have declared that sharing data during disease emergencies will not impede later publication.
Meanwhile, improved analysis, data visualization and machine learning have expanded our ability to use disparate data sources to decide what to do. A study published last year4 used precise geospatial modelling to infer that insecticide-treated bed nets were the single most influential intervention in the rapid decline of malaria.
However, in many parts of the developing world, there are still hurdles to the collection, analysis and use of more precise public-health data. Work towards malaria elimination in South Africa, for example, has depended largely on paper reporting forms, which are collected and entered manually each week by dozens of subdistricts, and eventually analysed at the province level. This process would be much faster if field workers filed reports from mobile phones.

Sources: Ref. 8/Bill & Melinda Gates Foundation
…Frontline workers should not find themselves frustrated by global programmes that fail to take into account data on local circumstances. Wherever they live — in a village, city or country, in the global south or north — people have the right to public-health decisions that are based on the best data and science possible, that minimize risk and cost, and maximize health in their communities…(More)”
Talent Gap Is a Main Roadblock as Agencies Eye Emerging Tech
Theo Douglas in GovTech: “U.S. public service agencies are closely eyeing emerging technologies, chiefly advanced analytics and predictive modeling, according to a new report from Accenture, but like their counterparts globally they must address talent and complexity issues before adoption rates will rise.
The report, Emerging Technologies in Public Service, compiled a nine-nation survey of IT officials across all levels of government in policing and justice, health and social services, revenue, border services, pension/Social Security and administration, and was released earlier this week.
It revealed a deep interest in emerging tech from the public sector, finding 70 percent of agencies are evaluating their potential — but a much lower adoption level, with just 25 percent going beyond piloting to implementation….
The revenue and tax industries have been early adopters of advanced analytics and predictive modeling, he said, while biometrics and video analytics are resonating with police agencies.
In Australia, the tax office found using voiceprint technology could save 75,000 work hours annually.
Closer to home, Utah Chief Technology Officer Dave Fletcher told Accenture that consolidating data centers into a virtualized infrastructure improved speed and flexibility, so some processes that once took weeks or months can now happen in minutes or hours.
Nationally, 70 percent of agencies have either piloted or implemented an advanced analytics or predictive modeling program. Biometrics and identity analytics were the next most popular technologies, with 29 percent piloting or implementing, followed by machine learning at 22 percent.
Those numbers contrast globally with Australia, where 68 percent of government agencies have charged into piloting and implementing biometric and identity analytics programs; and Germany and Singapore, where 27 percent and 57 percent of agencies respectively have piloted or adopted video analytic programs.
Overall, 78 percent of respondents said they were either underway or had implemented some machine-learning technologies.
The benefits of embracing emerging tech that were identified ranged from finding better ways of working through automation to innovating and developing new services and reducing costs.
Agencies told Accenture their No. 1 objective was increasing customer satisfaction. But 89 percent said they’d expect a return on implementing intelligent technology within two years. Four-fifths, or 80 percent, agreed intelligent tech would improve employees’ job satisfaction….(More).
The ethical impact of data science
Theme issue of Phil. Trans. R. Soc. A compiled and edited by Mariarosaria Taddeo and Luciano Floridi: “This theme issue has the founding ambition of landscaping data ethics as a new branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (e.g. right conducts or right values). Data ethics builds on the foundation provided by computer and information ethics but, at the same time, it refines the approach endorsed so far in this research field, by shifting the level of abstraction of ethical enquiries, from being information-centric to being data-centric. This shift brings into focus the different moral dimensions of all kinds of data, even data that never translate directly into information but can be used to support actions or generate behaviours, for example. It highlights the need for ethical analyses to concentrate on the content and nature of computational operations—the interactions among hardware, software and data—rather than on the variety of digital technologies that enable them. And it emphasizes the complexity of the ethical challenges posed by data science. Because of such complexity, data ethics should be developed from the start as a macroethics, that is, as an overall framework that avoids narrow, ad hoc approaches and addresses the ethical impact and implications of data science and its applications within a consistent, holistic and inclusive framework. Only as a macroethics will data ethics provide solutions that can maximize the value of data science for our societies, for all of us and for our environments….(More)”
Table of Contents:
- The dynamics of big data and human rights: the case of scientific research; Effy Vayena, John Tasioulas
- Facilitating the ethical use of health data for the benefit of society: electronic health records, consent and the duty of easy rescue; Sebastian Porsdam Mann, Julian Savulescu, Barbara J. Sahakian
- Faultless responsibility: on the nature and allocation of moral responsibility for distributed moral actions; Luciano Floridi
- Compelling truth: legal protection of the infosphere against big data spills; Burkhard Schafer
- Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems; Sabina Leonelli
- Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy; Deirdre K. Mulligan, Colin Koopman, Nick Doty
- Beyond privacy and exposure: ethical issues within citizen-facing analytics; Peter Grindrod
- The ethics of smart cities and urban science; Rob Kitchin
- The ethics of big data as a public good: which public? Whose good? Linnet Taylor
- Data philanthropy and the design of the infraethics for information societies; Mariarosaria Taddeo
- The opportunities and ethics of big data: practical priorities for a national Council of Data Ethics; Olivia Varley-Winter, Hetan Shah
- Data science ethics in government; Cat Drew
- The ethics of data and of data science: an economist’s perspective; Jonathan Cave
- What’s the good of a science platform? John Gallacher
Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective
Nikolaos Aletras et al at Peer J. Computer Science: “Recent advances in Natural Language Processing and Machine Learning provide us with the tools to build predictive models that can be used to unveil patterns driving judicial decisions. This can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions. This paper presents the first systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content. We formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e., N-grams, and topics. Our models can predict the court’s decisions with a strong accuracy (79% on average). Our empirical analysis indicates that the formal facts of a case are the most important predictive factor. This is consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts. We also observe that the topical content of a case is another important feature in this classification task and explore this relationship further by conducting a qualitative analysis….(More)”
Crowdsourcing Gun Violence Research
Penn Engineering: “Gun violence is often described as an epidemic, but as visible and shocking as shooting incidents are, epidemiologists who study that particular source of mortality have a hard time tracking them. The Centers for Disease Control is prohibited by federal law from conducting gun violence research, so there is little in the way of centralized infrastructure to monitor where, how,when, why and to whom shootings occur.
Chris Callison-Burch, Aravind K.Joshi Term Assistant Professor in Computer and InformationScience, and graduate studentEllie Pavlick are working to solve this problem.
They have developed the GunViolence Database, which combines machine learning and crowdsourcing techniques to produce a national registry of shooting incidents. Callison-Burch and Pavlick’s algorithm scans thousands of articles from local newspaper and television stations,determines which are about gun violence, then asks everyday people to pullout vital statistics from those articles, compiling that information into a unified, open database.
For natural language processing experts like Callison-Burch and Pavlick, the most exciting prospect of this effort is that it is training computer systems to do this kind of analysis automatically. They recently presented their work on that front at Bloomberg’s Data for Good Exchange conference.
The Gun Violence Database project started in 2014, when it became the centerpiece of Callison-Burch’s “Crowdsourcing and Human Computation”class. There, Pavlick developed a series of homework assignments that challenged undergraduates to develop a classifier that could tell whether a given news article was about a shooting incident.
“It allowed us to teach the things we want students to learn about datascience and natural language processing, while giving them the motivation to do a project that could contribute to the greater good,” says Callison-Burch.
The articles students used to train their classifiers were sourced from “TheGun Report,” a daily blog from New York Times reporters that attempted to catalog shootings from around the country in the wake of the Sandy Hook massacre. Realizing that their algorithmic approach could be scaled up to automate what the Times’ reporters were attempting, the researchers began exploring how such a database could work. They consulted with DouglasWiebe, a Associate Professor of Epidemiology in Biostatistics andEpidemiology in the Perelman School of Medicine, to learn more about what kind of information public health researchers needed to better study gun violence on a societal scale.
From there, the researchers enlisted people to annotate the articles their classifier found, connecting with them through Mechanical Turk, Amazon’scrowdsourcing platform, and their own website, http://gun-violence.org/…(More)”
Combining Satellite Imagery and Machine Learning to Predict Poverty
From the sustainability and artificial intelligence lab: “The elimination of poverty worldwide is the first of 17 UN Sustainable Development Goals for the year 2030. To track progress towards this goal, we require more frequent and more reliable data on the distribution of poverty than traditional data collection methods can provide.
In this project, we propose an approach that combines machine learning with high-resolution satellite imagery to provide new data on socioeconomic indicators of poverty and wealth. Check out the short video below for a quick overview and then read the paper for a more detailed explanation of how it all works….(More)”
Law in the Future
Paper by Benjamin Alarie, Anthony Niblett and Albert Yoon: “The set of tasks and activities in which humans are strictly superior to computers is becoming vanishingly small. Machines today are not only performing mechanical or manual tasks once performed by humans, they are also performing thinking tasks, where it was long believed that human judgment was indispensable. From self-driving cars to self-flying planes; and from robots performing surgery on a pig to artificially intelligent personal assistants, so much of what was once unimaginable is now reality. But this is just the beginning of the big data and artificial intelligence revolution. Technology continues to improve at an exponential rate. How will the big data and artificial intelligence revolutions affect law? We hypothesize that the growth of big data, artificial intelligence, and machine learning will have important effects that will fundamentally change the way law is made, learned, followed, and practiced. It will have an impact on all facets of the law, from the production of micro-directives to the way citizens learn of their legal obligations. These changes will present significant challenges to human lawmakers, judges, and lawyers. While we do not attempt to address all these challenges, we offer a short and positive preview of the future of law: a world of self-driving law, of legal singularity, and of the democratization of the law…(More)”
Encouraging and Sustaining Innovation in Government: Technology and Innovation in the Next Administration
New report by Beth Simone Noveck and Stefaan Verhulst: “…With rates of trust in government at an all-time low, technology and innovation will be essential to achieve the next administration’s goals and to deliver services more effectively and efficiently. The next administration must prioritize using technology to improve governing and must develop plans to do so in the transition… This paper provides analysis and a set of concrete recommendations, both for the period of transition before the inauguration, and for the start of the next presidency, to encourage and sustain innovation in government. Leveraging the insights from the experts who participated in a day-long discussion, we endeavor to explain how government can improve its use of using digital technologies to create more effective policies, solve problems faster and deliver services more effectively at the federal, state and local levels….
The broad recommendations are:
- Scale Data Driven Governance: Platforms such as data.gov represent initial steps in the direction of enabling data-driven governance. Much more can be done, however, to open-up data and for the agencies to become better consumers of data, to improve decision-making and scale up evidence-based governance. This includes better use of predictive analytics, more public engagement; and greater use of cutting-edge methods like machine learning.
- Scale Collaborative Innovation: Collaborative innovation takes place when government and the public work together, thus widening the pool of expertise and knowledge brought to bear on public problems. The next administration can reach out more effectively, not just to the public at large, but to conduct targeted outreach to public officials and citizens who possess the most relevant skills or expertise for the problems at hand.
- Promote a Culture of Innovation: Institutionalizing a culture of technology-enabled innovation will require embedding and institutionalizing innovation and technology skills more widely across the federal enterprise. For example, contracting, grants and personnel officials need to have a deeper understanding of how technology can help them do their jobs more efficiently, and more people need to be trained in human-centered design, gamification, data science, data visualization, crowdsourcing and other new ways of working.
- Utilize Evidence-Based Innovation: In order to better direct government investments, leaders need a much better sense of what works and what doesn’t. The government spends billions on research in the private and university sectors, but very little experimenting with, testing, and evaluating its own programs. The next administration should continue developing an evidence-based approach to governance, including a greater use of methods like A/B testing (a method of comparing two versions of a webpage or app against each other to determine which one performs the best); establishing a clearinghouse for success and failure stories and best practices; and encouraging overseers to be more open to innovation.
- Make Innovation a Priority in the Transition: The transition period represents a unique opportunity to seed the foundations for long-lasting change. By explicitly incorporating innovation into the structure, goals and activities of the transition teams, the next administration can get a fast start in implementing policy goals and improving government operations through innovation approaches….(More)”