The Ethical Algorithm: The Science of Socially Aware Algorithm Design


Book by Michael Kearns and Aaron Roth: “Over the course of a generation, algorithms have gone from mathematical abstractions to powerful mediators of daily life. Algorithms have made our lives more efficient, more entertaining, and, sometimes, better informed. At the same time, complex algorithms are increasingly violating the basic rights of individual citizens. Allegedly anonymized datasets routinely leak our most sensitive personal information; statistical models for everything from mortgages to college admissions reflect racial and gender bias. Meanwhile, users manipulate algorithms to “game” search engines, spam filters, online reviewing services, and navigation apps.

Understanding and improving the science behind the algorithms that run our lives is rapidly becoming one of the most pressing issues of this century. Traditional fixes, such as laws, regulations and watchdog groups, have proven woefully inadequate. Reporting from the cutting edge of scientific research, The Ethical Algorithm offers a new approach: a set of principled solutions based on the emerging and exciting science of socially aware algorithm design. Michael Kearns and Aaron Roth explain how we can better embed human principles into machine code – without halting the advance of data-driven scientific exploration. Weaving together innovative research with stories of citizens, scientists, and activists on the front lines, The Ethical Algorithm offers a compelling vision for a future, one in which we can better protect humans from the unintended impacts of algorithms while continuing to inspire wondrous advances in technology….(More)”.

Finland’s model in utilising forest data


Report by Matti Valonen et al: “The aim of this study is to depict the Finnish Forest Centre’s Metsään.fiwebsite’s background, objectives and implementation and to assess its needs for development and future prospects. The Metsään.fi-service included in the Metsään.fi-website is a free e-service for forest owners and corporate actors (companies, associations and service providers) in the forest sector, which aim is to support active decision-making among forest owners by offering forest resource data and maps on forest properties, by making contacts with the authorities easier through online services and to act as a platform for offering forest services, among other things.

In addition to the Metsään.fi-service, the website includes open forest data services that offer the users national forest resource data that is not linked with personal information.

Private forests are in a key position as raw material sources for traditional and new forest-based bioeconomy. In addition to wood material, the forests produce non-timber forest products (for example berries and mushrooms), opportunities for recreation and other ecosystem services.

Private forests cover roughly 60 percent of forest land, but about 80 percent of the domestic wood used by forest industry. In 2017 the value of the forest industry production was 21 billion euros, which is a fifth of the entire industry production value in Finland. The forest industry export in 2017 was worth about 12 billion euros, which covers a fifth of the entire export of goods. Therefore, the forest sector is important for Finland’s national economy…(More)”.

Big Data, Algorithms and Health Data


Paper by Julia M. Puaschunder: “The most recent decade featured a data revolution in the healthcare sector in screening, monitoring and coordination of aid. Big data analytics have revolutionarized the medical profession. The health sector relys on Artificial Intelligence (AI) and robotics as never before. The opportunities of unprecedented access to healthcare, rational precision and human resemblance but also targeted aid in decentralized aid grids are obvious innovations that will lead to most sophisticated neutral healthcare in the future. Yet big data driven medical care also bears risks of privacy infringements and ethical concerns of social stratification and discrimination. Today’s genetic human screening, constant big data information amalgamation as well as social credit scores pegged to access to healthcare also create the most pressing legal and ethical challenges of our time.Julia M. PuaschunderThe most recent decade featured a data revolution in the healthcare sector in screening, monitoring and coordination of aid. Big data analytics have revolutionarized the medical profession. The health sector relys on Artificial Intelligence (AI) and robotics as never before. The opportunities of unprecedented access to healthcare, rational precision and human resemblance but also targeted aid in decentralized aid grids are obvious innovations that will lead to most sophisticated neutral healthcare in the future. Yet big data driven medical care also bears risks of privacy infringements and ethical concerns of social stratification and discrimination. Today’s genetic human screening, constant big data information amalgamation as well as social credit scores pegged to access to healthcare also create the most pressing legal and ethical challenges of our time.

The call for developing a legal, policy and ethical framework for using AI, big data, robotics and algorithms in healthcare has therefore reached unprecedented momentum. Problematic appear compatibility glitches in the AI-human interaction as well as a natural AI preponderance outperforming humans. Only if the benefits of AI are reaped in a master-slave-like legal frame, the risks associated with these novel superior technologies can be curbed. Liability control but also big data privacy protection appear important to secure the rights of vulnerable patient populations. Big data mapping and social credit scoring must be met with clear anti-discrimination and anti-social stratification ethics. Lastly, the value of genuine human care must be stressed and precious humanness in the artifical age conserved alongside coupling the benefits of AI, robotics and big data with global common goals of sustainability and inclusive growth.

The report aims at helping a broad spectrum of stakeholders understand the impact of AI, big data, algorithms and health data based on information about key opportunities and risks but also future market challenges and policy developments for orchestrating the concerted pursuit of improving healthcare excellence. Stateshuman and diplomates are invited to consider three trends in the wake of the AI (r)evolution:

Artificial Intelligence recently gained citizenship in robots becoming citizens: With attributing quasi-human rights to AI, ethical questions arise of a stratified citizenship. Robots and algorithms may only be citizens for their protection and upholding social norms towards human-like creatures that should be considered slave-like for economic and liability purposes without gaining civil privileges such as voting, property rights and holding public offices.

Big data and computational power imply unprecedented opportunities for: crowd understanding, trends prediction and healthcare control. Risks include data breaches, privacy infringements, stigmatization and discrimination. Big data protection should be enacted through technological advancement, self-determined privacy attention fostered by e-education as well as discrimination alleviation by only releasing targeted information and regulated individual data mining capacities.

The European Union should consider establishing a fifth trade freedom of data by law and economic incentives: in order to bundle AI and big data gains large scale. Europe holds the unique potential of offering data supremacy in state-controlled universal healthcare big data wealth that is less fractionate than the US health landscape and more Western-focused than Asian healthcare. Europe could therefore lead the world on big data derived healthcare insights but should also step up to imbuing humane societal imperatives on these most cutting-edge innovations of our time….(More)”.

We are finally getting better at predicting organized conflict


Tate Ryan-Mosley at MIT Technology Review: “People have been trying to predict conflict for hundreds, if not thousands, of years. But it’s hard, largely because scientists can’t agree on its nature or how it arises. The critical factor could be something as apparently innocuous as a booming population or a bad year for crops. Other times a spark ignites a powder keg, as with the assassination of Archduke Franz Ferdinand of Austria in the run-up to World War I.

Political scientists and mathematicians have come up with a slew of different methods for forecasting the next outbreak of violence—but no single model properly captures how conflict behaves. A study published in 2011 by the Peace Research Institute Oslo used a single model to run global conflict forecasts from 2010 to 2050. It estimated a less than .05% chance of violence in Syria. Humanitarian organizations, which could have been better prepared had the predictions been more accurate, were caught flat-footed by the outbreak of Syria’s civil war in March 2011. It has since displaced some 13 million people.

Bundling individual models to maximize their strengths and weed out weakness has resulted in big improvements. The first public ensemble model, the Early Warning Project, launched in 2013 to forecast new instances of mass killing. Run by researchers at the US Holocaust Museum and Dartmouth College, it claims 80% accuracy in its predictions.

Improvements in data gathering, translation, and machine learning have further advanced the field. A newer model called ViEWS, built by researchers at Uppsala University, provides a huge boost in granularity. Focusing on conflict in Africa, it offers monthly predictive readouts on multiple regions within a given state. Its threshold for violence is a single death.

Some researchers say there are private—and in some cases, classified—predictive models that are likely far better than anything public. Worries that making predictions public could undermine diplomacy or change the outcome of world events are not unfounded. But that is precisely the point. Public models are good enough to help direct aid to where it is needed and alert those most vulnerable to seek safety. Properly used, they could change things for the better, and save lives in the process….(More)”.

Artificial intelligence: From expert-only to everywhere


Deloitte: “…AI consists of multiple technologies. At its foundation are machine learning and its more complex offspring, deep-learning neural networks. These technologies animate AI applications such as computer vision, natural language processing, and the ability to harness huge troves of data to make accurate predictions and to unearth hidden insights (see sidebar, “The parlance of AI technologies”). The recent excitement around AI stems from advances in machine learning and deep-learning neural networks—and the myriad ways these technologies can help companies improve their operations, develop new offerings, and provide better customer service at a lower cost.

The trouble with AI, however, is that to date, many companies have lacked the expertise and resources to take full advantage of it. Machine learning and deep learning typically require teams of AI experts, access to large data sets, and specialized infrastructure and processing power. Companies that can bring these assets to bear then need to find the right use cases for applying AI, create customized solutions, and scale them throughout the company. All of this requires a level of investment and sophistication that takes time to develop, and is out of reach for many….

These tech giants are using AI to create billion-dollar services and to transform their operations. To develop their AI services, they’re following a familiar playbook: (1) find a solution to an internal challenge or opportunity; (2) perfect the solution at scale within the company; and (3) launch a service that quickly attracts mass adoption. Hence, we see Amazon, Google, Microsoft, and China’s BATs launching AI development platforms and stand-alone applications to the wider market based on their own experience using them.

Joining them are big enterprise software companies that are integrating AI capabilities into cloud-based enterprise software and bringing them to the mass market. Salesforce, for instance, integrated its AI-enabled business intelligence tool, Einstein, into its CRM software in September 2016; the company claims to deliver 1 billion predictions per day to users. SAP integrated AI into its cloud-based ERP system, S4/HANA, to support specific business processes such as sales, finance, procurement, and the supply chain. S4/HANA has around 8,000 enterprise users, and SAP is driving its adoption by announcing that the company will not support legacy SAP ERP systems past 2025.

A host of startups is also sprinting into this market with cloud-based development tools and applications. These startups include at least six AI “unicorns,” two of which are based in China. Some of these companies target a specific industry or use case. For example, Crowdstrike, a US-based AI unicorn, focuses on cybersecurity, while Benevolent.ai uses AI to improve drug discovery.

The upshot is that these innovators are making it easier for more companies to benefit from AI technology even if they lack top technical talent, access to huge data sets, and their own massive computing power. Through the cloud, they can access services that address these shortfalls—without having to make big upfront investments. In short, the cloud is democratizing access to AI by giving companies the ability to use it now….(More)”.

OMB rethinks ‘protected’ or ‘open’ data binary with upcoming Evidence Act guidance


Jory Heckman at Federal News Network: “The Foundations for Evidence-Based Policymaking Act has ordered agencies to share their datasets internally and with other government partners — unless, of course, doing so would break the law.

Nearly a year after President Donald Trump signed the bill into law, agencies still have only a murky idea of what data they can share, and with whom. But soon, they’ll have more nuanced options of ranking the sensitivity of their datasets before sharing them out to others.

Chief Statistician Nancy Potok said the Office of Management and Budget will soon release proposed guidelines for agencies to provide “tiered” access to their data, based on the sensitivity of that information….

OMB, as part of its Evidence Act rollout, will also rethink how agencies ensure protected access to data for research. Potok said agency officials expect to pilot a single application governmentwide for people seeking access to sensitive data not available to the public.

The pilot resembles plans for a National Secure Data Service envisioned by the Commission on Evidence-Based Policymaking, an advisory group whose recommendations laid the groundwork for the Evidence Act.

“As a state-of-the-art resource for improving government’s capacity to use the data it already collects, the National Secure Data Service will be able to temporarily link existing data and provide secure access to those data for exclusively statistical purposes in connection with approved projects,” the commission wrote in its 2017 final report.

In an effort to strike a balance between access and privacy, Potok said OMB has also asked agencies to provide a list of the statutes that prohibit them from sharing data amongst themselves….(More)”.

Geolocation Data for Pattern of Life Analysis in Lower-Income Countries


Report by Eduardo Laguna-Muggenburg, Shreyan Sen and Eric Lewandowski: “Urbanization processes in the developing world are often associated with the creation of informal settlements. These areas frequently have few or no public services exacerbating inequality even in the context of substantial economic growth.

In the past, the high costs of gathering data through traditional surveying methods made it challenging to study how these under-served areas evolve through time and in relation to the metropolitan area to which they belong. However, the advent of mobile phones and smartphones in particular presents an opportunity to generate new insights on these old questions.

In June 2019, Orbital Insight and the United Nations Development Programme (UNDP) Arab States Human Development Report team launched a collaborative pilot program assessing the feasibility of using geolocation data to understand patterns of life among the urban poor in Cairo, Egypt.

The objectives of this collaboration were to assess feasibility (and conditionally pursue preliminary analysis) of geolocation data to create near-real time population density maps, understand where residents of informal settlements tend to work during the day, and to classify universities by percentage of students living in informal settlements.

The report is organized as follows. In Section 2 we describe the data and its limitations. In Section 3 we briefly explain the methodological background. Section 4 summarizes the insights derived from the data for the Egyptian context. Section 5 concludes….(More)”.

The Value of Data: Towards a Framework to Redistribute It


Paper by Maria Savona: “This note attempts a systematisation of different pieces of literature that underpin the recent policy and academic debate on the value of data. It mainly poses foundational questions around the definition, economic nature and measurement of data value, and discusses the opportunity to redistribute it. It then articulates a framework to compare ways of implementing redistribution, distinguishing between data as capital, data as labour or data as an intellectual property. Each of these raises challenges, revolving around the notions of data property and data rights, that are also briefly discussed. The note concludes by indicating areas for policy considerations and a research agenda to shape the future structure of data governance more at large….(More)”.

Algorithmic futures: The life and death of Google Flu Trends


Vincent Duclos in Medicine Anthropology Theory: “In the last few years, tracking systems that harvest web data to identify trends, calculate predictions, and warn about potential epidemic outbreaks have proliferated. These systems integrate crowdsourced data and digital traces, collecting information from a variety of online sources, and they promise to change the way governments, institutions, and individuals understand and respond to health concerns. This article examines some of the conceptual and practical challenges raised by the online algorithmic tracking of disease by focusing on the case of Google Flu Trends (GFT). Launched in 2008, GFT was Google’s flagship syndromic surveillance system, specializing in ‘real-time’ tracking of outbreaks of influenza. GFT mined massive amounts of data about online search behavior to extract patterns and anticipate the future of viral activity. But it did a poor job, and Google shut the system down in 2015. This paper focuses on GFT’s shortcomings, which were particularly severe during flu epidemics, when GFT struggled to make sense of the unexpected surges in the number of search queries. I suggest two reasons for GFT’s difficulties. First, it failed to keep track of the dynamics of contagion, at once biological and digital, as it affected what I call here the ‘googling crowds’. Search behavior during epidemics in part stems from a sort of viral anxiety not easily amenable to algorithmic anticipation, to the extent that the algorithm’s predictive capacity remains dependent on past data and patterns. Second, I suggest that GFT’s troubles were the result of how it collected data and performed what I call ‘epidemic reality’. GFT’s data became severed from the processes Google aimed to track, and the data took on a life of their own: a trackable life, in which there was little flu left. The story of GFT, I suggest, offers insight into contemporary tensions between the indomitable intensity of collective life and stubborn attempts at its algorithmic formalization.Vincent DuclosIn the last few years, tracking systems that harvest web data to identify trends, calculate predictions, and warn about potential epidemic outbreaks have proliferated. These systems integrate crowdsourced data and digital traces, collecting information from a variety of online sources, and they promise to change the way governments, institutions, and individuals understand and respond to health concerns. This article examines some of the conceptual and practical challenges raised by the online algorithmic tracking of disease by focusing on the case of Google Flu Trends (GFT). Launched in 2008, GFT was Google’s flagship syndromic surveillance system, specializing in ‘real-time’ tracking of outbreaks of influenza. GFT mined massive amounts of data about online search behavior to extract patterns and anticipate the future of viral activity. But it did a poor job, and Google shut the system down in 2015. This paper focuses on GFT’s shortcomings, which were particularly severe during flu epidemics, when GFT struggled to make sense of the unexpected surges in the number of search queries. I suggest two reasons for GFT’s difficulties. First, it failed to keep track of the dynamics of contagion, at once biological and digital, as it affected what I call here the ‘googling crowds’. Search behavior during epidemics in part stems from a sort of viral anxiety not easily amenable to algorithmic anticipation, to the extent that the algorithm’s predictive capacity remains dependent on past data and patterns. Second, I suggest that GFT’s troubles were the result of how it collected data and performed what I call ‘epidemic reality’. GFT’s data became severed from the processes Google aimed to track, and the data took on a life of their own: a trackable life, in which there was little flu left. The story of GFT, I suggest, offers insight into contemporary tensions between the indomitable intensity of collective life and stubborn attempts at its algorithmic formalization….(More)”.

Restrictions on Privacy and Exploitation in the Digital Economy: A Competition Law Perspective


Paper by Nicholas Economides and Ioannis Lianos: “The recent controversy on the intersection of competition law with the protection of privacy, following the emergence of big data and social media is a major challenge for competition authorities worldwide. Recent technological progress in data analytics may greatly facilitate the prediction of personality traits and attributes from even a few digital records of human behaviour.


There are different perspectives globally as to the level of personal data protection and the role competition law may play in this context, hence the discussion of integrating such concerns in competition law enforcement may be premature for some jurisdictions. However, a market failure approach may provide common intellectual foundations for the assessment of harms associated to the exploitation of personal data, even when the specific legal system does not formally recognize a fundamental right to privacy.


The paper presents a model of market failure based on a requirement provision in the acquisition of personal information from users of other products/services. We establish the economic harm from the market failure and the requirement using the traditional competition law toolbox and focusing more on situations in which the restriction on privacy may be analysed as a form of exploitation. Eliminating the requirement and the market failure by creating a functioning market for the sale of personal information is imperative. This emphasis on exploitation does not mean that restrictions on privacy may not result from exclusionary practices. However, we analyse this issue in a separate study.


Besides the traditional analysis of the requirement and market failure, we note that there are typically informational asymmetries between the data controller and the data subject. The latter may not be aware that his data was harvested, in the first place, or that the data will be processed by the data controller for a different purpose or shared and sold to third parties. The exploitation of personal data may also result from economic coercion, on the basis of resource-dependence or lock-in of the user, the latter having no other choice, in order to enjoy the consumption of a specific service provided by the data controller or its ecosystem, in particular in the presence of dominance, than to consent to the harvesting and use of his data. A behavioural approach would also emphasise the possible internalities (demand-side market failures) coming out of the bounded rationality, or the fact that people do not internalise all consequences of their actions and face limits in their cognitive capacities.
The paper also addresses the way competition law could engage with exploitative conduct leading to privacy harm, both for ex ante and ex post enforcement.


With regard to ex ante enforcement, the paper explores how privacy concerns may be integrated in merger control as part of the definition of product quality, the harm in question being merely exploitative (the possibility the data aggregation provides to the merged entity to exploit (personal) data in ways that harm directly consumers), rather than exclusionary (harming consumers by enabling the merged entity to marginalise a rival with better privacy policies), which is examined in a separate paper.


With regard to ex post enforcement, the paper explores different theories of harm that may give rise to competition law concerns and suggest specific tests for their assessment. In particular, we analyse old and new exploitative theories of harm relating to excessive data extraction, personalised pricing, unfair commercial practices and trading conditions, exploitative requirement contracts, behavioural manipulation.
We are in favour of collective action to restore the conditions of a well-functioning data market and the paper makes several policy recommendations….(More)”.