Opening Data for Global Health


Chapter by Matt Laessig, Bryon Jacob and Carla AbouZahr in The Palgrave Handbook of Global Health Data Methods for Policy and Practice: “…provide best practices for organizations to adopt to disseminate data openly for others to use. They describe development of the open data movement and its rapid adoption by governments, non-governmental organizations, and research groups. The authors provide examples from the health sector—an early adopter—but acknowledge concerns specific to health relating to informed consent, intellectual property, and ownership of personal data. Drawing on their considerable contributions to the open data movement, Laessig and Jacob share their Open Data Progression Model. They describe six stages to make data open: from data collection, documentation of the data, opening the data, engaging the community of users, making the data interoperable, to finally linking the data….(More)”

A Taxonomy of Definitions for the Health Data Ecosystem


Announcement: “Healthcare technologies are rapidly evolving, producing new data sources, data types, and data uses, which precipitate more rapid and complex data sharing. Novel technologies—such as artificial intelligence tools and new internet of things (IOT) devices and services—are providing benefits to patients, doctors, and researchers. Data-driven products and services are deepening patients’ and consumers’ engagement and helping to improve health outcomes. Understanding the evolving health data ecosystem presents new challenges for policymakers and industry. There is an increasing need to better understand and document the stakeholders, the emerging data types and their uses.

The Future of Privacy Forum (FPF) and the Information Accountability Foundation (IAF) partnered to form the FPF-IAF Joint Health Initiative in 2018. Today, the Initiative is releasing A Taxonomy of Definitions for the Health Data Ecosystem; the publication is intended to enable a more nuanced, accurate, and common understanding of the current state of the health data ecosystem. The Taxonomy outlines the established and emerging language of the health data ecosystem. The Taxonomy includes definitions of:

  • The stakeholders currently involved in the health data ecosystem and examples of each;
  • The common and emerging data types that are being collected, used, and shared across the health data ecosystem;
  • The purposes for which data types are used in the health data ecosystem; and
  • The types of actions that are now being performed and which we anticipate will be performed on datasets as the ecosystem evolves and expands.

This report is as an educational resource that will enable a deeper understanding of the current landscape of stakeholders and data types….(More)”.

Come to Finland if you want to glimpse the future of health data!


Jukka Vahti at Sitra: “The Finnish tradition of establishing, maintaining and developing data registers goes back to the 1600s, when parish records were first kept.

When this old custom is combined with the opportunities afforded by digitisation, the positive approach Finns have towards research and technology, and the recently updated legislation enabling the data economy, Finland and the Finnish people can lead the way as Europe gradually, or even suddenly, switches to a fair data economy.

The foundations for a fair data economy already exist

The fair data economy is a natural continuation of the former projects promoting e-services that were undertaken in Finland.

For example, the Data Exchange Layer is already speeding up the transfer of data from one system to another in Finland and in Estonia, the country where the system originated, and a system unique to just these two countries.

In May 2019 Finland also saw the entry into force of the Act on the Secondary Use of Health and Social Data, according to which the information on social welfare and healthcare held in registers may be used for purposes of statistics, research, education, knowledge management, control and supervision conducted by authorities, and development and innovation activity.

The new law will make the work of researchers and service developers more effective, as the business of acquiring a permit will take place through a one-stop-shop principle and it will be possible to use data from more than one source more readily than before….(More)”.

Can tracking people through phone-call data improve lives?


Amy Maxmen in Nature: “After an earthquake tore through Haiti in 2010, killing more than 100,000 people, aid agencies spread across the country to work out where the survivors had fled. But Linus Bengtsson, a graduate student studying global health at the Karolinska Institute in Stockholm, thought he could answer the question from afar. Many Haitians would be using their mobile phones, he reasoned, and those calls would pass through phone towers, which could allow researchers to approximate people’s locations. Bengtsson persuaded Digicel, the biggest phone company in Haiti, to share data from millions of call records from before and after the quake. Digicel replaced the names and phone numbers of callers with random numbers to protect their privacy.

Bengtsson’s idea worked. The analysis wasn’t completed or verified quickly enough to help people in Haiti at the time, but in 2012, he and his collaborators reported that the population of Haiti’s capital, Port-au-Prince, dipped by almost one-quarter soon after the quake, and slowly rose over the next 11 months1. That result aligned with an intensive, on-the-ground survey conducted by the United Nations.

Humanitarians and researchers were thrilled. Telecommunications companies scrutinize call-detail records to learn about customers’ locations and phone habits and improve their services. Researchers suddenly realized that this sort of information might help them to improve lives. Even basic population statistics are murky in low-income countries where expensive household surveys are infrequent, and where many people don’t have smartphones, credit cards and other technologies that leave behind a digital trail, making remote-tracking methods used in richer countries too patchy to be useful.

Since the earthquake, scientists working under the rubric of ‘data for good’ have analysed calls from tens of millions of phone owners in Pakistan, Bangladesh, Kenya and at least two dozen other low- and middle-income nations. Humanitarian groups say that they’ve used the results to deliver aid. And researchers have combined call records with other information to try to predict how infectious diseases travel, and to pinpoint locations of poverty, social isolation, violence and more (see ‘Phone calls for good’)….(More)”.

The Geopolitics of Information


Paper by Eric Rosenbach and Katherine Mansted: “Information is now the world’s most consequential and contested geopolitical resource. The world’s most profitable businesses have asserted for years that data is the “new oil.” Political campaigns—and foreign intelligence operatives—have shown over the past two American presidential elections that data-driven social media is the key to public opinion. Leading scientists and technologists understand that good datasets, not just algorithms, will give them a competitive edge.

Data-driven innovation is not only disrupting economies and societies; it is reshaping relations between nations. The pursuit of information power—involving states’ ability to use information to influence, decide, create and communicate—is causing states to rewrite their terms of engagement with markets and citizens, and to redefine national interests and strategic priorities. In short, information power is altering the nature and behavior of the fundamental building block of international relations, the state, with potentially seismic consequences.

Authoritarian governments recognize the strategic importance of information and over the past five years have operationalized powerful domestic and international information strategies. They are cauterizing their domestic information environments and shutting off their citizens from global information flows, while weaponizing information to attack and destabilize democracies. In particular, China and Russia believe that strategic competition in the 21st century is characterized by a zero-sum contest for control of data, as well as the technology and talent needed to convert data into useful information.

Democracies remain fundamentally unprepared for strategic competition in the Information Age. For the United States in particular, as the importance of information as a geopolitical resource has waxed, its information dominance has waned. Since the end of the Cold War, America’s supremacy in information technologies seemed unassailable—not least because of its central role in creating the Internet and overall economic primacy. Democracies have also considered any type of information strategy to be largely unneeded: government involvement in the domestic information environment feels Orwellian, while democracies believed that their “inherently benign” foreign policy didn’t need extensive influence operations.

However, to compete and thrive in the 21st century, democracies, and the United States in particular, must develop new national security and economic strategies that address the geopolitics of information. In the 20th century, market capitalist democracies geared infrastructure, energy, trade, and even social policy to protect and advance that era’s key source of power—manufacturing. In this century, democracies must better account for information geopolitics across all dimensions of domestic policy and national strategy….(More)”.

So­cial me­dia data re­veal where vis­it­ors to nature loca­tions provide po­ten­tial be­ne­fits or threats to biodiversity


University of Helsinki: “In a new article published in the journal Science of the Total Environment, a team of researchers assessed global patterns of visitation rates, attractiveness and pressure to more than 12,000 Important Bird and Biodiversity Areas (IBAs), which are sites of international significance for nature conservation, by using geolocated data mined from social media (Twitter and Flickr).

The study found that Important Bird and Biodiversity Areas located in Europe and Asia, and in temperate biomes, had the highest density of social media users. Results also showed that sites of importance for congregatory species, which were also more accessible, more densely populated and provided more tourism facilities, received higher visitation than did sites richer in bird species.

 “Resources in biodiversity conservation are woefully inadequate and novel data sources from social media provide openly available user-generated information about human-nature interactions, at an unprecedented spatio-temporal scale”, says Dr Anna Hausmann from the University of Helsinki, a conservation scientist leading the study. “Our group has been exploring and validating data retrieved from social media to understand people´s preferences for experiencing nature in national parks at a local, national and continental scale”, she continues, “in this study, we expand our analyses at a global level”. …

“Social media content and metadata contain useful information for understanding human-nature interactions in space and time”, says Prof. Tuuli Toivonen, another co-author in the paper and the leader of the Digital Geography Lab at the University of Helsinki. “Social media data can also be used to cross-validate and enrich data collected by conservation organizations”, she continues. The study found that the 17 percent of all Important Bird and Biodiversity Areas (IBA) that were assessed by experts to be under greater human disturbance also had higher density of social media users….(More)”.

Commission publishes guidance on free flow of non-personal data


European Commission: “The guidance fulfils an obligation in the Regulation on the free flow of non-personal data (FFD Regulation), which requires the Commission to publish a guidance on the interaction between this Regulation and the General Data Protection Regulation (GDPR), especially as regards datasets composed of both personal and non-personal data. It aims to help users – in particular small and medium-sized enterprises – understand the interaction between the two regulations.

In line with the existing GDPR documents, prepared by the European Data Protection Board, this guidance document aims to clarify which rules apply when processing personal and non-personal data. It gives a useful overview of the central concepts of the free flow of personal and non-personal data within the EU, while explaining the relation between the two Regulations in practical terms and with concrete examples….

Non-personal data are distinct from personal data, as laid down in the GDPR Regulation. The non-personal data can be categorised in terms of origin, namely:

  • data which originally did not relate to an identified or identifiable natural person, such as data on weather conditions generated by sensors installed on wind turbines, or data on maintenance needs for industrial machines; or
  • data which was initially personal data, but later made anonymous.

While the guidance refers to more examples of non-personal data, it also explains the concept of personal data, anonymised and pseudonymised, to provide a better understanding as well describes the limitations between personal and non-personal data.

What are mixed datasets?

In most real-life situations, a dataset is very likely to be composed of both personal and non-personal data. This is often referred to as a “mixed dataset”. Mixed datasets represent the majority of datasets used in the data economy and commonly gathered thanks to technological developments such as the Internet of Things (i.e. digitally connecting objects), artificial intelligence and technologies enabling big data analytics.

Examples of mixed datasets include a company’s tax records, mentioning the name and telephone number of the managing director of the company. This can also include a company’s knowledge of IT problems and solutions based on individual incident reports, or a research institution’s anonymised statistical data and the raw data initially collected, such as the replies of individual respondents to statistical survey questions….(More)”.

Open Data and the Private Sector


Chapter by Joel Gurin, Carla Bonini and Stefaan Verhulst in State of Open Data: “The open data movement launched a decade ago with a focus on transparency, good governance, and citizen participation. As other chapters in this collection have documented in detail, those critical uses of open data have remained paramount and are continuing to grow in importance at a time of fake news and increased secrecy. But the value of open data extends beyond transparency and accountability – open data is also an important resource for business and economic growth.

The past several years have seen an increased focus on the value of open data to the private sector. In 2012, the Open Data Institute (ODI) was founded in the United Kingdom (UK) and backed with GBP 10 million by the UK government to maximise the value of open data in business and government. A year later, McKinsey released a report suggesting open data could help unlock USD 3 to 5 trillion in economic value annually. At around the same time, Monsanto acquired the Climate Corporation, a digital agriculture company that leverages open data to inform farmers for approximately USD 1.1 billion. In 2014, the GovLab launched the Open Data 500,2the first national study of businesses using open government data (now in six countries), and, in 2015, Open Data for Development (OD4D) launched the Open Data Impact Map, which today contains more than 1 100 examples of private sector companies using open data. The potential business applications of open data continue to be a priority for many governments around the world as they plan and develop their data programmes.

The use of open data has become part of the broader business practice of using data and data science to inform business decisions, ranging from launching new products and services to optimising processes and outsmarting the competition. In this chapter, we take stock of the state of open data and the private sector by analysing how the private sector both leverages and contributes to the open data ecosystem….(More)”.

Africa must reap the benefits of its own data


Tshilidzi Marwala at Business Insider: “Twenty-two years ago when I was a doctoral student in artificial intelligence (AI) at the University of Cambridge, I had to create all the AI algorithms I needed to understand the complex phenomena related to this field.

For starters, AI is a computer software that performs intelligent tasks that normally require human beings, while an algorithm is a set of rules that instruct a computer to execute specific tasks. In that era, the ability to create AI algorithms was more important than the ability to acquire and use data.

Google has created an open-source library called TensorFlow, which contains all the developed AI algorithms. This way Google wants people to develop applications (apps) using their software, with the payoff being that Google will collect data on any individual using the apps developed with TensorFlow.

Today, an AI algorithm is not a competitive advantage but data is. The World Economic Forum calls data the new “oxygen”, while Chinese AI specialist Kai-Fu Lee calls it the new “oil”.

Africa’s population is increasing faster than in any region in the world. The continent has a population of 1.3-billion people and a total nominal GDP of $2.3-trillion. This increase in the population is in effect an increase in data, and if data is the new oil, it is akin to an increase in oil reserve.

Even oil-rich countries such as Saudi Arabia do not experience an increase in their oil reserve. How do we as Africans take advantage of this huge amount of data?

There are two categories of data in Africa: heritage and personal. Heritage data resides in society, whereas personal data resides in individuals. Heritage data includes data gathered from our languages, emotions and accents. Personal data includes health, facial and fingerprint data.

Facebook, Amazon, Apple, Netflix and Google are data companies. They trade data to advertisers, banks and political parties, among others. For example, the controversial company Cambridge Analytica harvested Facebook data to influence the presidential election that potentially contributed to Donald Trump’s victory in the US elections.

The company Google collects language data to build an application called Google Translate that translates from one language to another. This app claims to cover African languages such as Zulu, Yoruba and Swahili. Google Translate is less effective in handling African languages than it is in handling European and Asian languages.

Now, how do we capitalise on our language heritage to create economic value? We need to build our own language database and create our own versions of Google Translate.

An important area is the creation of an African emotion database. Different cultures exhibit emotions differently. These are very important in areas such as safety of cars and aeroplanes. If we can build a system that can read pilots’ emotions, this would enable us to establish if a pilot is in a good state of mind to operate an aircraft, which would increase safety.

To capitalise on the African emotion database, we should create a data bank that captures emotions of African people in various parts of the continent, and then use this database to create AI apps to read people’s emotions. Mercedes-Benz has already implemented the “Attention Assist”, which alerts drivers to fatigue.

Another important area is the creation of an African health database. AI algorithms are able to diagnose diseases better than human doctors. However, these algorithms depend on the availability of data. To capitalise on this, we need to collect such data and use it to build algorithms that will be able to augment medical care….(More)”.

Beyond Bias: Re-Imagining the Terms of ‘Ethical AI’ in Criminal Law


Paper by Chelsea Barabas: “Data-driven decision-making regimes, often branded as “artificial intelligence,” are rapidly proliferating across the US criminal justice system as a means of predicting and managing the risk of crime and addressing accusations of discriminatory practices. These data regimes have come under increased scrutiny, as critics point out the myriad ways that they can reproduce or even amplify pre-existing biases in the criminal justice system. This essay examines contemporary debates regarding the use of “artificial intelligence” as a vehicle for criminal justice reform, by closely examining two general approaches to, what has been widely branded as, “algorithmic fairness” in criminal law: 1) the development of formal fairness criteria and accuracy measures that illustrate the trade-offs of different algorithmic interventions and 2) the development of “best practices” and managerialist standards for maintaining a baseline of accuracy, transparency and validity in these systems.

The essay argues that attempts to render AI-branded tools more accurate by addressing narrow notions of “bias,” miss the deeper methodological and epistemological issues regarding the fairness of these tools. The key question is whether predictive tools reflect and reinforce punitive practices that drive disparate outcomes, and how data regimes interact with the penal ideology to naturalize these practices. The article concludes by calling for an abolitionist understanding of the role and function of the carceral state, in order to fundamentally reformulate the questions we ask, the way we characterize existing data, and how we identify and fill gaps in existing data regimes of the carceral state….(More)”