Responsible Data Science


Book by Peter Bruce and Grant Fleming: “The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of “Black box” algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair.

Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk of undue harm to vulnerable members of society. Both data science practitioners and managers of analytics teams will learn how to:

  • Improve model transparency, even for black box models
  • Diagnose bias and unfairness within models using multiple metrics
  • Audit projects to ensure fairness and minimize the possibility of unintended harm…(More)”

Mapping the United Nations Fundamental Principles of Official Statistics against new and big data sources


Paper by Dominik Rozkrut, Olga Świerkot-Strużewska, and Gemma Van Halderen: “Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information….(More)”.

Principles and Practices for a Federal Statistical Agency


Book by the National Academies of Sciences, Engineering, and Medicine: “Government statistics are widely used to inform decisions by policymakers, program administrators, businesses and other organizations as well as households and the general public. Principles and Practices for a Federal Statistical Agency, Seventh Edition will assist statistical agencies and units, as well as other agencies engaged in statistical activities, to carry out their responsibilities to provide accurate, timely, relevant, and objective information for public and policy use. This report will also inform legislative and executive branch decision makers, data users, and others about the characteristics of statistical agencies that enable them to serve the public good….(More)”

Building on a year of open data: progress and promise


Jennifer Yokoyama at Microsoft: “…The biggest takeaway from our work this past year – and the one thing I hope any reader of this post will take away – is that data collaboration is a spectrum. From the presence (or absence) of data to how open that data is to the trust level of the collaboration participants, these factors may necessarily lead to different configurations and different goals, but they can all lead to more open data and innovative insights and discoveries.

Here are a few other lessons we have learned over the last year:

  1. Principles set the foundation for stakeholder collaboration: When we launched the Open Data Campaign, we adopted five principles that guide our contributions and commitments to trusted data collaborations: Open, Usable, Empowering, Secure and Private. These principles underpin our participation, but importantly, organizations can build on them to establish responsible ways to share and collaborate around their data. The London Data Commission, for example, established a set of data sharing principles for public- and private-sector organizations to ensure alignment and to guide the participating groups in how they share data.
  2. There is value in pilot projects: Traditionally, data collaborations with several stakeholders require time – often including a long runway for building the collaboration, plus the time needed to execute on the project and learn from it. However, our learnings show short-term projects that experiment and test data collaborations can provide valuable insights. The London Data Commission did exactly that with the launch of four short-term pilot projects. Due to the success of the pilots, the partners are exploring how they can be expanded upon.
  3. Open data doesn’t require new data: Identifying data to share does not always mean it must be newly shared data; sometimes the data was narrowly shared, but can be shared more broadly, made more accessible or analyzed for a different purpose. Microsoft’s environmental indicator data is an example of data that was already disclosed in certain venues, but was then made available to the Linux Foundation’s OS-Climate Initiative to be consumed through analytics, thereby extending its reach and impact…

To get started, we suggest that emerging data collaborations make use of the wealth of existing resources. When embarking on data collaborations, we leveraged many of the definitions, toolkits and guides from leading organizations in this space. As examples, resources such as the Open Data Institute’s Data Ethics Canvas are extremely useful as a framework to develop ethical guidance. Additionally, The GovLab’s Open Data Policy Lab and Executive Course on Data Stewardship, both supported by Microsoft, highlight important case studies, governance considerations and frameworks when sharing data. If you want to learn more about the exciting work our partners are doing, check out the latest posts from the Open Data Institute and GovLab…(More)”. See also Open Data Policy Lab.

Resetting Data Governance: Authorized Public Purpose Access and Society Criteria for Implementation of APPA Principles


Paper by the WEF Japan: “In January 2020, our first publication presented Authorized Public Purpose Access (APPA), a new data governance model that aims to strike a balance between individual rights and the interests of data holders and the public interest. It is proposed that the use of personal data for public-health purposes, including fighting pandemics, be subject to appropriate and balanced governance mechanisms such as those set out the APPA approach. The same approach could be extended to the use of data for non-medical public-interest purposes, such as achieving the United Nations Sustainable Development Goals (SDGs). This publication proposes a systematic approach to implementing APPA and to pursuing public-interest goals through data use. The approach values practicality, broad social agreement on appropriate goals and methods, and the valid interests of all stakeholders….(More)”.

Tracking Economic Activity in Response to the COVID-19 using nighttime Lights


Paper by Mark Roberts: “Over the last decade, nighttime lights – artificial lighting at night that is associated with human activity and can be detected by satellite sensors – have become a proxy for monitoring economic activity. To examine how the COVID-19 crisis has affected economic activity in Morocco, we calculated monthly lights estimates for both the country overall and at a sub-national level. By examining the intensity of Morocco’s lights in comparison with the quarterly GDP data at the national level, we are also able to confirm that nighttime lights are able to track movements in real economic activity for Morocco….(More)”.

What Is Mobility Data? Where Is It Used?


Brief by Andrew J. Zahuranec, Stefaan Verhulst, Andrew Young, Aditi Ramesh, and Brennan Lake: “Mobility data is data about the geographic location of a device passively produced through normal activity. Throughout the pandemic, public health experts and public officials have used mobility data to understand patterns of COVID-19’s spread and the impact of disease control measures. However, privacy advocates and others have questioned the need for this data and raised concerns about the capacity of such data-driven tools to facilitate surveillance, improper data use, and other exploitative practices.

In April, The GovLab, Cuebiq, and the Open Data Institute released The Use of Mobility Data for Responding to the COVID-19 Pandemic, which relied on several case studies to look at the opportunities, risks, and challenges associated with mobility data. Today, we hope to supplement that report with a new resource: a brief on what mobility data is and the different types of data it can include. The piece is a one-pager to allow decision-makers to easily read it. It provides real-world examples from the report to illustrate how different data types can be used in a responsible way…..(More)”.

How we mapped billions of trees in West Africa using satellites, supercomputers and AI


Martin Brandt and Kjeld Rasmussen in The Conversation: “The possibility that vegetation cover in semi-arid and arid areas was retreating has long been an issue of international concern. In the 1930s it was first theorized that the Sahara was expanding and woody vegetation was on the retreat. In the 1970s, spurred by the “Sahel drought”, focus was on the threat of “desertification”, caused by human overuse and/or climate change. In recent decades, the potential impact of climate change on the vegetation has been the main concern, along with the feedback of vegetation on the climate, associated with the role of the vegetation in the global carbon cycle.

Using high-resolution satellite data and machine-learning techniques at supercomputing facilities, we have now been able to map billions of individual trees and shrubs in West Africa. The goal is to better understand the real state of vegetation coverage and evolution in arid and semi-arid areas.

Finding a shrub in the desert – from space

Since the 1970s, satellite data have been used extensively to map and monitor vegetation in semi-arid areas worldwide. Images are available in “high” spatial resolution (with NASA’s satellites Landsat MSS and TM, and ESA’s satellites Spot and Sentinel) and “medium or low” spatial resolution (NOAA AVHRR and MODIS).

To accurately analyse vegetation cover at continental or global scale, it is necessary to use the highest-resolution images available – with a resolution of 1 metre or less – and up until now the costs of acquiring and analysing the data have been prohibitive. Consequently, most studies have relied on moderate- to low-resolution data. This has not allowed for the identification of individual trees, and therefore these studies only yield aggregate estimates of vegetation cover and productivity, mixing herbaceous and woody vegetation.

In a new study covering a large part of the semi-arid Sahara-Sahel-Sudanian zone of West Africa, published in Nature in October 2020, an international group of researchers was able to overcome these limitations. By combining an immense amount of high-resolution satellite data, advanced computing capacities, machine-learning techniques and extensive field data gathered over decades, we were able to identify individual trees and shrubs with a crown area of more than 3 m2 with great accuracy. The result is a database of 1.8 billion trees in the region studied, available to all interested….(More)”

Supercomputing, machine learning, satellite data and field assessments allow to map billions of individual trees in West Africa. Martin Brandt, Author provided

The Case for Local Data Sharing Ordinances


Paper by Beatriz Botero Arcila: “Cities in the US have started to enact data-sharing rules and programs to access some of the data that technology companies operating under their jurisdiction – like short-term rental or ride hailing companies – collect. This information allows cities to adapt too to the challenges and benefits of the digital information economy. It allows them to understand what their impact is on congestion, the housing market, the local job market and even the use of public spaces. It also empowers them to act accordingly by, for example, setting vehicle caps or mandating a tailored minimum pay for gig-workers. These companies, however, sometimes argue that sharing this information attempts against their users’ privacy rights and their privacy rights, because this information is theirs; it’s part of their business records. The question is thus what those rights are, and whether it should and could be possible for local governments to access that information to advance equity and sustainability, without harming the legitimate privacy interests of both individuals and companies. This Article argues that within current Fourth Amendment doctrine and privacy law there is space for data-sharing programs. Privacy law, however, is being mobilized to alter the distribution of power and welfare between local governments, companies, and citizens within current digital information capitalism to extend those rights beyond their fair share and preempt permissible data-sharing requests. The Article warns that if the companies succeed in their challenges, privacy law will have helped shield corporate power from regulatory oversight, while still leaving individuals largely unprotected and submitting local governments further to corporate interests….(More)”.

Data Access, Consumer Interests and Public Welfare


Book edited by Bundesministerium der Justiz und für Verbraucherschutz, and Max-Planck-Institut für Innovation und Wettbewerb: “Data are considered to be key for the functioning of the data economy as well as for pursuing multiple public interest concerns. Against this backdrop this book strives to device new data access rules for future legislation. To do so, the contributions first explain the justification for such rules from an economic and more general policy perspective. Then, building on the constitutional foundations and existing access regimes, they explore the potential of various fields of the law (competition and contract law, data protection and consumer law, sector-specific regulation) as a basis for the future legal framework. The book also addresses the need to coordinate data access rules with intellectual property rights and to integrate these rules as one of multiple measures in larger data governance systems. Finally, the book discusses the enforcement of the Government’s interest in using privately held data as well as potential data access rights of the users of connected devices….(More)”.