Feedback Loops in Open Data Ecosystems


Paper by Daniel Rudmark and Magnus Andersson: “Public agencies are increasingly publishing open data to increase transparency and fuel data-driven innovation. For these organizations, maintaining sufficient data quality is key to continuous re-use but also heavily dependent on feedback loops being initiated between data publishers and users. This paper reports from a longitudinal engagement with Scandinavian transportation agencies, where such feedback loops have been successfully established. Based on these experiences, we propose four distinct types of data feedback loops in which both data publishers and re-users play critical roles…(More)”.

Why you should develop a Rules as Code-enabled future


Blog by Tim de Sousa: “In 2021, Rules as Code (RaC) is truly hitting its stride. More governments are exploring the concept of machine-consumable legislation, regulation and policy, research institutes have been established, papers and reports are being published, tools and platforms are being built, and multi-disciplinary teams are learning new ways to draft and implement rules by getting their hands dirty.

RaC is still an emerging practice. Much of the current discussion about RaC is centred on introductory questions such as why and how we should code rules (and we’ve tried to answer those questions here), but to understand the true potential of RaC, we have to take a longer view.

In this two-part series, I set out some possible optimistic futures that could be enabled by RaC. We have to ask ourselves what kind of world we want to build with coded rules. so we can better plan how to get there.

Trustworthy automated decisions

The first reaction that RaC practitioners are often faced with is the fear of the killer robot. What happens if the automated system makes a wrong decision? What if that decision hurts someone? This is not an unfounded fear – we have seen poorly implemented and poorly used automated systems raise debts that are not owed, and lead to the arrest of innocent people. All human-built systems have flaws, and RaC-enabled systems are not immune.

As a former administrative lawyer and someone who grapples with the ethical uses of technology on a daily basis, the use of RaC to help people understand what decisions are being made and how they’re being made – that is, to enable trustworthy automated decisions – is particularly compelling.

Administrative law is the body of law that regulates how governments make decisions. In common law countries, this generally includes requirements that only relevant matters should be taken into account, irrelevant matters should not be, reasons should be given for decisions, and there should be workable avenues for merits reviews of decisions…(More)”

Statistics and Data Science for Good


Introduction to Special Issue of Chance by Caitlin Augustin, Matt Brems, and Davina P. Durgana: “One lesson that our team has taken from the past 18 months is that no individual, no team, and no organization can be successful on their own. We’ve been grateful and humbled to witness incredible collaboration—taking on forms of resource sharing, knowledge exchange, and reimagined outcomes. Some advances, like breakthrough medicine, have been widely publicized. Other advances have received less fanfare. All of these advances are in the public interest and demonstrate how collaborations can be done “for good.”

In reading this issue, we hope that you realize the power of diverse multidisciplinary collaboration; you recognize the positive social impact that statisticians, data scientists, and technologists can have; and you learn that this isn’t limited to companies with billions of dollars or teams of dozens of people. You, our reader, can get involved in similar positive social change.

This special edition of CHANCE focuses on using data and statistics for the public good and on highlighting collaborations and innovations that have been sparked by partnerships between pro bono institutions and social impact partners. We recognize that the “pro bono” or “for good” field is vast, and we welcome all actors working in the public interest into the big tent.

Through the focus of this edition, we hope to demonstrate how new or novel collaborations might spark meaningful and lasting positive change in communities, sectors, and industries. Anchored by work led through Statistics Without Borders and DataKind, this edition features reporting on projects that touch on many of the United Nations Sustainable Development Goals (SDGs).

Pro bono volunteerism is one way of democratizing access to high-skill, high-expense services that are often unattainable for social impact organizations. Statistics Without Borders (founded in 2008), DataKind (founded in 2012), and numerous other volunteer organizations began with this model in mind: If there was an organizing or galvanizing body that could coordinate the myriad requests for statistical, data science, machine learning, or data engineering help, there would be a ready supply of talented individuals who would want to volunteer to see those projects through. Or, put another way, “If you build it, they will come.”

Doing pro bono work requires more than positive intent. Plenty of well-meaning organizations and individuals charitably donate their time, their energy, their expertise, only to have an unintended adverse impact. To do work for good, ethics is an important part of the projects. In this issue, you’ll notice the writers’ attention to independent review boards (IRBs), respecting client and data privacy, discussing ethical considerations of methods used, and so on.

While no single publication can fully capture the great work of pro bono organizations working in “data for good,” we hope readers will be inspired to contribute to open source projects, solve problems in a new way, or even volunteer themselves for a future cohort of projects. We’re thrilled that this special edition represents programs, partners, and volunteers from around the world. You will learn about work that is truly representative of the SDGs, such as international health organizations’ work in Uganda, political justice organizations in Kenya, and conservationists in Madagascar, to name a few.

Several articles describe projects that are contextualized with the SDGs. While achieving many goals is interconnected, such as the intertwining of economic attainment and reducing poverty, we hope that calling out key themes here will whet your appetite for exploration.

  • • Multiple articles focused on tackling aspects of SDG 3: Ensuring healthy lives and promoting well-being for people at all ages.
  • • An article tackling SDG 8: Promote sustained, inclusive, and sustainable economic growth; full and productive employment; and decent work for all.
  • • Several articles touching on SDG 9: Build resilient infrastructure; promote inclusive and sustainable industrialization, and foster innovation; one is a reflection on building and sustaining free and open source software as a public good.
  • • A handful of articles highlighting the needs for capacity-building and systems-strengthening aligned to SDG 16: Promote peaceful and inclusive societies for sustainable development; provide access to justice for all; and build effective, accountable, and inclusive institutions at all levels.
  • • An article about migration along the southern borders of the United States addressing multiple issues related to poverty (SDG 1), opportunity (SDG 10), and peace and justice (SDG 16)….(More)”

GovTech Maturity Index : The State of Public Sector Digital Transformation


Report by the World Bank: “Governments have been using technology to modernize the public sector for decades. The World Bank Group (WBG) has been a partner in this process, providing both financing and technical assistance to facilitate countries’ digital transformation journeys since the 1980s. The WBG launched the GovTech Initiative in 2019 to support the latest generation of these reforms. Over the past five years, developing countries have increasingly requested WBG support to design even more advanced digital transformation programs. These programs will help to increase government efficiency and improve the access to and the quality of service delivery, provide more government-to-citizen and government-to-business communications, enhance transparency and reduce corruption, improve governance and oversight, and modernize core government operations. The GovTech Initiative appropriately responds to this growing demand.

The GovTech Maturity Index (GTMI) measures the key aspects of four GovTech focus areas—supporting core government systems, enhancing service delivery, mainstreaming citizen engagement, and fostering GovTech enablers—and assists advisers and practitioners in the design of new digital transformation projects. Constructed for 198 economies using consistent data sources, the GTMI is the most comprehensive measure of digital transformation in the public sector. Several similar indices and indicators are available in the public domain to measure aspects of digital government—including the United Nations e-Government Development Index, the WBG’s Digital Adoption Index, and the Organisation for Economic Co-operation and Development (OECD) Digital Government Index.

These indices, however, do not fully capture the aspects of emphasis in the GovTech approach—the whole-of-government approach and citizen centricity—as key when assessing the use of digital solutions for public sector modernization. The GTMI is not intended to be an assessment of readiness or performance; rather, it is intended to complement the existing tools and diagnostics by providing a baseline and a benchmark for GovTech maturity and by offering insights to those areas that have room for improvement. The GTMI is designed to be used by practitioners, policy makers, and task teams involved in the design of digital transformation strategies and individual projects, as well as by those who seek to understand their own practices and learn from those of others….(More)”.

Secondary use of health data in Europe


Report by Mark Boyd, Dr Milly Zimeta, Dr Jeni Tennison and Mahad Alassow: “Open and trusted health data systems can help Europe respond to the many urgent challenges facing its society and economy today. The global pandemic has already altered many of our societal and economic systems, and data has played a key role in enabling cross-border and cross-sector collaboration in public health responses.

Even before the pandemic, there was an urgent need to optimise healthcare systems and manage limited resources more effectively, to meet the needs of growing, and often ageing, populations. Now, there is a heightened need to develop early-diagnostic and health-surveillance systems, and more willingness to adopt digital healthcare solutions…

By reusing health data in different ways, we can increase the value of this data and help to enable these improvements. Clinical data, such as incidences of healthcare and clinical trials data, can be combined with data collected from other sources, such as sickness and insurance claims records, and from devices and wearable technologies. This data can then be anonymised and aggregated to generate new insights and optimise population health, improve patients’ health and experiences, create more efficient healthcare systems, and foster innovation.

This secondary use of health data can enable a wide range of benefits across the entire healthcare system. These include opportunities to optimise service, reduce health inequalities by better allocating resources, and enhance personalised healthcare –for example, by comparing treatments for people with similar characteristics. It can also help encourage innovation by extending research data to assess whether new therapies would work for a broader population….(More)”.

Greece used AI to curb COVID: what other nations can learn


Editorial at Nature: “A few months into the COVID-19 pandemic, operations researcher Kimon Drakopoulos e-mailed both the Greek prime minister and the head of the country’s COVID-19 scientific task force to ask if they needed any extra advice.

Drakopoulos works in data science at the University of Southern California in Los Angeles, and is originally from Greece. To his surprise, he received a reply from Prime Minister Kyriakos Mitsotakis within hours. The European Union was asking member states, many of which had implemented widespread lockdowns in March, to allow non-essential travel to recommence from July 2020, and the Greek government needed help in deciding when and how to reopen borders.

Greece, like many other countries, lacked the capacity to test all travellers, particularly those not displaying symptoms. One option was to test a sample of visitors, but Greece opted to trial an approach rooted in artificial intelligence (AI).

Between August and November 2020 — with input from Drakopoulos and his colleagues — the authorities launched a system that uses a machine-learning algorithm to determine which travellers entering the country should be tested for COVID-19. The authors found machine learning to be more effective at identifying asymptomatic people than was random testing or testing based on a traveller’s country of origin. According to the researchers’ analysis, during the peak tourist season, the system detected two to four times more infected travellers than did random testing.

The machine-learning system, which is among the first of its kind, is called Eva and is described in Nature this week (H. Bastani et al. Nature https://doi.org/10.1038/s41586-021-04014-z; 2021). It’s an example of how data analysis can contribute to effective COVID-19 policies. But it also presents challenges, from ensuring that individuals’ privacy is protected to the need to independently verify its accuracy. Moreover, Eva is a reminder of why proposals for a pandemic treaty (see Nature 594, 8; 2021) must consider rules and protocols on the proper use of AI and big data. These need to be drawn up in advance so that such analyses can be used quickly and safely in an emergency.

In many countries, travellers are chosen for COVID-19 testing at random or according to risk categories. For example, a person coming from a region with a high rate of infections might be prioritized for testing over someone travelling from a region with a lower rate.

By contrast, Eva collected not only travel history, but also demographic data such as age and sex from the passenger information forms required for entry to Greece. It then matched those characteristics with data from previously tested passengers and used the results to estimate an individual’s risk of infection. COVID-19 tests were targeted to travellers calculated to be at highest risk. The algorithm also issued tests to allow it to fill data gaps, ensuring that it remained up to date as the situation unfolded.

During the pandemic, there has been no shortage of ideas on how to deploy big data and AI to improve public health or assess the pandemic’s economic impact. However, relatively few of these ideas have made it into practice. This is partly because companies and governments that hold relevant data — such as mobile-phone records or details of financial transactions — need agreed systems to be in place before they can share the data with researchers. It’s also not clear how consent can be obtained to use such personal data, or how to ensure that these data are stored safely and securely…(More)”.

Who takes part in Citizen Science projects & why?


CS Track: “Citizen Science in Europe, as elsewhere, continues to manifest itself in a variety of different ways. While attracting interest across multiple sectors of society, its definition remains unclear. The first CS Track White Paper on Themes, objectives and participants of citizen science activities has just been published and, along with the initial results of the first large scale survey into participation in citizen science, provides an important overview of who participates in citizen science projects and what motivates them. This short report focuses on one aspect that emerges in this white paper.

Citizen Science Participants – who are they? 

Participants and who they are, have a significant impact on the objectives and outcomes of citizen science projects. However, existing information on the demographics of participants in citizen science projects is very limited and most studies have focused on a single project or programme. Furthermore, certain groups, like young people, are underrepresented in the available data.

What our research team has gathered from the literature and the initial results of the CS Track large-scale survey is the following:

  • Well-educated, affluent participants outnumber less affluent participants,
  • More men than women take part in many of the programmes that have been analysed.
  • Citizen scientists seem to be whitemiddle-agedscientifically literate or generally interested in science or scientific topics.
  • Scientistsacademicsteachersscience students and people who have a passion for the outdoors are among the groups of people most likely to take part in citizen science.
  • In agricultural, biological and environmental science-based programmes, participants are often scientists themselves, science teachers or students, conservation group members, backpackers or hikers or other outdoor enthusiasts – in other words people who care about nature.
  • Community and youth citizen science projects are underrepresented in the available data….(More)“.

Where Is Everyone? The Importance of Population Density Data


Data Artefact Study by Aditi Ramesh, Stefaan Verhulst, Andrew Young and Andrew Zahuranec: “In this paper, we explore new and traditional approaches to measuring population density, and ways in which density information has frequently been used by humanitarian, private-sector and government actors to advance a range of private and public goals. We explain how new innovations are leading to fresh ways of collecting data—and fresh forms of data—and how this may open up new avenues for using density information in a variety of contexts. Section III examines one particular example: Facebook’s High-Resolution Population Density Maps (also referred to as HRSL, or high resolution settlement layer). This recent initiative, created in collaboration with a number of external organizations, shows not only the potential of mapping innovations but also the potential benefits of inter-sectoral partnerships and sharing. We examine three particular use cases of HRSL, and we follow with an assessment and some lessons learned. These lessons are applicable to HRSL in particular, but also more broadly. We conclude with some thoughts on avenues for future research….(More)”.

The search engine of 1896


The Generalist Academy: In 1896 Paul Otlet set up a bibliographic query service by mail: a 19th century search engine….The end of the 19th century was awash with the written word: books, monographs, and publications of all kinds. It was fiendishly difficult to find what you wanted in that mess. Bibliographies – compilations of references on a specific subject – were the maps to this vast informational territory. But they were expensive and time-consuming to compile.

Paul Otlet had a passion for information. More precisely, he had a passion for organising information. He and Henri La Fontaine made bibliographies on many subjects – and then turned their efforts towards creating something better. A master bibliography. A bibliography to rule them all, nothing less than a complete record of everything that had ever been published on every topic. This was their plan: the grandly named Universal Bibliographic Repertory.

This ambitious endeavour listed sources for every topic that its creators could imagine. The references were meticulously recorded on index cards that were filed in a massive series of drawers like the ones pictured above. The whole thing was arranged according to their Universal Decimal Classification, and it was enormous. In 1895 there were four hundred thousand entries. At its peak in 1934, there were nearly sixteen million.

How could you access such a mega-bibliography? Well, Otlet and La Fontaine set up a mail service. People set in queries and received a summary of publications relating to that topic. Curious about the native religions of Sumatra? Want to explore the 19th century decipherment of Akkadian cuneiform? Send a request to the Universal Bibliographic Repertory, get a tidy list of the references you need. It was nothing less than a manual search engine, one hundred and twenty-five years ago.

Encyclopedia Universalis
Paul Otlet, Public domain, via Wikimedia Commons

Otlet had many more ambitions: a world encyclopaedia of knowledge, contraptions to easily access every publication in the world (he was an early microfiche pioneer), and a whole city to serve as the bright centre of global intellect. These ambitions were mostly unrealised, due to lack of funds and the intervention of war. But today Otlet is recognised as an important figure in the history of information science…(More)”.

Designing geospatial data portals


Guidance by The Geospatial Commission: “…for developers and designers to increase the discoverability and usefulness of geospatial data through user-focused data portals….Data portals differ by the data they provide and the audiences they serve. ‘Data portals’ described within this guidance are web-based interfaces designed to help users find and access datasets. Optimally, they should be built around metadata records which describe datasets, provide pointers to where they can be located and explain any restrictions or limitations in their use.

Although more and more geospatial data is being made available online, there are users who are confused about where to go, who to trust and which datasets are most relevant to answering their questions.

In 2018 user researchers and designers across the Geo6 came together to explore the needs and frustrations experienced by users of data portals containing geospatial data.

Throughout 2019 and 2020 the Geo6 have worked on solutions to address pain points identified by the user research conducted for the Data Discoverability project. This guidance provides high-level general recommendations, however, exact requirements for any given portal will vary depending on the needs of your target audience and according to the data volumes and subject matters covered. This resource is not a replacement for portal-specific user research and design work…(More)”.