Synthetic data: innovation for public good


Blog Post by Catrin Cheung: “What is synthetic data, and how can it be used for public good? ….Synthetic data are artificially generated data that have the look and structure of real data, but do not contain any information on individuals. They also contain more general characteristics that are used to find patterns in the data.

They are modelled on real data, but designed in a way which safeguards the legal, ethical and confidentiality requirements of the original data. Given their resemblance to the original data, synthetic data are useful in a range of situations, for example when data is sensitive or missing. They are used widely as teaching materials, to test code or mathematical models, or as training data for machine learning models….

There’s currently a wealth of research emerging from the health sector, as the nature of data published is often sensitive. Public Health England have synthesised cancer data which can be freely accessed online. NHS Scotland are making advances in cutting-edge machine learning methods such as Variational Auto Encoders and Generative Adversarial Networks (GANs).

There is growing interest in this area of research, and its influence extends beyond the statistical community. While the Data Science Campus have also used GANs to generate synthetic data in their latest research, its power is not limited to data generation. It can be trained to construct features almost identical to our own across imagery, music, speech and text. In fact, GANs have been used to create a painting of Edmond de Belamy, which sold for $432,500 in 2018!

Within the ONS, a pilot to create synthetic versions of securely held Labour Force Survey data has been carried out using a package in R called “synthpop”. This synthetic dataset can be shared with approved researchers to de-bug codes, prior to analysis of data held in the Secure Research Service….

Although much progress is done in this field, one challenge that persists is guaranteeing the accuracy of synthetic data. We must ensure that the statistical properties of synthetic data match properties of the original data.

Additional features, such as the presence of non-numerical data, add to this difficult task. For example, if something is listed as “animal” and can take the possible values “dog”,”cat” or “elephant”, it is difficult to convert this information into a format suitable for precise calculations. Furthermore, given that datasets have different characteristics, there is no straightforward solution that can be applied to all types of data….particular focus was also placed on the use of synthetic data in the field of privacy, following from the challenges and opportunities identified by the National Statistician’s Quality Review of privacy and data confidentiality methods published in December 2018….(More)”.

Digital Data for Development


LinkedIn: “The World Bank Group and LinkedIn share a commitment to helping workers around the world access opportunities that make good use of their talents and skills. The two organizations have come together to identify new ways that data from LinkedIn can help inform policymakers who seek to boost employment and grow their economies.

This site offers data and automated visuals of industries where LinkedIn data is comprehensive enough to provide an emerging picture. The data complements a wealth of official sources and can offer a more real-time view in some areas particularly for new, rapidly changing digital and technology industries.

The data shared in the first phase of this collaboration focuses on 100+ countries with at least 100,000 LinkedIn members each, distributed across 148 industries and 50,000 skills categories. In the near term, it will help World Bank Group teams and government partners pinpoint ways that developing countries could stimulate growth and expand opportunity, especially as disruptive technologies reshape the economic landscape. As LinkedIn’s membership and digital platforms continue to grow in developing countries, this collaboration will assess the possibility to expand the sectors and countries covered in the next annual update.

This site offers downloadable data, visualizations, and an expanding body of insights and joint research from the World Bank Group and LinkedIn. The data is being made accessible as a public good, though it will be most useful for policy analysts, economists, and researchers….(More)”.

Statistics Estonia to coordinate data governance


Article by Miriam van der Sangen at CBS: “In 2018, Statistics Estonia launched a new strategy for the period 2018-2022. This strategy addresses the organisation’s aim to produce statistics more quickly while minimising the response burden on both businesses and citizens. Another element in the strategy is addressing the high expectations in Estonian society regarding the use of data. ‘We aim to transform Statistics Estonia into a national data agency,’ says Director General Mägi. ‘This means our role as a producer of official statistics will be enlarged by data governance responsibilities in the public sector. Taking on such responsibilities requires a clear vision of the whole public data ecosystem and also agreement to establish data stewards in most public sector institutions.’…

the Estonian Parliament passed new legislation that effectively expanded the number of official tasks for Statistics Estonia. Mägi elaborates: ‘Most importantly, we shall be responsible for coordinating data governance. The detailed requirements and conditions of data governance will be specified further in the coming period.’ Under the new Act, Statistics Estonia will also have more possibilities to share data with other parties….

Statistics Estonia is fully committed to producing statistics which are based on big data. Mägi explains: ‘At the moment, we are actively working on two big data projects. One project involves the use of smart electricity meters. In this project, we are looking into ways to visualise business and household electricity consumption information. The second project involves web scraping of prices and enterprise characteristics. This project is still in an initial phase, but we can already see that the use of web scraping can improve the efficiency of our production process.’ We are aiming to extend the web scraping project by also identifying e-commerce and innovation activities of enterprises.’

Yet another ambitious goal for Statistics Estonia lies in the field of data science. ‘Similarly to Statistics Netherlands, we established experimental statistics and data mining activities years ago. Last year, we developed a so-called think-tank service, providing insights from data into all aspects of our lives. Think of birth, education, employment, et cetera. Our key clients are the various ministries, municipalities and the private sector. The main aim in the coming years is to speed up service time thanks to visualisations and data lake solutions.’ …(More)”.

New Data-Driven Map Shows Spread of Participation in Democracy


Loren Peabody at the Participatory Budgeting Project: “As we celebrate the first 30 years of participatory budgeting (PB) in the world and the first 10 years of the Participatory Budgeting Project (PBP), we reflect on how far and wide PB has spread–and how it continues to grow! We’re thrilled to introduce a new tool to help us look back as we plan for the next 30+ years of PB. And so we’re introducing a map of PB across the U.S. and Canada. Each dot on the map represents a place where democracy has been deepened by bringing people together to decide together how to invest public resources in their community….

This data sheds light on larger questions, such as what is the relationship between the size of PB budgets and the number of people who participate? Looking at PBP data on processes in counties, cities, and urban districts, we find a positive correlation between the size of the PB budget per person and the number of people who take part in a PB vote (r=.22, n=245). In other words, where officials make a stronger commitment to funding PB, more people take part in the process–all the more reason to continue growing PB!….(More)”.

Open Justice: Public Entrepreneurs Learn to Use New Technology to Increase the Efficiency, Legitimacy, and Effectiveness of the Judiciary


The GovLab: “Open justice is a growing movement to leverage new technologies – including big data, digital platforms, blockchain and more – to improve legal systems by making the workings of courts easier to understand, scrutinize and improve. Through the use of new technology, open justice innovators are enabling greater efficiency, fairness, accountability and a reduction in corruption in the third branch of government. For example, the open data portal ‘Atviras Teismas’ Lithuania (translated ‘open court’ Lithuania) is a platform for monitoring courts and judges through performance metrics’. This portal serves to make the courts of Lithuania transparent and benefits both courts and citizens by presenting comparative data on the Lithuanian Judiciary.

To promote more Open Justice projects, the GovLab in partnership with the Electoral Tribunal of the Federal Judiciary (TEPJF) of Mexico, launched an historic, first of its kind, online course on Open Justice. Designed primarily for lawyers, judges, and public officials – but also intended to appeal to technologists, and members of the public – the Spanish-language course consists of 10 modules.

Each of the ten modules comprises:

  1. A short video-based lecture
  2. An original Open Justice reader
  3. Associated additional readings
  4. A self-assessment quiz
  5. A demonstration of a platform or tool
  6. An interview with a global practitioner

Among those featured in the interviews are Felipe Moreno of Jusbrasil, Justin Erlich of OpenJustice California, Liam Hayes of Aurecon, UK, Steve Ghiassi of Legaler, Australia, and Sara Castillo of Poder Judicial, Chile….(More)”.

Facebook’s AI team maps the whole population of Africa


Devin Coldewey at TechCrunch: “A new map of nearly all of Africa shows exactly where the continent’s 1.3 billion people live, down to the meter, which could help everyone from local governments to aid organizations. The map joins others like it from Facebook  created by running satellite imagery through a machine learning model.

It’s not exactly that there was some mystery about where people live, but the degree of precision matters. You may know that a million people live in a given region, and that about half are in the bigger city and another quarter in assorted towns. But that leaves hundreds of thousands only accounted for in the vaguest way.

Fortunately, you can always inspect satellite imagery and pick out the spots where small villages and isolated houses and communities are located. The only problem is that Africa is big. Really big. Manually labeling the satellite imagery even from a single mid-sized country like Gabon or Malawi would take a huge amount of time and effort. And for many applications of the data, such as coordinating the response to a natural disaster or distributing vaccinations, time lost is lives lost.

Better to get it all done at once then, right? That’s the idea behind Facebook’s Population Density Maps project, which had already mapped several countries over the last couple of years before the decision was made to take on the entire African continent….

“The maps from Facebook ensure we focus our volunteers’ time and resources on the places they’re most needed, improving the efficacy of our programs,” said Tyler Radford, executive director of the Humanitarian OpenStreetMap Team, one of the project’s partners.

The core idea is straightforward: Match census data (how many people live in a region) with structure data derived from satellite imagery to get a much better idea of where those people are located.

“With just the census data, the best you can do is assume that people live everywhere in the district – buildings, fields, and forests alike,” said Facebook engineer James Gill. “But once you know the building locations, you can skip the fields and forests and only allocate the population to the buildings. This gives you very detailed 30 meter by 30 meter population maps.”

That’s several times more accurate than any extant population map of this size. The analysis is done by a machine learning agent trained on OpenStreetMap data from all over the world, where people have labeled and outlined buildings and other features.

First the huge amount of Africa’s surface that obviously has no structure had to be removed from consideration, reducing the amount of space the team had to evaluate by a factor of a thousand or more. Then, using a region-specific algorithm (because things look a lot different in coastal Morocco than they do in central Chad), the model identifies patches that contain a building….(More)”.

Rethink government with AI


Helen Margetts and Cosmina Dorobantu at Nature: “People produce more than 2.5 quintillion bytes of data each day. Businesses are harnessing these riches using artificial intelligence (AI) to add trillions of dollars in value to goods and services each year. Amazon dispatches items it anticipates customers will buy to regional hubs before they are purchased. Thanks to the vast extractive might of Google and Facebook, every bakery and bicycle shop is the beneficiary of personalized targeted advertising.

But governments have been slow to apply AI to hone their policies and services. The reams of data that governments collect about citizens could, in theory, be used to tailor education to the needs of each child or to fit health care to the genetics and lifestyle of each patient. They could help to predict and prevent traffic deaths, street crime or the necessity of taking children into care. Huge costs of floods, disease outbreaks and financial crises could be alleviated using state-of-the-art modelling. All of these services could become cheaper and more effective.

This dream seems rather distant. Governments have long struggled with much simpler technologies. Flagship policies that rely on information technology (IT) regularly flounder. The Affordable Care Act of former US president Barack Obama nearly crumbled in 2013 when HealthCare.gov, the website enabling Americans to enrol in health insurance plans, kept crashing. Universal Credit, the biggest reform to the UK welfare state since the 1940s, is widely regarded as a disaster because of its failure to pay claimants properly. It has also wasted £837 million (US$1.1 billion) on developing one component of its digital system that was swiftly decommissioned. Canada’s Phoenix pay system, introduced in 2016 to overhaul the federal government’s payroll process, has remunerated 62% of employees incorrectly in each fiscal year since its launch. And My Health Record, Australia’s digital health-records system, saw more than 2.5 million people opt out by the end of January this year over privacy, security and efficacy concerns — roughly 1 in 10 of those who were eligible.

Such failures matter. Technological innovation is essential for the state to maintain its position of authority in a data-intensive world. The digital realm is where citizens live and work, shop and play, meet and fight. Prices for goods are increasingly set by software. Work is mediated through online platforms such as Uber and Deliveroo. Voters receive targeted information — and disinformation — through social media.

Thus the core tasks of governments, such as enforcing regulation, setting employment rights and ensuring fair elections require an understanding of data and algorithms. Here we highlight the main priorities, drawn from our experience of working with policymakers at The Alan Turing Institute in London….(More)”.

Innovation Meets Citizen Science


Caroline Nickerson at SciStarter: “Citizen science has been around as long as science, but innovative approaches are opening doors to more and deeper forms of public participation.

Below, our editors spotlight a few projects that feature new approaches, novel research, or low-cost instruments. …

Colony B: Unravel the secrets of microscopic life! Colony B is a mobile gaming app developed at McGill University that enables you to contribute to research on microbes. Collect microbes and grow your colony in a fast-paced puzzle game that advances important scientific research.

AirCasting: AirCasting is an open-source, end-to-end solution for collecting, displaying, and sharing health and environmental data using your smartphone. The platform consists of wearable sensors, including a palm-sized air quality monitor called the AirBeam, that detect and report changes in your environment. (Android only.)

LingoBoingo: Getting computers to understand language requires large amounts of linguistic data and “correct” answers to language tasks (what researchers call “gold standard annotations”). Simply by playing language games online, you can help archive languages and create the linguistic data used by researchers to improve language technologies. These games are in English, French, and a new “multi-lingual” category.

TreeSnap: Help our nation’s trees and protect human health in the process. Invasive diseases and pests threaten the health of America’s forests. With the TreeSnap app, you can record the location and health of particular tree species–those unharmed by diseases that have wiped out other species. Scientists then use the collected information to locate candidates for genetic sequencing and breeding programs. Tag trees you find in your community, on your property, or out in the wild to help scientists understand forest health….(More)”.

This tech tells cities when floods are coming–and what they will destroy


Ben Paynter at FastCompany: “Several years ago, one of the eventual founders of One Concern nearly died in a tragic flood. Today, the company specializes in using artificial intelligence to predict how natural disasters are unfolding in real time on a city-block-level basis, in order to help disaster responders save as many lives as possible….

To fix that, One Concern debuted Flood Concern in late 2018. It creates map-based visualizations of where water surges may hit hardest, up to five days ahead of an impending storm. For cities, that includes not just time-lapse breakdowns of how the water will rise, how fast it could move, and what direction it will be flowing, but also what structures will get swamped or washed away, and how differing mitigation efforts–from levy building to dam releases–will impact each scenario. It’s the winner of Fast Company’s 2019 World Changing Ideas Awards in the AI and Data category.

[Image: One Concern]

So far, Flood Concern has been retroactively tested against events like Hurricane Harvey to show that it could have predicted what areas would be most impacted well ahead of the storm. The company, which was founded in Silicon Valley in 2015, started with one of that region’s pressing threats: earthquakes. It’s since earned contracts with cities like San Francisco, Los Angeles, and Cupertino, as well as private insurance companies….

One Concern’s first offering, dubbed Seismic Concern, takes existing information from satellite images and building permits to figure out what kind of ground structures are built on, and what might happen if they started shaking. If a big one hits, the program can extrapolate from the epicenter to suggest the likeliest places for destruction, and then adjust as more data from things like 911 calls and social media gets factored in….(More)”.


Does increased ‘participation’ equal a new-found enthusiasm for democracy?


Blog by Stephen King and Paige Nicol: “With a few months under our belts, 2019 looks unlikely to be the year of a great global turnaround for democracy. The decade of democratic ‘recession’ that Larry Diamond declared in 2015 has dragged on and deepened, and may now be teetering on the edge of becoming a full-blown depression. 

The start of each calendar year is marked by the release of annual indices, rankings, and reports on how democracy is faring around the world. 2018 reports from Freedom House and the Economist Intelligence Unit (EIU) highlighted precipitous declines in civil liberties in long-standing democracies as well as authoritarian states. Some groups, including migrants, women, ethnic and other minorities, opposition politicians, and journalists have been particularly affected by these setbacks. According to the Committee to Protect Journalists, the number of journalists murdered nearly doubled last year, while the number imprisoned remained above 250 for the third consecutive year. 

Yet, the EIU also found a considerable increase in political participation worldwide. Levels of participation (including voting, protesting, and running for elected office, among other dimensions) increased substantially enough last year to offset falling scores in the other four categories of the index. Based on the methodology used, the rise in political participation was significant enough to prevent a decline in the global overall score for democracy for the first time in three years.

Though this development could give cause for optimism we believe it could also raise new concerns. 

In Zimbabwe, Sudan, and Venezuela we see people who, through desperation and frustration, have taken to the streets – a form of participation which has been met with brutal crackdowns. Time has yet to tell what the ultimate outcome of these protests will be, but it is clear that governments with autocratic tendencies have more – and cheaper – tools to monitor, direct, control, and suppress participation than ever before. 

Elsewhere, we see a danger of people becoming dislocated and disenchanted with democracy, as their representatives fail to take meaningful action on the issues that matter to them. In the UK Parliament, as Brexit discussions have become increasingly polarised and fractured along party political and ideological lines, Foreign Secretary Jeremy Hunt warned that there was a threat of social unrest if Parliament was seen to be frustrating the ‘will of the people.’ 

While we see enhanced participation as crucial to just and fair societies, it alone will not be the silver bullet that saves democracy. Whether this trend becomes a cause for hope or concern will depend on three factors: who is participating, what form does participation take, and how is participation received by those with power?…(More)”.