Why bad times call for good data


Tim Harford in the Financial Times: “Watching the Ever Given wedge itself across the Suez Canal, it would have taken a heart of stone not to laugh. But it was yet another unpleasant reminder that the unseen gears in our global economy can all too easily grind or stick.

From the shutdown of Texas’s plastic polymer manufacturing to a threat, to vaccine production from a shortage of giant plastic bags, we keep finding out the hard way that modern life relies on weak links in surprising places.

So where else is infrastructure fragile and taken for granted? I worry about statistical infrastructure — the standards and systems we rely on to collect, store and analyse our data.

Statistical infrastructure sounds less important than a bridge or a power line, but it can mean the difference between life and death for millions. Consider Recovery (Randomised Evaluations of Covid-19 Therapy). Set up in a matter of days by two Oxford academics, Martin Landray and Peter Horby, over the past year Recovery has enlisted hospitals across the UK to run randomised trials of treatments such as the antimalarial drug hydroxychloroquine and the cheap steroid dexamethasone.

With minimal expense and paperwork, it turned the guesses of physicians into simple but rigorous clinical trials. The project quickly found that dexamethasone was highly effective as a treatment for severe Covid-19, thereby saving a million lives.

Recovery relied on data accumulated as hospitals treated patients and updated their records. It wasn’t always easy to reconcile the different sources — some patients were dead according to one database and alive on another. But such data problems are solvable and were solved. A modest amount of forethought about collecting the right data in the right way has produced enormous benefits….

But it isn’t just poor countries that have suffered. In the US, data about Covid-19 testing was collected haphazardly by states. This left the federal government flying blind, unable to see where and how quickly the virus was spreading. Eventually volunteers, led by the journalists Robinson Meyer and Alexis Madrigal of the Covid Tracking Project, put together a serviceable data dashboard. “We have come to see the government’s initial failure here as the fault on which the entire catastrophe pivots,” wrote Meyer and Madrigal in The Atlantic. They are right.

What is more striking is that the weakness was there in plain sight. Madrigal recently told me that the government’s plan for dealing with a pandemic assumed that good data would be available — but did not build the systems to create them. It is hard to imagine a starker example of taking good statistical infrastructure for granted….(More)”.

Tech tools help deepen citizen input in drafting laws abroad and in U.S. states


Gopal Ratnam at RollCall: “Earlier this month, New Jersey’s Department of Education launched a citizen engagement process asking students, teachers and parents to vote on ideas for changes that officials should consider as the state reopens its schools after the pandemic closed classrooms for a year. 

The project, managed by The Governance Lab at New York University’s Tandon School of Engineering, is part of a monthlong nationwide effort using an online survey tool called All Our Ideas to help state education officials prioritize policymaking based on ideas solicited from those who are directly affected by the policies.

Among the thousands of votes cast for various ideas nationwide, teachers and parents backed changes that would teach more problem-solving skills to kids. But students backed a different idea as the most important: making sure that kids have social and emotional skills, as well as “self-awareness and empathy.” 

A government body soliciting ideas from those who are directly affected, via online technology, is one small example of greater citizen participation in governance that advocates hope can grow at both state and federal levels….

Taiwan has taken crowdsourcing legislative ideas to a new height.

Using a variety of open-source engagement and consultation tools that are collectively known as the vTaiwan process, government ministries, elected representatives, experts, civil society groups, businesses and ordinary citizens come together to produce legislation. 

The need for an open consultation process stemmed from the 2014 Sunflower Student Movement, when groups of students and others occupied the Taiwanese parliament to protest the fast-tracking of a trade agreement with China with little public review.  

After the country’s parliament acceded to the demands, the “consensus opinion was that instead of people having to occupy the parliament every time there’s a controversial, emergent issue, it might actually work better if we have a consultation mechanism in the very beginning of the issue rather than at the end,” said Audrey Tang, Taiwan’s digital minister. …

At about the same time that Taiwan’s Sunflower movement was unfolding, in Brazil then-President Dilma Rousseff signed into law the country’s internet bill of rights in April 2014. 

The bill was drafted and refined through a consultative process that included not only legal and technical experts but average citizens as well, said Debora Albu, program coordinator at the Institute for Technology and Society of Rio, also known as ITS. 

The institute was involved in designing the platform for seeking public participation, Albu said. 

“From then onwards, we wanted to continue developing projects that incorporated this idea of collective intelligence built into the development of legislation or public policies,” Albu said….(More)”.

We’re Beating Systems Change to Death


Essay by Kevin Starr: “Systems change! Just saying the words aloud makes me feel like one of the cognoscenti, one of the elite who has transcended the ways of old-school philanthropy. Those two words capture our aspirations of lasting impact at scale: systems are big, and if you manage to change them, they’ll keep spinning out impact forever. Why would you want to do anything else?

There’s a problem, though. “Systems analysis” is an elegant and useful way to think about problems and get ideas for solutions, but “systems change” is accelerating toward buzzword purgatory. It’s so sexy that everyone wants to use it for everything. …

But when you rummage through the growing literature on systems change thinking, there are in fact a few recurring themes. One is the need to tackle the root causes of any problem you take on. Another is that a broad coalition must be assembled ASAP. Finally, the most salient theme is the notion that the systems involved are transformed as a result of the work (although in many of the examples I read about, it’s not articulated clearly just what system is being changed).

Taken individually or as a whole, these themes point to some of the ways in which systems change is a less-than-ideal paradigm for the work we need to get done:

1. It’s too hard to know to what degree systems change is or isn’t happening. It may be the case that “not everything that matters can be counted,” but most of the stuff that matters can, and it’s hard to get better at something if you’re unable to measure it. But these words of a so-called expert on systems change measurement are typical of what I’ve seen in in the literature: “Measuring systems change is about detecting patterns in the connections between the parts. It is about qualitative changes in the structure of the system, about its adaptiveness and resilience, about synergies emerging from collective efforts—and more…”

Like I said, it’s too hard to know to what is or isn’t happening.

2. “Root cause” thinking can—paradoxically—bog down progress. “Root cause” analysis is a common feature of most systems change discussions, and it’s a wonderful tool to generate ideas and avoid unintended consequences. However, broad efforts to tackle all of a problem’s root causes can turn anything into a complicated, hard-to-replicate project. It can also make things look so overwhelming as to result in a kind of paralysis. And however successful a systems change effort might be, that complication makes it hard to replicate, and you’re often stuck with a one-off project….(More)”.

Data Is Power: Washington Needs to Craft New Rules for the Digital Age


Matthew Slaughter and David McCormick at Foreign Affairs: “…Working with all willing and like-minded nations, it should seek a structure for data that maximizes its immense economic potential without sacrificing privacy and individual liberty. This framework should take the form of a treaty that has two main parts.

First would be a set of binding principles that would foster the cross-border flow of data in the most data-intensive sectors—such as energy, transportation, and health care. One set of principles concerns how to value data and determine where it was generated. Just as traditional trade regimes require goods and services to be priced and their origins defined, so, too, must this framework create a taxonomy to classify data flows by value and source. Another set of principles would set forth the privacy standards that governments and companies would have to follow to use data. (Anonymizing data, made easier by advances in encryption and quantum computing, will be critical to this step.) A final principle, which would be conditional on achieving the other two, would be to promote as much cross-border and open flow of data as possible. Consistent with the long-established value of free trade, the parties should, for example, agree to not levy taxes on data flows—and diligently enforce that rule. And they would be wise to ensure that any negative impacts of open data flows, such as job losses or reduced wages, are offset through strong programs to help affected workers adapt to the digital economy.

Such standards would benefit every sector they applied to. Envision, for example, dozens of nations with data-sharing arrangements for autonomous vehicles, oncology treatments, and clean-tech batteries. Relative to their experience in today’s Balkanized world, researchers would be able to discover more data-driven innovations—and in more countries, rather than just in those that already have a large presence in these industries.

The second part of the framework would be free-trade agreements regulating the capital goods, intermediate inputs, and final goods and services of the targeted sectors, all in an effort to maximize the gains that might arise from data-driven innovations. Thus would the traditional forces of comparative advantage and global competition help bring new self-driving vehicles, new lifesaving chemotherapy compounds, and new sources of renewable energy to participating countries around the world. 

There is already a powerful example of such agreements. In 1996, dozens of countries accounting for nearly 95 percent of world trade in information technology ratified the Information Technology Agreement, a multilateral trade deal under the WTO. The agreement ultimately eliminated all tariffs for hundreds of IT-related capital goods, intermediate inputs, and final products—from machine tools to motherboards to personal computers. The agreement proved to be an important impetus for the subsequent wave of the IT revolution, a competitive spur that led to productivity gains for firms and price declines for consumers….(More)”.

Citizen science is booming during the pandemic


Sigal Samuel at Vox: “…The pandemic has driven a huge increase in participation in citizen science, where people without specialized training collect data out in the world or perform simple analyses of data online to help out scientists.

Stuck at home with time on their hands, millions of amateurs arouennd the world are gathering information on everything from birds to plants to Covid-19 at the request of institutional researchers. And while quarantine is mostly a nightmare for us, it’s been a great accelerant for science.

Early in the pandemic, a firehose of data started gushing forth on citizen science platforms like Zooniverse and SciStarter, where scientists ask the public to analyze their data online.It’s a form of crowdsourcing that has the added bonus of giving volunteers a real sense of community; each project has a discussion forum where participants can pose questions to each other (and often to the scientists behind the projects) and forge friendly connections.

“There’s a wonderful project called Rainfall Rescue that’s transcribing historical weather records. It’s a climate change project to understand how weather has changed over the past few centuries,” Laura Trouille, vice president of citizen science at the Adler Planetarium in Chicago and co-lead of Zooniverse, told me. “They uploaded a dataset of 10,000 weather logs that needed transcribing — and that was completed in one day!”

Some Zooniverse projects, like Snapshot Safari, ask participants to classify animals in images from wildlife cameras. That project saw daily classifications go from 25,000 to 200,000 per day in the initial days of lockdown. And across all its projects, Zooniverse reported that 200,000 participants contributed more than 5 million classifications of images in one week alone — the equivalent of 48 years of research. Although participation has slowed a bit since the spring, it’s still four times what it was pre-pandemic.

Many people are particularly eager to help tackle Covid-19, and scientists have harnessed their energy. Carnegie Mellon University’s Roni Rosenfeld set up a platform where volunteers can help artificial intelligence predict the spread of the coronavirus, even if they know nothing about AI. Researchers at the University of Washington invited people to contribute to Covid-19 drug discovery using a computer game called Foldit; they experimented with designing proteins that could attach to the virus that causes Covid-19 and prevent it from entering cells….(More)”.

How spooks are turning to superforecasting in the Cosmic Bazaar


The Economist: “Every morning for the past year, a group of British civil servants, diplomats, police officers and spies have woken up, logged onto a slick website and offered their best guess as to whether China will invade Taiwan by a particular date. Or whether Arctic sea ice will retrench by a certain amount. Or how far covid-19 infection rates will fall. These imponderables are part of Cosmic Bazaar, a forecasting tournament created by the British government to improve its intelligence analysis.

Since the website was launched in April 2020, more than 10,000 forecasts have been made by 1,300 forecasters, from 41 government departments and several allied countries. The site has around 200 regular forecasters, who must use only publicly available information to tackle the 30-40 questions that are live at any time. Cosmic Bazaar represents the gamification of intelligence. Users are ranked by a single, brutally simple measure: the accuracy of their predictions.

Forecasting tournaments like Cosmic Bazaar draw on a handful of basic ideas. One of them, as seen in this case, is the “wisdom of crowds”, a concept first illustrated by Francis Galton, a statistician, in 1907. Galton observed that in a contest to estimate the weight of an ox at a county fair, the median guess of nearly 800 people was accurate within 1% of the true figure.

Crowdsourcing, as this idea is now called, has been augmented by more recent research into whether and how people make good judgments. Experiments by Philip Tetlock of the University of Pennsylvania, and others, show that experts’ predictions are often no better than chance. Yet some people, dubbed “superforecasters”, often do make accurate predictions, largely because of the way they form judgments—such as having a commitment to revising predictions in light of new data, and being aware of typical human biases. Dr Tetlock’s ideas received publicity last year when Dominic Cummings, then an adviser to Boris Johnson, Britain’s prime minister, endorsed his book and hired a controversial superforecaster to work at Mr Johnson’s office in Downing Street….(More)”.

‘Master,’ ‘Slave’ and the Fight Over Offensive Terms in Computing


Kate Conger at the New York Times: “Anyone who joined a video call during the pandemic probably has a global volunteer organization called the Internet Engineering Task Force to thank for making the technology work.

The group, which helped create the technical foundations of the internet, designed the language that allows most video to run smoothly online. It made it possible for someone with a Gmail account to communicate with a friend who uses Yahoo, and for shoppers to safely enter their credit card information on e-commerce sites.

Now the organization is tackling an even thornier issue: getting rid of computer engineering terms that evoke racist history, like “master” and “slave” and “whitelist” and “blacklist.”

But what started as an earnest proposal has stalled as members of the task force have debated the history of slavery and the prevalence of racism in tech. Some companies and tech organizations have forged ahead anyway, raising the possibility that important technical terms will have different meanings to different people — a troubling proposition for an engineering world that needs broad agreement so technologies work together.

While the fight over terminology reflects the intractability of racial issues in society, it is also indicative of a peculiar organizational culture that relies on informal consensus to get things done.

The Internet Engineering Task Force eschews voting, and it often measures consensus by asking opposing factions of engineers to hum during meetings. The hums are then assessed by volume and ferocity. Vigorous humming, even from only a few people, could indicate strong disagreement, a sign that consensus has not yet been reached…(More)”.

Vancouver launches health data dashboard to drive collective action


Sarah Wray at Cities Today: “Vancouver has published a new open data dashboard to track progress against 23 health and wellbeing indicators.

These include datasets on the number of children living below the poverty line, the number of households spending more than 30 percent of their income on housing, and the proportion of adults who have a sense of community belonging. As well as the most recent data for each indicator, the dashboard includes target figures and the current status of the city’s progress towards that goal…

The launch represents the first phase of the project and there are plans to expand the dashboard to include additional indicators, as well as neighbourhood-level and disaggregated data for different populations. The city is also working with Indigenous communities to identify more decolonised ways of collecting and analysing the data.

report published last year by British Columbia’s Office of the Human Rights Commissioner called for provincial governments to collect and use disaggregated demographic and race-based data to address systemic racism and inequities. It emphasised that the process must include the community.

“One important piece that we’re still working on is data governance,” Zak said. “As we publish more disaggregated data that shows which communities in Vancouver are most impacted by health inequities, we need to do it in a way that is not just the local government telling stories about a community, but instead is telling a story with the community that leads to policy change.”…

Technical and financial support for the dashboard was provided by the Partnership for Healthy Cities, a global network of cities for preventing noncommunicable diseases and injuries. The partnership is supported by Bloomberg Philanthropies in partnership with the World Health Organization and the public health organisation Vital Strategies….(More)”.

Data Brokers Are a Threat to Democracy


Justin Sherman at Wired: “Enter the data brokerage industry, the multibillion dollar economy of selling consumers’ and citizens’ intimate details. Much of the privacy discourse has rightly pointed fingers at Facebook, Twitter, YouTube, and TikTok, which collect users’ information directly. But a far broader ecosystem of buying up, licensing, selling, and sharing data exists around those platforms. Data brokerage firms are middlemen of surveillance capitalism—purchasing, aggregating, and repackaging data from a variety of other companies, all with the aim of selling or further distributing it.

Data brokerage is a threat to democracy. Without robust national privacy safeguards, entire databases of citizen information are ready for purchase, whether to predatory loan companies, law enforcement agencies, or even malicious foreign actors. Federal privacy bills that don’t give sufficient attention to data brokerage will therefore fail to tackle an enormous portion of the data surveillance economy, and will leave civil rights, national security, and public-private boundaries vulnerable in the process.

Large data brokers—like Acxiom, CoreLogic, and Epsilon—tout the detail of their data on millions or even billions of people. CoreLogic, for instance, advertises its real estate and property information on 99.9 percent of the US population. Acxiom promotes 11,000-plus “data attributes,” from auto loan information to travel preferences, on 2.5 billion people (all to help brands connect with people “ethically,” it adds). This level of data collection and aggregation enables remarkably specific profiling.

Need to run ads targeting poor families in rural areas? Check out one data broker’s “Rural and Barely Making It” data set. Or how about racially profiling financial vulnerability? Buy another company’s “Ethnic Second-City Strugglers” data set. These are just some of the disturbing titles captured in a 2013 Senate report on the industry’s data products, which have only expanded since. Many other brokers advertise their ability to identify subgroups upon subgroups of individuals through criteria like race, gender, marital status, and income level, all sensitive characteristics that citizens likely didn’t know would end up in a database—let alone up for sale….(More)”.

How we mapped billions of trees in West Africa using satellites, supercomputers and AI


Martin Brandt and Kjeld Rasmussen in The Conversation: “The possibility that vegetation cover in semi-arid and arid areas was retreating has long been an issue of international concern. In the 1930s it was first theorized that the Sahara was expanding and woody vegetation was on the retreat. In the 1970s, spurred by the “Sahel drought”, focus was on the threat of “desertification”, caused by human overuse and/or climate change. In recent decades, the potential impact of climate change on the vegetation has been the main concern, along with the feedback of vegetation on the climate, associated with the role of the vegetation in the global carbon cycle.

Using high-resolution satellite data and machine-learning techniques at supercomputing facilities, we have now been able to map billions of individual trees and shrubs in West Africa. The goal is to better understand the real state of vegetation coverage and evolution in arid and semi-arid areas.

Finding a shrub in the desert – from space

Since the 1970s, satellite data have been used extensively to map and monitor vegetation in semi-arid areas worldwide. Images are available in “high” spatial resolution (with NASA’s satellites Landsat MSS and TM, and ESA’s satellites Spot and Sentinel) and “medium or low” spatial resolution (NOAA AVHRR and MODIS).

To accurately analyse vegetation cover at continental or global scale, it is necessary to use the highest-resolution images available – with a resolution of 1 metre or less – and up until now the costs of acquiring and analysing the data have been prohibitive. Consequently, most studies have relied on moderate- to low-resolution data. This has not allowed for the identification of individual trees, and therefore these studies only yield aggregate estimates of vegetation cover and productivity, mixing herbaceous and woody vegetation.

In a new study covering a large part of the semi-arid Sahara-Sahel-Sudanian zone of West Africa, published in Nature in October 2020, an international group of researchers was able to overcome these limitations. By combining an immense amount of high-resolution satellite data, advanced computing capacities, machine-learning techniques and extensive field data gathered over decades, we were able to identify individual trees and shrubs with a crown area of more than 3 m2 with great accuracy. The result is a database of 1.8 billion trees in the region studied, available to all interested….(More)”

Supercomputing, machine learning, satellite data and field assessments allow to map billions of individual trees in West Africa. Martin Brandt, Author provided