Control Creep: When the Data Always Travels, So Do the Harms


Essay by Sun-ha Hong: “In 2014, a Canadian firm made history. Calgary-based McLeod Law brought the first known case in which Fitbit data would be used to support a legal claim. The device’s loyalty was clear: the young woman’s personal injury claim would be supported by her own Fitbit data, which would help prove that her activity levels had dipped post-injury. Yet the case had opened up a wider horizon for data use, both for and against the owners of such devices. Leading artificial intelligence (AI) researcher Kate Crawford noted at the time that the machines we use for “self-tracking” may be opening up a “new age of quantified self incrimination.”

Subsequent cases have demonstrated some of those possibilities. In 2015, a Connecticut man reported that his wife had been murdered by a masked intruder. Based partly on the victim’s Fitbit data, and other devices such as the family house alarm, detectives charged the man — not a masked intruder — with the crime. “In 2016, a Pennsylvania woman claimed she was sexually assaulted, but police argued that the woman’s own Fitbit data suggested otherwise, and charged her with false reporting.” In the courts and elsewhere, data initially gathered for self-tracking is increasingly being used to contradict or overrule the self — despite academic research and even a class action lawsuit alleging high rates of error in Fitbit data.

The data always travels, creating new possibilities for judging and predicting human lives. We might call it control creep: data-driven technologies tend to be pitched for a particular context and purpose, but quickly expand into new forms of control. Although we often think about data use in terms of trade-offs or bargains, such frameworks can be deeply misleading. What does it mean to “trade” personal data for the convenience of, say, an Amazon Echo, when the other side of that trade is constantly arranging new ways to sell and use that data in ways we cannot anticipate? As technology scholars Jake Goldenfein, Ben Green and Salomé Viljoen argue, the familiar trade-off of “privacy vs. X” rarely results in full respect for both values but instead tends to normalize a further stripping of privacy….(More)”.

Socially Responsible Data Labeling


Blog By Hamed Alemohammad at Radiant Earth Foundation: “Labeling satellite imagery is the process of applying tags to scenes to provide context or confirm information. These labeled training datasets form the basis for machine learning (ML) algorithms. The labeling undertaking (in many cases) requires humans to meticulously and manually assign captions to the data, allowing the model to learn patterns and estimate them for other observations.

For a wide range of Earth observation applications, training data labels can be generated by annotating satellite imagery. Images can be classified to identify the entire image as a class (e.g., water body) or for specific objects within the satellite image. However, annotation tasks can only identify features observable in the imagery. For example, with Sentinel-2 imagery at the 10-meter spatial resolution, one cannot detect the more detailed features of interest, such as crop types but would be able to distinguish large croplands from other land cover classes.

Human error in labeling is inevitable and results in uncertainties and errors in the final label. As a result, it’s best practice to examine images multiple times and then assign a majority or consensus label. In general, significant human resources and financial investment is needed to annotate imagery at large scales.

In 2018, we identified the need for a geographically diverse land cover classification training dataset that required human annotation and validation of labels. We proposed to Schmidt Futures a project to generate such a dataset to advance land cover classification globally. In this blog post, we discuss what we’ve learned developing LandCoverNet, including the keys to generating good quality labels in a socially responsible manner….(More)”.

A Resurgence of Democracy in 2040?


Blog by Steven Aftergood: “The world will be “increasingly out of balance and contested at every level” over the next twenty years due to the pressures of demographic, environmental, economic and technological change, a new forecast from the National Intelligence Council called Global Trends 2040 said last week.

But among the mostly grim possible futures that can be plausibly anticipated — international chaos, political paralysis, resource depletion, mounting poverty — one optimistic scenario stands out: “In 2040, the world is in the midst of a resurgence of open democracies led by the United States and its allies.”

How could such a global renaissance of democracy possibly come about?

The report posits that between now and 2040 technological innovation in open societies will lead to economic growth, which will enable solutions to domestic problems, build public confidence, reduce vulnerabilities and establish an attractive model for emulation by others. Transparency is both a precondition and a consequence of this process.

“Open, democratic systems proved better able to foster scientific research and technological innovation, catalyzing an economic boom. Strong economic growth, in turn, enabled democracies to meet many domestic needs, address global challenges, and counter rivals,” the report assessed in this potential scenario.

“With greater resources and improving services, these democracies launched initiatives to crack down on corruption, increase transparency, and improve accountability worldwide, boosting public trust. These efforts helped to reverse years of social fragmentation and to restore a sense of civic nationalism.”

“The combination of rapid innovation, a stronger economy, and greater societal cohesion enabled steady progress on climate and other challenges. Democratic societies became more resilient to disinformation because of greater public awareness and education initiatives and new technologies that quickly identify and debunk erroneous information. This environment restored a culture of vigorous but civil debate over values, goals, and policies.”

“Strong differences in public preferences and beliefs remained but these were worked out democratically.”

In this hopeful future, openness provided practical advantages that left closed authoritarian societies lagging behind.

“In contrast to the culture of collaboration prevailing in open societies, Russia and China failed to cultivate the high-tech talent, investment, and environment necessary to sustain continuous innovation.”

“By the mid-2030s, the United States and its allies in Europe and Asia were the established global leaders in several technologies, including AI, robotics, the Internet of Things, biotech, energy storage, and additive manufacturing.”

The success of open societies in problem solving, along with their economic and social improvements, inspired other countries to adopt the democratic model.

“Technological success fostered a widely perceived view among emerging and developing countries that democracies were more adaptable and resilient and better able to cope with growing global challenges.”…(More)”.

Combining Racial Groups in Data Analysis Can Mask Important Differences in Communities


Blog by Jonathan Schwabish and Alice Feng: “Surveys, datasets, and published research often lump together racial and ethnic groups, which can erase the experiences of certain communities. Combining groups with different experiences can mask how specific groups and communities are faring and, in turn, affect how government funds are distributed, how services are provided, and how groups are perceived.

Large surveys that collect data on race and ethnicity are used to disburse government funds and services in a number of ways. The US Department of Housing Urban Development, for instance, distributes millions of dollars annually to Native American tribes through the Indian Housing Block Grant. And statistics on race and ethnicity are used as evidence in employment discrimination lawsuits and to help determine whether banks are discriminating against people and communities of color.

Despite the potentially large effects these data can have, researchers don’t always disaggregate their analysis to more racial groups. Many point to small sample sizes as a limitation for including more race and ethnicity categories in their analysis, but efforts to gather more specific data and disaggregate available survey results are critical to creating better policy for everyone.

To illustrate how aggregating racial groups can mask important variation, we looked at the 2019 poverty rate across 139 detailed race categories in the Census Bureau’s annual American Community Survey (ACS). The ACS provides information that helps determine how more than $675 billion in government funds is distributed each year.

The official poverty rate in the United States stood at 10.5 percent in 2019, with significant variation across racial and ethnic groups. The primary question in the ACS concerning race includes 15 separate checkboxes, with space to print additional names or races for some options (a separate question refers to Hispanic or Latino origin).

Screenshot of the American Community Survey's race question

Although the survey offers ample latitude for interviewees to respond with their race, researchers have a tendency to aggregate racial categories. People who identify as Asian or Pacific Islander (API), for example, are often combined in economic analyses.

This aggregation can mask variation within racial or ethnic categories. As an example, one analysis that used the ACS showed 11 percent of children in the API group are in poverty, relative to 18 percent of the overall population. But that estimate could understate the poverty rate among children who identify as Pacific lslanders and could overstate the poverty rate among children who identify as Asian, which itself is a broad grouping that encompasses many different communities with various experiences. Similar aggregating can be found across economic literature, including on educationimmigration (PDF), and wealth….(More)”.

10 + 1 Guidelines for EU Citizen’s Assemblies


Blog post: “Over the past years, deliberative citizens’ assemblies selected by lot have increased their popularity and impact around the world. If introduced at European Union level, and aimed at developing recommendations on EU policy issues such first ever transnational citizens’ assemblies would be groundbreaking in advancing EU democratic reform. The Citizens Take Over Europe coalition recognizes the political urgency and democratic potential of such innovations of EU governance. We therefore call for the introduction of European citizens’ assemblies as a regular and permanent body for popular policy deliberation. In order for EU level citizens’ assemblies to work as an effective tool in further democratising EU decision-making, we have thoroughly examined preexisting exercises of deliberative democracy. The following 10 + 1 guidelines are based on best practices and lessons learned from national and local citizens’ assemblies across Europe. They have been designed in collaboration with leading experts. At present, these guidelines shall instruct the Conference on the Future of Europe on how to create the first experimental space for transnational citizens’ assemblies. But they are designed for future EU citizens’ assemblies as well.

1. Participatory prerequisites 

Strong participatory instruments are a prerequisite for a democratic citizens’ assembly. Composed as a microcosm of the EU population with people selected by lot, the assembly workings must be participatory and allow all members to have a say, with proper professional moderation during the deliberative rounds. The assembly must fit the EU participatory pillar and connect to the existing tools of EU participatory democracy, for instance by deliberating on successful European citizens’ initiatives. 

The scope and structure of the citizens’ assembly should be designed in a participatory manner by the members of the assembly, starting with the first assembly meeting that will draft and adopt its rules of procedure and set its agenda.

Additional participatory instruments such as the possibility to submit online proposals  to the assembly on relevant topics should be included in order to facilitate the engagement of all citizens. Information about opportunities to get involved and participate in the citizens’ assembly proceedings must be attractive and accessible to ordinary citizens….(More)”.

How We Built a Facebook Feed Viewer


Citizen Browser at The MarkUp: “Our interactive dashboard, Split Screen, gives readers a peek into the content Facebook delivered to people of different demographic backgrounds and voting preferences who participated in our Citizen Browser project. 

Using Citizen Browser, our custom Facebook inspector, we perform daily captures of Facebook data from paid panelists. These captures collect the content that was displayed on their Facebook feeds at the moment the app performed its automated capture. From Dec. 1, 2020, to March 2, 2021, 2,601 paid participants have contributed their data to the project. 

To measure what Facebook’s recommendation algorithm displays to different groupings of people, we compare data captured from each over a two-week period. We look at three different pairings:

  • Women vs. Men
  • Biden Voters vs. Trump Voters
  • Millennials vs. Boomers 

We labeled our panelists based on their self-disclosed political leanings, gender, and age. We describe each pairing in more detail in the Pairings section of this article. 

For each pair, we examine four types of content served by Facebook: news sources, posts with news links, hashtags, and group recommendations. We compare the percentage of each grouping that was served each piece of content to that of the other grouping in the pair.  

For more information on the data we collect, the panel’s demographic makeup, and the extensive redaction process we undertake to preserve privacy, see our methodology How We Built a Facebook Inspector.

Our observations should not be taken as proof of Facebook’s choosing to target specific content at specific demographic groups. There are many factors that influence any given person’s feed that we do not account for, including users’ friends and social networks….(More)”.

Financing the Digital Public Goods Ecosystem


Blog by the Digital Public Goods Alliance (DPGA): “… we believe that digital public goods (DPGs) are essential to unlocking the full potential of digital technologies to enhance human welfare at scale. Their relevance to one or multiple sustainable development goals (SDGs), combined with their adoptability and adaptability, allows DPGs to strengthen international digital cooperation. Stakeholders can join forces to support solutions that address many of today’s greatest global challenges in critical areas such as health, education and climate change. DPGs are of particular importance for resource constrained countries looking to accelerate development through improving access to digital services.

Still, precisely due to their nature as “public goods” – which ensures that no one can prevent others from benefiting from them – DPGs can be difficult to fund through market mechanisms, and some of them should not have to prioritise generating profit….

Sustainably funded infrastructural DPGs can become a reliable core for broader ecosystems through community building:

  • For the Modular Open Source Identity Platform (MOSIP) core code management and evolution is fully funded by grants from a group of philanthropic and bilateral donors.** This enables the team responsible for managing and evolving the generic platform to focus exclusively on maximising utility for those the platform is designed to serve – in this case, countries in need of foundational digital identity systems.
  • Similarly backed by grant funding for core code development and maintenance, the team behind District Health Information Software 2 (DHIS2) has prioritised community building within and between the 70+ countries that have adopted the software, enabling countries to share improvements and related innovations. This is best exemplified by Sri Lanka, the first country in the world to use DHIS2 for COVID-19 surveillance, who shared this groundbreaking innovation with the global DHIS2 community. Today, this system is operational in 38 countries and is under development in fourteen more.
  • The data exchange layer X-Road, which is publicly funded by NIIS members (currently Estonia and Finland), demonstrates how infrastructural DPGs can use community building to advance both the core technology and the  quality of downstream deployments. The X-Road Community connects a diverse group of individuals and allows anyone to contribute to the open-source technology. This community-based support and knowledge-sharing helps local vendors around the world build the expertise needed to provide quality services to stakeholders adopting the technology….(More)”.

The speed of science


Essay by Saloni Dattani & Nathaniel Bechhofer: “The 21st century has seen some phenomenal advances in our ability to make scientific discoveries. Scientists have developed new technology to build vaccines swiftly, new algorithms to predict the structure of proteins accurately, new equipment to sequence DNA rapidly, and new engineering solutions to harvest energy efficiently. But in many fields of science, reliable knowledge and progress advance staggeringly slowly. What slows it down? And what can we learn from individual fields of science to pick up the pace across the board – without compromising on quality?

By and large, scientific research is published in journals in the form of papers – static documents that do not update with new data or new methods. Instead of sharing the data and the code that produces their results, most scientists simply publish a textual description of their research in online publications. These publications are usually hidden behind paywalls, making it harder for outsiders to verify their authenticity.

On the occasion when a reader spots a discrepancy in the data or an error in the methods, they must read the intricate details of a study’s method scrupulously, and cross-check the statistics manually. When scientists don’t share the data to produce their results openly, the task becomes even harder. The process of error correction – from scientists publishing a paper, to readers spotting errors, to having the paper corrected or retracted – can take years, assuming those errors are spotted at all.

When scientists reference previous research, they cite entire papers, not specific results or values from them. And although there is evidence that scientists hold back from citing papers once they have been retracted, the problem is compounded over time – consider, for example, a researcher who cites a study that itself derives its data or assumptions from prior research that has been disputed, corrected or retracted. The longer it takes to sift through the science, to identify which results are accurate, the longer it takes to gather an understanding of scientific knowledge.

What makes the problem even more challenging is that flaws in a study are not necessarily mathematical errors. In many situations, researchers make fairly arbitrary decisions as to how they collect their data, which methods they apply to analyse them, and which results they report – altogether leaving readers blind to the impact of these decisions on the results.

This murkiness can result in what is known as p-hacking: when researchers selectively apply arbitrary methods in order to achieve a particular result. For example, in a study that compares the well-being of overweight people to that of underweight people, researchers may find that certain cut-offs of weight (or certain subgroups in their sample) provide the result they’re looking for, while others don’t. And they may decide to only publish the particular methods that provided that result…(More)”.

How can we measure productivity in the public sector?


Ravi Somani at the World Bank: “In most economies, the public sector is a major purchaser of goods, services and labor. According to the Worldwide Bureaucracy Indicators, globally the public sector accounts for around 25% of GDP and 38% of formal employment. Generating efficiency gains in the public sector can, therefore, have important implications for a country’s overall economic performance.  

Public-sector productivity measures the rate with which inputs are converted into desirable outputs in the public sector. Measures can be developed at the level of the employee, organization, or overall public sector, and can be tracked over time. Such information allows policymakers to identify good and bad performers, understand what might be correlated with good performance, and measure the returns to different types of public expenditures. This knowledge can be used to improve the allocation of public resources in the future and maximize the impact of the public purse.

But how can we measure it?

However, measuring productivity in the public sector can be tricky because:

  • There are often no market transactions for public services, or they are distorted by subsidies and other market imperfections.
  • Many public services are complex, requiring (often immeasurable) inputs from multiple individuals and organizations.
  • There is often a substantial time lag between investments in inputs and the realization of outputs and outcomes.

This recent World Bank publication provides a summary of the different approaches to measuring productivity in the public sector, presented in the table below.  For simplicity, the approaches are separated into: ‘macro’ approaches, which provide aggregate information at the level of an organization, sector, or service as a whole; and ‘micro’ approaches, which can be applied to the individual employee, task, project, and process.   
 

Macro and Micro Approaches to measure public-sector productivity

There is no silver bullet for accurately measuring public-sector productivity – each approach has its own limitations.  For example, the cost-weighted-output approach requires activity-level data, necessitates different approaches for different sectors, and results in metrics with difficult-to-interpret absolute levels.  Project-completion rates require access to project-level data and may not fully account for differences in the quality and complexity of projects. The publication includes a list of the pros, cons, and implementation requirements for each approach….(More)”.

Using Data and Citizen Science for Gardening Success


Article by Elizabeth Waddington: “…Data can help you personally by providing information you can use. And it also allows you to play a wider role in boosting understanding of our planet and tackling the global crises we face in a collaborative way. Consider the following examples.

Grow Observatory

This is one great example of data gathering and citizen science. Grow Observatory is a European citizen’s observatory through which people work together to take action on climate change, build better soil, grow healthier food and corroborate data from the new generation of Copernicus satellites.

Twenty-four Grow communities in 13 European countries created a network of over 6,500 ground-based soil sensors and collected a lot of soil-related data. And many insights have helped people learn about and test regenerative food growing techniques.

On their website, you can explore sensor locations, or make use of dynamic soil moisture maps. With the Grow Observatory app, you can get crop and planting advice tailored to your location, and get detailed, science-based information about regenerative growing practices. Their water planner also allows small-scale growers to learn more about how much water their plants will need in their location over the coming months if they live in one of the areas which currently have available data…

Cooperative Citizen Science: iNaturalist, Bioblitzes, Bird Counts, and More

Wherever you live, there are many different ways to get involved and help build data. From submitting observations on wildlife in your garden through apps like iNaturalist to taking part in local Bioblitzes, bird counts, and more – there are plenty of ways we can collect data that will help us – and others – down the road.

Collecting data through our observations, and, crucially, sharing that data with others can help us create the future we all want to see. We, as individuals, can often feel powerless. But citizen science projects help us to see the collective power we can wield when we work together. Modern technology means we can be hyper-connected, and affect wider systems, even when we are alone in our own gardens….(More)”