A data ‘black hole’: Europol ordered to delete vast store of personal data


Article by Apostolis Fotiadis, Ludek Stavinoha, Giacomo Zandonini, Daniel Howden: “…The EU’s police agency, Europol, will be forced to delete much of a vast store of personal data that it has been found to have amassed unlawfully by the bloc’s data protection watchdog. The unprecedented finding from the European Data Protection Supervisor (EDPS) targets what privacy experts are calling a “big data ark” containing billions of points of information. Sensitive data in the ark has been drawn from crime reports, hacked from encrypted phone services and sampled from asylum seekers never involved in any crime.

According to internal documents seen by the Guardian, Europol’s cache contains at least 4 petabytes – equivalent to 3m CD-Roms or a fifth of the entire contents of the US Library of Congress. Data protection advocates say the volume of information held on Europol’s systems amounts to mass surveillance and is a step on its road to becoming a European counterpart to the US National Security Agency (NSA), the organisation whose clandestine online spying was revealed by whistleblower Edward Snowden….(More)”.

Are we witnessing the dawn of post-theory science?


Essay by Laura Spinney: “Does the advent of machine learning mean the classic methodology of hypothesise, predict and test has had its day?..

Isaac Newton apocryphally discovered his second law – the one about gravity – after an apple fell on his head. Much experimentation and data analysis later, he realised there was a fundamental relationship between force, mass and acceleration. He formulated a theory to describe that relationship – one that could be expressed as an equation, F=ma – and used it to predict the behaviour of objects other than apples. His predictions turned out to be right (if not always precise enough for those who came later).

Contrast how science is increasingly done today. Facebook’s machine learning tools predict your preferences better than any psychologist. AlphaFold, a program built by DeepMind, has produced the most accurate predictions yet of protein structures based on the amino acids they contain. Both are completely silent on why they work: why you prefer this or that information; why this sequence generates that structure.

You can’t lift a curtain and peer into the mechanism. They offer up no explanation, no set of rules for converting this into that – no theory, in a word. They just work and do so well. We witness the social effects of Facebook’s predictions daily. AlphaFold has yet to make its impact felt, but many are convinced it will change medicine.

Somewhere between Newton and Mark Zuckerberg, theory took a back seat. In 2008, Chris Anderson, the then editor-in-chief of Wired magazine, predicted its demise. So much data had accumulated, he argued, and computers were already so much better than us at finding relationships within it, that our theories were being exposed for what they were – oversimplifications of reality. Soon, the old scientific method – hypothesise, predict, test – would be relegated to the dustbin of history. We’d stop looking for the causes of things and be satisfied with correlations.

With the benefit of hindsight, we can say that what Anderson saw is true (he wasn’t alone). The complexity that this wealth of data has revealed to us cannot be captured by theory as traditionally understood. “We have leapfrogged over our ability to even write the theories that are going to be useful for description,” says computational neuroscientist Peter Dayan, director of the Max Planck Institute for Biological Cybernetics in Tübingen, Germany. “We don’t even know what they would look like.”

But Anderson’s prediction of the end of theory looks to have been premature – or maybe his thesis was itself an oversimplification. There are several reasons why theory refuses to die, despite the successes of such theory-free prediction engines as Facebook and AlphaFold. All are illuminating, because they force us to ask: what’s the best way to acquire knowledge and where does science go from here?…(More)”

Data in Collective Impact: Focusing on What Matters


Article by Justin Piff: “One of the five conditions of collective impact, “shared measurement systems,” calls upon initiatives to identify and share key metrics of success that align partners toward a common vision. While the premise that data should guide shared decision-making is not unique to collective impact, its articulation 10 years ago as a necessary condition for collective impact catalyzed a focus on data use across the social sector. In the original article on collective impact in Stanford Social Innovation Review, the authors describe the benefits of using consistent metrics to identify patterns, make comparisons, promote learning, and hold actors accountable for success. While this vision for data collection remains relevant today, the field has developed a more nuanced understanding of how to make it a reality….

Here are four lessons from our work to help collective impact initiatives and their funders use data more effectively for social change.

1. Prioritize the Learning, Not the Data System

Those of us who are “data people” have espoused the benefits of shared data systems and common metrics too many times to recount. But a shared measurement system is only a means to an end, not an end in itself. Too often, new collective impact initiatives focus on creating the mythical, all-knowing data system—spending weeks, months, and even years researching or developing the perfect software that captures, aggregates, and computes data from multiple sectors. They let the perfect become the enemy of the good, as the pursuit of perfect data and technical precision inhibits meaningful action. And communities pay the price.

Using data to solve complex social problems requires more than a technical solution. Many communities in the US have more data than they know what to do with, yet they rarely spend time thinking about the data they actually need. Before building a data system, partners must focus on how they hope to use data in their work and identify the sources and types of data that can help them achieve their goals. Once those data are identified and collected, partners, residents, students, and others can work together to develop a shared understanding of what the data mean and move forward. In Connecticut, the Hartford Data Collaborative helps community agencies and leaders do just this. For example, it has matched programmatic data against Hartford Public Schools data and National Student Clearinghouse data to get a clear picture of postsecondary enrollment patterns across the community. The data also capture services provided to residents across multiple agencies and can be disaggregated by gender, race, and ethnicity to identify and address service gaps….(More)”.

‘In Situ’ Data Rights


Essay by Marshall W Van Alstyne, Georgios Petropoulos, Geoffrey Parker, and Bertin Martens: “…Data portability sounds good in theory—number portability improved telephony—but this theory has its flaws.

  • Context: The value of data depends on context. Removing data from that context removes value. A portability exercise by experts at the ProgrammableWeb succeeded in downloading basic Facebook data but failed on a re-upload.1 Individual posts shed the prompts that preceded them and the replies that followed them. After all, that data concerns others.
  • Stagnation: Without a flow of updates, a captured stock depreciates. Data must be refreshed to stay current, and potential users must see those data updates to stay informed.
  • Impotence: Facts removed from their place of residence become less actionable. We cannot use them to make a purchase when removed from their markets or reach a friend when they are removed from their social networks. Data must be reconnected to be reanimated.
  • Market Failure. Innovation is slowed. Consider how markets for business analytics and B2B services develop. Lacking complete context, third parties can only offer incomplete benchmarking and analysis. Platforms that do offer market overview services can charge monopoly prices because they have context that partners and competitors do not.
  • Moral Hazard: Proposed laws seek to give merchants data portability rights but these entail a problem that competition authorities have not anticipated. Regulators seek to help merchants “multihome,” to affiliate with more than one platform. Merchants can take their earned ratings from one platform to another and foster competition. But, when a merchant gains control over its ratings data, magically, low reviews can disappear! Consumers fraudulently edited their personal records under early U.K. open banking rules. With data editing capability, either side can increase fraud, surely not the goal of data portability.

Evidence suggests that following GDPR, E.U. ad effectiveness fell, E.U. Web revenues fell, investment in E.U. startups fell, the stock and flow of apps available in the E.U. fell, while Google and Facebook, who already had user data, gained rather than lost market share as small firms faced new hurdles the incumbents managed to avoid. To date, the results are far from regulators’ intentions.

We propose a new in situ data right for individuals and firms, and a new theory of benefits. Rather than take data from the platform, or ex situ as portability implies, let us grant users the right to use their data in the location where it resides. Bring the algorithms to the data instead of bringing the data to the algorithms. Users determine when and under what conditions third parties access their in situ data in exchange for new kinds of benefits. Users can revoke access at any time and third parties must respect that. This patches and repairs the portability problems…(More).”

If AI Is Predicting Your Future, Are You Still Free?


Essay by Carissa Veliz” “…Today, prediction is mostly done through machine learning algorithms that use statistics to fill in the blanks of the unknown. Text algorithms use enormous language databases to predict the most plausible ending to a string of words. Game algorithms use data from past games to predict the best possible next move. And algorithms that are applied to human behavior use historical data to infer our future: what we are going to buy, whether we are planning to change jobs, whether we are going to get sick, whether we are going to commit a crime or crash our car. Under such a model, insurance is no longer about pooling risk from large sets of people. Rather, predictions have become individualized, and you are increasingly paying your own way, according to your personal risk scores—which raises a new set of ethical concerns.

An important characteristic of predictions is that they do not describe reality. Forecasting is about the future, not the present, and the future is something that has yet to become real. A prediction is a guess, and all sorts of subjective assessments and biases regarding risk and values are built into it. There can be forecasts that are more or less accurate, to be sure, but the relationship between probability and actuality is much more tenuous and ethically problematic than some assume.

Institutions today, however, often try to pass off predictions as if they were a model of objective reality. And even when AI’s forecasts are merely probabilistic, they are often interpreted as deterministic in practice—partly because human beings are bad at understanding probability and partly because the incentives around avoiding risk end up reinforcing the prediction. (For example, if someone is predicted to be 75 percent likely to be a bad employee, companies will not want to take the risk of hiring them when they have candidates with a lower risk score)…(More)”.

The 2021 Good Tech Awards


Kevin Roose at the New York Times: “…Especially at a time when many of tech’s leaders seem more interested in building new, virtual worlds than improving the world we live in, it’s worth praising the technologists who are stepping up to solve some of our biggest problems.

So here, without further ado, are this year’s Good Tech Awards…

One of the year’s most exciting A.I. breakthroughs came in July when DeepMind — a Google-owned artificial intelligence company — published data and open-source code from its groundbreaking AlphaFold project.

The project, which used A.I. to predict the structures of proteins, solved a problem that had vexed scientists for decades, and was hailed by experts as one of the greatest scientific discoveries of all time. And by publishing its data freely, AlphaFold set off a frenzy among researchers, some of whom are already using it to develop new drugs and better understand the proteins involved in viruses like SARS-CoV-2.

Google’s overall A.I. efforts have been fraught with controversy and missteps, but AlphaFold seems like an unequivocally good use of the company’s vast expertise and resources…

Prisons aren’t known as hotbeds of innovation. But two tech projects this year tried to make our criminal justice system more humane.

Recidiviz is a nonprofit tech start-up that builds open-source data tools for criminal justice reform. It was started by Clementine Jacoby, a former Google employee who saw an opportunity to corral data about the prison system and make it available to prison officials, lawmakers, activists and researchers to inform their decisions. Its tools are in use in seven states, including North Dakota, where the data tools helped prison officials assess the risk of Covid-19 outbreaks and identify incarcerated people who were eligible for early release….(More)”.

CoFoE: deliberative democracy more accountable than elections and polls


Article by Eleonora Vasques: “Deliberative democracy processes are more democratic than general elections or surveys, according to Conference on the Future of Europe (CoFoE) participants and experts of the second panel on democracy gathered in Florence last weekend.

CoFoE is a deliberative democracy experiment where 800 citizens, divided into four thematic panels, deliberate recommendations to discuss and vote on with lawmakers.

The panel on European democracy, values, rights, the rule of law, and security, recently approved 39 recommendations on anti-discrimination, democracy, the rule of law, EU institutional reforms, the building of a European identity, and the strengthening of citizen participation.

“Usually, the way we try to understand what people think is through elections or opinion polls. However, I think both methods are biased. They rather ‘freeze’ a debate, imposing the discussion, without asking people what they want. Thus, it is good that people here speak about their own will. And they do not necessarily use the same categories utilised by electoral campaigns and opinion polls,” Oliver Roy, professor at the European University Institute and one of the panel experts, told journalists…

Similarly, citizens selected for this panel believe that this democratic exercise is more valuable than mainstream political participation.

“I feel I am living a unique democratic experiment, which goes beyond the majority rule. Democracy is often understood only as a majority rule exercise, with elections. But here, we are demonstrating that democracy is about debating, sharing general ideas from the bottom up that can have an impact,” Max, a participant from Slovakia, told EURACTIV…(More)”.

GDP’s Days Are Numbered


Essay by Diane Coyle: “How should we measure economic success? Criticisms of conventional indicators, particularly gross domestic product, have abounded for years, if not decades. Environmentalists have long pointed out that GDP omits the depletion of natural assets, as well as negative externalities such as global warming. And its failure to capture unpaid but undoubtedly valuable work in the home is another glaring omission. But better alternatives may soon be at hand.

In 2009, a commission led by Joseph StiglitzAmartya Sen, and Jean-Paul Fitoussi spurred efforts to find alternative ways to gauge economic progress by recommending a “dashboard” of indicators. Since then, economists and statisticians, working alongside natural scientists, have put considerable effort into developing rigorous wealth-based prosperity metrics, particularly concerning natural assets. The core idea is to create a comprehensive national balance sheet to demonstrate that economic progress today is illusory when it comes at the expense of future living standards.

In an important milestone in March of this year, the United Nations approved a statistical standard relating to the services that nature provides to the economy. That followed the UK Treasury’s publication of a review by the University of Cambridge’s Partha Dasgupta setting out how to integrate nature in general, and biodiversity in particular, into economic analysis. With the consequences of climate change starting to become all too apparent, any meaningful concept of economic success in the future will surely include sustainability.

The next steps in this statistical endeavor will be to incorporate measures of social capital, reflecting the ability of communities or countries to act collectively, and to extend measurement of the household sector. The COVID-19 pandemic has highlighted how crucial this unpaid work is to a country’s economic health. For example, the US Bureau of Labor Statistics intends to develop a more comprehensive concept of living standards that includes the value of such activity….(More)”.

Will data improve governance? It depends on the questions we ask


Essay by Uma Kalkar, Andrew Young and Stefaan Verhulst in the Quaterly Inquiry of the Development Intelligence Lab: “…What are the key takeaways from our process and how does it relate to the Summit for Democracy? First, good questions serve as the bedrock for effective and data-driven decision-making across the governance ecosystem. Second, sourcing multidisciplinary and global experts allow us to paint a fuller picture of the hot-button issues and encourage a more nuanced understanding of priorities. Lastly, including the public as active participants in the process of designing questions can help to increase the legitimacy of and obtain a social impact for data efforts, as well as tap into the collective intelligence that exists across society….

A key focus for world leaders, civil society members, academics, and private sector representatives at the Summit for Democracy should not only be on how to promote open governance by democratising data and data science. It must also consider how we can democratise and improve the way we formulate and prioritise questions facing society. To paraphrase Albert Einstein’s famous quote: “If I had an hour to solve a problem and my life depended on the solution, I would spend the first 55 minutes determining the proper question to ask… for once I know the proper question, I could solve the problem in less than five minutes”….(More)”.

For Queer Communities, Being Counted Has Downsides


Article by Kevin Guyan: “Next March, for the first time, Scotland’s census will ask all residents 16 and over to share information about their sexual orientation and whether they identify as trans. These new questions, whose addition follows similar developments in other parts of the United Kingdom and Malta, invite people to “come out” on their census return. Proposals to add more questions about gender, sex, and sexuality to national censuses are at various stages of discussion in countries outside of Europe, including New ZealandCanadaAustralia, and the United States.

The idea of being counted in a census feels good. Perhaps it’s my passion for data, but I feel recognized when I tick the response option “gay” in a survey that previously pretended I did not exist or was not important enough to count. If you identify with descriptors less commonly listed in drop-down boxes, seeing yourself reflected in a survey can change how you relate to wider communities that go beyond individual experiences. It therefore makes sense that many bottom-up queer rights groups and top-down government agencies frame the counting of queer communities in a positive light and position expanded data collection as a step toward greater inclusion.

There is great historical significance in increased visibility for many queer communities. But an over-focus on the benefits of being counted distracts from the potential harms for queer communities that come with participation in data collection activities….

The limits of inclusion became apparent to me as I observed the design process for Scotland’s 2022 census. While researching my book Queer Data, I sat through committee meetings at the Scottish Parliament, digested lengthy reports, submitted evidence, and participated in stakeholder engagement sessions. As many months of disagreement over how to count and who to count progressed, it grew more and more obvious that the design of a census is never exclusively about the collection of accurate data.

I grew ambivalent about what “being counted” actually meant for queer communities and concerned that the expansion of the census to include some queer people further erased those who did not match the government’s narrow understanding of gender, sex, and sexuality. Most notably, Scotland’s 2022 census does not count nonbinary people, who are required to identify their sex as either male or female. In another example, trans-exclusionary campaign groups requested that the census remove the “other” write-in box and limit response options for sexual orientation to “gay or lesbian,” “bisexual,” and “straight/heterosexual.” Reproducing the idea that sexual orientation is based on a fixed, binary notion of sex and restricting the question to just three options would effectively delete those who identify as queer, pansexual, asexual, and other sexualities from the count. Although the final version of the sexual orientation question includes an “other” write-in box for sexuality, collecting data about the lives of some queer people can push those who fall outside these expectations further into the shadows…(More)”.