Explore our articles
View All Results

Stefaan Verhulst

Joshua Benton at Nieman Labs: “The New York Times wants more of its journalists to have those basic data skills, and now it’s releasing the curriculum they’ve built in-house out into the world, where it can be of use to reporters, newsrooms, and lots of other people too.

Here’s Lindsey Rogers Cook, an editor for digital storytelling and training at the Times, and the sort of person who is willing to have “spreadsheets make my heart sing” appear under her byline:

Even with some of the best data and graphics journalists in the business, we identified a challenge: data knowledge wasn’t spread widely among desks in our newsroom and wasn’t filtering into news desks’ daily reporting.

Yet fluency with numbers and data has become more important than ever. While journalists once were fond of joking that they got into the field because of an aversion to math, numbers now comprise the foundation for beats as wide-ranging as education, the stock market, the Census, and criminal justice. More data is released than ever before — there are nearly 250,000 datasets on data.govalone — and increasingly, government, politicians, and companies try to twist those numbers to back their own agendas…

We wanted to help our reporters better understand the numbers they get from sources and government, and give them the tools to analyze those numbers. We wanted to increase collaboration between traditional and non-traditional journalists…And with more competition than ever, we wanted to empower our reporters to find stories lurking in the hundreds of thousands of databases maintained by governments, academics, and think tanks. We wanted to give our reporters the tools and support necessary to incorporate data into their everyday beat reporting, not just in big and ambitious projects.

….You can access the Times’ training materials here. Some of what you’ll find:

  • An outline of the data skills the course aims to teach. It’s all run on Google Docs and Google Sheets; class starts with the uber-basics (mean! median! sum!), crosses the bridge of pivot tables, and then heads into data cleaning and more advanced formulas.
  • The full day-by-day outline of the Times’ three-week course, which of course you’re free to use or reshape to your newsroom’s needs.
  • It’s not just about cells, columns, and rows — the course also includes more journalism-based information around ethical questions, how to use data effectively inside a story’s narrative, and how best to work with colleagues in the graphic department.
  • Cheat sheets! If you don’t have time to dig too deeply, they’ll give a quick hit of information: onetwothreefourfive.
  • Data sets that you use to work through the beginner, intermediate, and advanced stages of the training, including such journalism classics as census datacampaign finance data, and BLS data.But don’t be a dummy and try to write real news stories off these spreadsheets; the Times cautions in bold: “NOTE: We have altered many of these datasets for instructional purposes, so please download the data from the original source if you want to use it in your reporting.”
  • How Not To Be Wrong,” which seems like a useful thing….(More)”
The New York Times has a course to teach its reporters data skills, and now they’ve open-sourced it

Article by Karen Kornbluh and Ellen P. Goodman: “The first volume of Special Counsel Robert Mueller’s report notes that “sweeping” and “systemic” social media disinformation was a key element of Russian interference in the 2016 election. No sooner were Mueller’s findings public than Twitter suspended a host of bots who had been promoting a “Russiagate hoax.”

Since at least 2016, conspiracy theories like Pizzagate and QAnon have flourished online and bled into mainstream debate. Earlier this year, a British member of Parliament called social media companies “accessories to radicalization” for their role in hosting and amplifying radical hate groups after the New Zealand mosque shooter cited and attempted to fuel more of these groups. In Myanmar, anti-Rohingya forces used Facebook to spread rumors that spurred ethnic cleansing, according to a UN special rapporteur. These platforms are vulnerable to those who aim to prey on intolerance, peer pressure, and social disaffection. Our democracies are being compromised. They work only if the information ecosystem has integrity—if it privileges truth and channels difference into nonviolent discourse. But the ecosystem is increasingly polluted.

Around the world, a growing sense of urgency about the need to address online radicalization is leading countries to embrace ever more draconian solutions: After the Easter bombings in Sri Lanka, the government shut down access to Facebook, WhatsApp, and other social media platforms. And a number of countries are considering adopting laws requiring social media companies to remove unlawful hate speech or face hefty penalties. According to Freedom House, “In the past year, at least 17 countries approved or proposed laws that would restrict online media in the name of fighting ‘fake news’ and online manipulation.”

The flaw with these censorious remedies is this: They focus on the content that the user sees—hate speech, violent videos, conspiracy theories—and not on the structural characteristics of social media design that create vulnerabilities. Content moderation requirements that cannot scale are not only doomed to be ineffective exercises in whack-a-mole, but they also create free expression concerns, by turning either governments or platforms into arbiters of acceptable speech. In some countries, such as Saudi Arabia, content moderation has become justification for shutting down dissident speech.

When countries pressure platforms to root out vaguely defined harmful content and disregard the design vulnerabilities that promote that content’s amplification, they are treating a symptom and ignoring the disease. The question isn’t “How do we moderate?” Instead, it is “How do we promote design change that optimizes for citizen control, transparency, and privacy online?”—exactly the values that the early Internet promised to embody….(More)”.

Bringing Truth to the Internet

Paper by Noam Kolt: “Consumers routinely supply personal data to technology companies in exchange for services. Yet, the relationship between the utility (U) consumers gain and the data (D) they supply — “return on data” (ROD) — remains largely unexplored. Expressed as a ratio, ROD = U / D. While lawmakers strongly advocate protecting consumer privacy, they tend to overlook ROD. Are the benefits of the services enjoyed by consumers, such as social networking and predictive search, commensurate with the value of the data extracted from them? How can consumers compare competing data-for-services deals?

Currently, the legal frameworks regulating these transactions, including privacy law, aim primarily to protect personal data. They treat data protection as a standalone issue, distinct from the benefits which consumers receive. This article suggests that privacy concerns should not be viewed in isolation, but as part of ROD. Just as companies can quantify return on investment (ROI) to optimize investment decisions, consumers should be able to assess ROD in order to better spend and invest personal data. Making data-for-services transactions more transparent will enable consumers to evaluate the merits of these deals, negotiate their terms and make more informed decisions. Pivoting from the privacy paradigm to ROD will both incentivize data-driven service providers to offer consumers higher ROD, as well as create opportunities for new market entrants….(More)”.

Return on Data

US Federal Data Strategy: “For the purposes of the Federal Data Strategy, a “Use Case” is a data practice or method that leverages data to support an articulable federal agency mission or public interest outcome. The Federal Data Strategy sought use cases from the public that solve problems or demonstrate solutions that can help inform the four strategy areas: Enterprise Data Governance; Use, Access, and Augmentation; Decision-making and Accountability; and Commercialization, Innovation, and Public Use. The Federal Data Strategy team was in part informed by these submissions, which are posted below…..(More)”.

Federal Data Strategy: Use Cases

Kevin Litman-Navarro at the New York Times: “….I analyzed the length and readability of privacy policies from nearly 150 popular websites and apps. Facebook’s privacy policy, for example, takes around 18 minutes to read in its entirety – slightly above average for the policies I tested….

Despite efforts like the General Data Protection Regulation to make policies more accessible, there seems to be an intractable tradeoff between a policy’s readability and length. Even policies that are shorter and easier to read can be impenetrable, given the amount of background knowledge required to understand how things like cookies and IP addresses play a role in data collection….

So what might a useful privacy policy look like?

Consumers don’t need a technical understanding of data collection processes in order to protect their personal information. Instead of explaining the excruciatingly complicated inner workings of the data marketplace, privacy policies should help people decide how they want to present themselves online. We tend to go on the internet privately – on our phones or at home – which gives the impression that our activities are also private. But, often, we’re more visible than ever.

A good privacy policy would help users understand how exposed they are: Something as simple as a list of companies that might purchase and use your personal information could go a long way towards setting a new bar for privacy-conscious behavior. For example, if you know that your weather app is constantly tracking your whereabouts and selling your location data as marketing research, you might want to turn off your location services entirely, or find a new app.

Until we reshape privacy policies to meet our needs — or we find a suitable replacement — it’s probably best to act with one rule in mind. To be clear and concise: Someone’s always watching….(More)”.

We Read 150 Privacy Policies. They Were an Incomprehensible Disaster.

Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

Data & Policy: A new venue to study and explore policy–data interaction

Fleur Johns at Modern Law Review: “All states have pursued what James C. Scott characterised as modernist projects of legibility and simplification: maps, censuses, national economic plans and related legislative programs. Many, including Scott, have pointed out blindspots embedded in these tools. As such criticism persists, however, the synoptic style of law and development has changed. Governments, NGOs and international agencies now aspire to draw upon immense repositories of digital data. Modes of analysis too have changed. No longer is legibility a precondition for action. Law‐ and policy‐making are being informed by business development methods that prefer prototypes over plans. States and international institutions continue to plan, but also seek insight from the release of minimally viable policy mock‐ups. Familiar critiques of law and development work, and arguments for its reform, have limited purchase on these practices, Scott’s included. Effective critical intervention in this field today requires careful attention to be paid to these emergent patterns of practice…(More)”.

From Planning to Prototypes: New Ways of Seeing Like a State

Press Release: “Last week’s 3rd annual AI for Good Global Summit once again showcased the growing number of Artificial Intelligence (AI) projects with promise to advance the United Nations Sustainable Development Goals (SDGs).

Now, using the Summit’s momentum, AI innovators and humanitarian leaders are prepared to take the ‘AI for Good’ movement to the next level.

They are working together to launch an ‘AI Commons’ that aims to scale AI for Good projects and maximize their impact across the world.

The AI Commons will enable AI adopters to connect with AI specialists and data owners to align incentives for innovation and develop AI solutions to precisely defined problems.

“The concept of AI Commons has developed over three editions of the Summit and is now motivating implementation,” said ITU Secretary-General Houlin Zhao in closing remarks to the summit. “AI and data need to be a shared resource if we are serious about scaling AI for good. The community supporting the Summit is creating infrastructure to scale-up their collaboration − to convert the principles underlying the Summit into global impact.”…

The AI Commons will provide an open framework for collaboration, a decentralized system to democratize problem solving with AI.

It aims to be a “knowledge space”, says Banifatemi, answering a key question: “How can problem solving with AI become common knowledge?”

“The goal is to be an open initiative, like a Linux effort, like an open-source network, where everyone can participate and we jointly share and we create an abundance of knowledge, knowledge of how we can solve problems with AI,” said Banifatemi.

AI development and application will build on the state of the art, enabling AI solutions to scale with the help of shared datasets, testing and simulation environments, AI models and associated software, and storage and computing resources….(More)”.

Introducing ‘AI Commons’: A framework for collaboration to achieve global impact

Agnes Batory & Sara Svensson at Policy and Politics: “Involving people in policy-making is generally a good thing. Policy-makers themselves often pay at least lip-service to the importance of giving citizens a say. In the academic literature, participatory governance has been, with some exaggeration, almost universally hailed as a panacea to all ills in Western democracies. In particular, it is advocated as a way to remedy the alienation of voters from politicians who seem to be oblivious to the concerns of the common man and woman, with an ensuing decline in public trust in government. Representation by political parties is ridden with problems, so the argument goes, and in any case it is overly focused on the act of voting in elections – a one-off event once every few years which limits citizens’ ability to control the policy agenda. On the other hand, various forms of public participation are expected to educate citizens, help develop a civic culture, and boost the legitimacy of decision-making. Consequently, practices to ensure that citizens can provide direct input into policy-making are to be welcomed on both pragmatic and normative grounds.  

I do not disagree with these generally positive expectations. However, the main objective of my recent article in Policy and Politics, co-authored with Sara Svensson, is to inject a dose of healthy scepticism into the debate or, more precisely, to show that there are circumstances in which public consultations will achieve anything but greater legitimacy and better policy-outcomes. We do this partly by discussing the more questionable assumptions in the participatory governance literature, and partly by examining a recent, glaring example of the misuse, and abuse, of popular input….(More)”.

How not to conduct a consultation – and why asking the public is not always such a great idea

The Royal Society: “How can technologies help organisations and individuals protect data in practice and, at the same time, unlock opportunities for data access and use?

The Royal Society’s Privacy Enhancing Technologies project has been investigating this question and has launched a report (PDF) setting out the current use, development and limits of privacy enhancing technologies (PETs) in data analysis. 

The data we generate every day holds a lot of value and potentially also contains sensitive information that individuals or organisations might not wish to share with everyone. The protection of personal or sensitive data featured prominently in the social and ethical tensions identified in our British Academy and Royal Society report Data management and use: Governance in the 21st century. For example, how can organisations best use data for public good whilst protecting sensitive information about individuals? Under other circumstances, how can they share data with groups with competing interests whilst protecting commercially or otherwise sensitive information?

Realising the full potential of large-scale data analysis may be constrained by important legal, reputational, political, business and competition concerns.  Certain risks can potentially be mitigated and managed with a set of emerging technologies and approaches often collectively referred to as ‘Privacy Enhancing Technologies’ (PETs). 

This disruptive set of technologies, combined with changes in wider policy and business frameworks, could enable the sharing and use of data in a privacy-preserving manner. They also have the potential to reshape the data economy and to change the trust relationships between citizens, governments and companies.

This report provides a high-level overview of five current and promising PETs of a diverse nature, with their respective readiness levels and illustrative case studies from a range of sectors, with a view to inform in particular applied data science research and the digital strategies of government departments and businesses. This report also includes recommendations on how the UK could fully realise the potential of PETs and to allow their use on a greater scale.

The project was informed by a series of conversations and evidence gathering events, involving a range of stakeholders across academia, government and the private sector (also see the project terms of reference and Working Group)….(More)”.

Privacy Enhancing Technologies

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday