Book by Matthew Wood: “Hyper-active governance is a new way of thinking about governing that puts debates over expertise at the heart. Contemporary governing requires delegation to experts, but also increases demands for political accountability. In this context, politicians and experts work together under political stress to adopt different governing relationships that appear more ‘hands-off’ or ‘hands-on’. These approaches often serve to displace profound social and economic crises. Only a genuinely collaborative approach to governing, with an inclusive approach to expertise, can create democratically legitimate and effective governance in our accelerating world. Using detailed case studies and global datasets in various policy areas including medicines, flooding, water resources, central banking and electoral administration, the book develops a new typology of modes of governing. Drawing from innovative social theory, it breathes new life into debates about expert forms of governance and how to achieve real paradigm shifts in how we govern our increasingly hyper-active world…(More)”.
Virtuous and vicious circles in the data life-cycle
Paper by Elizabeth Yakel, Ixchel M. Faniel, and Zachary J. Maiorana: “In June 2014, ‘Data sharing reveals complexity in the westward spread of domestic animals across Neolithic Turkey’, was published in PLoS One (Arbuckle et al. 2014). In this article, twenty-three authors, all zooarchaeologists, representing seventeen different archaeological sites in Turkey investigated the domestication of animals across Neolithic southwest Asia, a pivotal era of change in the region’s economy. The PLoS One article originated in a unique data sharing, curation, and reuse project in which a majority of the authors agreed to share their data and perform analyses across the aggregated datasets. The extent of data sharing and the breadth of data reuse and collaboration were previously unprecedented in archaeology. In the present article, we conduct a case study of the collaboration leading to the development of the PLoS One article. In particular, we focus on the data sharing, data curation, and data reuse practices exercised during the project in order to investigate how different phases in the data life-cycle affected each other.
Studies of data practices have generally engaged issues from the singular perspective of data producers, sharers, curators, or reusers. Furthermore, past studies have tended to focus on one aspect of the life-cycle (production, sharing, curation, reuse, etc.). A notable exception is Carlson and Anderson’s (2007) comparative case study of four research projects which discusses the life-cycle of data from production through sharing with an eye towards reuse. However, that study primarily addresses the process of data sharing. While we see from their research that data producers’ and curators’ decisions and actions regarding data are tightly coupled and have future consequences, those consequences are not fully explicated since the authors do not discuss reuse in depth.
Taking a perspective that captures the trajectory of data, our case study discusses actions and their consequences throughout the data life-cycle. Our research theme explores how different stakeholders and their work practices positively and/or negatively affected other phases of the life-cycle. More specifically, we focus on data production practices and data selection decisions made during data sharing as these have frequent and diverse consequences for other life-cycle phases in our case study. We address the following research questions:
- How do different aspects of data production positively and negatively impact other phases in the life-cycle?
- How do data selection decisions during sharing positively and negatively impact other phases in the life-cycle?
- How can the work of data curators intervene to reinforce positive actions or mitigate negative actions?…(More)”
The New York Times has a course to teach its reporters data skills, and now they’ve open-sourced it
Joshua Benton at Nieman Labs: “The New York Times wants more of its journalists to have those basic data skills, and now it’s releasing the curriculum they’ve built in-house out into the world, where it can be of use to reporters, newsrooms, and lots of other people too.
Here’s Lindsey Rogers Cook, an editor for digital storytelling and training at the Times, and the sort of person who is willing to have “spreadsheets make my heart sing” appear under her byline:
Even with some of the best data and graphics journalists in the business, we identified a challenge: data knowledge wasn’t spread widely among desks in our newsroom and wasn’t filtering into news desks’ daily reporting.
Yet fluency with numbers and data has become more important than ever. While journalists once were fond of joking that they got into the field because of an aversion to math, numbers now comprise the foundation for beats as wide-ranging as education, the stock market, the Census, and criminal justice. More data is released than ever before — there are nearly 250,000 datasets on data.govalone — and increasingly, government, politicians, and companies try to twist those numbers to back their own agendas…
We wanted to help our reporters better understand the numbers they get from sources and government, and give them the tools to analyze those numbers. We wanted to increase collaboration between traditional and non-traditional journalists…And with more competition than ever, we wanted to empower our reporters to find stories lurking in the hundreds of thousands of databases maintained by governments, academics, and think tanks. We wanted to give our reporters the tools and support necessary to incorporate data into their everyday beat reporting, not just in big and ambitious projects.
….You can access the Times’ training materials here. Some of what you’ll find:
- An outline of the data skills the course aims to teach. It’s all run on Google Docs and Google Sheets; class starts with the uber-basics (mean! median! sum!), crosses the bridge of pivot tables, and then heads into data cleaning and more advanced formulas.
- The full day-by-day outline of the Times’ three-week course, which of course you’re free to use or reshape to your newsroom’s needs.
- It’s not just about cells, columns, and rows — the course also includes more journalism-based information around ethical questions, how to use data effectively inside a story’s narrative, and how best to work with colleagues in the graphic department.
- Cheat sheets! If you don’t have time to dig too deeply, they’ll give a quick hit of information: one, two, three, four, five.
- Data sets that you use to work through the beginner, intermediate, and advanced stages of the training, including such journalism classics as census data, campaign finance data, and BLS data.But don’t be a dummy and try to write real news stories off these spreadsheets; the Times cautions in bold: “NOTE: We have altered many of these datasets for instructional purposes, so please download the data from the original source if you want to use it in your reporting.”
- “How Not To Be Wrong,” which seems like a useful thing….(More)”
Bringing Truth to the Internet
Article by Karen Kornbluh and Ellen P. Goodman: “The first volume of Special Counsel Robert Mueller’s report notes that “sweeping” and “systemic” social media disinformation was a key element of Russian interference in the 2016 election. No sooner were Mueller’s findings public than Twitter suspended a host of bots who had been promoting a “Russiagate hoax.”
Since at least 2016, conspiracy theories like Pizzagate and QAnon have flourished online and bled into mainstream debate. Earlier this year, a British member of Parliament called social media companies “accessories to radicalization” for their role in hosting and amplifying radical hate groups after the New Zealand mosque shooter cited and attempted to fuel more of these groups. In Myanmar, anti-Rohingya forces used Facebook to spread rumors that spurred ethnic cleansing, according to a UN special rapporteur. These platforms are vulnerable to those who aim to prey on intolerance, peer pressure, and social disaffection. Our democracies are being compromised. They work only if the information ecosystem has integrity—if it privileges truth and channels difference into nonviolent discourse. But the ecosystem is increasingly polluted.
Around the world, a growing sense of urgency about the need to address online radicalization is leading countries to embrace ever more draconian solutions: After the Easter bombings in Sri Lanka, the government shut down access to Facebook, WhatsApp, and other social media platforms. And a number of countries are considering adopting laws requiring social media companies to remove unlawful hate speech or face hefty penalties. According to Freedom House, “In the past year, at least 17 countries approved or proposed laws that would restrict online media in the name of fighting ‘fake news’ and online manipulation.”
The flaw with these censorious remedies is this: They focus on the content that the user sees—hate speech, violent videos, conspiracy theories—and not on the structural characteristics of social media design that create vulnerabilities. Content moderation requirements that cannot scale are not only doomed to be ineffective exercises in whack-a-mole, but they also create free expression concerns, by turning either governments or platforms into arbiters of acceptable speech. In some countries, such as Saudi Arabia, content moderation has become justification for shutting down dissident speech.
When countries pressure platforms to root out vaguely defined harmful content and disregard the design vulnerabilities that promote that content’s amplification, they are treating a symptom and ignoring the disease. The question isn’t “How do we moderate?” Instead, it is “How do we promote design change that optimizes for citizen control, transparency, and privacy online?”—exactly the values that the early Internet promised to embody….(More)”.
Return on Data
Paper by Noam Kolt: “Consumers routinely supply personal data to technology companies in exchange for services. Yet, the relationship between the utility (U) consumers gain and the data (D) they supply — “return on data” (ROD) — remains largely unexplored. Expressed as a ratio, ROD = U / D. While lawmakers strongly advocate protecting consumer privacy, they tend to overlook ROD. Are the benefits of the services enjoyed by consumers, such as social networking and predictive search, commensurate with the value of the data extracted from them? How can consumers compare competing data-for-services deals?
Currently, the legal frameworks regulating these transactions, including privacy law, aim primarily to protect personal data. They treat data protection as a standalone issue, distinct from the benefits which consumers receive. This article suggests that privacy concerns should not be viewed in isolation, but as part of ROD. Just as companies can quantify return on investment (ROI) to optimize investment decisions, consumers should be able to assess ROD in order to better spend and invest personal data. Making data-for-services transactions more transparent will enable consumers to evaluate the merits of these deals, negotiate their terms and make more informed decisions. Pivoting from the privacy paradigm to ROD will both incentivize data-driven service providers to offer consumers higher ROD, as well as create opportunities for new market entrants….(More)”.
Federal Data Strategy: Use Cases
US Federal Data Strategy: “For the purposes of the Federal Data Strategy, a “Use Case” is a data practice or method that leverages data to support an articulable federal agency mission or public interest outcome. The Federal Data Strategy sought use cases from the public that solve problems or demonstrate solutions that can help inform the four strategy areas: Enterprise Data Governance; Use, Access, and Augmentation; Decision-making and Accountability; and Commercialization, Innovation, and Public Use. The Federal Data Strategy team was in part informed by these submissions, which are posted below…..(More)”.
We Read 150 Privacy Policies. They Were an Incomprehensible Disaster.
Kevin Litman-Navarro at the New York Times: “….I analyzed the length and readability of privacy policies from nearly 150 popular websites and apps. Facebook’s privacy policy, for example, takes around 18 minutes to read in its entirety – slightly above average for the policies I tested….
Despite efforts like the General Data Protection Regulation to make policies more accessible, there seems to be an intractable tradeoff between a policy’s readability and length. Even policies that are shorter and easier to read can be impenetrable, given the amount of background knowledge required to understand how things like cookies and IP addresses play a role in data collection….
So what might a useful privacy policy look like?
Consumers don’t need a technical understanding of data collection processes in order to protect their personal information. Instead of explaining the excruciatingly complicated inner workings of the data marketplace, privacy policies should help people decide how they want to present themselves online. We tend to go on the internet privately – on our phones or at home – which gives the impression that our activities are also private. But, often, we’re more visible than ever.
A good privacy policy would help users understand how exposed they are: Something as simple as a list of companies that might purchase and use your personal information could go a long way towards setting a new bar for privacy-conscious behavior. For example, if you know that your weather app is constantly tracking your whereabouts and selling your location data as marketing research, you might want to turn off your location services entirely, or find a new app.
Until we reshape privacy policies to meet our needs — or we find a suitable replacement — it’s probably best to act with one rule in mind. To be clear and concise: Someone’s always watching….(More)”.
Data & Policy: A new venue to study and explore policy–data interaction

Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….
- Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
- The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
- Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
- With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
- It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
- Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
- Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
- Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.
All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….
During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015, 2016, 2017, 2019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.
Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…
So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.
From Planning to Prototypes: New Ways of Seeing Like a State
Fleur Johns at Modern Law Review: “All states have pursued what James C. Scott characterised as modernist projects of legibility and simplification: maps, censuses, national economic plans and related legislative programs. Many, including Scott, have pointed out blindspots embedded in these tools. As such criticism persists, however, the synoptic style of law and development has changed. Governments, NGOs and international agencies now aspire to draw upon immense repositories of digital data. Modes of analysis too have changed. No longer is legibility a precondition for action. Law‐ and policy‐making are being informed by business development methods that prefer prototypes over plans. States and international institutions continue to plan, but also seek insight from the release of minimally viable policy mock‐ups. Familiar critiques of law and development work, and arguments for its reform, have limited purchase on these practices, Scott’s included. Effective critical intervention in this field today requires careful attention to be paid to these emergent patterns of practice…(More)”.
Introducing ‘AI Commons’: A framework for collaboration to achieve global impact
Press Release: “Last week’s 3rd annual AI for Good Global Summit once again showcased the growing number of Artificial Intelligence (AI) projects with promise to advance the United Nations Sustainable Development Goals (SDGs).
Now, using the Summit’s momentum, AI innovators and humanitarian leaders are prepared to take the ‘AI for Good’ movement to the next level.
They are working together to launch an ‘AI Commons’ that aims to scale AI for Good projects and maximize their impact across the world.
The AI Commons will enable AI adopters to connect with AI specialists and data owners to align incentives for innovation and develop AI solutions to precisely defined problems.
“The concept of AI Commons has developed over three editions of the Summit and is now motivating implementation,” said ITU Secretary-General Houlin Zhao in closing remarks to the summit. “AI and data need to be a shared resource if we are serious about scaling AI for good. The community supporting the Summit is creating infrastructure to scale-up their collaboration − to convert the principles underlying the Summit into global impact.”…
The AI Commons will provide an open framework for collaboration, a decentralized system to democratize problem solving with AI.
It aims to be a “knowledge space”, says Banifatemi, answering a key question: “How can problem solving with AI become common knowledge?”
“The goal is to be an open initiative, like a Linux effort, like an open-source network, where everyone can participate and we jointly share and we create an abundance of knowledge, knowledge of how we can solve problems with AI,” said Banifatemi.
AI development and application will build on the state of the art, enabling AI solutions to scale with the help of shared datasets, testing and simulation environments, AI models and associated software, and storage and computing resources….(More)”.