How Data Happened: A History from the Age of Reason to the Age of Algorithms


Book by Chris Wiggins and Matthew L Jones: “From facial recognition—capable of checking people into flights or identifying undocumented residents—to automated decision systems that inform who gets loans and who receives bail, each of us moves through a world determined by data-empowered algorithms. But these technologies didn’t just appear: they are part of a history that goes back centuries, from the census enshrined in the US Constitution to the birth of eugenics in Victorian Britain to the development of Google search.

Expanding on the popular course they created at Columbia University, Chris Wiggins and Matthew L. Jones illuminate the ways in which data has long been used as a tool and a weapon in arguing for what is true, as well as a means of rearranging or defending power. They explore how data was created and curated, as well as how new mathematical and computational techniques developed to contend with that data serve to shape people, ideas, society, military operations, and economies. Although technology and mathematics are at its heart, the story of data ultimately concerns an unstable game among states, corporations, and people. How were new technical and scientific capabilities developed; who supported, advanced, or funded these capabilities or transitions; and how did they change who could do what, from what, and to whom?

Wiggins and Jones focus on these questions as they trace data’s historical arc, and look to the future. By understanding the trajectory of data—where it has been and where it might yet go—Wiggins and Jones argue that we can understand how to bend it to ends that we collectively choose, with intentionality and purpose…(More)”.

Exploring data journalism practices in Africa: data politics, media ecosystems and newsroom infrastructures


Paper by Sarah Chiumbu and Allen Munoriyarwa: “Extant research on data journalism in Africa has focused on newsroom factors and the predilections of individual journalists as determinants of the uptake of data journalism on the continent. This article diverts from this literature by examining the slow uptake of data journalism in sub- Saharan Africa through the prisms of non-newsroom factors. Drawing on in-depth interviews with prominent investigative journalists sampled from several African countries, we argue that to understand the slow uptake of data journalism on the continent; there is a need to critique the role of data politics, which encompasses state, market and existing media ecosystems across the continent. Therefore, it is necessary to move beyond newsroom-centric factors that have dominated the contemporary understanding of data journalism practices. A broader, non-newsroom conceptualisation beyond individual journalistic predilections and newsroom resources provides productive clarity on data journalism’s slow uptake on the continent. These arguments are made through the conceptual prisms of materiality, performativity and reflexivity…(More)”.

Wanted: Data Stewards — Drafting the Job Specs for A Re-imagined Data Stewardship Role


Blog by Stefaan Verhulst: “With the rapid datafication of our world and the ever-growing need to access data for re-use in the public interest, it’s no surprise that the need for data stewards is becoming increasingly more important every day. Organizations across sectors and geographies, from the United Nations Statistics Division to the Government of New Zealand, are all moving towards defining the roles and responsibilities of a data steward within their own unique contexts and use cases.

At The GovLab, we have long advocated for the professionalization of data stewardship through our research into the role of data stewards in fostering data collaboration, as well as our executive education courses at the Data Stewards Academy. The recent launch of The Data Tank, a non-profit dedicated to addressing the challenges and opportunities of datafication, which I co-founded, is another step in the right direction, creating a platform to explore data stewardship in practice and providing additional educational resources.

While these resources are no doubt valuable, we are still often faced with the question: What are the required competencies of a data steward? If I want to hire or train a data steward, what should the job specifications be?

With that in mind, we are initiating a process of crafting a job description for data stewards, outlining the responsibilities, skills, and behaviors of a data steward below. Such a job description may not only help organizations create formal data steward roles internally and recruit externally, but it will also help aspiring data stewards seek out the relevant training and opportunities for them to strengthen their skillset.

The job description below captures our initial thoughts on the role of a data steward, and we would welcome your insights on the roles and skills required to be an effective data steward. It is based on previous presentations shared publicly….(More)”.

Data Collaborative Case Study: NYC Recovery Data Partnership


Report by the Open Data Policy Lab (The GovLab): “In July 2020, following severe economic and social losses due to the COVID-19 pandemic, the administration of New York City Mayor Bill de Blasio announced the NYC Recovery Data Partnership. This data collaborative asked private and civic organizations with assets relevant to New York City to provide their data to the city. Senior city leaders from the First Deputy Mayor’s Office, the Mayor’s Office of Operations, Mayor’s Office of Information Privacy and Mayor’s Office of Data Analytics formed an internal coalition which served as trusted intermediaries, assessing agency requests from city agencies to use the data provided and allocating access accordingly. The data informed internal research conducted by various city agencies, including New York City Emergency Management’s Recovery Team and the NYC…(More)”Department of City Planning. The experience reveals the ability of crises to spur innovation, the value of responsiveness from both data users and data suppliers, and the importance of technical capacity, and the value of a network of peers. In terms of challenges, the experience also exposes the limitations of data, the challenges of compiling complex datasets, and the role of resource constraints.

Mapping Diversity


About: “Mapping Diversity is a platform for discovering key facts about diversity and representation in street names across Europe, and to spark a debate about who is missing from our urban spaces.

We looked at the names of 145,933 streets across 30 major European cities, located in 17 different countries. More than 90% of the streets named after individuals are dedicated to white men. Where did all the other inhabitants of Europe end up? The lack of diversity in toponymy speaks volumes about our past and contributes to shaping Europe’s present and future…(More)”.

Principles for effective beneficial ownership disclosure


Open Ownership: “The Open Ownership Principles (OO Principles) are a framework for considering the elements that influence whether the implementation of reforms to improve the transparency of the beneficial ownership of corporate vehicles will lead to effective beneficial ownership disclosure, that is, it generates high-quality and reliable data, maximising usability for users.

The OO Principles are intended to support governments implementing effective beneficial ownership transparency reforms and guide international institutions, civil society, and private sector actors in understanding and supporting reforms. They are a tool to identify and separate issues affecting implementation, and they provide a framework for assessing and improving existing disclosure regimes. If implemented together, the OO Principles enable disclosure systems to generate actionable and usable data across the widest range of policy applications of beneficial ownership data.

The nine principles are interdependent, but can be broadly grouped by the three main ways they improve data. The DefinitionCoverage, and Detail principles enable data disclosure and collection. The Central registerAccess, and Structured data principles facilitate data storage and auditability. Finally, the VerificationUp-to-date and historical records, and Sanctions and enforcement principles improve data quality and reliability….Download January 2023 version (translated versions are forthcoming)”

Ten (not so) simple rules for clinical trial data-sharing


Paper by Claude Pellen et al: “Clinical trial data-sharing is seen as an imperative for research integrity and is becoming increasingly encouraged or even required by funders, journals, and other stakeholders. However, early experiences with data-sharing have been disappointing because they are not always conducted properly. Health data is indeed sensitive and not always easy to share in a responsible way. We propose 10 rules for researchers wishing to share their data. These rules cover the majority of elements to be considered in order to start the commendable process of clinical trial data-sharing:

  • Rule 1: Abide by local legal and regulatory data protection requirements
  • Rule 2: Anticipate the possibility of clinical trial data-sharing before obtaining funding
  • Rule 3: Declare your intent to share data in the registration step
  • Rule 4: Involve research participants
  • Rule 5: Determine the method of data access
  • Rule 6: Remember there are several other elements to share
  • Rule 7: Do not proceed alone
  • Rule 8: Deploy optimal data management to ensure that the data shared is useful
  • Rule 9: Minimize risks
  • Rule 10: Strive for excellence…(More)”

Decidim: why digital tools for democracy need to be developed democratically


Blog by Adrian Smith and Pedro Prieto Martín: “On Wednesday 18 January 2023, a pan-European citizen jury voted Barcelona the first European Capital of Democracy. Barcelona has a rich history of official and citizen initiatives in political and economic democracy. One received a special mention from the jurors. That initiative is Decidim.

Decidim is a digital platform for citizen participation. Through it, citizens can propose, comment, debate, and vote on urban developments, decide how to spend city budgets, and design and contribute to local strategies and plans.

Launched in 2016, more than 400 organisations around the world have since used the platform. What makes Decidim stand out, according to our research, is developer commitment to democratising technology development itself and embedding it within struggles for democracy offline and online. Decidim holds important lessons at a time when the monopolisation of social media by corporate power presents democrats with so many challenges…(More)”.

The Sensitive Politics Of Information For Digital States


Essay by Federica Carugati, Cyanne E. Loyle and Jessica Steinberg: “In 2020, Vice revealed that the U.S. military had signed a contract with Babel Street, a Virginia-based company that created a product called Locate X, which collects location data from users across a variety of digital applications. Some of these apps are seemingly innocuous: one for following storms, a Muslim dating app and a level for DIY home repair. Less innocuously, these reports indicate that the U.S. government is outsourcing some of its counterterrorism and counterinsurgency information-gathering activities to a private company.

While states have always collected information about citizens and their activities, advances in digital technologies — including new kinds of data and infrastructure — have fundamentally altered their ability to access, gather and analyze information. Bargaining with and relying on non-state actors like private companies creates tradeoffs between a state’s effectiveness and legitimacy. Those tradeoffs might be unacceptable to citizens, undermining our very understanding of what states do and how we should interact with them …(More)”

Machine Learning as a Tool for Hypothesis Generation


Paper by Jens Ludwig & Sendhil Mullainathan: “While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about who to jail. We begin with a striking fact: The defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mugshot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by demographics (e.g. race) or existing psychology research; nor are they already known (even if tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science…(More)”.