Text as Data: A New Framework for Machine Learning and the Social Sciences


Book by Justin Grimmer, Margaret E. Roberts, and Brandon M. Stewart: “From social media posts and text messages to digital government documents and archives, researchers are bombarded with a deluge of text reflecting the social world. This textual data gives unprecedented insights into fundamental questions in the social sciences, humanities, and industry. Meanwhile new machine learning tools are rapidly transforming the way science and business are conducted. Text as Data shows how to combine new sources of data, machine learning tools, and social science research design to develop and evaluate new insights.

Text as Data is organized around the core tasks in research projects using text—representation, discovery, measurement, prediction, and causal inference. The authors offer a sequential, iterative, and inductive approach to research design. Each research task is presented complete with real-world applications, example methods, and a distinct style of task-focused research.

Bridging many divides—computer science and social science, the qualitative and the quantitative, and industry and academia—Text as Data is an ideal resource for anyone wanting to analyze large collections of text in an era when data is abundant and computation is cheap, but the enduring challenges of social science remain…(More)”.

Artificial Intelligence for Children


WEF Toolkit: “Children and youth are surrounded by AI in many of the products they use in their daily lives, from social media to education technology, video games, smart toys and speakers. AI determines the videos children watch online, their curriculum as they learn, and the way they play and interact with others.

This toolkit, produced by a diverse team of youth, technologists, academics and business leaders, is designed to help companies develop trustworthy artificial intelligence (AI) for children and youth and to help parents, guardians, children and youth responsibly buy and safely use AI products.

AI can be used to educate and empower children and youth and have a positive impact on society. But children and youth can be especially vulnerable to the potential risks posed by AI, including bias, cybersecurity and lack of accessibility. AI must be designed inclusively to respect the rights of the child user. Child-centric design can protect children and youth from the potential risks posed by the technology.

AI technology must be created so that it is both innovative and responsible. Responsible AI is safe, ethical, transparent, fair, accessible and inclusive. Designing responsible and trusted AI is good for consumers, businesses and society. Parents, guardians and adults all have the responsibility to carefully select ethically designed AI products and help children use them safely.

What is at stake? AI will determine the future of play, childhood, education and societies. Children and youth represent the future, so everything must be done to support them to use AI responsibly and address the challenges of the future.

This toolkit aims to help responsibly design, consume and use AI. It is designed to help companies, designers, parents, guardians, children and youth make sure that AI respects the rights of children and has a positive impact in their lives…(More)”.

Going Digital Toolkit


OECD Toolkit: “The ongoing digital transformation of the economy and society holds many promises to spur innovation, generate efficiencies and improve services, and in doing so boost growth. Digital technologies empower people by increasing access to information and enabling new forms of social engagement.

Yet such benefits come with other challenges as digital technologies change the nature and structure of organisations, markets and communities, and raise concerns about equity and inclusion. It is essential that people, firms and governments come together to put digital technologies and data to work for economic and social well-being.

How should platform work and app-based ride services be regulated? How can we measure digital well-being? What are the competition effects of consumer data? The OECD Going Digital Toolkit includes indicators, policy guidance and related publications to answer these questions and help countries realise the promises of digital transformation for all.

It also contains Going Digital Toolkit notes that identify innovative approaches to addressing the most pressing policy and measurement challenges of the digital age…(More)”.

Data-Informed Societies Achieving Sustainability: Tasks for the Global Scientific, Engineering, and Medical Communities


Proceedings by the National Academies of Sciences, Engineering, and Medicine: “The 2030 Agenda for Sustainable Development, adopted in 2015 by all United Nations Member States, offers a “shared blueprint for peace and prosperity for people and the planet, now and into the future.” The Agenda outlines 17 Sustainable Development Goals (SDGs), which address a range of global challenges, including poverty, inequality, climate change, and environmental degradation, among others. Advances in technology and the proliferation of data are providing new opportunities for monitoring and tracking the progress of the SDGs. Yet, with these advances come significant challenges, such as a lack infrastructure, knowledge, and capacity to support big data…(More)“.

Decolonize Data


Essay by Nithya Ramanathan, Jim Fruchterman, Amy Fowler & Gabriele Carotti-Sha: “The social sector aims to empower communities with tools and knowledge to effect change for themselves, because community-driven change is more likely to drive sustained impact than attempts to force change from the outside. This commitment should include data, which is increasingly essential for generating social impact. Today the effective implementation and continuous improvement of social programs all but requires the collection and analysis of data.

But all too often, social sector practitioners, including researchers, extract data from individuals, communities, and countries for their own purposes, and do not even make it available to them, let alone enable them to draw their own conclusions from it. With data flows the power to make informed decisions.

It is therefore counterproductive, and painfully ironic, that we have ignored our fundamental principles when it comes to data. We see donors and leading practitioners making a sincere move to decolonize aid. However, if we are truly committed to decolonizing the practices in aid, then we must also examine the ownership and flow of data.

Decolonizing data would not only help ensure that the benefits of data accrue directly to the rightful data owners but also open up more intentional data sharing driven by the rightful data owners—the communities we claim to empower…(More)”.

Beyond Benchmarking: Why Countries should Ignore International Rankings


Essay by Robyn Klingler-Vidra and Yu-Ching Kuo: “In Ranking the World, Alexander Cooley and Jack Snyder explore the rise of benchmarking and rankings of countries. They indicate more than 95 such rankings by the time their book was published in 2016. Today, with the success of country rankings such as the Economic Intelligence Unit’s Democracy Index and the World Economic Forum’s Global Competitiveness Rankings, that number has grown two-fold, as more than 200 rankings systems compare countries for their democratic quality, investor friendliness, economic competitiveness, and more.

But, international benchmarking methods are problematic; they reflect politics, suffer from incomplete coverage, sample size and bias challenges, and institutional bias. Why, then, do countries increasingly rely on them to inform their policymaking? We employ Taiwan, and entrepreneurship rankings, as a lens to explore the accuracy of benchmarking methodologies, and the offer a new way forward. One that is informed by local evidence rather than global rankings, and as such, is better positioned to solve the ecosystem’s pressing challenges…(More)”.

Lessons from the COVID data wizards


Article by Lynne Peeples: “In March 2020, Beth Blauer started hearing anecdotally that COVID-19 was disproportionately affecting Black people in the United States. But the numbers to confirm that disparity were “very limited”, says Blauer, a data and public-policy specialist at Johns Hopkins University in Baltimore, Maryland. So, her team, which had developed one of the most popular tools for tracking the spread of COVID-19 around the world, added a new graphic to their website: a colour-coded map tracking which US states were — and were not — sharing infection and death data broken down by race and ethnicity.

They posted the map to their data dashboard — the Coronavirus Resource Center — in mid-April 2020 and promoted it through social media and blogs. At the time, just 26 states included racial information with their death data. “Then we started to see the map rapidly filling in,” says Blauer. By the middle of May 2020, 40 states were reporting that information. For Blauer, the change showed that people were paying attention. “And it confirmed that we have the ability to influence what’s happening here,” she says.

COVID-19 dashboards mushroomed around the world in 2020 as data scientists and journalists shifted their work to tracking and presenting information on the pandemic — from infection and death rates, to vaccination data and other variables. “You didn’t have any data set before that was so essential to how you plan your life,” says Lisa Charlotte Muth, a data designer and blogger at Datawrapper, a Berlin-based company that helps newsrooms and journalists to enrich their reporting with embeddable charts. “The weather, maybe, was the closest thing you could compare it to.” The growth in the service’s popularity was impressive. In January 2020 — before the pandemic — Datawrapper had 260 million chart views on its clients’ websites. By April that year, that monthly figure had shot up to more than 4.7 billion.

Policymakers, too, have leaned on COVID-19 data dashboards and charts to guide important decisions. And they had hundreds of local and global examples to reference, including academic enterprises such as the Coronavirus Resource Center, as well as government websites and news-media projects…(More)”.

Social-media reform is flying blind


Paper by Chris Bail: “As Russia continues its ruthless war in Ukraine, pundits are speculating what social-media platforms might have done years ago to undermine propaganda well before the attack. Amid accusations that social media fuels political violence — and even genocide — it is easy to forget that Facebook evolved from a site for university students to rate each other’s physical attractiveness. Instagram was founded to facilitate alcohol-based gatherings. TikTok and YouTube were built to share funny videos.

The world’s social-media platforms are now among the most important forums for discussing urgent social problems, such as Russia’s invasion of Ukraine, COVID-19 and climate change. Techno-idealists continue to promise that these platforms will bring the world together — despite mounting evidence that they are pulling us apart.

Efforts to regulate social media have largely stalled, perhaps because no one knows what something better would look like. If we could hit ‘reset’ and redesign our platforms from scratch, could we make them strengthen civil society?

Researchers have a hard time studying such questions. Most corporations want to ensure studies serve their business model and avoid controversy. They don’t share much data. And getting answers requires not just making observations, but doing experiments.

In 2017, I co-founded the Polarization Lab at Duke University in Durham, North Carolina. We have created a social-media platform for scientific research. On it, we can turn features on and off, and introduce new ones, to identify those that improve social cohesion. We have recruited thousands of people to interact with each other on these platforms, alongside bots that can simulate social-media users.

We hope our effort will help to evaluate some of the most basic premises of social media. For example, tech leaders have long measured success by the number of connections people have. Anthropologist Robin Dunbar has suggested that humans struggle to maintain meaningful relationships with more than 150 people. Experiments could encourage some social-media users to create deeper connections with a small group of users while allowing others to connect with anyone. Researchers could investigate the optimal number of connections in different situations, to work out how to optimize breadth of relationships without sacrificing depth.

A related question is whether social-media platforms should be customized for different societies or groups. Although today’s platforms seem to have largely negative effects on US and Western-Europe politics, the opposite might be true in emerging democracies (P. Lorenz-Spreen et al. Preprint at https://doi.org/hmq2; 2021). One study suggested that Facebook could reduce ethnic tensions in Bosnia–Herzegovina (N. Asimovic et al. Proc. Natl Acad. Sci. USA 118, e2022819118; 2021), and social media has helped Ukraine to rally support around the world for its resistance….(More)”.

Valuing Financial Data


Paper by Maryam Farboodi, Dhruv Singal, Laura Veldkamp & Venky Venkateswaran: “How should an investor value financial data? The answer is complicated because it depends on the characteristics of all investors. We develop a sufficient statistics approach that uses equilibrium asset return moments to summarize all relevant information about others’ characteristics. It can value data that is public or private, about one or many assets, relevant for dividends or for sentiment. While different data types have different valuations, heterogeneous investors value the same data very differently, which suggests a low price elasticity for data demand. Heterogeneous investors’ data valuations are also affected very differentially by market illiquidity…(More)”.

The ethical imperative to identify and address data and intelligence asymmetries


Article by Stefaan Verhulst in AI & Society: “The insight that knowledge, resulting from having access to (privileged) information or data, is power is more relevant today than ever before. The data age has redefined the very notion of knowledge and information (as well as power), leading to a greater reliance on dispersed and decentralized datasets as well as to new forms of innovation and learning, such as artificial intelligence (AI) and machine learning (ML). As Thomas Piketty (among others) has shown, we live in an increasingly stratified world, and our society’s socio-economic asymmetries are often grafted onto data and information asymmetries. As we have documented elsewhere, data access is fundamentally linked to economic opportunity, improved governance, better science and citizen empowerment. The need to address data and information asymmetries—and their resulting inequalities of political and economic power—is therefore emerging as among the most urgent ethical challenges of our era, yet often not recognized as such.

Even as awareness grows of this imperative, society and policymakers lag in their understanding of the underlying issue. Just what are data asymmetries? How do they emerge, and what form do they take? And how do data asymmetries accelerate information and other asymmetries? What forces and power structures perpetuate or deepen these asymmetries, and vice versa? I argue that it is a mistake to treat this problem as homogenous. In what follows, I suggest the beginning of a taxonomy of asymmetries. Although closely related, each one emerges from a different set of contingencies, and each is likely to require different policy remedies. The focus of this short essay is to start outlining these different types of asymmetries. Further research could deepen and expand the proposed taxonomy as well help define solutions that are contextually appropriate and fit for purpose….(More)”.