Paper by Alex Luscombe, Kevin Dick & Kevin Walby: “Web scraping, defined as the automated extraction of information online, is an increasingly important means of producing data in the social sciences. We contribute to emerging social science literature on computational methods by elaborating on web scraping as a means of automated access to information. We begin by situating the practice of web scraping in context, providing an overview of how it works and how it compares to other methods in the social sciences. Next, we assess the benefits and challenges of scraping as a technique of information production. In terms of benefits, we highlight how scraping can help researchers answer new questions, supersede limits in official data, overcome access hurdles, and reinvigorate the values of sharing, openness, and trust in the social sciences. In terms of challenges, we discuss three: technical, legal, and ethical. By adopting “algorithmic thinking in the public interest” as a way of navigating these hurdles, researchers can improve the state of access to information on the Internet while also contributing to scholarly discussions about the legality and ethics of web scraping. Example software accompanying this article are available within the supplementary materials..(More)”.
Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences
How to contribute:
Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?
Share it with us at info@thelivinglib.org so that we can add it to the Collection!
About the Curator
Get the latest news right in you inbox
Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday
Related articles
artificial intelligence, DATA
Generative AI and the New Tabula Rasa: Why Question Literacy Matters
Posted in October 15, 2025 by Stefaan Verhulst
artificial intelligence
AI Ethics Is Simpler Than You Think
Posted in October 15, 2025 by Stefaan Verhulst
artificial intelligence
Is Misinformation More Open? A Study of robots.txt Gatekeeping on the Web
Posted in October 14, 2025 by Stefaan Verhulst