Paper by Steffen Knoblauch et al: “Urban mobility analysis using Twitter as a proxy has gained significant attention in various application fields; however, long-term validation studies are scarce. This paper addresses this gap by assessing the reliability of Twitter data for modeling inner-urban mobility dynamics over a 27-month period in the. metropolitan area of Rio de Janeiro, Brazil. The evaluation involves the validation of Twitter-derived mobility estimates at both temporal and spatial scales, employing over 1.6 × 1011 mobile phone records of around three million users during the non-stationary mobility period from April 2020 to. June 2022, which coincided with the COVID-19 pandemic. The results highlight the need for caution when using Twitter for short-term modeling of urban mobility flows. Short-term inference can be influenced by Twitter policy changes and the availability of publicly accessible tweets. On the other hand, this long-term study demonstrates that employing multiple mobility metrics simultaneously, analyzing dynamic and static mobility changes concurrently, and employing robust preprocessing techniques such as rolling window downsampling can enhance the inference capabilities of Twitter data. These novel insights gained from a long-term perspective are vital, as Twitter – rebranded to X in 2023 – is extensively used by researchers worldwide to infer human movement patterns. Since conclusions drawn from studies using Twitter could be used to inform public policy, emergency response, and urban planning, evaluating the reliability of this data is of utmost importance…(More)”.
Veridical Data Science
Book by Bin Yu and Rebecca L. Barter: “Most textbooks present data science as a linear analytic process involving a set of statistical and computational techniques without accounting for the challenges intrinsic to real-world applications. Veridical Data Science, by contrast, embraces the reality that most projects begin with an ambiguous domain question and messy data; it acknowledges that datasets are mere approximations of reality while analyses are mental constructs.
Bin Yu and Rebecca Barter employ the innovative Predictability, Computability, and Stability (PCS) framework to assess the trustworthiness and relevance of data-driven results relative to three sources of uncertainty that arise throughout the data science life cycle: the human decisions and judgment calls made during data collection, cleaning, and modeling. By providing real-world data case studies, intuitive explanations of common statistical and machine learning techniques, and supplementary R and Python code, Veridical Data Science offers a clear and actionable guide for conducting responsible data science. Requiring little background knowledge, this lucid, self-contained textbook provides a solid foundation and principled framework for future study of advanced methods in machine learning, statistics, and data science…(More)”.
Contractual Freedom and Fairness in EU Data Sharing Agreements
Paper by Thomas Margoni and Alain M. Strowel: “This chapter analyzes the evolving landscape of EU data-sharing agreements, particularly focusing on the balance between contractual freedom and fairness in the context of non-personal data. The discussion highlights the complexities introduced by recent EU legislation, such as the Data Act, Data Governance Act, and Open Data Directive, which collectively aim to regulate data markets and enhance data sharing. The chapter emphasizes how these laws impose obligations that limit contractual freedom to ensure fairness, particularly in business-to-business (B2B) and Internet of Things (IoT) data transactions. It also explores the tension between private ordering and public governance, suggesting that the EU’s approach marks a shift from property-based models to governance-based models in data regulation. This chapter underscores the significant impact these regulations will have on data contracts and the broader EU data economy…(More)”.
Cross-border data flows in Africa: Continental ambitions and political realities
Paper by Melody Musoni, Poorva Karkare and Chloe Teevan: “Africa must prioritise data usage and cross-border data sharing to realise the goals of the African Continental Free Trade Area and to drive innovation and AI development. Accessible and shareable data is essential for the growth and success of the digital economy, enabling innovations and economic opportunities, especially in a rapidly evolving landscape.
African countries, through the African Union (AU), have a common vision of sharing data across borders to boost economic growth. However, the adopted continental digital policies are often inconsistently applied at the national level, where some member states implement restrictive measures like data localisation that limit the free flow of data.
The paper looks at national policies that often prioritise domestic interests and how those conflict with continental goals. This is due to differences in political ideologies, socio-economic conditions, security concerns and economic priorities. This misalignment between national agendas and the broader AU strategy is shaped by each country’s unique context, as seen in the examples of Senegal, Nigeria and Mozambique, which face distinct challenges in implementing the continental vision.
The paper concludes with actionable recommendations for the AU, member states and the partnership with the European Union. It suggests that the AU enhances support for data-sharing initiatives and urges member states to focus on policy alignment, address data deficiencies, build data infrastructure and find new ways to use data. It also highlights how the EU can strengthen its support for Africa’s datasharing goals…(More)”.
Lifecycles, pipelines, and value chains: toward a focus on events in responsible artificial intelligence for health
Paper by Joseph Donia et al: “Process-oriented approaches to the responsible development, implementation, and oversight of artificial intelligence (AI) systems have proliferated in recent years. Variously referred to as lifecycles, pipelines, or value chains, these approaches demonstrate a common focus on systematically mapping key activities and normative considerations throughout the development and use of AI systems. At the same time, these approaches risk focusing on proximal activities of development and use at the expense of a focus on the events and value conflicts that shape how key decisions are made in practice. In this article we report on the results of an ‘embedded’ ethics research study focused on SPOTT– a ‘Smart Physiotherapy Tracking Technology’ employing AI and undergoing development and commercialization at an academic health sciences centre. Through interviews and focus groups with the development and commercialization team, patients, and policy and ethics experts, we suggest that a more expansive design and development lifecycle shaped by key events offers a more robust approach to normative analysis of digital health technologies, especially where those technologies’ actual uses are underspecified or in flux. We introduce five of these key events, outlining their implications for responsible design and governance of AI for health, and present a set of critical questions intended for others doing applied ethics and policy work. We briefly conclude with a reflection on the value of this approach for engaging with health AI ecosystems more broadly…(More)”.
A shared destiny for public sector data
Blog post by Shona Nicol: “As a data professional, it can sometime feel hard to get others interested in data. Perhaps like many in this profession, I can often express the importance and value of data for good in an overly technical way. However when our biggest challenges in Scotland include eradicating child poverty, growing the economy and tackling the climate emergency, I would argue that we should all take an interest in data because it’s going to be foundational in helping us solve these problems.
Data is already intrinsic to shaping our society and how services are delivered. And public sector data is a vital component in making sure that services for the people of Scotland are being delivered efficiently and effectively. Despite an ever growing awareness of the transformative power of data to improve the design and delivery of services, feedback from public sector staff shows that they can face difficulties when trying to influence colleagues and senior leaders around the need to invest in data.
A vision gap
In the Scottish Government’s data maturity programme and more widely, we regularly hear about the challenges data professionals encounter when trying to enact change. This community tell us that a long-term vision for public sector data for Scotland could help them by providing the context for what they are trying to achieve locally.
Earlier this year we started to scope how we might do this. We recognised that organisations are already working to deliver local and national strategies and policies that relate to data, so any vision had to be able to sit alongside those, be meaningful in different settings, agnostic of technology and relevant to any public sector organisation. We wanted to offer opportunities for alignment, not enforce an instruction manual…(More)”.
Understanding local government responsible AI strategy: An international municipal policy document analysis
Paper by Anne David et al: “The burgeoning capabilities of artificial intelligence (AI) have prompted numerous local governments worldwide to consider its integration into their operations. Nevertheless, instances of notable AI failures have heightened ethical concerns, emphasising the imperative for local governments to approach the adoption of AI technologies in a responsible manner. While local government AI guidelines endeavour to incorporate characteristics of responsible innovation and technology (RIT), it remains essential to assess the extent to which these characteristics have been integrated into policy guidelines to facilitate more effective AI governance in the future. This study closely examines local government policy documents (n = 26) through the lens of RIT, employing directed content analysis with thematic data analysis software. The results reveal that: (a) Not all RIT characteristics have been given equal consideration in these policy documents; (b) Participatory and deliberate considerations were the most frequently mentioned responsible AI characteristics in policy documents; (c) Adaptable, explainable, sustainable, and accountable considerations were the least present responsible AI characteristics in policy documents; (d) Many of the considerations overlapped with each other as local governments were at the early stages of identifying them. Furthermore, the paper summarised strategies aimed at assisting local authorities in identifying their strengths and weaknesses in responsible AI characteristics, thereby facilitating their transformation into governing entities with responsible AI practices. The study informs local government policymakers, practitioners, and researchers on the critical aspects of responsible AI policymaking…(More)” See also: AI Localism
AI helped Uncle Sam catch $1 billion of fraud in one year. And it’s just getting started
Article by Matt Egan: “The federal government’s bet on using artificial intelligence to fight financial crime appears to be paying off.
Machine learning AI helped the US Treasury Department to sift through massive amounts of data and recover $1 billion worth of check fraud in fiscal 2024 alone, according to new estimates shared first with CNN. That’s nearly triple what the Treasury recovered in the prior fiscal year.
“It’s really been transformative,” Renata Miskell, a top Treasury official, told CNN in a phone interview.
“Leveraging data has upped our game in fraud detection and prevention,” Miskell said.
The Treasury Department credited AI with helping officials prevent and recover more than $4 billion worth of fraud overall in fiscal 2024, a six-fold spike from the year before.
US officials quietly started using AI to detect financial crime in late 2022, taking a page out of what many banks and credit card companies already do to stop bad guys.
The goal is to protect taxpayer money against fraud, which spiked during the Covid-19 pandemic as the federal government scrambled to disburse emergency aid to consumers and businesses.
To be sure, Treasury is not using generative AI, the kind that has captivated users of OpenAI’s ChatGPT and Google’s Gemini by generating images, crafting song lyrics and answering complex questions (even though it still sometimes struggles with simple queries)…(More)”.
Statistical Significance—and Why It Matters for Parenting
Blog by Emily Oster: “…When we say an effect is “statistically significant at the 5% level,” what this means is that there is less than a 5% chance that we’d see an effect of this size if the true effect were zero. (The “5% level” is a common cutoff, but things can be significant at the 1% or 10% level also.)
The natural follow-up question is: Why would any effect we see occur by chance? The answer lies in the fact that data is “noisy”: it comes with error. To see this a bit more, we can think about what would happen if we studied a setting where we know our true effect is zero.
My fake study
Imagine the following (fake) study. Participants are randomly assigned to eat a package of either blue or green M&Ms, and then they flip a (fair) coin and you see if it is heads. Your analysis will compare the number of heads that people flip after eating blue versus green M&Ms and report whether this is “statistically significant at the 5% level.”…(More)”.
External Researcher Access to Closed Foundation Models
Report by Esme Harrington and Dr. Mathias Vermeulen: “…addresses a pressing issue: independent researchers need better conditions for accessing and studying the AI models that big companies have developed. Foundation models — the core technology behind many AI applications — are controlled mainly by a few major players who decide who can study or use them.
What’s the problem with access?
- Limited access: Companies like OpenAI, Google and others are the gatekeepers. They often restrict access to researchers whose work aligns with their priorities, which means independent, public-interest research can be left out in the cold.
- High-end costs: Even when access is granted, it often comes with a hefty price tag that smaller or less-funded teams can’t afford.
- Lack of transparency: These companies don’t always share how their models are updated or moderated, making it nearly impossible for researchers to replicate studies or fully understand the technology.
- Legal risks: When researchers try to scrutinize these models, they sometimes face legal threats if their work uncovers flaws or vulnerabilities in the AI systems.
The research suggests that companies need to offer more affordable and transparent access to improve AI research. Additionally, governments should provide legal protections for researchers, especially when they are acting in the public interest by investigating potential risks…(More)”.