Paper by Judith Sáinz-Pardo Díaz & Álvaro López García: “Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests…(More)”.
Garden city: A synthetic dataset and sandbox environment for analysis of pre-processing algorithms for GPS human mobility data
Paper by Thomas H. Li, and Francisco Barreras: “Human mobility datasets have seen increasing adoption in the past decade, enabling diverse applications that leverage the high precision of measured trajectories relative to other human mobility datasets. However, there are concerns about whether the high sparsity in some commercial datasets can introduce errors due to lack of robustness in processing algorithms, which could compromise the validity of downstream results. The scarcity of “ground-truth” data makes it particularly challenging to evaluate and calibrate these algorithms. To overcome these limitations and allow for an intermediate form of validation of common processing algorithms, we propose a synthetic trajectory simulator and sandbox environment meant to replicate the features of commercial datasets that could cause errors in such algorithms, and which can be used to compare algorithm outputs with “ground-truth” synthetic trajectories and mobility diaries. Our code is open-source and is publicly available alongside tutorial notebooks and sample datasets generated with it….(More)”
AI, huge hacks leave consumers facing a perfect storm of privacy perils
Article by Joseph Menn: “Hackers are using artificial intelligence to mine unprecedented troves of personal information dumped online in the past year, along with unregulated commercial databases, to trick American consumers and even sophisticated professionals into giving up control of bank and corporate accounts.
Armed with sensitive health information, calling records and hundreds of millions of Social Security numbers, criminals and operatives of countries hostile to the United States are crafting emails, voice calls and texts that purport to come from government officials, co-workers or relatives needing help, or familiar financial organizations trying to protect accounts instead of draining them.
“There is so much data out there that can be used for phishing and password resets that it has reduced overall security for everyone, and artificial intelligence has made it much easier to weaponize,” said Ashkan Soltani, executive director of the California Privacy Protection Agency, the only such state-level agency.
The losses reported to the FBI’s Internet Crime Complaint Center nearly tripled from 2020 to 2023, to $12.5 billion, and a number of sensitive breaches this year have only increased internet insecurity. The recently discovered Chinese government hacks of U.S. telecommunications companies AT&T, Verizon and others, for instance, were deemed so serious that government officials are being told not to discuss sensitive matters on the phone, some of those officials said in interviews. A Russian ransomware gang’s breach of Change Healthcare in February captured data on millions of Americans’ medical conditions and treatments, and in August, a small data broker, National Public Data, acknowledged that it had lost control of hundreds of millions of Social Security numbers and addresses now being sold by hackers.
Meanwhile, the capabilities of artificial intelligence are expanding at breakneck speed. “The risks of a growing surveillance industry are only heightened by AI and other forms of predictive decision-making, which are fueled by the vast datasets that data brokers compile,” U.S. Consumer Financial Protection Bureau Director Rohit Chopra said in September…(More)”.
Why ‘open’ AI systems are actually closed, and why this matters
Paper by David Gray Widder, Meredith Whittaker & Sarah Myers West: “This paper examines ‘open’ artificial intelligence (AI). Claims about ‘open’ AI often lack precision, frequently eliding scrutiny of substantial industry concentration in large-scale AI development and deployment, and often incorrectly applying understandings of ‘open’ imported from free and open-source software to AI systems. At present, powerful actors are seeking to shape policy using claims that ‘open’ AI is either beneficial to innovation and democracy, on the one hand, or detrimental to safety, on the other. When policy is being shaped, definitions matter. To add clarity to this debate, we examine the basis for claims of openness in AI, and offer a material analysis of what AI is and what ‘openness’ in AI can and cannot provide: examining models, data, labour, frameworks, and computational power. We highlight three main affordances of ‘open’ AI, namely transparency, reusability, and extensibility, and we observe that maximally ‘open’ AI allows some forms of oversight and experimentation on top of existing models. However, we find that openness alone does not perturb the concentration of power in AI. Just as many traditional open-source software projects were co-opted in various ways by large technology companies, we show how rhetoric around ‘open’ AI is frequently wielded in ways that exacerbate rather than reduce concentration of power in the AI sector…(More)”.
Scientists Scramble to Save Climate Data from Trump—Again
Article by Chelsea Harvey: “Eight years ago, as the Trump administration was getting ready to take office for the first time, mathematician John Baez was making his own preparations.
Together with a small group of friends and colleagues, he was arranging to download large quantities of public climate data from federal websites in order to safely store them away. Then-President-elect Donald Trump had repeatedly denied the basic science of climate change and had begun nominating climate skeptics for cabinet posts. Baez, a professor at the University of California, Riverside, was worried the information — everything from satellite data on global temperatures to ocean measurements of sea-level rise — might soon be destroyed.
His effort, known as the Azimuth Climate Data Backup Project, archived at least 30 terabytes of federal climate data by the end of 2017.
In the end, it was an overprecaution.
The first Trump administration altered or deleted numerous federal web pages containing public-facing climate information, according to monitoring efforts by the nonprofit Environmental Data and Governance Initiative (EDGI), which tracks changes on federal websites. But federal databases, containing vast stores of globally valuable climate information, remained largely intact through the end of Trump’s first term.
Yet as Trump prepares to take office again, scientists are growing more worried.
Federal datasets may be in bigger trouble this time than they were under the first Trump administration, they say. And they’re preparing to begin their archiving efforts anew.
“This time around we expect them to be much more strategic,” said Gretchen Gehrke, EDGI’s website monitoring program lead. “My guess is that they’ve learned their lessons.”
The Trump transition team didn’t respond to a request for comment.
Like Baez’s Azimuth project, EDGI was born in 2016 in response to Trump’s first election. They weren’t the only ones…(More)”.
Can AI review the scientific literature — and figure out what it all means?
Article by Helen Pearson: “When Sam Rodriques was a neurobiology graduate student, he was struck by a fundamental limitation of science. Even if researchers had already produced all the information needed to understand a human cell or a brain, “I’m not sure we would know it”, he says, “because no human has the ability to understand or read all the literature and get a comprehensive view.”
Five years later, Rodriques says he is closer to solving that problem using artificial intelligence (AI). In September, he and his team at the US start-up FutureHouse announced that an AI-based system they had built could, within minutes, produce syntheses of scientific knowledge that were more accurate than Wikipedia pages1. The team promptly generated Wikipedia-style entries on around 17,000 human genes, most of which previously lacked a detailed page.How AI-powered science search engines can speed up your research
Rodriques is not the only one turning to AI to help synthesize science. For decades, scholars have been trying to accelerate the onerous task of compiling bodies of research into reviews. “They’re too long, they’re incredibly intensive and they’re often out of date by the time they’re written,” says Iain Marshall, who studies research synthesis at King’s College London. The explosion of interest in large language models (LLMs), the generative-AI programs that underlie tools such as ChatGPT, is prompting fresh excitement about automating the task…(More)”.
AI adoption in the public sector
Two studies from the Joint Research Centre: “…delve into the factors that influence the adoption of Artificial Intelligence (AI) in public sector organisations.
A first report analyses a survey conducted among 574 public managers across seven EU countries, identifying what are currently the main drivers of AI adoption and providing 3 key recommendations to practitioners.
Strong expertise and various organisational factors emerge as key contributors for AI adoptions, and a second study sheds light on the essential competences and governance practices required for the effective adoption and usage of AI in the public sector across Europe…
The study finds that AI adoption is no longer a promise for public administration, but a reality, particularly in service delivery and internal operations and to a lesser extent in policy decision-making. It also highlights the importance of organisational factors such as leadership support, innovative culture, clear AI strategy, and in-house expertise in fostering AI adoption. Anticipated citizen needs are also identified as a key external factor driving AI adoption.
Based on these findings, the report offers three policy recommendations. First, it suggests paying attention to AI and digitalisation in leadership programmes, organisational development and strategy building. Second, it recommends broadening in-house expertise on AI, which should include not only technical expertise, but also expertise on ethics, governance, and law. Third, the report advises monitoring (for instance through focus groups and surveys) and exchanging on citizen needs and levels of readiness for digital improvements in government service delivery…(More)”.
AI Investment Potential Index: Mapping Global Opportunities for Sustainable Development
Paper by AFD: “…examines the potential of artificial intelligence (AI) investment to drive sustainable development across diverse national contexts. By evaluating critical factors, including AI readiness, social inclusion, human capital, and macroeconomic conditions, we construct a nuanced and comprehensive analysis of the global AI landscape. Employing advanced statistical techniques and machine learning algorithms, we identify nations with significant untapped potential for AI investment.
We introduce the AI Investment Potential Index (AIIPI), a novel instrument designed to guide financial institutions, development banks, and governments in making informed, strategic AI investment decisions. The AIIPI synthesizes metrics of AI readiness with socio-economic indicators to identify and highlight opportunities for fostering inclusive and sustainable growth. The methodological novelty lies in the weight selection process, which combines statistical modeling and also an entropy-based weighting approach. Furthermore, we provide detailed policy implications to support stakeholders in making targeted investments aimed at reducing disparities and advancing equitable technological development…(More)”.
Access to data for research: lessons for the National Data Library from the front lines of AI innovation.
Report by the Minderoo Centre for Technology and Democracy and the Bennett Institute for Public Policy: “…a series of case studies on access to data for research. These case studies illustrate the barriers that researchers are grappling with, and suggest how a new wave of policy development could help address these.
Each show innovative uses of data for research in areas that are critically important to science and society, including:
- Diagnosing the data challenges in cancer research – Dr Mireia Crispin
- A sea change in social media research – Dr Amy Orben
- Conserving with code: How data is helping to save our planet – Dr Sadiq Jaffer and Dr Alec Christie
- Breaking the ice: Addressing data barriers in Polar research – Dr Scott Hosking
- Making a difference with data: Insights from COVID-19 – Professor Stefan Scholtes
- Untangling the web of supply chain data – Professor Vasco Carvalho
The projects highlight crucial design considerations for the UK’s National Data Library and the need for a digital infrastructure that connects data, researchers, and resources that enable data use. By centring the experiences of researchers on the front-line of AI innovation, this report hopes to bring some of those barriers into focus and inform continued conversations in this area…(More)”.
NegotiateAI
About: “The NegotiateAI app is designed to streamline access to critical information on the UN Plastic Treaty Negotiations to develop a legally binding instrument on plastic pollution, including the marine environment. It offers a comprehensive, centralized database of documents submitted by member countries available here, along with an extensive collection of supporting resources, including reports, research papers, and policy briefs. You can find more information about the NegotiateAI project on our website…The Interactive Treaty Assistant simplifies the search and analysis of documents by INC members, enabling negotiators and other interested parties to quickly pinpoint crucial information. With an intuitive interface, The Interactive Treaty Assistant supports treaty-specific queries and provides direct links to relevant documents for deeper research…(More)”.