Paper by Panos Fitsilis et al: “The significance of open data in higher education stems from the changing tendencies towards open science, and open research in higher education encourages new ways of making scientific inquiry more transparent, collaborative and accessible. This study focuses on the critical role of open data stewards in this transition, essential for managing and disseminating research data effectively in universities, while it also highlights the increasing demand for structured training and professional policies for data stewards in academic settings. Building upon this context, the paper investigates the essential skills and competences required for effective data stewardship in higher education institutions by elaborating on a critical literature review, coupled with practical engagement in open data stewardship at universities, provided insights into the roles and responsibilities of data stewards. In response to these identified needs, the paper proposes a structured training framework and comprehensive curriculum for data stewardship, a direct response to the gaps identified in the literature. It addresses five key competence categories for open data stewards, aligning them with current trends and essential skills and knowledge in the field. By advocating for a structured approach to data stewardship education, this work sets the foundation for improved data management in universities and serves as a critical step towards professionalizing the role of data stewards in higher education. The emphasis on the role of open data stewards is expected to advance data accessibility and sharing practices, fostering increased transparency, collaboration, and innovation in academic research. This approach contributes to the evolution of universities into open ecosystems, where there is free flow of data for global education and research advancement…(More)”.
Annoyed Redditors tanking Google Search results illustrates perils of AI scrapers
Article by Scharon Harding: “A trend on Reddit that sees Londoners giving false restaurant recommendations in order to keep their favorites clear of tourists and social media influencers highlights the inherent flaws of Google Search’s reliance on Reddit and Google’s AI Overview.
In May, Google launched AI Overviews in the US, an experimental feature that populates the top of Google Search results with a summarized answer based on an AI model built into Google’s web rankings. When Google first debuted AI Overview, it quickly became apparent that the feature needed work with accuracy and its ability to properly summarize information from online sources. AI Overviews are “built to only show information that is backed up by top web results,” Liz Reid, VP and head of Google Search, wrote in a May blog post. But as my colleague Benj Edwards pointed out at the time, that setup could contribute to inaccurate, misleading, or even dangerous results: “The design is based on the false assumption that Google’s page-ranking algorithm favors accurate results and not SEO-gamed garbage.”
As Edwards alluded to, many have complained about Google Search results’ quality declining in recent years, as SEO spam and, more recently, AI slop float to the top of searches. As a result, people often turn to the Reddit hack to make Google results more helpful. By adding “site:reddit.com” to search results, users can hone their search to more easily find answers from real people. Google seems to understand the value of Reddit and signed an AI training deal with the company that’s reportedly worth $60 million per year…(More)”.
Exploring the Intersections of Open Data and Generative AI: Recent Additions to the Observatory
Blog by Roshni Singh, Hannah Chafetz, Andrew Zahuranec, Stefaan Verhulst: “The Open Data Policy Lab’s Observatory of Examples of How Open Data and Generative AI Intersect provides real-world use cases of where open data from official sources intersects with generative artificial intelligence (AI), building from the learnings from our report, “A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.”
The Observatory includes over 80 examples from several domains and geographies–ranging from supporting administrative work within the legal department of the Government of France to assisting researchers across the African continent in navigating cross-border data sharing laws. The examples include generative AI chatbots to improve access to services, conversational tools to help analyze data, datasets to improve the quality of the AI output, and more. A key feature of the Observatory is its categorization across our Spectrum of Scenarios framework, shown below. Through this effort, we aim to bring together the work already being done and identify ways to use generative AI for the public good.
This Observatory is an attempt to grapple with the work currently being done to apply generative AI in conjunction with official open data. It does not make a value judgment on their efficacy or practices. Many of these examples have ethical implications, which merit further attention and study.
From September through October, we added to the Observatory:
- Bayaan Platform: A conversational tool by the Statistics Centre Abu Dhabi that provides decision makers with data analytics and visualization support.
- Berufsinfomat: A generative AI tool for career coaching in Austria.
- ChatTCU: A chatbot for Brazil’s Federal Court of Accounts.
- City of Helsinki’s AI Register: An initiative aimed at leveraging open city data to enhance civic services and facilitate better engagement with residents.
- Climate Q&A: A generative AI chatbot that provides information about climate change based on scientific reports.
- DataLaw.Bot: A generative AI tool that disseminates data sharing regulations with researchers across several African countries…(More)”.
South Korea leverages open government data for AI development
Article by Si Ying Thian: “In South Korea, open government data is powering artificial intelligence (AI) innovations in the private sector.
Take the case of TTCare which may be the world’s first mobile application to analyse eye and skin disease symptoms in pets.
The AI model was trained on about one million pieces of data – half of the data coming from the government-led AI Hub and the rest collected by the firm itself, according to the Korean newspaper Donga.
AI Hub is an integrated platform set up by the government to support the country’s AI infrastructure.
TTCare’s CEO Heo underlined the importance of government-led AI training data in improving the model’s ability to diagnose symptoms. The firm’s training data is currently accessible through AI Hub, and any Korean citizen can download or use it.
Pushing the boundaries of open data
Over the years, South Korea has consistently come up top in the world’s rankings for Open, Useful, and Re-usable data (OURdata) Index.
The government has been pushing the boundaries of what it can do with open data – beyond just making data usable by providing APIs. Application Programming Interfaces, or APIs, make it easier for users to tap on open government data to power their apps and services.
There is now rising interest from public sector agencies to tap on such data to train AI models, said South Korea’s National Information Society Agency (NIA)’s Principal Manager, Dongyub Baek, although this is still at an early stage.
Baek sits in NIA’s open data department, which handles policies, infrastructure such as the National Open Data Portal, as well as impact assessments of the government initiatives…(More)”
Unlocking data for climate action requires trusted marketplaces
Report by Digital Impact Alliance: “In 2024, the northern hemisphere recorded the hottest summer overall, the hottest day, and the hottest ever month of August. That same month – August 2024 – this warming fueled droughts in Italy and intensified typhoons that devastated parts of the Philippines, Taiwan, and China. The following month, new research calculated that warming is costing the global economy billions of dollars: an increase in extreme heat and severe drought costs about 0.2% of a country’s GDP.
These are only the latest stories and statistics that illustrate the growing costs of climate change – data points that have emerged in the short time since we published our second Spotlight on unlocking climate data with open transaction networks.
This third paper in the series continues the work of the Joint Learning Network on Unlocking Data for Climate Action (Climate Data JLN). This multi-disciplinary network identified multiple promising models to explore in the context of unlocking data for climate action. This Spotlight paper examines the third of these models: data spaces. Through examination of data spaces in action, the paper analyzes the key elements that render them more or less applicable to specific climate-related data sets. Data spaces are relatively new and mostly conceptual, with only a handful of implementations in process and concentrated in a few geographic areas. While this model requires extensive up-front work to agree upon governance and technical standards, the result is an approach that overcomes trust and financing issues by maintaining data sovereignty and creating a marketplace for data exchange…(More)”.
Trust in artificial intelligence makes Trump/Vance a transhumanist ticket
Article by Filip Bialy: “AI plays a central role in the 2024 US presidential election, as a tool for disinformation and as a key policy issue. But its significance extends beyond these, connecting to an emerging ideology known as TESCREAL, which envisages AI as a catalyst for unprecedented progress, including space colonisation. After this election, TESCREALism may well have more than one representative in the White House, writes Filip Bialy
In June 2024, the essay Situational Awareness by former OpenAI employee Leopold Aschenbrenner sparked intense debate in the AI community. The author predicted that by 2027, AI would surpass human intelligence. Such claims are common among AI researchers. They often assert that only a small elite – mainly those working at companies like OpenAI – possesses inside knowledge of the technology. Many in this group hold a quasi-religious belief in the imminent arrival of artificial general intelligence (AGI) or artificial superintelligence (ASI)…
These hopes and fears, however, are not only religious-like but also ideological. A decade ago, Silicon Valley leaders were still associated with the so-called Californian ideology, a blend of hippie counterculture and entrepreneurial yuppie values. Today, figures like Elon Musk, Mark Zuckerberg, and Sam Altman are under the influence of a new ideological cocktail: TESCREAL. Coined in 2023 by Timnit Gebru and Émile P. Torres, TESCREAL stands for Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, and Longtermism.
While these may sound like obscure terms, they represent ideas developed over decades, with roots in eugenics. Early 20th-century eugenicists such as Francis Galton promoted selective breeding to enhance future generations. Later, with advances in genetic engineering, the focus shifted from eugenics’ racist origins to its potential to eliminate genetic defects. TESCREAL represents a third wave of eugenics. It aims to digitise human consciousness and then propagate digital humans into the universe…(More)”
Commission launches public consultation on the rules for researchers to access online platform data under the Digital Services Act
Press Release: “Today, the Commission launched a public consultation on the draft delegated act on access to online platform data for vetted researchers under the Digital Services Act (DSA).
With the Digital Services Act, researchers will for the first time have access to data to study systemic risks and to assess online platforms’ risk mitigation measures in the EU. It will allow the research community to play a vital role in scrutinising and safeguarding the online environment.
The draft delegated act clarifies the procedures on how researchers can access Very Large Operating Platforms’ and Search Engines’ data. It also sets out rules on data formats and data documentation requirements. Lastly, it establishes the DSA data access portal, a one-stop-shop for researchers, data providers, and DSCs to exchange information on data access requests. The consultation follows a first call for evidence.
The consultation will run until 26 November 2024. After gathering public feedback, the Commission plans to adopt the rules in the first quarter of 2025…(More)”.
Open-Access AI: Lessons From Open-Source Software
Article by Parth Nobel, Alan Z. Rozenshtein, Chinmayi Sharma: “Before analyzing how the lessons of open-source software might (or might not) apply to open-access AI, we need to define our terms and explain why we use the term “open-access AI” to describe models like Llama rather than the more commonly used “open-source AI.” We join many others in arguing that “open-source AI” is a misnomer for such models. It’s misleading to fully import the definitional elements and assumptions that apply to open-source software when talking about AI. Rhetoric matters, and the distinction isn’t just semantic; it’s about acknowledging the meaningful differences in access, control, and development.
The software industry definition of “open source” grew out of the free software movement, which makes the point that “users have the freedom to run, copy, distribute, study, change and improve” software. As the movement emphasizes, one should “think of ‘free’ as in ‘free speech,’ not as in ‘free beer.’” What’s “free” about open-source software is that users can do what they want with it, not that they initially get it for free (though much open-source software is indeed distributed free of charge). This concept is codified by the Open Source Initiative as the Open Source Definition (OSD), many aspects of which directly apply to Llama 3.2. Llama 3.2’s license makes it freely redistributable by license holders (Clause 1 of the OSD) and allows the distribution of the original models, their parts, and derived works (Clauses 3, 7, and 8). ..(More)”.
Proactive Mapping to Manage Disaster
Article by Andrew Mambondiyani: “..In March 2019, Cyclone Idai ravaged Zimbabwe, killing hundreds of people and leaving a trail of destruction. The Global INFORM Risk Index data shows that Zimbabwe is highly vulnerable to extreme climate-related events like floods, cyclones, and droughts, which in turn destroy infrastructure, displace people, and result in loss of lives and livelihoods.
Severe weather events like Idai have exposed the shortcomings of Zimbabwe’s traditional disaster-management system, which was devised to respond to environmental disasters by providing relief and rehabilitation of infrastructure and communities. After Idai, a team of climate-change researchers from three Zimbabwean universities and the local NGO DanChurchAid (DCA) concluded that the nation must adopt a more proactive approach by establishing an early-warning system to better prepare for and thereby prevent significant damage and death from such disasters.
In response to these findings, the Open Mapping Hub—Eastern and Southern Africa (ESA Hub)—launched a program in 2022 to develop an anticipatory-response approach in Zimbabwe. The ESA Hub is a regional NGO based in Kenya created by the Humanitarian OpenStreetMap Team (HOT), an international nonprofit that uses open-mapping technology to reduce environmental disaster risk. One of HOT’s four global hubs and its first in Africa, the ESA Hub was created in 2021 to facilitate the aggregation, utilization, and dissemination of high-quality open-mapping data across 23 countries in Eastern and Southern Africa. Open-source expert Monica Nthiga leads the hub’s team of 13 experts in mapping, open data, and digital content. The team collaborates with community-based organizations, humanitarian organizations, governments, and UN agencies to meet their specific mapping needs to best anticipate future climate-related disasters.
“The ESA Hub’s [anticipatory-response] project demonstrates how preemptive mapping can enhance disaster preparedness and resilience planning,” says Wilson Munyaradzi, disaster-services manager at the ESA Hub.
Open-mapping tools and workflows enable the hub to collect geospatial data to be stored, edited, and reviewed for quality assurance prior to being shared with its partners. “Geospatial data has the potential to identify key features of the landscape that can help plan and prepare before disasters occur so that mitigation methods are put in place to protect lives and livelihoods,” Munyaradzi says…(More)”.
Navigating Generative AI in Government
Report by the IBM Center for The Business of Government: “Generative AI refers to algorithms that can create realistic content such as images, text, music, and videos by learning from existing data patterns. Generative AI does more than just create content, it also serves as a user-friendly interface for other AI tools, making complex results easy to understand and use. Generative AI transforms analysis and prediction results into personalized formats, improving explainability by converting complicated data into understandable content. As Generative AI evolves, it plays an active role in collaborative processes, functioning as a vital collaborator by offering strengths that complement human abilities.
Generative AI has the potential to revolutionize government agencies by enhancing efficiency, improving decision making, and delivering better services to citizens, while maintaining agility and scalability. However, in order to implement generative AI solutions effectively, government agencies must address key questions—such as what problems AI can solve, data governance frameworks, and scaling strategies, to ensure a thoughtful and effective AI strategy. By exploring generic use cases, agencies can better understand the transformative potential of generative AI and align it with their unique needs and ethical considerations.
This report, which distills perspectives from two expert roundtable of leaders in Australia, presents 11 strategic pathways for integrating generative AI in government. The strategies include ensuring coherent and ethical AI implementation, developing adaptive AI governance models, investing in a robust data infrastructure, and providing comprehensive training for employees. Encouraging innovation and prioritizing public engagement and transparency are also essential to harnessing the full potential of AI…(More)”