Effective Data Stewardship in Higher Education: Skills, Competences, and the Emerging Role of Open Data Stewards


Paper by Panos Fitsilis et al: “The significance of open data in higher education stems from the changing tendencies towards open science, and open research in higher education encourages new ways of making scientific inquiry more transparent, collaborative and accessible. This study focuses on the critical role of open data stewards in this transition, essential for managing and disseminating research data effectively in universities, while it also highlights the increasing demand for structured training and professional policies for data stewards in academic settings. Building upon this context, the paper investigates the essential skills and competences required for effective data stewardship in higher education institutions by elaborating on a critical literature review, coupled with practical engagement in open data stewardship at universities, provided insights into the roles and responsibilities of data stewards. In response to these identified needs, the paper proposes a structured training framework and comprehensive curriculum for data stewardship, a direct response to the gaps identified in the literature. It addresses five key competence categories for open data stewards, aligning them with current trends and essential skills and knowledge in the field. By advocating for a structured approach to data stewardship education, this work sets the foundation for improved data management in universities and serves as a critical step towards professionalizing the role of data stewards in higher education. The emphasis on the role of open data stewards is expected to advance data accessibility and sharing practices, fostering increased transparency, collaboration, and innovation in academic research. This approach contributes to the evolution of universities into open ecosystems, where there is free flow of data for global education and research advancement…(More)”.

Exploring the Intersections of Open Data and Generative AI: Recent Additions to the Observatory


Blog by Roshni Singh, Hannah Chafetz, Andrew Zahuranec, Stefaan Verhulst: “The Open Data Policy Lab’s Observatory of Examples of How Open Data and Generative AI Intersect provides real-world use cases of where open data from official sources intersects with generative artificial intelligence (AI), building from the learnings from our report, “A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.” 

The Observatory includes over 80 examples from several domains and geographies–ranging from supporting administrative work within the legal department of the Government of France to assisting researchers across the African continent in navigating cross-border data sharing laws. The examples include generative AI chatbots to improve access to services, conversational tools to help analyze data, datasets to improve the quality of the AI output, and more. A key feature of the Observatory is its categorization across our Spectrum of Scenarios framework, shown below. Through this effort, we aim to bring together the work already being done and identify ways to use generative AI for the public good.

Screenshot 2024 10 25 at 10.50.23 am

This Observatory is an attempt to grapple with the work currently being done to apply generative AI in conjunction with official open data. It does not make a value judgment on their efficacy or practices. Many of these examples have ethical implications, which merit further attention and study. 

From September through October, we added to the Observatory:

  • Bayaan Platform: A conversational tool by the Statistics Centre Abu Dhabi that provides decision makers with data analytics and visualization support.
  • Berufsinfomat: A generative AI tool for career coaching in Austria.
  • ChatTCU: A chatbot for Brazil’s Federal Court of Accounts.
  • City of Helsinki’s AI Register: An initiative aimed at leveraging open city data to enhance civic services and facilitate better engagement with residents.
  • Climate Q&A: A generative AI chatbot that provides information about climate change based on scientific reports.
  • DataLaw.Bot: A generative AI tool that disseminates data sharing regulations with researchers across several African countries…(More)”.

South Korea leverages open government data for AI development


Article by Si Ying Thian: “In South Korea, open government data is powering artificial intelligence (AI) innovations in the private sector.

Take the case of TTCare which may be the world’s first mobile application to analyse eye and skin disease symptoms in pets.

AI Hub allows users to search by industry, data format and year (top row), with the data sets made available based on the particular search term “pet” (bottom half of the page). Image: AI Hub, provided by courtesy of Baek

The AI model was trained on about one million pieces of data – half of the data coming from the government-led AI Hub and the rest collected by the firm itself, according to the Korean newspaper Donga.

AI Hub is an integrated platform set up by the government to support the country’s AI infrastructure.

TTCare’s CEO Heo underlined the importance of government-led AI training data in improving the model’s ability to diagnose symptoms. The firm’s training data is currently accessible through AI Hub, and any Korean citizen can download or use it.

Pushing the boundaries of open data

Over the years, South Korea has consistently come up top in the world’s rankings for Open, Useful, and Re-usable data (OURdata) Index.

The government has been pushing the boundaries of what it can do with open data – beyond just making data usable by providing APIs. Application Programming Interfaces, or APIs, make it easier for users to tap on open government data to power their apps and services.

There is now rising interest from public sector agencies to tap on such data to train AI models, said South Korea’s National Information Society Agency (NIA)’s Principal Manager, Dongyub Baek, although this is still at an early stage.

Baek sits in NIA’s open data department, which handles policies, infrastructure such as the National Open Data Portal, as well as impact assessments of the government initiatives…(More)”

Open government data and self-efficacy: The empirical evidence of micro foundation via survey experiments


Paper by Kuang-Ting Tai, Pallavi Awasthi, and Ivan P. Lee: “Research on the potential impacts of government openness and open government data is not new. However, empirical evidence regarding the micro-level impact, which can validate macro-level theories, has been particularly limited. Grounded in social cognitive theory, this study contributes to the literature by empirically examining how the dissemination of government information in an open data format can influence individuals’ perceptions of self-efficacy, a key predictor of public participation. Based on two rounds of online survey experiments conducted in the U.S., the findings reveal that exposure to open government data is associated with decreased perceived self-efficacy, resulting in lower confidence in participating in public affairs. This result, while contrary to optimistic assumptions, aligns with some other empirical studies and highlights the need to reconsider the format for disseminating government information. The policy implications suggest further calibration of open data applications to target professional and skilled individuals. This study underscores the importance of experiment replication and theory development as key components of future research agendas…(More)”.

A shared destiny for public sector data


Blog post by Shona Nicol: “As a data professional, it can sometime feel hard to get others interested in data. Perhaps like many in this profession, I can often express the importance and value of data for good in an overly technical way. However when our biggest challenges in Scotland include eradicating child poverty, growing the economy and tackling the climate emergency, I would argue that we should all take an interest in data because it’s going to be foundational in helping us solve these problems.

Data is already intrinsic to shaping our society and how services are delivered. And public sector data is a vital component in making sure that services for the people of Scotland are being delivered efficiently and effectively. Despite an ever growing awareness of the transformative power of data to improve the design and delivery of services, feedback from public sector staff shows that they can face difficulties when trying to influence colleagues and senior leaders around the need to invest in data.

A vision gap

In the Scottish Government’s data maturity programme and more widely, we regularly hear about the challenges data professionals encounter when trying to enact change. This community tell us that a long-term vision for public sector data for Scotland could help them by providing the context for what they are trying to achieve locally.

Earlier this year we started to scope how we might do this. We recognised that organisations are already working to deliver local and national strategies and policies that relate to data, so any vision had to be able to sit alongside those, be meaningful in different settings, agnostic of technology and relevant to any public sector organisation. We wanted to offer opportunities for alignment, not enforce an instruction manual…(More)”.

Unlocking AI for All: The Case for Public Data Banks


Article by Kevin Frazier: “The data relied on by OpenAI, Google, Meta, and other artificial intelligence (AI) developers is not readily available to other AI labs. Google and Meta relied, in part, on data gathered from their own products to train and fine-tune their models. OpenAI used tactics to acquire data that now would not work or may be more likely to be found in violation of the law (whether such tactics violated the law when originally used by OpenAI is being worked out in the courts). Upstart labs as well as research outfits find themselves with a dearth of data. Full realization of the positive benefits of AI, such as being deployed in costly but publicly useful ways (think tutoring kids or identifying common illnesses), as well as complete identification of the negative possibilities of AI (think perpetuating cultural biases) requires that labs other than the big players have access to quality, sufficient data.

The proper response is not to return to an exploitative status quo. Google, for example, may have relied on data from YouTube videos without meaningful consent from users. OpenAI may have hoovered up copyrighted data with little regard for the legal and social ramifications of that approach. In response to these questionable approaches, data has (rightfully) become harder to acquire. Cloudflare has equipped websites with the tools necessary to limit data scraping—the process of extracting data from another computer program. Regulators have developed new legal limits on data scraping or enforced old ones. Data owners have become more defensive over their content and, in some cases, more litigious. All of these largely positive developments from the perspective of data creators (which is to say, anyone and everyone who uses the internet) diminish the odds of newcomers entering the AI space. The creation of a public AI training data bank is necessary to ensure the availability of enough data for upstart labs and public research entities. Such banks would prevent those new entrants from having to go down the costly and legally questionable path of trying to hoover up as much data as possible…(More)”.

Artificial Intelligence as a Catalyzer for Open Government Data Ecosystems: A Typological Theory Approach


Paper by Anthony Simonofski et al: “Artificial Intelligence (AI) within digital government has witnessed growing interest as it can improve governance processes and stimulate citizen engagement. Despite the rise of Generative AI, discussions on AI fusion with Open Government Data (OGD) remain limited to specific implementations and scattered across disciplines. Drawing from the synthesis of the literature through a systematic review, this study examines and structures how AI can enrich OGD initiatives. Employing a typological approach, ideal profiles of AI application within the OGD lifecycle are formalized, capturing varied roles across the portal and ecosystems perspectives. The resulting conceptual framework identifies eight ideal types of AI applications for OGD: AI as Portal Curator, Explorer, Linker, and Monitor, and AI as Ecosystem Data Retriever, Connecter, Value Developer and Engager. This theoretical foundation shows the under-investigation of some types and will inform policymakers, practitioners, and researchers in leveraging AI to cultivate OGD ecosystems…(More)”.

Community consent: neither a ceiling nor a floor


Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata. 

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

  • Ensuring that information about the research project spreads throughout the community,
  • Removing potential barriers that might be created by resistance from community members,
  • Alleviating the possible concerns of individuals about the perspectives of community leaders, and 
  • Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.

The Role of Open Data in Driving Sectoral Innovation and Global Economic Development


Paper by Olalekan Jamiu Okunleye: “This study assessed the transformative impact of implementing open data principles on fostering innovation across various sectors and enhancing global economic development. Using a comprehensive analysis of secondary data from government portals, industry reports, and global innovation indexes between 2015 to 2019, the research employed panel data regression, correlation analysis, and descriptive statistics to evaluate key relationships. The findings indicate that the availability of open data significantly increases innovation outputs, with robust statistical evidence showing positive correlations between open data sets and sector-specific innovation metrics such as patents filed, R&D expenditure, and the number of startups created. Greater interoperability of open data across international borders contributes to economic growth, particularly through international joint ventures. However, the lack of standardized data formats hampers cross-sector collaboration. Regions with well-established open data policies demonstrate faster technological advancements and economic development compared to regions without such policies. The study highlighted the critical importance of promoting open data initiatives, standardizing data formats, strengthening data governance frameworks, and investing in digital infrastructure and capacity building to optimize open data utilization and drive sustainable development…(More)”.

The societal impact of Open Science: a scoping review


Report by Nicki Lisa Cole, Eva Kormann, Thomas Klebel, Simon Apartis and Tony Ross-Hellauer: “Open Science (OS) aims, in part, to drive greater societal impact of academic research. Government, funder and institutional policies state that it should further democratize research and increase learning and awareness, evidence-based policy-making, the relevance of research to society’s problems, and public trust in research. Yet, measuring the societal impact of OS has proven challenging and synthesized evidence of it is lacking. This study fills this gap by systematically scoping the existing evidence of societal impact driven by OS and its various aspects, including Citizen Science (CS), Open Access (OA), Open/FAIR Data (OFD), Open Code/Software and others. Using the PRISMA Extension for Scoping Reviews and searches conducted in Web of Science, Scopus and relevant grey literature, we identified 196 studies that contain evidence of societal impact. The majority concern CS, with some focused on OA, and only a few addressing other aspects. Key areas of impact found are education and awareness, climate and environment, and social engagement. We found no literature documenting evidence of the societal impact of OFD and limited evidence of societal impact in terms of policy, health, and trust in academic research. Our findings demonstrate a critical need for additional evidence and suggest practical and policy implications…(More)”.