Data Stewardship Decoded: Mapping Its Diverse Manifestations and Emerging Relevance at a time of AI


Paper by Stefaan Verhulst: “Data stewardship has become a critical component of modern data governance, especially with the growing use of artificial intelligence (AI). Despite its increasing importance, the concept of data stewardship remains ambiguous and varies in its application. This paper explores four distinct manifestations of data stewardship to clarify its emerging position in the data governance landscape. These manifestations include a) data stewardship as a set of competencies and skills, b) a function or role within organizations, c) an intermediary organization facilitating collaborations, and d) a set of guiding principles. 

The paper subsequently outlines the core competencies required for effective data stewardship, explains the distinction between data stewards and Chief Data Officers (CDOs), and details the intermediary role of stewards in bridging gaps between data holders and external stakeholders. It also explores key principles aligned with the FAIR framework (Findable, Accessible, Interoperable, Reusable) and introduces the emerging principle of AI readiness to ensure data meets the ethical and technical requirements of AI systems. 

The paper emphasizes the importance of data stewardship in enhancing data collaboration, fostering public value, and managing data reuse responsibly, particularly in the era of AI. It concludes by identifying challenges and opportunities for advancing data stewardship, including the need for standardized definitions, capacity building efforts, and the creation of a professional association for data stewardship…(More)”

Enhancing Access to and Sharing of Data in the Age of Artificial Intelligence



OECD Report: “Artificial intelligence (AI) is transforming economies and societies, but its full potential is hindered by poor access to quality data and models. Based on comprehensive country examples, the OECD report “Enhancing Access to and Sharing of Data in the Age of AI” highlights how governments can enhance access to and sharing of data and certain AI models, while ensuring privacy and other rights and interests such as intellectual property rights. It highlights the OECD Recommendation on Enhancing Access to and Sharing of Data, which provides principles to balance openness while ensuring effective legal, technical and organisational safeguards. This policy brief highlights the key findings of the report and their relevance for stakeholders seeking to promote trustworthy AI through better policies for data and AI models that drive trust, investment, innovation, and well-being….(More)”

Cities, health, and the big data revolution


Blog by Harvard Public Health: “Cities influence our health in unexpected ways. From sidewalks to crosswalks, the built environment affects how much we move, impacting our risk for diseases like obesity and diabetes. A recent New York City study underscores that focusing solely on infrastructure, without understanding how people use it, can lead to ineffective interventions. Researchers analyzed over two million Google Street View images, combining them with health and demographic data to reveal these dynamics. Harvard Public Health spoke with Rumi Chunara, director of New York University’s Center for Health Data Science and lead author of the study.

Why study this topic?

We’re seeing an explosion of new data sources, like street-view imagery, being used to make decisions. But there’s often a disconnect—people using these tools don’t always have the public health knowledge to interpret the data correctly. We wanted to highlight the importance of combining data science and domain expertise to ensure interventions are accurate and impactful.

What did you find?

We discovered that the relationship between built environment features and health outcomes isn’t straightforward. It’s not just about having sidewalks; it’s about how often people are using them. Improving physical activity levels in a community could have a far greater impact on health outcomes than simply adding more infrastructure.

It also revealed the importance of understanding the local context. For instance, Google Street View data sometimes misclassifies sidewalks, particularly near highways or bridges, leading to inaccurate conclusions. Relying solely on this data, without accounting for these nuances, could result in less effective interventions…(More)”.

Tech tycoons have got the economics of AI wrong


The Economist: “…The Jevons paradox—the idea that efficiency leads to more use of a resource, not less—has in recent days provided comfort to Silicon Valley titans worried about the impact of DeepSeek, the maker of a cheap and efficient Chinese chatbot, which threatens the more powerful but energy-guzzling American varieties. Satya Nadella, the boss of Microsoft, posted on X, a social-media platform, that “Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of,” along with a link to the Wikipedia page for the economic principle. Under this logic, DeepSeek’s progress will mean more demand for data centres, Nvidia chips and even the nuclear reactors that the hyperscalers were, prior to the unveiling of DeepSeek, paying to restart. Nothing to worry about if the price falls, Microsoft can make it up on volume.

The logic, however self-serving, has a ring of truth to it. Jevons’s paradox is real and observable in a range of other markets. Consider the example of lighting. William Nordhaus, a Nobel-prizewinning economist, has calculated that a Babylonian oil lamp, powered by sesame oil, produced about 0.06 lumens of light per watt of energy. That compares with up to 110 lumens for a modern light-emitting diode. The world has not responded to this dramatic improvement in energy efficiency by enjoying the same amount of light as a Babylonian at lower cost. Instead, it has banished darkness completely, whether through more bedroom lamps than could have been imagined in ancient Mesopotamia or the Las Vegas sphere, which provides passersby with the chance to see a 112-metre-tall incandescent emoji. Urban light is now so cheap and so abundant that many consider it to be a pollutant.

Likewise, more efficient chatbots could mean that AI finds new uses (some no doubt similarly obnoxious). The ability of DeepSeek’s model to perform about as well as more compute-hungry American AI shows that data centres are more productive than previously thought, rather than less. Expect, the logic goes, more investment in data centres and so on than you did before.

Although this idea should provide tech tycoons with some solace, they still ought to worry. The Jevons paradox is a form of a broader phenomenon known as “rebound effects”. These are typically not large enough to fully offset savings from improved efficiency….Basing the bull case for AI on the Jevons paradox is, therefore, a bet not on the efficiency of the technology but on the level of demand. If adoption is being held back by price then efficiency gains will indeed lead to greater use. If technological progress raises expectations rather than reduces costs, as is typical in health care, then chatbots will make up an ever larger proportion of spending. At the moment, that looks unlikely. America’s Census Bureau finds that only 5% of American firms currently use AI and 7% have plans to adopt it in the future. Many others find the tech difficult to use or irrelevant to their line of business…(More)”.

Unlocking AI’s potential for the public sector


Article by Ruth Kelly: “…Government needs to work on its digital foundations. The extent of legacy IT systems across government is huge. Many were designed and built for a previous business era, and still rely on paper-based processes. Historic neglect and a lack of asset maintenance has added to the difficulty. Because many systems are not compatible, sharing data across systems requires manual extraction which is risky and costly. All this adds to problems with data quality. Government suffers from data which is incomplete, inconsistent, inaccessible, difficult to process and not easily shareable. A lack of common data models, both between and within government departments, makes it difficult and costly to combine different sources of data, and significant manual effort is required to make data usable. Some departments have told us that they spend 60% to 80% of their time on cleaning data when carrying out analysis.

Why is this an issue for AI? Large volumes of good-quality data are important for training, testing and deploying AI models. Poor data leads to poor outcomes, especially where it involves personal data. Access to good-quality data was identified as a barrier to implementing AI by 62% of the 87 government bodies responding to our survey. Simple productivity improvements that provide integration with routine administration (for example summarising documents) is already possible, but integration with big, established legacy IT is a whole other long-term endeavour. Layering new technology on top of existing systems, and reusing poor-quality and aging data, carries the risk of magnifying problems and further embedding reliance on legacy systems…(More)”

Randomize NIH grant giving


Article by Vinay Prasad: “A pause in NIH study sections has been met with fear and anxiety from researchers. At many universities, including mine, professors live on soft money. No grants? If you are assistant professor, you can be asked to pack your desk. If you are a full professor, the university slowly cuts your pay until you see yourself out. Everyone talks about you afterwards, calling you a failed researcher. They laugh, a little too long, and then blink back tears as they wonder if they are next. Of course, your salary doubles in the new job and you are happier, but you are still bitter and gossiped about.

In order to apply for NIH grants, you have to write a lot of bullshit. You write specific aims and methods, collect bios from faculty and more. There is a section where you talk about how great your department and team is— this is the pinnacle of the proverbial expression, ‘to polish a turd.’ You invite people to work on your grant if they have a lot of papers or grants or both, and they agree to be on your grant even though they don’t want to talk to you ever again.

You submit your grant and they hire someone to handle your section. They find three people to review it. Ideally, they pick people who have no idea what you are doing or why it is important, and are not as successful as you, so they can hate read your proposal. If, despite that, they give you a good score, you might be discussed at study section.

The study section assembles scientists to discuss your grant. As kids who were picked last in kindergarten basketball, they focus on the minutiae. They love to nitpick small things. If someone on study section doesn’t like you, they can tank you. In contrast, if someone loves you, they can’t really single handedly fund you.

You might wonder if study section leaders are the best scientists. Rest assured. They aren’t. They are typically mid career, mediocre scientists. (This is not just a joke, data support this claim see www.drvinayprasad.com). They rarely have written extremely influential papers.

Finally, your proposal gets a percentile score. Here is the chance of funding by percentile. You might get a chance to revise your grant if you just fall short….Given that the current system is onerous and likely flawed, you would imagine that NIH leadership has repeatedly tested whether the current method is superior than say a modified lottery, aka having an initial screen and then randomly giving out the money.

Of course not. Self important people giving out someone else’s money rarely study their own processes. If study sections are no better than lottery, that would mean a lot of NIH study section officers would no longer need to work hard from home half the day, freeing up money for one more grant.

Let’s say we take $200 million and randomize it. Half of it is allocated to being given out in the traditional method, and the other half is allocated to a modified lottery. If an application is from a US University and passes a minimum screen, it is enrolled in the lottery.

Then we follow these two arms into the future. We measure publications, citations, h index, the average impact factor of journals in which the papers are published, and more. We even take a subset of the projects and blind reviewers to score the output. Can they tell which came from study section?…(More)”.

Artificial Intelligence for Participation


Policy Brief by the Brazil Centre of the University of Münster: “…provides an overview of current and potential applications of artificial intelligence (AI) technologies in the context of political participation and democratic governance processes in cities. Aimed primarily at public managers, the document also highlights critical issues to consider in the implementation of these technologies, and proposes an agenda for debate on the new state capabilities they require…(More)”.

Will big data lift the veil of ignorance?


Blog by Lisa Herzog: “Imagine that you have a toothache, and a visit at the dentist reveals that a major operation is needed. You phone your health insurance. You listen to the voice of the chatbot, press the buttons to go through the menu. And then you hear: “We have evaluated your profile based on the data you have agreed to share with us. Your dental health behavior scores 6 out of 10. The suggested treatment plan therefore requires a co-payment of [insert some large sum of money here].”

This may sound like science fiction. But many other insurances, e.g. car insurances, already build on automated data being shared with them. If they were allowed, health insurers would certainly like to access our data as well – not only those from smart toothbrushes, but also credit card data, behavioral data (e.g. from step counting apps), or genetic data. If they were allowed to use them, they could move towards segmented insurance plans for specific target groups. As two commentators, on whose research I come back below, recently wrote about health insurance: “Today, public plans and nondiscrimination clauses, not lack of information, are what stands between integration and segmentation.”

If, like me, you’re interested in the relation between knowledge and institutional design, insurance is a fascinating topic. The basic idea of insurance is centuries old – here is a brief summary (skip a few paragraphs if you know this stuff). Because we cannot know what might happen to us in the future, but we can know that on an aggregate level, things will happen to people, it can make sense to enter an insurance contract, creating a pool that a group jointly contributes to. Those for whom the risks in question materialize get support from the pool. Those for whom it does not materialize may go through life without receiving any money, but they still know that they could get support if something happened to them. As such, insurance combines solidarity within a group with individual pre-caution…(More)”.

Is This How Reddit Ends?


Article by Matteo Wong: “The internet is growing more hostile to humans. Google results are stuffed with search-optimized spam, unhelpful advertisements, and AI slop. Amazon has become littered with undifferentiated junk. The state of social media, meanwhile—fractured, disorienting, and prone to boosting all manner of misinformation—can be succinctly described as a cesspool.

It’s with some irony, then, that Reddit has become a reservoir of humanity. The platform has itself been called a cesspool, rife with hateful rhetoric and falsehoods. But it is also known for quirky discussions and impassioned debates on any topic among its users. Does charging your brother rent, telling your mom she’s an unwanted guest, or giving your wife a performance review make you an asshole? (Redditors voted no, yes, and “everyone sucks,” respectively.) The site is where fans hash out the best rap album ever and plumbers weigh in on how to unclog a drain. As Google has begun to offer more and more vacuous SEO sites and ads in response to queries, many people have started adding reddit to their searches to find thoughtful, human-written answers: find mosquito in bedroom redditfix musty sponge reddit.

But now even Reddit is becoming more artificial. The platform has quietly started beta-testing Reddit Answers, what it calls an “AI-powered conversational interface.” In function and design, the feature—which is so far available only for some users in the U.S.—is basically an AI chatbot. On a new search screen accessible from the homepage, Reddit Answers takes anyone’s queries, trawls the site for relevant discussions and debates, and composes them into a response. In other words, a site that sells itself as a home for “authentic human connection” is now giving humans the option to interact with an algorithm instead…(More)”.

Flipping data on its head: Differing conceptualisations of data and the implications for actioning Indigenous data sovereignty principles


Paper by Stephanie Cunningham-Reimann et al: “Indigenous data sovereignty is of global concern. The power of data through its multitude of uses can cause harm to Indigenous Peoples, communities, organisations and Nations in Canada and globally. Indigenous research principles play a vital role in guiding researchers, scholars and policy makers in their careers and roles. We define data, data sovereignty principles, ways of practicing Indigenous research principles, and recommendations for applying and actioning Indigenous data sovereignty through culturally safe self-reflection, interpersonal and reciprocal relationships built upon respect, reciprocity, relevance, responsibility and accountability. Research should be co-developed, co-led, and co-disseminated in partnership with Indigenous Peoples, communities, organisations and/or nations to build capacity, support self-determination, and reduce harms produced through the analysis and dissemination of research findings. OCAP® (Ownership, Control, Access & Possession), OCAS (Ownership, Control, Access & Stewardship), Inuit Qaujimajatuqangit principles in conjunction the 4Rs (respect, relevance, reciprocity & responsibility) and cultural competency including self-examination of the 3Ps (power, privilege, and positionality) of researchers, scholars and policy makers can be challenging, but will amplify the voices and understandings of Indigenous research by implementing Indigenous data sovereignty in Canada…(More)”