Randomize NIH grant giving


Article by Vinay Prasad: “A pause in NIH study sections has been met with fear and anxiety from researchers. At many universities, including mine, professors live on soft money. No grants? If you are assistant professor, you can be asked to pack your desk. If you are a full professor, the university slowly cuts your pay until you see yourself out. Everyone talks about you afterwards, calling you a failed researcher. They laugh, a little too long, and then blink back tears as they wonder if they are next. Of course, your salary doubles in the new job and you are happier, but you are still bitter and gossiped about.

In order to apply for NIH grants, you have to write a lot of bullshit. You write specific aims and methods, collect bios from faculty and more. There is a section where you talk about how great your department and team is— this is the pinnacle of the proverbial expression, ‘to polish a turd.’ You invite people to work on your grant if they have a lot of papers or grants or both, and they agree to be on your grant even though they don’t want to talk to you ever again.

You submit your grant and they hire someone to handle your section. They find three people to review it. Ideally, they pick people who have no idea what you are doing or why it is important, and are not as successful as you, so they can hate read your proposal. If, despite that, they give you a good score, you might be discussed at study section.

The study section assembles scientists to discuss your grant. As kids who were picked last in kindergarten basketball, they focus on the minutiae. They love to nitpick small things. If someone on study section doesn’t like you, they can tank you. In contrast, if someone loves you, they can’t really single handedly fund you.

You might wonder if study section leaders are the best scientists. Rest assured. They aren’t. They are typically mid career, mediocre scientists. (This is not just a joke, data support this claim see www.drvinayprasad.com). They rarely have written extremely influential papers.

Finally, your proposal gets a percentile score. Here is the chance of funding by percentile. You might get a chance to revise your grant if you just fall short….Given that the current system is onerous and likely flawed, you would imagine that NIH leadership has repeatedly tested whether the current method is superior than say a modified lottery, aka having an initial screen and then randomly giving out the money.

Of course not. Self important people giving out someone else’s money rarely study their own processes. If study sections are no better than lottery, that would mean a lot of NIH study section officers would no longer need to work hard from home half the day, freeing up money for one more grant.

Let’s say we take $200 million and randomize it. Half of it is allocated to being given out in the traditional method, and the other half is allocated to a modified lottery. If an application is from a US University and passes a minimum screen, it is enrolled in the lottery.

Then we follow these two arms into the future. We measure publications, citations, h index, the average impact factor of journals in which the papers are published, and more. We even take a subset of the projects and blind reviewers to score the output. Can they tell which came from study section?…(More)”.

Will big data lift the veil of ignorance?


Blog by Lisa Herzog: “Imagine that you have a toothache, and a visit at the dentist reveals that a major operation is needed. You phone your health insurance. You listen to the voice of the chatbot, press the buttons to go through the menu. And then you hear: “We have evaluated your profile based on the data you have agreed to share with us. Your dental health behavior scores 6 out of 10. The suggested treatment plan therefore requires a co-payment of [insert some large sum of money here].”

This may sound like science fiction. But many other insurances, e.g. car insurances, already build on automated data being shared with them. If they were allowed, health insurers would certainly like to access our data as well – not only those from smart toothbrushes, but also credit card data, behavioral data (e.g. from step counting apps), or genetic data. If they were allowed to use them, they could move towards segmented insurance plans for specific target groups. As two commentators, on whose research I come back below, recently wrote about health insurance: “Today, public plans and nondiscrimination clauses, not lack of information, are what stands between integration and segmentation.”

If, like me, you’re interested in the relation between knowledge and institutional design, insurance is a fascinating topic. The basic idea of insurance is centuries old – here is a brief summary (skip a few paragraphs if you know this stuff). Because we cannot know what might happen to us in the future, but we can know that on an aggregate level, things will happen to people, it can make sense to enter an insurance contract, creating a pool that a group jointly contributes to. Those for whom the risks in question materialize get support from the pool. Those for whom it does not materialize may go through life without receiving any money, but they still know that they could get support if something happened to them. As such, insurance combines solidarity within a group with individual pre-caution…(More)”.

Data Stewardship as Environmental Stewardship


Article by Stefaan Verhulst and Sara Marcucci: “Why responsible data stewardship could help address today’s pressing environmental challenges resulting from artificial intelligence and other data-related technologies…

Even as the world grows increasingly reliant on data and artificial intelligence, concern over the environmental impact of data-related activities is increasing. Solutions remain elusive. The rise of generative AI, which rests on a foundation of massive data sets and computational power, risks exacerbating the problem.

In the below, we propose that responsible data stewardship offers a potential pathway to reducing the environmental footprint of data activities. By promoting practices such as data reuse, minimizing digital waste, and optimizing storage efficiency, data stewardship can help mitigate environmental harm. Additionally, data stewardship supports broader environmental objectives by facilitating better decision-making through transparent, accessible, and shared data. In the below, we suggest that advancing data stewardship as a cornerstone of environmental responsibility could provide a compelling approach to addressing the dual challenges of advancing digital technologies while safeguarding the environment…(More)”

Data Governance Meets the EU AI Act


Article by Axel Schwanke: “..The EU AI Act emphasizes sustainable AI through robust data governance, promoting principles like data minimization, purpose limitation, and data quality to ensure responsible data collection and processing. It mandates measures such as data protection impact assessments and retention policies. Article 10 underscores the importance of effective data management in fostering ethical and sustainable AI development…This article states that high-risk AI systems must be developed using high-quality data sets for training, validation, and testing. These data sets should be managed properly, considering factors like data collection processes, data preparation, potential biases, and data gaps. The data sets should be relevant, representative, error-free, and complete as much as possible. They should also consider the specific context in which the AI system will be used. In some cases, providers may process special categories of personal data to detect and correct biases, but they must follow strict conditions to protect individuals’ rights and freedoms…

However, achieving compliance presents several significant challenges:

  • Ensuring Dataset Quality and Relevance: Organizations must establish robust data and AI platforms to prepare and manage datasets that are error-free, representative, and contextually relevant for their intended use cases. This requires rigorous data preparation and validation processes.
  • Bias and Contextual Sensitivity: Continuous monitoring for biases in data is critical. Organizations must implement corrective actions to address gaps while ensuring compliance with privacy regulations, especially when processing personal data to detect and reduce bias.
  • End-to-End Traceability: A comprehensive data governance framework is essential to track and document data flow from its origin to its final use in AI models. This ensures transparency, accountability, and compliance with regulatory requirements.
  • Evolving Data Requirements: Dynamic applications and changing schemas, particularly in industries like real estate, necessitate ongoing updates to data preparation processes to maintain relevance and accuracy.
  • Secure Data Processing: Compliance demands strict adherence to secure processing practices for personal data, ensuring privacy and security while enabling bias detection and mitigation.

Example: Real Estate Data
Immowelt’s real estate price map, awarded as the top performer in a 2022 test of real estate price maps, exemplifies the challenges of achieving high-quality datasets. The prepared data powers numerous services and applications, including data analysis, price predictions, personalization, recommendations, and market research…(More)”

Why Digital Public Goods, including AI, Should Depend on Open Data


Article by Cable Green: “Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors…(More)”.

The Case for Local and Regional Public Engagement in Governing Artificial Intelligence


Article by Stefaan Verhulst and Claudia Chwalisz: “As the Paris AI Action Summit approaches, the world’s attention will once again turn to the urgent questions surrounding how we govern artificial intelligence responsibly. Discussions will inevitably include calls for global coordination and participation, exemplified by several proposals for a Global Citizens’ Assembly on AI. While such initiatives aim to foster inclusivity, the reality is that meaningful deliberation and actionable outcomes often emerge most effectively at the local and regional levels.

Building on earlier reflections in “AI Globalism and AI Localism,” we argue that to govern AI for public benefit, we must prioritize building public engagement capacity closer to the communities where AI systems are deployed. Localized engagement not only ensures relevance to specific cultural, social, and economic contexts but also equips communities with the agency to shape both policy and product development in ways that reflect their needs and values.

While a Global Citizens’ Assembly sounds like a great idea on the surface, there is no public authority with teeth or enforcement mechanisms at that level of governance. The Paris Summit represents an opportunity to rethink existing AI governance frameworks, reorienting them toward an approach that is grounded in lived, local realities and mutually respectful processes of co-creation. Toward that end, we elaborate below on proposals for: local and regional AI assemblies; AI citizens’ assemblies for EU policy; capacity-building programs, and localized data governance models…(More)”.

Good government data requires good statistics officials – but how motivated and competent are they?


Worldbank Blog: “Government data is only as reliable as the statistics officials who produce it. Yet, surprisingly little is known about these officials themselves. For decades, they have diligently collected data on others –  such as households and firms – to generate official statistics, from poverty rates to inflation figures. Yet, data about statistics officials themselves is missing. How competent are they at analyzing statistical data? How motivated are they to excel in their roles? Do they uphold integrity when producing official statistics, even in the face of opposing career incentives or political pressures? And what can National Statistical Offices (NSOs) do to cultivate a workforce that is competent, motivated, and ethical?

We surveyed 13,300 statistics officials in 14 countries in Latin America and the Caribbean to find out. Five results stand out. For further insights, consult our Inter-American Development Bank (IDB) report, Making National Statistical Offices Work Better.

1. The competence and management of statistics officials shape the quality of statistical data

Our survey included a short exam assessing basic statistical competencies, such as descriptive statistics and probability. Statistical competence correlates with data quality: NSOs with higher exam scores among employees tend to achieve better results in the World Bank’s Statistical Performance Indicators (r = 0.36).

NSOs with better management practices also have better statistical performance. For instance, NSOs with more robust recruitment and selection processes have better statistical performance (r = 0.62)…(More)”.

Which Health Facilities Have Been Impacted by L.A.-Area Fires? AI May Paint a Clearer Picture


Article by Andrew Schroeder: “One of the most important factors for humanitarian responders in these types of large-scale disaster situations is to understand the effects on the formal health system, upon which most people — and vulnerable communities in particular — rely upon in their neighborhoods. Evaluation of the impact of disasters on individual structures, including critical infrastructure such as health facilities, is traditionally a relatively slow and manually arduous process, involving extensive ground truth visitation by teams of assessment professionals.

Speeding up this process without losing accuracy, while potentially improving the safety and efficiency of assessment teams, is among the more important analytical efforts Direct Relief can undertake for response and recovery efforts. Manual assessments can now be effectively paired with AI-based analysis of satellite imagery to do just that…

With the advent of geospatial AI models trained on disaster damage impacts, ground assessment is not the only tool available to response agencies and others seeking to understand how much damage has occurred and the degree to which that damage may affect essential services for communities. The work of the Oregon State University team of experts in remote sensing-based post-disaster damage detection, led by Jamon Van Den Hoek and Corey Scher, was featured in the Financial Times on January 9.

Their modeling, based on Sentinel-1 satellite imagery, identified 21,757 structures overall, of which 11,124 were determined to have some level of damage. The Oregon State model does not distinguish between different levels of damage, and therefore cannot respond to certain types of questions that the manual inspections can respond to, but nevertheless the coverage area and the speed of detection have been much greater…(More)”.

Kickstarting Collaborative, AI-Ready Datasets in the Life Sciences with Government-funded Projects


Article by Erika DeBenedictis, Ben Andrew & Pete Kelly: “In the age of Artificial Intelligence (AI), large high-quality datasets are needed to move the field of life science forward. However, the research community lacks strategies to incentivize collaboration on high-quality data acquisition and sharing. The government should fund collaborative roadmapping, certification, collection, and sharing of large, high-quality datasets in life science. In such a system, nonprofit research organizations engage scientific communities to identify key types of data that would be valuable for building predictive models, and define quality control (QC) and open science standards for collection of that data. Projects are designed to develop automated methods for data collection, certify data providers, and facilitate data collection in consultation with researchers throughout various scientific communities. Hosting of the resulting open data is subsidized as well as protected by security measures. This system would provide crucial incentives for the life science community to identify and amass large, high-quality open datasets that will immensely benefit researchers…(More)”.

Announcing SPARROW: A Breakthrough AI Tool to Measure and Protect Earth’s Biodiversity in the Most Remote Places


Blog by Juan Lavista Ferres: “The biodiversity of our planet is rapidly declining. We’ve likely reached a tipping point where it is crucial to use every tool at our disposal to help preserve what remains. That’s why I am pleased to announce SPARROW—Solar-Powered Acoustic and Remote Recording Observation Watch, developed by Microsoft’s AI for Good Lab. SPARROW is an AI-powered edge computing solution designed to operate autonomously in the most remote corners of the planet. Solar-powered and equipped with advanced sensors, it collects biodiversity data—from camera traps, acoustic monitors, and other environmental detectors—that are processed using our most advanced PyTorch-based wildlife AI models on low-energy edge GPUs. The resulting critical information is then transmitted via low-Earth orbit satellites directly to the cloud, allowing researchers to access fresh, actionable insights in real time, no matter where they are. 

Think of SPARROW as a network of Earth-bound satellites, quietly observing and reporting on the health of our ecosystems without disrupting them. By leveraging solar energy, these devices can run for a long time, minimizing their footprint and any potential harm to the environment…(More)”.