The Open-Source Movement Comes to Medical Datasets

Blog by Edmund L. Andrews: “In a move to democratize research on artificial intelligence and medicine, Stanford’s Center for Artificial Intelligence in Medicine and Imaging (AIMI) is dramatically expanding what is already the world’s largest free repository of AI-ready annotated medical imaging datasets.

Artificial intelligence has become an increasingly pervasive tool for interpreting medical images, from detecting tumors in mammograms and brain scans to analyzing ultrasound videos of a person’s pumping heart.

Many AI-powered devices now rival the accuracy of human doctors. Beyond simply spotting a likely tumor or bone fracture, some systems predict the course of a patient’s illness and make recommendations.

But AI tools have to be trained on expensive datasets of images that have been meticulously annotated by human experts. Because those datasets can cost millions of dollars to acquire or create, much of the research is being funded by big corporations that don’t necessarily share their data with the public.

“What drives this technology, whether you’re a surgeon or an obstetrician, is data,” says Matthew Lungren, co-director of AIMI and an assistant professor of radiology at Stanford. “We want to double down on the idea that medical data is a public good, and that it should be open to the talents of researchers anywhere in the world.”

Launched two years ago, AIMI has already acquired annotated datasets for more than 1 million images, many of them from the Stanford University Medical Center. Researchers can download those datasets at no cost and use them to train AI models that recommend certain kinds of action.

Now, AIMI has teamed up with Microsoft’s AI for Health program to launch a new platform that will be more automated, accessible, and visible. It will be capable of hosting and organizing scores of additional images from institutions around the world. Part of the idea is to create an open and global repository. The platform will also provide a hub for sharing research, making it easier to refine different models and identify differences between population groups. The platform can even offer cloud-based computing power so researchers don’t have to worry about building local resource intensive clinical machine-learning infrastructure….(More)”.