Spanning Today’s Chasms: Seven Steps to Building Trusted Data Intermediaries

James Shulman at the Mellon Foundation: “In 2001, when hundreds of individual colleges and universities were scrambling to scan their slide libraries, The Andrew W. Mellon Foundation created a new organization, Artstor, to assemble a massive library of digital images from disparate sources to support teaching and research in the arts and humanities.

Rather than encouraging—or paying for—each school to scan its own slide of the Mona Lisa, the Mellon Foundation created an intermediary organization that would balance the interests of those who created, photographed and cared for art works, such as artists and museums, and those who wanted to use such images for the admirable calling of teaching and studying history and culture.  This organization would reach across the gap that separated these two communities and would respect and balance the interests of both sides, while helping each accomplish their missions.  At the same time that Napster was using technology to facilitate the un-balanced transfer of digital content from creators to users, the Mellon Foundation set up a new institution aimed at respecting the interests of one side of the market and supporting the socially desirable work of the other.

As the internet has enabled the sharing of data across the world, new intermediaries have emerged as entire platforms. A networked world needs such bridges—think Etsy or Ebay sitting between sellers and buyers, or Facebook sitting between advertisers and users. While intermediaries that match sellers and buyers of things provide a marketplace to bridge from one side or the other, aggregators of data work in admittedly more shadowy territories.

In the many realms that market forces won’t support, however, a great deal of public good can be done by aggregating and managing access to datasets that might otherwise continue to live in isolation. Whether due to institutional sociology that favors local solutions, the technical challenges associated with merging heterogeneous databases built with different data models, intellectual property limitations, or privacy concerns, datasets are built and maintained by independent groups that—if networked—could be used to further each other’s work.

Think of those studying coral reefs, or those studying labor practices in developing markets, or child welfare offices seeking to call upon court records in different states, or medical researchers working in different sub-disciplines but on essentially the same disease.  What intermediary invests in joining these datasets?  Many people assume that computers can simply “talk” to each other and share data intuitively, but without targeted investment in connecting them, they can’t.  Unlike modern databases that are now often designed with the cloud in mind, decades of locally created databases churn away in isolation, at great opportunity cost to us all.

Art history research is an unusually vivid example. Most people can understand that if you want to study Caravaggio, you don’t want to hunt and peck across hundreds of museums, books, photo archives, libraries, churches, and private collections.  You want all that content in one place—exactly what Mellon sought to achieve by creating Artstor.

What did we learn in creating Artstor that might be distilled as lessons for others taking on an aggregation project to serve the public good?….(More)”.