Explore our articles
artificial intelligence
Share:

Enriching Unstructured Cultural Heritage Data Using NLP

Article by Madison Leeson: “Cultural heritage researchers often have to sift through a mountain of data related to the cultural items they study, including reports, museum records, news, and databases. The information in these sources contains a significant amount of unstructured and semi-structured data, including ownership histories (‘provenance’), object descriptions, and timelines, which presents an opportunity to leverage automated systems. Recognising the scale and importance of the issue, researchers at the Italian Institute of Technology’s Centre for Cultural Heritage Technology have fine-tuned three natural language processing (NLP) models to distill key information from these unstructured texts. This was performed within the scope of the EU-funded RITHMS project, which has built a digital platform for law enforcement to trace illicit cultural goods using social network analysis (SNA). The research team aimed to fill the critical gap: how do we transform complex textual records into clean, structured, analysable data?

The paper introduces a streamlined pipeline to create custom, domain-specific datasets from textual heritage records, then trained and fine-tuned NLP models (derived from spaCy) to perform named entity recognition (NER) on challenging inputs like provenance, museum registries, and records of stolen and missing art and artefacts. It evaluates zero-shot models such as GLiNER, and employs Meta’s Llama3 (8B) to bootstrap high-quality annotations, minimising the need for manual labelling of the data. The result? Fine-tuned transformer models (especially on provenance data) significantly outperformed out-of-the-box models, highlighting the power of small, curated training sets in a specialised domain…(More)

Share
How to contribute:

Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?

Share it with us at info@thelivinglib.org so that we can add it to the Collection!

About the Curator

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday

Related articles

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday