BookReconciler: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering
In many settings, people work with only minimal bibliographic metadata, often just a book’s title and author (e.g., “The Book of Salt” by “Monique Truong”). How can we enrich and cluster minimal bibliographic metadata, especially at scale? This presentation will introduce BookReconciler, an open-source tool for metadata enrichment and Work-level clustering of bibliographic data. Built as an extension for OpenRefine, BookReconciler allows users to match minimal metadata—such as title and author—to authoritative identifiers from services including the Library of Congress, VIAF, OCLC, HathiTrust, Google Books, and Wikidata, as well as cluster related manifestations. This integration makes it easier to combine related datasets, digital library collections, and humanities corpora, and to expand and analyze the data at scale. Our tool is designed around a human-in-the-loop workflow that enables users to evaluate matches and define the contours of a Work through an interactive interface. We evaluate BookReconciler on U.S. prize-winning books and contemporary world fiction. The tool achieves near-perfect accuracy for U.S. works but lower performance for global texts, reflecting structural weaknesses in bibliographic infrastructures for non-English and global texts. By bridging library-linked data practices with humanities research workflows, BookReconciler offers a practical, extensible solution for improving bibliographic metadata and Work-level integration.