Session: Next step: K1 (S100)
Since the foundation of the KB, National Library of the Netherlands in 1798 our cataloguers describe publications manually. Due to the increasing amount of born digital publications and the rise of artificial intelligence (AI) applications, we might now be able to train the computer to assist our cataloguers in assigning keywords to publications.
In this paper we describe the results of our first experiments with automated generation of metadata, as described in the whitepaper 'Exploring possibilities automated metadata generation' (Kleppe et al 2019). Since a year, a group of KB colleagues from different departments formed a research group and teamed up with academic researchers with different expertise. Together we explored several tools that suggest keywords from a controlled vocabulary that is being used at the KB since 1885.
We used a train/test set of 18k/5k dissertations from six Dutch universities to compare the performance of several algorithms. Since we aim to assist cataloguers to find the right keywords rather than fully automate the process, we focused on recall scores: if the algorithm outputs a list of twenty possible keywords, are the right ones among them?
We present the initial promising results, that led us to further experiment with the tool 'Annif' on different types of data. We briefly describe experiments we did on digitized early-modern legislative texts and elaborate on our current focus on e-books. Although we hope to improve performance of these algorithms, we believe they should merely facilitate the work of cataloguers, rather than fully replacing them. The human perspective, expertise and skill will remain necessary for guaranteeing the quality that we as the KB represent.
Martijn Kleppe1, Sara Veldhoen2, Thomas Haighton2, Oudsten Brigitte den2
1KB, National Library of the Netherlands, Research, The Hague, Niederlande, 2KB, National Library of the Netherlands, The Hague, Niederlande
Speakers: Martijn Kleppe Sara Veldhoen Thomas Haighton Oudsten Brigitte den