Signmap: Providing an Inventory of the Scholarly Objects Managed by a Repository

Herbert van de Sompel, Patrick Hochstenbach, Michael L. Nelson, Martin Klein, Enno Meijers

Research output: Other contributionScientific

Abstract

Historically, repositories have provided an inventory of the scholarly objects they manage by making descriptive metadata available via an OAI-PMH machine interface. But, differing approaches in providing metadata can make it hard for harvesting applications to unambiguously determine where an object's content files reside, what it's persistent identifier is, etc. Recent innovations in metadata formats, exemplified by RIOXX version 3, have significantly improved on the status quo.

Over time, repositories have also started to publish an inventory using the Sitemaps Protocol, which has been the dominant approach to help web crawlers find a server's resources since 2009. In a typical repository implementation, for each scholarly object managed by a repository, the Sitemap has an entry that provides the object's landing page URL. Given a landing page URL and the HTML that is available there, a crawler can attempt to discover the URLs of other resources associated with each scholarly object, e.g. metadata resources, content resources, persistent identifier. For the longest time, this has been a laborious heuristic-bound task. Support for FAIR Signposting removes uncertainty by providing distinct typed links on the landing page that make discovering the constituent resources of a scholarly object unambiguous. This significantly simplifies the task for any bot that interacts with landing pages, including crawlers intent on collecting all resources associated with each repository object. But if a crawler is only interested in, for example, PDF content resources or BIBTEX metadata resources, it must visit the landing page URL of each object and, by checking the appropriate typed links, determine whether any of the linked resources meet its scope.

Signmaps, specified in this document, leverage the convenience of the long-established Sitemaps Protocol and extend it with the ability to associate Signposting links with each landing page URL listed in a Sitemap. As such, Signmaps allow crawlers to discover URLs of resources that meet their scope without having to visit each landing page URL.
Original languageEnglish
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Signmap: Providing an Inventory of the Scholarly Objects Managed by a Repository'. Together they form a unique fingerprint.

Cite this