Abstract
The Internet Archive pioneered web archiving and remains the largest publicly accessible web archive hosting archived copies of web pages (Mementos) going back as far as early 1996. Its holdings have grown steadily since, and it hosts more than 881 billion URIs as of September 2019. However, the landscape of web archiving has changed significantly over the last two decades. Today we can freely access Mementos from more than 20 web archives around the world, operated by for-profit and nonprofit organisations, national libraries and academic institutions, as well as individuals. The resulting diversity improves the odds of the survival of archived records but also requires technical standards to ensure interoperability between archival systems. To date, the Memento Protocol and the WARC file format are the main enablers of interoperability between web archives. We describe a variety of tools and services that leverage the broad adoption of the Memento Protocol and discuss a selection of research efforts that would likely not have been possible without these interoperability standards. In addition, we outline examples of technical specifications that build on the ability of machines to access resource versions on the Web in an automatic, standardised and interoperable manner.
Original language | English |
---|---|
Title of host publication | The Past Web: Exploring Web Archives |
Editors | Daniel Gomes, Elena Demidova, Jane Winters, Thomas Risse |
Place of Publication | Cham |
Publisher | Springer International Publishing AG |
Pages | 101-126 |
Number of pages | 26 |
ISBN (Electronic) | 978-3-030-63291-5 |
ISBN (Print) | 978-3-030-63290-8 |
DOIs | |
Publication status | Published - 2021 |