TY - CHAP
T1 - The Memento Tracer Framework
T2 - Balancing Quality and Scalability for Web Archiving
AU - Klein, Martin
AU - Shankar, Harihar
AU - Balakireva, Lyudmila
AU - Sompel, Herbert Van de
N1 - Accepted for publication at TPDL 2019
PY - 2019/9/10
Y1 - 2019/9/10
N2 - Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the Internet Archive that are optimized for scale. Human driven services such as the Webrecorder tool provide high-quality archival captures but are not optimized to operate at scale. We introduce the Memento Tracer framework that aims to balance archival quality and scalability. We outline its concept and architecture and evaluate its archival quality and operation at scale. Our findings indicate quality is on par or better compared against established archiving frameworks and operation at scale comes with a manageable overhead.
AB - Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the Internet Archive that are optimized for scale. Human driven services such as the Webrecorder tool provide high-quality archival captures but are not optimized to operate at scale. We introduce the Memento Tracer framework that aims to balance archival quality and scalability. We outline its concept and architecture and evaluate its archival quality and operation at scale. Our findings indicate quality is on par or better compared against established archiving frameworks and operation at scale comes with a manageable overhead.
KW - cs.DL
U2 - 10.1007/978-3-030-30760-8_15
DO - 10.1007/978-3-030-30760-8_15
M3 - Chapter
VL - 1909.04404
T3 - arXiv
SP - 163
EP - 176
BT - Digital Libraries for Open Knowledge
CY - Oslo
ER -