A multiple-label guided clustering algorithm for historical document dating and localization

J.W.J. Burgers, Sheng He, Petros Samara, L.R.B. Schomaker

Research output: Contribution to journal/periodicalArticleScientificpeer-review

Abstract

It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. Firstly, a novel descriptor, called Histogram of Orientations of Handwritten Strokes (HOHS or H2OS), is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the Multi-Label Self-Organizing Map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. Experimental results on the Medieval Paleographic Scale (MPS) data set demonstrate that our method achieves state-of-the-art results.
Original languageEnglish
Pages (from-to)5252
Number of pages5256
JournalIEEE Transactions on Image Processing
Volume25
Issue number11
DOIs
Publication statusPublished - 2016

Fingerprint

Dive into the research topics of 'A multiple-label guided clustering algorithm for historical document dating and localization'. Together they form a unique fingerprint.

Cite this