TY - JOUR
T1 - A multiple-label guided clustering algorithm for historical document dating and localization
AU - Burgers, J.W.J.
AU - He, Sheng
AU - Samara, Petros
AU - Schomaker, L.R.B.
PY - 2016
Y1 - 2016
N2 - It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. Firstly, a novel descriptor, called Histogram of Orientations of Handwritten Strokes (HOHS or H2OS), is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the Multi-Label Self-Organizing Map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. Experimental results on the Medieval Paleographic Scale (MPS) data set demonstrate that our method achieves state-of-the-art results.
AB - It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. Firstly, a novel descriptor, called Histogram of Orientations of Handwritten Strokes (HOHS or H2OS), is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the Multi-Label Self-Organizing Map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. Experimental results on the Medieval Paleographic Scale (MPS) data set demonstrate that our method achieves state-of-the-art results.
U2 - 10.1109/TIP.2016.2602078
DO - 10.1109/TIP.2016.2602078
M3 - Article
SN - 1057-7149
VL - 25
SP - 5252
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 11
ER -