Japanese / English

文献の詳細

論文の言語 英語
著者 Rina Buoy, Sovisal Chenda, Nguonly Taing, Marry Kong, Masakazu Iwamura, and Koichi Kise
論文名 Addressing the Attention Drift Problem for Khmer Long Textline Recognition
論文誌名 International Journal on Document Analysis and Recognition (IJDAR)
出版社 Springer
出版社の住所 Berlin, Germany
査読の有無
年月 2025年9月
要約 An autoregressive (AR) decoder generates one character or token at a time, based on previous outputs (i.e., context). This approach inherently includes an internal language model, which can lead to higher recognition accuracy compared to a non-autoregressive (NAR) decoder. However, this internal language model can be a double-edged sword. For long textline inputs, such as Khmer textline images lacking word boundaries, the autoregressive (AR) decoding process can accumulate errors at each step. Once a critical error threshold is reached, the decoder may struggle to align the corresponding visual features with the decoded outputs. This misalignment, known as the attention drift problem, arises because many AR decoders rely on attention mechanisms for alignment. In this paper, we propose a robust Khmer textline recognition method based on a partially autoregressive (PAR) decoder. Our method effectively addresses the attention drift problem when dealing with long textline inputs and thus maintains high recognition accuracy. The experimental results demonstrate that the proposed method significantly improves recognition accuracy for Khmer long textlines by mitigating the attention drift issue. Compared to existing state-of-the-art (SoTA) methods, our approach establishes new SoTA performance. Further experiments on the Latin handwritten datasets, IAM and RIMES, validate the robustness of the proposed method.
DOI 10.1007/s10032-025-00554-6
一覧に戻る