文献の詳細
論文の言語 | 英語 |
---|---|
著者 | Rina Buoy, Sovisal Chenda, Nguonly Taing, Marry Kong, Masakazu Iwamura, and Koichi Kise |
論文名 | Addressing the Attention Drift Problem for Khmer Long Textline Recognition |
論文誌名 | International Journal on Document Analysis and Recognition (IJDAR) |
出版社 | Springer |
出版社の住所 | Berlin, Germany |
査読の有無 | 有 |
年月 | 2025年9月 |
要約 | An autoregressive (AR) decoder generates one character or token at a time, based on previous outputs (i.e., context). This approach inherently includes an internal language model, which can lead to higher recognition accuracy compared to a non-autoregressive (NAR) decoder. However, this internal language model can be a double-edged sword. For long textline inputs, such as Khmer textline images lacking word boundaries, the autoregressive (AR) decoding process can accumulate errors at each step. Once a critical error threshold is reached, the decoder may struggle to align the corresponding visual features with the decoded outputs. This misalignment, known as the attention drift problem, arises because many AR decoders rely on attention mechanisms for alignment. In this paper, we propose a robust Khmer textline recognition method based on a partially autoregressive (PAR) decoder. Our method effectively addresses the attention drift problem when dealing with long textline inputs and thus maintains high recognition accuracy. The experimental results demonstrate that the proposed method significantly improves recognition accuracy for Khmer long textlines by mitigating the attention drift issue. Compared to existing state-of-the-art (SoTA) methods, our approach establishes new SoTA performance. Further experiments on the Latin handwritten datasets, IAM and RIMES, validate the robustness of the proposed method. |
DOI | 10.1007/s10032-025-00554-6 |
- BibTeX用エントリー
@Article{Buoy2025, author = {Rina Buoy and Sovisal Chenda and Nguonly Taing and Marry Kong and Masakazu Iwamura and Koichi Kise}, title = {Addressing the Attention Drift Problem for Khmer Long Textline Recognition}, journal = {International Journal on Document Analysis and Recognition (IJDAR)}, year = 2025, month = sep, DOI = {10.1007/s10032-025-00554-6}, publisher = {Springer}, address = {Berlin, Germany} }