Japanese / English

Detail of Publication

Text Language English
Authors Rina Buoy, Masakazu Iwamura, Sovila Srun & Koichi Kise
Title Towards Reduced-Complexity Scene Text Recognition (RCSTR) Through A Novel Salient Feature Selection
Journal International Journal on Document Analysis and Recognition (IJDAR)
Number of Pages 14 pages
Publisher Springer
Address Berlin, Germany
Location Berlin, Germany
Reviewed or not Reviewed
Month & Year May 2024
Abstract The integration of an attention mechanism has played a crucial role in many recent scene text recognition (STR) methods. It enables the capture of spatial feature dependencies (known as self-attention) and the identification of relevant features while predicting a character (known as cross-attention). However, computations and memory requirements in the self-attention and cross-attention layers increase quadratically and linearly with the feature map size, respectively, leading to a computational bottleneck in low-resource environments. But, is it necessary to attend to the entire feature maps? On the other hand, text in a natural scene is continuous and oriented in a specific direction, and it does not occupy the entire image. Therefore, utilizing only a small salient subset of features in text regions is sufficient for accurately predicting characters. Based on this salient feature selection, we propose a reduced-complexity scene text recognition framework that significantly reduces model complexities and memory requirements in the self-attention and cross-attention layers. We validate the proposed framework by employing a convolutional STR architecture with both connectionist temporal classification and transformer decoders. Through the model complexity and performance analyses on public benchmark datasets, we demonstrate that the proposed method can substantially reduce model complexities while still maintaining reasonably robust recognition accuracy.
DOI 10.1007/s10032-024-00474-x
Back to list