Detail of Publication
Text Language | English |
---|---|
Authors | Rina Buoy, Masakazu Iwamura, Sovila Srun, Koichi Kise |
Title | Towards Reduced-Complexity Scene Text Recognition (RCSTR) Through A Novel Salient Feature Selection |
Journal | International Journal on Document Analysis and Recognition (IJDAR) |
Number of Pages | 14 pages |
Publisher | Springer |
Address | Berlin, Germany |
Reviewed or not | Reviewed |
Month & Year | May 2024 |
Abstract | The integration of an attention mechanism has played a crucial role in many recent scene text recognition (STR) methods. It enables the capture of spatial feature dependencies (known as self-attention) and the identification of relevant features while predicting a character (known as cross-attention). However, computations and memory requirements in the self-attention and cross-attention layers increase quadratically and linearly with the feature map size, respectively, leading to a computational bottleneck in low-resource environments. But, is it necessary to attend to the entire feature maps? On the other hand, text in a natural scene is continuous and oriented in a specific direction, and it does not occupy the entire image. Therefore, utilizing only a small salient subset of features in text regions is sufficient for accurately predicting characters. Based on this salient feature selection, we propose a reduced-complexity scene text recognition framework that significantly reduces model complexities and memory requirements in the self-attention and cross-attention layers. We validate the proposed framework by employing a convolutional STR architecture with both connectionist temporal classification and transformer decoders. Through the model complexity and performance analyses on public benchmark datasets, we demonstrate that the proposed method can substantially reduce model complexities while still maintaining reasonably robust recognition accuracy. |
DOI | 10.1007/s10032-024-00474-x |
- Entry for BibTeX
@Article{Buoy2024, author = {Rina Buoy and Masakazu Iwamura and Sovila Srun and Koichi Kise}, title = {Towards Reduced-Complexity Scene Text Recognition (RCSTR) Through A Novel Salient Feature Selection}, journal = {International Journal on Document Analysis and Recognition (IJDAR)}, year = 2024, month = may, numpages = {14}, DOI = {10.1007/s10032-024-00474-x}, publisher = {Springer}, address = {Berlin, Germany} }