Japanese / English

Detail of Publication

Text Language English
Authors Rina Buoy, Masakazu Iwamura, Sovila Srun & Koichi Kise
Title PARSTR: Partially Autoregressive Scene Text Recognition
Journal International Journal on Document Analysis and Recognition (IJDAR)
Number of Pages 14 pages
Publisher Springer
Address Berlin, Germany
Location Berlin, Germany
Reviewed or not Reviewed
Month & Year May 2024
Abstract An autoregressive (AR) decoder for scene text recognition (STR) requires numerous generation steps to decode a text image character by character but can yield high recognition accuracy. On the other hand, a non-autoregressive (NAR) decoder generates all characters in a single generation but suffers from a loss of recognition accuracy. This is because, unlike the former, the latter assumes that the predicted characters are conditionally independent. This paper presents a Partially Autoregressive Scene Text Recognition (PARSTR) method that unifies both AR and NAR decoding within the same model. To reduce decoding steps while maintaining recognition accuracy, we devise two decoding strategies: b-first and b-ahead, reducing the decoding steps to approximately b and by a factor of b, respectively. The experimental results demonstrate that our PARSTR models using the devised decoding strategies present a balanced compromise between efficiency and recognition accuracy compared to the fully AR and NAR decoding approaches. Specifically, the experimental results on public benchmark STR datasets demonstrate the potential to reduce decoding steps down to at most five steps and by a factor of five under the b-first and b-ahead decoding schemes, respectively, while having a slight reduction of total word recognition accuracy of less than or equal to 0.5%.
DOI 10.1007/s10032-024-00470-1
Back to list