hwalsuklee

awesome-deep-text-detection-recognition

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.
Under Apache License 2.0
By hwalsuklee

deep-learning ocr awesome-list text-recognition text-detection awesome-lists ocr-recognition ocr-detection text-detection-recognition ocr-papers ocr-paper ocr-paper-list

awesome-deep-text-detection-recognition

A curated list of awesome deep learning based papers on text detection and recognition.



Text Detection


Conf. | Date | Title | IC13 | IC15 | Resources |
:---: | :---: |:--- | :---: | :---: | :---: |
'14-ECCV | 14/10/07 | Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees |
15-CVPR | 15/06/01 | Symmetry-based text line detection in natural scenes | 0.8043 | | PRJ CODE |
'16-TIP | 15/10/12 | Text-Attentional Convolutional Neural Networks for Scene Text Detection | 0.8165 |
'15-ICCV | 15/12/13 | Text Flow : A Unified Text Detection System in Natural Scene Images |0.8025 |
'16-arXiv | 16/03/31 | Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork | 0.86 | |
'16-CVPR | 16/04/14 | Multi-Oriented Text Detection with Fully Convolutional Networks | 0.83 | 0.54 | *TORCH(M)
'16-CVPR | 16/04/22 | Synthetic Data for Text Localisation in Natural Images | 0.847 (L)0.8359 | | CODE DB
'16-arXiv | 16/06/29 | Scene Text Detection Via Holistic, Multi-Channel Prediction |0.8433 | 0.6477 |
'16-ECCV | 16/09/12 | Detecting Text in Natural Image with Connectionist Text Proposal Network | 0.8215 | 0.6085 | *CAFFE(M) CAFFE TF(M) TF DEMO BLOG(CH)
'17-AAAI | 16/11/21 |TextBoxes: A fast text detector with a single deep neural network | 0.85 (L)0.8767 | | *CAFFE(M) TF BLOG(KR)
'18-TM | 17/03/03 | Arbitrary-Oriented Scene Text Detection via Rotation Proposals | 0.9125 | 0.8020 | *CAFFE
'17-CVPR | 17/03/04 | Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection | | 0.7064
'17-CVPR | 17/03/19 | Detecting Oriented Text in Natural Images by Linking Segments | 0.853 | 0.75 (L)0.7636| *TF(M) TF(M) SLIDE VIDEO |
'17-arXiv | 17/03/24 | Deep Direct Regression for Multi-Oriented Scene Text Detection | 0.86 | 0.81 |
'17-arXiv | 17/04/03 | Cascaded Segmentation-Detection Networks for Word-Level Text Spotting | 0.86 | 0.71 |
'17-CVPR | 17/04/11 | EAST: An Efficient and Accurate Scene Text Detector | | 0.8072 (L)0.8038 | TF(M) TF PYTORCH(M) PYTORCH DEMO KERAS(M) VIDEO
'17-ICIP | 17/05/15 | WordFence: Text Detection in Natural Images with Border Awareness | 0.86 |
'17-arXiv | 17/06/30 | R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection | 0.8773 | 0.8254 | TF(M) CAFFE(M)
'17-CVPR | 17/07/21 | Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild | 0.85 | 0.63 |
'17-arXiv | 17/08/17 | Deep Scene Text Detection with Connected Component Proposals | 0.919 |
'17-ICCV | 17/08/22 | WordSup: Exploiting Word Annotations for Character based Text Detection | 0.9064 | 0.7816 |
'17-ICCV | 17/09/01 | Single Shot Text Detector with Regional Attention | 0.8704 | 0.7691 | *CAFFE(M) PYTORCH VIDEO
'17-arXiv | 17/09/11 | Fused Text Segmentation Networks for Multi-oriented Scene Text Detection | | 0.8414 |
'17-ICCV | 17/10/13 | WeText: Scene Text Detection under Weak Supervision | 0.869 (L)0.8313 |
'17-ICCV | 17/10/22 | Self-organized Text Detection with Minimal Post-processing via Border Learning | 0.84 | | *KERAS(M)
'17-ICDAR | 17/11/11 | Deep Residual Text Detection Network for Scene Text | 0.9117 (L)0.8925 |
'18-AAAI | 17/11/12 | Feature Enhancement Network: A Refined Scene Text Detector | 0.9161 |
'17-arXiv | 17/11/30 | ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene | | 0.759 |
'18-AAAI | 18/01/04 | PixelLink: Detecting Scene Text via Instance Segmentation | 0.881 | 0.8519 | *TF(M) TF
'18-CVPR | 18/01/05 | FOTS: Fast Oriented Text Spotting with a Unified Network | 0.925 | 0.8984 | PYTORCH PYTORCH VIDEO |
'18-TIP | 18/01/09 | TextBoxes++: A Single-Shot Oriented Scene Text Detector | 0.88 | 0.829 (L)0.8475 | *CAFFE(M)
'18-CVPR | 18/02/27 | Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation | 0.88 | 0.843 |*PYTORCH(M)
'18-CVPR | 18/03/09 | An end-to-end TextSpotter with Explicit Alighment and Attention | 0.9 | 0.87 |*CAFFE(M)
'18-CVPR | 18/03/14 | Rotation-Sensitive Regression for Oriented Scene Text Detection | 0.89 | 0.838 | *CAFFE(M)
'18-arXiv | 18/04/08 | Detecting Multi-Oriented Text with Corner-based Region Proposals | 0.876 | 0.845 | *CAFFE(M)
'18-arXiv | 18/04/24 | An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches | 0.92 | 0.86 |
'18-IJCAI | 18/05/03 | IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection | | 0.9047 |
'18-arXiv | 18/06/07 | Shape Robust Text Detection with Progressive Scale Expansion Network | | 0.8721 | PRJ
'18-ECCV | 18/07/04 | TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes | | 0.826 | PYTORCH
'18-ECCV | 18/07/06 | Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes | 0.917 | 0.86 |
'18-ECCV | 18/07/10 | Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping | 0.892 |
'19-AAAI | 18/11/21 | Scene Text Detection with Supervised Pyramid Context Network | 0.921 | 0.872 |
'19-TIP | 18/12/04 | TextField: Learning A Deep Direction Field for Irregular Scene Text Detection | | 0.824 | *CAFFE(M)
'19-CVPR | 19/03/21 | Towards Robust Curve Text Detection with Conditional Spatial Expansion | | | |
'19-CVPR | 19/03/28 | Shape Robust Text Detection with Progressive Scale Expansion Network | | 0.857 | TF(M)
'19-CVPR | 19/04/03 | Character Region Awareness for Text Detection | 0.952 | 0.869 |*PYTORCH(M) VIDEO PYTORCH TF(M) KERAS BLOG_CH BLOG_KR BLOG_KR BLOG_KR|
'19-CVPR | 19/04/13 | Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled | | 0.877 | |
'19-CVPR | 19/06/16 | Learning Shape-Aware Embedding for Scene Text Detection | | 0.877 | |
'19-CVPR | 19/06/16 | Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation | 0.917 | 0.876 | |
'19-ICCV | 19/08/16 | Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network | | 0.829 | |
'19-ICCV | 19/09/02 | Geometry Normalization Networks for Accurate Scene Text Detection | | 0.8852 | |
'19-AAAI | 19/11/20 | Real-time Scene Text Detection with Differentiable Binarization | | 0.847 | |


Text Recognition


Conf. | Date | Title | SVT | IIIT5k | IC03 | IC13 | Resources |
:---: | :---: |:--- | :---: | :---: | :---: | :---: | :---: |
'15-ICLR | 14/12/18 | Deep structured output learning for unconstrained text recognition | 0.717 | |0.896 | 0.818 | TF SLIDE VIDEO
'16-IJCV | 15/05/07 | Reading text in the wild with convolutional neural networks | 0.807 | | 0.933 | 0.908 | KERAS
'16-AAAI | 15/06/14 | Reading Scene Text in Deep Convolutional Sequences
'17-TPAMI | 15/07/21 | An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition | 0.808 | 0.782 | 0.894 | 0.867 | TORCH(M) TF TF TF TF PYTORCH PYTORCH(M) BLOG(KR)
'16-CVPR | 16/03/09 | Recursive Recurrent Nets with Attention Modeling for OCR in the Wild | 0.807 | 0.784 | 0.887 | 0.9 |
'16-CVPR | 16/03/12 | Robust scene text recognition with automatic rectification | 0.819 | 0.819 | 0.901 | 0.886 | PYTORCH PYTORCH
'16-CVPR | 16/06/27 | CNN-N-Gram for Handwriting Word Recognition | 0.8362 | | | | VIDEO
'16-BMVC | 16/09/19 | STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition | 0.836 | 0.833 | 0.899 | 0.891 |
'17-arXiv | 17/07/27 | STN-OCR: A single Neural Network for Text Detection and Text Recognition | 0.798 | 0.86 | | 0.903 | *MXNET(M) PRJ BLOG
'17-IJCAI | 17/08/19 | Learning to Read Irregular Text with Attention Mechanisms |
'17-arXiv | 17/09/06 | Scene Text Recognition with Sliding Convolutional Character Models | 0.765 | 0.816 | 0.845 | 0.852 |
'17-ICCV | 17/09/07 | Focusing Attention: Towards Accurate Text Recognition in Natural Images | 0.859 | 0.874 | 0.942 | 0.933 |
'18-CVPR | 17/11/12 | AON: Towards Arbitrarily-Oriented Text Recognition | 0.828 |0.87 | 0.915 ||TF
'17-NIPS | 17/12/04 | Gated Recurrent Convolution Neural Network for OCR | 0.815 | 0.808 | 0.978 | | *TORCH(M)
'18-AAAI | 18/01/04 | Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition | 0.844 | 0.836 | 0.915 | 0.908 |
'18-AAAI | 18/01/04 | SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network | | 0.87 | 0.931 | 0.929 |
'18-CVPR | 18/05/09 | Edit Probability for Scene Text Recognition | 0.875 | 0.883 | 0.946 | 0.944 |
'18-TPAMI | 18/06/25 | ASTER: An Attentional Scene Text Recognizer with Flexible Rectification | 0.936 | 0.934 | 0.945 | 0.918 | *TF(M) PYTORCH
'18-ECCV | 18/09/08 | Synthetically Supervised Feature Learning for Scene Text Recognition | 0.871 | 0.894 | 0.947 | 0.94 |
'19-AAAI | 18/09/18 | Scene Text Recognition from Two-Dimensional Perspective | 0.821 | 0.92 | | 0.914 |
'19-AAAI | 18/11/02 | Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition | 0.845 | 0.915 | | 0.91 | *TORCH(M)
'19-CVPR | 18/12/14 | ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification | 0.902 | 0.933 | | 0.913 | PRJ
'19-PR | 19/01/10 | MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition | 0.883 | 0.912 | 0.950 | 0.924 | *PYTORCH(M)
'19-ICCV | 19/04/03 | What is wrong with scene text recognition model comparisons? dataset and model analysis | 0.875 | | 0.949 | 0.936 | *PYTORCH(M) BLOG_KR
'19-CVPR | 19/04/18 | Aggregation Cross-Entropy for Sequence Recognition | 0.826 | 0.823 | 0.921 | 0.897 | *PYTORCH |
'19-CVPR | 19/06/16 | Sequence-to-Sequence Domain Adaptation Network for Robust Text Image Recognition | 0.845 | 0.838 | 0.921 | 0.918 | |
'19-ICCV | 19/08/06 | Symmetry-constrained Rectification Network for Scene Text Recognition | 0.889 | 0.944 | 0.95 | 0.939 |
'20-AAAI | 19/12/28 | TextScanner: Reading Characters in Order for Robust Scene Text Recognition | 0.895 | 0.926 | | 0.925 |
'20-AAAI | 19/12/21 | Decoupled Attention Network for Text Recognition | 0.892 | 0.943 | 0.95 | 0.939 | *PYTORCH(M)
'20-AAAI | 20/02/04 | GTC: Guided Training of CTC | 0.929 | 0.955 | 0.952 | 0.943 |


End-to-End Text Recognition


Conf. | Date | Title | IC03 | IC13 | IC15 | Resources |
:---: | :---: |:--- | :---: | :---: | :---: | :---: |
'12-ICPR | 12/11/11 | End-to-end text recognition with convolutional neural networks | 0.67 | | | *CODE
'14-ECCV | 14/09/06 | Deep Features for Text Spotting | 0.75 | | | PRJ MATLAB
'15-IJCV | 15/05/07 | Reading Text in the Wild with Convolutional Neural Networks | 0.70 | 0.77 | | KERAS
'15-TPAMI | 15/10/30 | Real-time Lexicon-free Scene Text Localization and Recognition | | 0.542 | 0.156 |
'16-arXiv | 16/04/10 | TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild | | 0.6843 | 0.4718 (L)0.533 | *CAFFE(M)
'17-AAAI | 16/11/21 | TextBoxes: A fast text detector with a single deep neural network | | 0.84 | | TF *CAFFE(M) BLOG_KR
'17-ICCV | 17/07/13 | Towards End-to-end Text Spotting with Convolution Recurrent Neural Network | | 0.8459 | | VIDEO
'17-ICCV | 17/10/22 | Deep TextSpotter An End-to-End Trainable Scene Text Localization and Recognition Framework | | 0.77 | 0.47 | VIDEO *CAFFE(M)
'18-CVPR | 18/01/05 | FOTS: Fast Oriented Text Spotting with a Unified Network | | 0.8477 | 0.6533 | VIDEO TF(M)
'18-TIP | 18/01/09 | TextBoxes++: A Single-Shot Oriented Scene Text Detector | | 0.8465 | 0.519 | *CAFFE(M)
'18-CVPR | 18/03/09 | An end-to-end TextSpotter with Explicit Alignment and Attention | | 0.86 | 0.63 | *CAFFE(M)
'18-TPAMI | 18/06/25 | ASTER: An Attentional Scene Text Recognizer with Flexible Rectification | | | 0.64 | *TF(M)
'18-ECCV | 18/07/06 | Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes | | 0.865 | 0.624 |
'19-ICCV | 19/08/24 | Towards Unconstrained End-to-End Text Spotting | | | 0.6994 | BLOG_KR
'19-ICCV | 19/10/17 | Convolutional Character Networks | | | 0.7108 | *PYTORCH(M)
'19-ICCV | 19/10/27 | TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting | | | 0.6537 |
'20-AAAI | 19/11/21 | All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting | | 0.841 | 0.641 |
'20-AAAI | 20/02/12 | Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting | | 0.858 | 0.651 |


Others


Conf. | Date | Title | Description | Resources |
:---: | :---: |:--- | :---: | :---: |
'14-NIPS | 14/06/09 | Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition | Dataset | PRJ
'17-ECCV | 17/02/13 | End-to-End Interpretation of the French Street Name Signs Dataset | Dataset (FSNS) | *TF(M)
'17-arXiv | 17/04/11 | Attention-based Extraction of Structured Information from Street View Imagery | FSNS | *TF(M) TF TF LUA BLOG_KR
'17-CVPR | 17/07/21 | Unambiguous Text Localization and Retrieval for Cluttered Scenes | Text Retrieval
'17-AAAI | 17/10/22 | Detection and Recognition of Text Embedded in Online Images via Neural Context Models | Dataset | PRJ
'18-CVPR | 17/11/17 | Separating Style and Content for Generalized Style Transfer | Font Style
'17-arXiv | 17/12/06 | Detecting Curve Text in the Wild New Dataset and New Solution | Dataset (CTW 1500) | PRJ
'18-AAAI | 17/12/14 | SEE: Towards Semi-Supervised End-to-End Scene Text Recognition | FSNS | PRJ *CHAINER(M)
'17-CVPR | 18/06/07 | Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks | Document Layout | PRJ
'18-CVPR | 18/06/19 | DocUNet: Document Image Unwarping via A Stacked U-Net | Document Dewarping | PRJ
'18-CVPR | 18/06/19 | Document Enhancement using Visibility Detection | Document Enhancement | PRJ
'18-IJCAI | 18/06/22 | Multi-Task Handwritten Document Layout Analysis | Document Layout
'18-ECCV | 18/07/09 | Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes | Dataset | PRJ
'19-AAAI | 18/12/03 | EnsNet: Ensconce Text in the Wild | Text Removal | DB
'19-CVPR | 18/12/14 | Spatial Fusion GAN for Image Synthesis | Dataset | DB
'19-AAAI | 19/01/27 | Hierarchical Encoder with Auxiliary Supervision for Table-to-text Generation: Learning Better Representation for Tables | TableToText |
'19-AAAI | 19/01/27 | A Radical-aware Attention-based Model for Chinese Text Classification | Chinese Character Classification |
'19-CVPR | 19/02/25 | Handwriting Recognition in Low-resource Scripts using Adversarial Learning | Handwritting Recognition | TF
'19-CVPR | 19/03/27 | Tightness-aware Evaluation Protocol for Scene Text Detection | Evaluation | CODE
'19-ICCV | 19/05/31 | Scene Text Visual Question Answering | Dataset | ICDAR_DB
'19-CVPR | 19/06/16 | DynTypo: Example-based Dynamic Text Effects Transfer | Text Effects | PRJ VIDEO
'19-CVPR | 19/06/16 | Typography with Decor: Intelligent Text Style Transfer | Text Effects | *PYTORCH(M)
'19-CVPR | 19/06/16 | An Alternative Deep Feature Approach to Line Level Keyword Spotting | Kyeword Spotting
'19-ICCV | 19/07/23 | GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition | Domain Adaptation |
'19-ICCV | 19/09/17 | Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning | Dataset | ICDAR_DB
'19-ICCV | 19/10/02 | Large-scale Tag-based Font Retrieval with Generative Feature Learning | Font Retrieval |
'19-ICCV | 19/10/27 | TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts | Place Recognition | DB
'19-ICCV | 19/10/27 | DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks | Document Dewarping | *PYTORCH(M)


Other lists

Tutorial Materials

Acknowledgment