跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/06 19:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:Direselign Addis Tadesse
研究生(外文):DIRESELIGN ADDIS TADESSE
論文名稱:衣索比亞語的文字辨識利用深度學習法
論文名稱(外文):Ethiopic Text Recognition Using Deep Learning Approaches
指導教授:劉傳銘劉傳銘引用關係
指導教授(外文):LIU, CHUAN-MING
口試委員:王正豪陳震宇俞征武賴傳淇
口試委員(外文):WANG, JENQ-HAURCHEN, JEN-YEUYU, CHANG-WULAI, CHUAN-CHI
口試日期:2021-01-27
學位類別:博士
校院名稱:國立臺北科技大學
系所名稱:電資學院外國學生專班(iEECS)
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:英文
論文頁數:102
中文關鍵詞:offline handwriting recognitionscene text detectionscene text recognitionscene text readingconvolutional neural networkmultilayer perceptrontime-restricted self-attention encoder-decoderoffline character recognitionTransformergated convolutional neural networkoctave convolution
外文關鍵詞:offline handwriting recognitionscene text detectionscene text recognitionscene text readingconvolutional neural networkmultilayer perceptrontime-restricted self-attention encoder-decoderoffline character recognitionTransformergated convolutional neural networkoctave convolution
相關次數:
  • 被引用被引用:0
  • 點閱點閱:141
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
Text is a collection of words or letters which represent a language and one of the most significant innovations of humans. It plays a vital role in human life including communicating ideas and delivering information. These texts are found in different forms such as in handwritten, machine printed, and electronically editable forms. Texts used in handwritten and machine printed forms must be transcribed into machine editable text that is used for further study, such as text mining, pattern recognition, computer vision, and other applications.
For several decades, researchers have been studying text recognition system also known as optical character recognition (OCR) system. Currently, there are several proprietary systems which efficiently converts simple machine printed scanned image into editable text. However, these systems failed to recognize texts from camera capture natural images and handwritten scanned images. Because, texts found in natural images and handwritten images have a large variability appearance compared to machine printed scanned images. Besides, under resource scripts like Ethiopic script are also another challenge. To address this gap, we propose robust deep learning techniques to recognize text from scanned handwritten images and camera captured natural images.
This dissertation has two folds. The first focuses on offline handwriting Ethiopic text recognition and the second focuses on scene text detection and recognition. For offline handwriting text recognition, we propose two methods which recognize at character and word/text-line level. To recognize an isolated Ethiopic character, Convolutional Neural Network (CNN) and Multilayer Perceptron (MLP) methods are employed. Additionally, the effects of five optimizers, number of layers and structure are analyzed. The experiment results show that CM4 architecture with AdaGrad optimizer shows better recognition performance than others.
The second method uses Gated CNN and Transformer network to recognize Ethiopic text at word and text-line level from offline handwritten scanned images. Compared to conventional stack of CNN, stack of Gated CNN and CNN has better performance to extract features. In addition, Transformer network enhances the limitations of recurrent based networks by avoiding recursion. To train and test the proposed model, we prepare a word and text-line database. In addition, we introduce a semi-automatic labeling algorithm for word based database preparation. The experiment result of the proposed model shows promising result.
On the other hand, a Convolutional Neural Network (CNN) with bidirectional Long Short-Term Memory (LSTM) and Connectionist Temporal Classification (CTC) also known as Convolutional Recurrent Neural Network (CRNN) is used to recognize Ethiopic scene texts from cropped natural images. The architecture has three layers which are feature extraction layer using stack of CNN, prediction layer using LSTM and loss calculation and transcription layer using CTC. The experiment result shows a promising result. Besides, for bilingual scene text detection as well as for end-to-end scene text reading, octave based feature extractor and time-restricted self-attention encoder-decoder method is used. In the architecture, we use Feature Pyramid Network (FPN) with octave based ResNet-50 to extract features. The outputs from the feature extraction layer are used to detect text/non-text regions using Region Proposal Network (RPN). Finally, a time-restricted self-attention encoder-decoder method is employed to recognize texts from regions which have text. The experiment result shows that the detection performance on both languages doesn’t have a difference. However, the recognition performance shows that English words are better recognition performance than Amharic words.
To evaluate and train both the proposed models, we prepare appropriate databases (i.e. isolated offline handwritten Ethiopic character and numeric database, Ethiopic offline handwritten word and text-line database, real and synthetic bilingual scene text database). In addition, other well-known datasets including ICDAR2013, ICDAR2015, and Total-Text are employed for scene text detection and recognition. The experiment results show a better recognition performance than previously proposed methods.

Text is a collection of words or letters which represent a language and one of the most significant innovations of humans. It plays a vital role in human life including communicating ideas and delivering information. These texts are found in different forms such as in handwritten, machine printed, and electronically editable forms. Texts used in handwritten and machine printed forms must be transcribed into machine editable text that is used for further study, such as text mining, pattern recognition, computer vision, and other applications.
For several decades, researchers have been studying text recognition system also known as optical character recognition (OCR) system. Currently, there are several proprietary systems which efficiently converts simple machine printed scanned image into editable text. However, these systems failed to recognize texts from camera capture natural images and handwritten scanned images. Because, texts found in natural images and handwritten images have a large variability appearance compared to machine printed scanned images. Besides, under resource scripts like Ethiopic script are also another challenge. To address this gap, we propose robust deep learning techniques to recognize text from scanned handwritten images and camera captured natural images.
This dissertation has two folds. The first focuses on offline handwriting Ethiopic text recognition and the second focuses on scene text detection and recognition. For offline handwriting text recognition, we propose two methods which recognize at character and word/text-line level. To recognize an isolated Ethiopic character, Convolutional Neural Network (CNN) and Multilayer Perceptron (MLP) methods are employed. Additionally, the effects of five optimizers, number of layers and structure are analyzed. The experiment results show that CM4 architecture with AdaGrad optimizer shows better recognition performance than others.
The second method uses Gated CNN and Transformer network to recognize Ethiopic text at word and text-line level from offline handwritten scanned images. Compared to conventional stack of CNN, stack of Gated CNN and CNN has better performance to extract features. In addition, Transformer network enhances the limitations of recurrent based networks by avoiding recursion. To train and test the proposed model, we prepare a word and text-line database. In addition, we introduce a semi-automatic labeling algorithm for word based database preparation. The experiment result of the proposed model shows promising result.
On the other hand, a Convolutional Neural Network (CNN) with bidirectional Long Short-Term Memory (LSTM) and Connectionist Temporal Classification (CTC) also known as Convolutional Recurrent Neural Network (CRNN) is used to recognize Ethiopic scene texts from cropped natural images. The architecture has three layers which are feature extraction layer using stack of CNN, prediction layer using LSTM and loss calculation and transcription layer using CTC. The experiment result shows a promising result. Besides, for bilingual scene text detection as well as for end-to-end scene text reading, octave based feature extractor and time-restricted self-attention encoder-decoder method is used. In the architecture, we use Feature Pyramid Network (FPN) with octave based ResNet-50 to extract features. The outputs from the feature extraction layer are used to detect text/non-text regions using Region Proposal Network (RPN). Finally, a time-restricted self-attention encoder-decoder method is employed to recognize texts from regions which have text. The experiment result shows that the detection performance on both languages doesn’t have a difference. However, the recognition performance shows that English words are better recognition performance than Amharic words.
To evaluate and train both the proposed models, we prepare appropriate databases (i.e. isolated offline handwritten Ethiopic character and numeric database, Ethiopic offline handwritten word and text-line database, real and synthetic bilingual scene text database). In addition, other well-known datasets including ICDAR2013, ICDAR2015, and Total-Text are employed for scene text detection and recognition. The experiment results show a better recognition performance than previously proposed methods.

ABSTRACT I
Acknowledgments III
DEDICATION IV
Table of Contents V
List of Figures VIII
List of tables IX
Chapter 1: Introduction 1
1.1. Background 1
1.2. Objectives of the Dissertation 6
1.3. Motivations and contributions 7
1.4. Organization of the Dissertation 8
Chapter 2: Machine Learning Techniques and Text Recognition 10
2.1. Machine Learning Techniques 10
2.1.1. Traditional Machine Learning 11
2.1.2. Deep learning (DL) 11
2.1.2.1. Multi-Layer Perceptron (MLP) 12
2.1.3. Convolutional Neural Network (CNN) 12
2.1.4. Recurrent Neural Network (RNN) 14
2.1.5. Long Short-Term Memory (LSTM) 14
2.1.5.1. Forget gate 15
2.1.5.2. Input Gate 16
2.1.5.3. Output Gate 17
2.1.6. Gated Recurrent Unit (GRU) 17
2.1.7. Activation Function 18
2.1.8. Optimization Function 20
2.2. Text Recognition 20
2.2.1. Offline Handwritten Text Recognition 21
2.2.2. Scene Text Recognition 23
2.2.2.1. Text Detection 23
2.2.2.2. Text Recognition 26
2.2.2.3. End-to-End method 29
2.3. Ethiopic Script 30
2.4. Performance Metrics 33
2.5. Summary 34
PART I 35
OFFLINE HANDWRITTEN TEXT RECOGNITION 35
Chapter 3: Offline Handwriting Ethiopic Character Recognition Using Deep Neural Network 36
3.1. Background 36
3.1.1. Data Collection and Preprocessing 39
3.2. Handwritten Ethiopic Character Recognition (HECR) Model 42
3.2.1.1. Convolutional Neural Network (CNN) 43
3.2.1.1.1. Convolutional layers 43
3.2.1.1.2. Pooling Layers 44
3.2.1.1.3. Fully-connected Layers 44
3.2.1.2. Multilayer Perceptron (MLP) 45
3.3. Experimental Results and Discussion 47
3.3.1. Experiment Setup 47
3.3.2. Results and Discussions 48
3.4. Summary 52
Chapter 4: Offline Handwritten Ethiopic Text Recognition using Gated Convolution and Transformer 53
4.1. Background 53
4.2. Recognition Method 55
4.2.1. Feature Extraction Layer 56
4.2.2. Encoder Layer 57
4.2.3. Decoder Layer 57
4.3. Experimental Results and Discussion 59
4.3.1. Data collection and preprocessing 59
4.3.1.1.1. Preprocessing 60
4.3.1.1.2. Segmentation 60
4.3.2. Results and Discussions 62
4.4. Summary 63
PART II 65
SCENE TEXT RECOGNITION 65
Chapter 5: Scene Text Recognition Using Deep Learning Approaches 66
5.1. Background 66
5.2. Ethiopic Scene Text Recognition (ESTR) Method 68
5.2.1. Feature Extraction Layer 69
5.2.2. Prediction Layer 69
5.2.3. Transcription Layer 70
5.3. Experiments 70
5.3.1. Dataset Preparation 70
5.3.1.1. Synthetic Ethiopic Scene Text Dataset 70
5.3.1.2. Real Ethiopic Scene Text Database 71
5.3.2. ESTR System Implementation Details 72
5.3.3. Experiment Results and Discussion 73
5.4. Summary 74
Chapter 6: Scene Text Reading Using Octave Convolution as a Feature Extractor 76
6.1. Background 76
6.2. Methodology 79
6.2.1. Overview of the Architecture 79
6.2.2. Feature Extraction Layer 79
6.2.3. Text Region Detection Layer 81
6.2.4. Segmentation and Recognition Layer 82
6.3. Bilingual Dataset Preparation 83
6.3.1. Synthetic Scene Text Dataset 83
6.3.2. Real Scene Text Dataset 84
6.4. Experiments and Discussions 85
6.5. Summary 88
Chapter 7: Conclusion and Recommendation 89
7.1. Conclusions 89
7.2. Recommendations 89
Reference 91
Appendix 101
Acronyms 101

[1]R. Mithe, S. Indalkar, and N. Divekar, “Optical character recognition,” Int. J. Recent Technol. Eng., vol. 2, no. 1, pp. 72–75, 2013.
[2]K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: A survey,” Pattern Recognit., vol. 37, no. 5, pp. 977–997, May 2004, doi: 10.1016/j.patcog.2003.10.012.
[3]T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” ieee Comput. Intell. Mag., vol. 13, no. 3, pp. 55–75, 2018.
[4]Y. Kumar and N. Singh, “A Comprehensive View of Automatic Speech Recognition System-A Systematic Literature Review,” in 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 2019, pp. 168–173.
[5]Z.-Q. Zhao, P. Zheng, S. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Trans. neural networks Learn. Syst., vol. 30, no. 11, pp. 3212–3232, 2019.
[6]Z. Raisi, M. A. Naiel, P. Fieguth, S. Wardell, and J. Zelek, “Text Detection and Recognition in the Wild: A Review,” arXiv Prepr. arXiv2006.04305, 2020.
[7]S. R. Narang, M. K. Jindal, and M. Kumar, “Ancient text recognition: a review,” Artif. Intell. Rev., pp. 1–42, 2020.
[8]A. Flor, D. S. Neto, B. Leite, D. Bezerra, and A. H. Toselli, “HTR-Flor ++ : A Handwritten Text Recognition System Based on a Pipeline of Optical and Language Models,” no. i, 2020.
[9]U. V. Marti and H. Bunke, “The IAM-database: An English sentence database for offline handwriting recognition,” Int. J. Doc. Anal. Recognit., vol. 5, no. 1, pp. 39–46, 2003, doi: 10.1007/s100320200071.
[10]A. Vaswani et al., “Transformer: Attention is all you need,” Adv. Neural Inf. Process. Syst. 30, pp. 5998–6008, 2017.
[11]T. Bluche and R. Messina, “Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Jul. 2017, vol. 1, pp. 646–651, doi: 10.1109/ICDAR.2017.111.
[12]J. Puigcerver, “Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Jul. 2017, vol. 1, pp. 67–72, doi: 10.1109/ICDAR.2017.20.
[13]Y. Chen et al., “Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution,” vol. 1, 2019, [Online]. Available: http://arxiv.org/abs/1904.05049.
[14]J. Schmidhuber, “Deep Learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015, doi: 10.1016/j.neunet.2014.09.003.
[15]A. Mishra, K. Alahari, and C. V Jawahar, “Scene text recognition using higher order language priors,” 2012.
[16]T. Novikova, O. Barinova, P. Kohli, and V. Lempitsky, “Large-lexicon attribute-consistent text recognition in natural images,” in European conference on computer vision, 2012, pp. 752–765.
[17]S. Tian et al., “Multilingual scene character recognition with co-occurrence of histogram of oriented gradients,” Pattern Recognit., vol. 51, pp. 125–134, 2016, doi: 10.1016/j.patcog.2015.07.009.
[18]A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel connectionist system for unconstrained handwriting recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 5, pp. 855–868, 2009, doi: 10.1109/TPAMI.2008.137.
[19]C. Computing, B. Data, V. S. Dhaka, M. Kumar, and P. Chaudhary, “Offline Handwritten English Script Recognition : A Survey,” Computing, 1929.
[20]P. Natarajan, S. Saleem, R. Prasad, E. MacRostie, and K. Subramanian, “Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach,” Arab. Chinese Handwrit. Recognit., pp. 231–250, 2008, doi: 10.1007/978-3-540-78199-8_14.
[21]W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bull. Math. Biophys., vol. 5, no. 4, pp. 115–133, 1943.
[22]F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychol. Rev., vol. 65, no. 6, p. 386, 1958.
[23]D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
[24]Y. LeCun et al., “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541.
[25]M. W. Gardner and S. R. Dorling, “Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences,” Atmos. Environ., vol. 32, no. 14–15, pp. 2627–2636, 1998.
[26]Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998, doi: 10.1109/5.726791.
[27]C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, vol. 07-12-June, pp. 1–9, doi: 10.1109/CVPR.2015.7298594.
[28]O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” in 2012 IEEE international conference on Acoustics, speech and signal processing (ICASSP), 2012, pp. 4277–4280.
[29]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Adv. Neural Inf. Process. Syst., pp. 1–9, 2012, doi: http://dx.doi.org/10.1016/j.protcy.2014.09.007.
[30]K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” pp. 1–14, Sep. 2014, doi: 10.1016/j.infsof.2008.09.005.
[31]K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Cvpr, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.
[32]X.-Y. Y. Zhang, Y. Bengio, and C.-L. L. Liu, “Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark,” Pattern Recognit., vol. 61, pp. 348–360, Jun. 2017, doi: 10.1016/j.patcog.2016.08.005.
[33]S. Kaur, S. Bawa, and R. Kumar, “A survey of mono- and multi-lingual character recognition using deep and shallow architectures : indic and non-indic scripts,” Artif. Intell. Rev., 2019, doi: 10.1007/s10462-019-09720-9.
[34]Z. Tian, W. Huang, T. He, and Y. Qiao, “Detecting Text in Natural Image with Connectionist Text Proposal Network.” Accessed: Dec. 14, 2019. [Online]. Available: http://textdet.com/.
[35]H. Li, P. Wang, M. You, and C. Shen, “Reading car license plates using deep neural networks,” Image Vis. Comput., vol. 72, pp. 14–23, Apr. 2018, doi: 10.1016/j.imavis.2018.02.002.
[36]C.-Y. Lee and S. Osindero, “Recursive Recurrent Nets with Attention Modeling for OCR in the Wild,” Mar. 2016, Accessed: Mar. 21, 2019. [Online]. Available: http://arxiv.org/abs/1603.03101.
[37]G. Ian, Y. Courville, and A. Bengio, Deep Learning, 1st ed. MIT Press, 2016.
[38]S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[39]F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with LSTM,” 1999.
[40]K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv Prepr. arXiv1406.1078, 2014.
[41]A. Graves and J. J. Schmidhuber, “Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks,” Adv. Neural Inf. Process. Syst. 21, NIPS’21, pp. 545–552, 2008, doi: 10.1007/978-1-4471-4072-6.
[42]M. Z. Alom et al., “The history began from alexnet: A comprehensive survey on deep learning approaches,” arXiv Prepr. arXiv1803.01164, 2018.
[43]M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Machine learning for wireless networks with artificial intelligence: A tutorial on neural networks,” arXiv Prepr. arXiv1710.02913, vol. 9, 2017.
[44]H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat., pp. 400–407, 1951.
[45]J. C. Spall, “Adaptive stochastic approximation by the simultaneous perturbation method,” IEEE Trans. Automat. Contr., vol. 45, no. 10, pp. 1839–1853, 2000.
[46]M. D. Zeiler, “ADADELTA: AN ADAPTIVE LEARNING RATE METHOD.” Accessed: Dec. 27, 2018. [Online]. Available: https://arxiv.org/pdf/1212.5701.pdf.
[47]D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” Dec. 2014, Accessed: Dec. 13, 2018. [Online]. Available: https://arxiv.org/abs/1412.6980.
[48]J. Lotman, The structure of the artistic text, no. 7. University of Michigan/Michigan Slavic, 1977.
[49]W. W. Bledsoe and I. Browning, “Pattern recognition and reading by machine,” in Papers presented at the December 1-3, 1959, eastern joint IRE-AIEE-ACM computer conference, 1959, pp. 225–232.
[50]A. L. Koerich, R. Sabourin, and C. Y. Suen, “Large vocabulary off-line handwriting recognition: A survey,” Pattern Anal. Appl., vol. 6, no. 2, pp. 97–121, 2003.
[51]Y. Assabie and J. Bigun, “Offline handwritten Amharic word recognition,” Pattern Recognit. Lett., vol. 32, no. 8, pp. 1089–1099, 2011, doi: 10.1016/j.patrec.2011.02.007.
[52]J. Cowell and F. Hussain, “Amharic character recognition using a fast signature based algorithm,” in Proceedings of the International Conference on Information Visualisation, 2003, vol. 2003-January, pp. 384–389, doi: 10.1109/IV.2003.1218014.
[53]A. T. Birhanu and R. Sethuraman, “Artificial neural network approach to the development of OCR for real life Amharic documents,” Int. J. Sci. Eng. Technol. Res., vol. 4, no. 1, pp. 141–147, 2015.
[54]M. Meshesha and C. V Jawahar, “Optical character recognition of amharic documents,” African J. Inf. Commun. Technol., vol. 3, no. 2, 2007.
[55]Y. Assabie and J. Bigun, “Writer-independent offline recognition of handwritten Ethiopic characters,” Proc. 11th ICFHR, pp. 652–657, 2008, [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.3715&rep=rep1&type=pdf.
[56]B. Gatos, I. Pratikakis, K. Kepene, and S. J. Perantonis, “Text detection in indoor/outdoor scene images,” Proc. 1st Int. Work. Camera-Based Doc. Anal. Recognition, CBDAR 2005, pp. 127–132, 2005.
[57]X. Wang, Y. Song, Y. Zhang, and J. Xin, “Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis,” Pattern Recognit. Lett., vol. 60, pp. 41–47, 2015.
[58]J. J. Lee, P. H. Lee, S. W. Lee, A. Yuille, and C. Koch, “AdaBoost for text detection in natural scene,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Sep. 2011, pp. 429–434, doi: 10.1109/ICDAR.2011.93.
[59]Q. Ye and D. Doermann, “Text detection and recognition in imagery: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 7, pp. 1480–1500, 2014.
[60]J. Liang, D. Doermann, and H. Li, “Camera-based analysis of text and documents: A survey,” International Journal on Document Analysis and Recognition, vol. 7, no. 2–3. Springer-Verlag, pp. 84–104, Jul. 2005, doi: 10.1007/s10032-004-0138-z.
[61]C. Yi, Y. Tian, Y. Zhu, C. Yao, and X. Bai, “Text string detection from natural scenes by structure-based partition and grouping,” IEEE Trans. Image Process., vol. 10, no. 1, pp. 2594–2605, 2016.
[62]B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp. 2963–2970, doi: 10.1109/CVPR.2010.5540041.
[63]J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” in Image and Vision Computing, Sep. 2004, vol. 22, no. 10 SPEC. ISS., pp. 761–767, doi: 10.1016/j.imavis.2004.02.006.
[64]A. Mosleh, N. Bouguila, and A. Ben Hamza, “Image Text Detection Using a Bandlet-Based Edge Detector and Stroke Width Transform.,” in BMVC, 2012, pp. 1–12.
[65]Cong Yao, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu, “Detecting texts of arbitrary orientations in natural images,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 1083–1090, doi: 10.1109/CVPR.2012.6247787.
[66]H. Il Koo and D. H. Kim, “Scene text detection via connected component clustering and nontext filtering,” IEEE Trans. image Process., vol. 22, no. 6, pp. 2296–2305, 2013.
[67]Q. Ye and D. Doermann, “Scene text detection via integrated discrimination of component appearance and consensus,” in International Workshop on Camera-Based Document Analysis and Recognition, 2013, pp. 47–59.
[68]W. Huang, Y. Qiao, and X. Tang, “Robust scene text detection with convolution neural network induced mser trees,” in European conference on computer vision, 2014, pp. 497–511.
[69]B. Shi, X. Bai, and S. Belongie, “Detecting Oriented Text in Natural Images by Linking Segments,” Mar. 2017, Accessed: Feb. 10, 2019. [Online]. Available: http://arxiv.org/abs/1703.06520.
[70]Y. Liu and L. Jin, “Deep matching prior network: Toward tighter multi-oriented text detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1962–1969.
[71]D. He et al., “Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 474–483, doi: 10.1109/CVPR.2017.58.
[72]Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, “Multi-Oriented Text Detection with Fully Convolutional Networks,” Apr. 2016, Accessed: Mar. 31, 2019. [Online]. Available: http://arxiv.org/abs/1604.04018.
[73]S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes,” Jul. 2018, Accessed: Feb. 10, 2019. [Online]. Available: http://arxiv.org/abs/1807.01544.
[74]X. Zhou et al., EAST: An Efficient and Accurate Scene Text Detector. IEEE, 2017, pp. 2642–2651.
[75]W. He, X.-Y. Zhang, F. Yin, and C.-L. Liu, “Deep Direct Regression for Multi-Oriented Scene Text Detection.” pp. 745–753, 2017, Accessed: Mar. 31, 2019. [Online]. Available: http://openaccess.thecvf.com/content_iccv_2017/html/He_Deep_Direct_Regression_ICCV_2017_paper.html.
[76]P. Lyu, C. Yao, W. Wu, S. Yan, and X. Bai, “Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation,” Feb. 2018, Accessed: Feb. 10, 2019. [Online]. Available: http://arxiv.org/abs/1802.08948.
[77]S. Zhu, R. Z.-P. of the I. C. on, and undefined 2016, “A text detection system for natural scenes with convolutional feature learning and cascaded classification,” cv-foundation.org, Accessed: Apr. 11, 2019. [Online]. Available: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Zhu_A_Text_Detection_CVPR_2016_paper.html.
[78]X. Liu, G. Meng, and C. Pan, “Scene text detection and recognition with advances in deep learning: a survey,” Int. J. Doc. Anal. Recognit., vol. 22, no. 2, pp. 143–162, 2019.
[79]X. Bai, C. Yao, and W. Liu, “Strokelets: A learned multi-scale mid-level representation for scene text recognition,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2789–2802, 2016.
[80]C.-Y. Lee, A. Bhardwaj, W. Di, V. Jagadeesh, and R. Piramuthu, “Region-based discriminative feature pooling for scene text recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 4050–4057.
[81]J. L. Feild and E. G. Learned-Miller, “Improving open-vocabulary scene text recognition,” in 2013 12th International Conference on Document Analysis and Recognition, 2013, pp. 604–608.
[82]V. Goel, A. Mishra, K. Alahari, and C. V Jawahar, “Whole is greater than sum of parts: Recognizing scene text words,” in 2013 12th International Conference on Document Analysis and Recognition, 2013, pp. 398–402.
[83]K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in Proceedings of the IEEE International Conference on Computer Vision, Nov. 2011, pp. 1457–1464, doi: 10.1109/ICCV.2011.6126402.
[84]C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao, “Scene text detection using graph model built upon maximally stable extremal regions,” Pattern Recognit. Lett., vol. 34, no. 2, pp. 107–116, 2013.
[85]B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2298–2304, Jul. 2017, doi: 10.1109/TPAMI.2016.2646371.
[86]B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, “Robust Scene Text Recognition with Automatic Rectification,” Mar. 2016, Accessed: Mar. 21, 2019. [Online]. Available: http://arxiv.org/abs/1603.03915.
[87]W. Liu, C. Chen, K.-Y. Wong, Z. Su, and J. Han, “STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition,” in Procedings of the British Machine Vision Conference 2016, 2016, pp. 43.1-43.13, doi: 10.5244/C.30.43.
[88]M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading Text in the Wild with Convolutional Neural Networks,” Int. J. Comput. Vis., vol. 116, no. 1, pp. 1–20, Dec. 2016, doi: 10.1007/s11263-015-0823-z.
[89]Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, “Focusing attention: Towards accurate text recognition in natural images,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5076–5084.
[90]M. Liao, B. Shi, and X. Bai, “TextBoxes++: A Single-Shot Oriented Scene Text Detector,” Jan. 2018, doi: 10.1109/TIP.2018.2825107.
[91]C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8693 LNCS, no. PART 5, pp. 391–405, 2014, doi: 10.1007/978-3-319-10602-1_26.
[92]P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, “Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11218 LNCS, pp. 71–88, doi: 10.1007/978-3-030-01264-9_5.
[93]T. Lin, P. Dollár, R. Girshick, … K. H.-P. of the, and undefined 2017, “Feature pyramid networks for object detection,” openaccess.thecvf.com, Accessed: Apr. 11, 2019. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2017/html/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.html.
[94]S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Jun. 2015, Accessed: Apr. 02, 2019. [Online]. Available: http://arxiv.org/abs/1506.01497.
[95]M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “TextBoxes: A Fast Text Detector with a Single Deep Neural Network,” Nov. 2016, Accessed: Apr. 02, 2019. [Online]. Available: http://arxiv.org/abs/1611.06779.
[96]T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, and C. Sun, “An End-to-End TextSpotter with Explicit Alignment and Attention,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 5020–5029, doi: 10.1109/CVPR.2018.00527.
[97]X. Liu, D. Liang, S. Yan, … D. C.-P. of the, and undefined 2018, “Fots: Fast oriented text spotting with a unified network,” openaccess.thecvf.com, Accessed: Apr. 11, 2019. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Liu_FOTS_Fast_Oriented_CVPR_2018_paper.html.
[98]D. Povey, H. Hadian, P. Ghahremani, K. Li, and S. Khudanpur, “A time-restricted self-attention layer for ASR,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, pp. 5874–5878, 2018, doi: 10.1109/ICASSP.2018.8462497.
[99]B. S. Mendisu and C. B. Efforts, “The Ethiopic Script: Linguistic Features and Socio-cultural Connotations,” Oslo Stud. Lang., vol. 8, no. 1, pp. 137–172, 2017.
[100]C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, “Online and offline handwritten Chinese character recognition: Benchmarking on new databases,” Pattern Recognit., vol. 46, no. 1, pp. 155–162, Jan. 2013, doi: 10.1016/j.patcog.2012.06.021.
[101]S. España-Boquera, M. J. Castro-bleda, J. Gorbe-moya, and F. Zamora-martinez, “Improving Offline Handwritten Text Recognition with Hybrid HMM / ANN Models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 4, pp. 767–779, Apr. 2011, doi: 10.1109/TPAMI.2010.141.
[102]Y. Aiquan, B. Gang, J. Lijing, and L. Yajie, “Offline handwritten English character recognition based on convolutional neural network,” 2012 10th IAPR Int. Work. Doc. Anal. Syst., pp. 125–129, 2012, doi: 10.1109/DAS.2012.61.
[103]P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” Proc. 25th Int. Conf. Mach. Learn. - ICML ’08, pp. 1096–1103, 2008, doi: 10.1145/1390156.1390294.
[104]G. E. Hinton, S. Osindero, and Y.-W. Teh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006, doi: 10.1162/neco.2006.18.7.1527.
[105]E.-S. Ahmed, M. Loey, and H. EL-Bakry, “Arabic Handwritten Characters Recognition using Convolutional Neural Network,” vol. 5, no. 1, pp. 11–19, 2017.
[106]K. Dutta, P. Krishnan, M. Mathew, and C. V. Jawahar, “Offline handwriting recognition on devanagari using a new benchmark dataset,” in Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Jun. 2018, pp. 25–30, doi: 10.1109/DAS.2018.69.
[107]Y. Zhang, “Deep Convolutional Network for Handwritten Chinese Character Recognition,” Comput. Sci. Dep. Stanford Univ., 2015, [Online]. Available: https://pdfs.semanticscholar.org/4941/aed85462968e9918110b4ba740c56030fd23.pdf.
[108]C. Viard-Gaudin, P. M. Lallican, S. Knerr, and P. Binter, “The IRESTE On/Off (IRONOFF) dual handwriting database,” in Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR ’99 (Cat. No.PR00318), 1999, pp. 455–458, doi: 10.1109/ICDAR.1999.791823.
[109]S. Sun, Z. Cao, H. Zhu, and J. Zhao, “A survey of optimization methods from a machine learning perspective,” IEEE Trans. Cybern., 2019.
[110]J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization.,” J. Mach. Learn. Res., vol. 12, no. 7, 2011.
[111]M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, “Subject independent facial expression recognition with robust face detection using a convolutional neural network,” in Neural Networks, Jun. 2003, vol. 16, no. 5–6, pp. 555–559, doi: 10.1016/S0893-6080(03)00115-1.
[112]F. S. Panchal and M. Panchal, “Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network,” Int. J. Comput. Sci. Mob. Comput., vol. 311, no. 11, pp. 455–464, 2014, [Online]. Available: http://www.ijcsmc.com/docs/papers/November2014/V3I11201499a19.pdf.
[113]X. Zhang, Y. Bengio, and C. Liu, “Online and O ffl ine Handwritten Chinese Character Recognition: A Comprehensive Study and New Benchmark,” pp. 1–21, 2016.
[114]U.-V. Marti and H. Bunke, “The IAM-database: an English sentence database for offline handwriting recognition,” Int. J. Doc. Anal. Recognit., vol. 5, no. 1, pp. 39–46, Nov. 2002, doi: 10.1007/s100320200071.
[115]T. Tieleman, G. H.-C. N. networks for machine learning, and undefined 2012, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.”
[116]K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.” Accessed: Nov. 01, 2020. [Online]. Available: https://www.cv-foundation.org/openaccess/content_iccv_2015/html/He_Delving_Deep_into_ICCV_2015_paper.html.
[117]S. Ioffe, “Batch Renormalization: Towards reducing minibatch dependence in batch-normalized models,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, pp. 1946–1954, 2017.
[118]A. Dutta and A. Zisserman, “The {VIA} Annotation Software for Images, Audio and Video,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, doi: 10.1145/3343031.3350535.
[119]T. M. Breuel, “The OCRopus open source OCR system,” in Document Recognition and Retrieval XV, Jan. 2008, vol. 6815, pp. 68150F-68150F–15, doi: 10.1117/12.783598.
[120]L. Neumann, J. M.-P. of the IEEE, and undefined 2013, “Scene text localization and recognition with oriented stroke detection,” openaccess.thecvf.com, Accessed: Apr. 11, 2019. [Online]. Available: http://openaccess.thecvf.com/content_iccv_2013/html/Neumann_Scene_Text_Localization_2013_ICCV_paper.html.
[121]L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6494 LNCS, no. PART 3, Springer, Berlin, Heidelberg, 2011, pp. 770–783.
[122]J. Ma et al., “Arbitrary-Oriented Scene Text Detection via Rotation Proposals,” Mar. 2017, doi: 10.1109/TMM.2018.2818020.
[123]B. Su and S. Lu, “Accurate Scene Text Recognition Based on Recurrent Neural Network,” 2015, pp. 35–48.
[124]M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition,” Jun. 2014, Accessed: Mar. 02, 2019. [Online]. Available: http://arxiv.org/abs/1406.2227.
[125]A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks BT - ICML ’06: Proceedings of the 23rd international conference on Machine learning,” pp. 369–376, 2006, doi: 10.1145/1143844.1143891.
[126]M. Buta, L. Neumann, and J. Matas, “FASText: Efficient unconstrained scene text detector,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, vol. 2015 Inter, pp. 1206–1214, doi: 10.1109/ICCV.2015.143.
[127]C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao, “Scene Text Detection via Holistic, Multi-Channel Prediction,” Jun. 2016, Accessed: Mar. 31, 2019. [Online]. Available: http://arxiv.org/abs/1606.09002.
[128]D. Deng, H. Liu, X. Li, and D. Cai, “PixelLink: Detecting Scene Text via Instance Segmentation,” undefined, Jan. 2018, Accessed: Feb. 10, 2019. [Online]. Available: http://arxiv.org/abs/1801.01315.
[129]P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang, “Reading Scene Text in Deep Convolutional Sequences,” Jun. 2015, Accessed: Apr. 02, 2019. [Online]. Available: https://arxiv.org/abs/1506.04395.
[130]H. Li, P. Wang, C. S.-P. of the IEEE, and undefined 2017, “Towards end-to-end text spotting with convolutional recurrent neural networks,” openaccess.thecvf.com, Accessed: Apr. 11, 2019. [Online]. Available: http://openaccess.thecvf.com/content_iccv_2017/html/Li_Towards_End-To-End_Text_ICCV_2017_paper.html.
[131]M. Busta, L. Neumann, and J. Matas, “Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 2223–2231, 2017, doi: 10.1109/ICCV.2017.242.
[132]A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic Data for Text Localisation in Natural Images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Apr. 2016, vol. 2016-Decem, pp. 2315–2324, doi: 10.1109/CVPR.2016.254.
[133]S. Zagoruyko and N. Komodakis, “Wide Residual Networks,” 2016, doi: 10.5244/C.30.87.
[134]S. Xie, R. Girshick, P. Dollár, … Z. T.-P. of the I., and undefined 2017, “Aggregated residual transformations for deep neural networks,” openaccess.thecvf.com, Accessed: Dec. 14, 2019. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2017/html/Xie_Aggregated_Residual_Transformations_CVPR_2017_paper.html.
[135]X. Zhang, X. Zhou, M. Lin, J. S.-P. of the IEEE, and undefined 2018, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” openaccess.thecvf.com, Accessed: Dec. 14, 2019. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html.
[136]F. C.-P. of the I. conference on and undefined 2017, “Xception: Deep learning with depthwise separable convolutions,” openaccess.thecvf.com, Accessed: Dec. 14, 2019. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html.
[137]A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.”
[138]K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, Dec. 2017, vol. 2017-October, pp. 2980–2988, doi: 10.1109/ICCV.2017.322.
[139]N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS - Improving Object Detection with One Line of Code,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 5562–5570, 2017, doi: 10.1109/ICCV.2017.593.
[140]L. Wu, T. Li, L. Wang, and Y. Yan, “Improving hybrid CTC/Attention architecture with time-restricted self-attention CTC for end-to-end speech recognition,” Appl. Sci., vol. 9, no. 21, pp. 1–14, 2019, doi: 10.3390/app9214639.
[141]D. Karatzas et al., “ICDAR 2013 robust reading competition,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Aug. 2013, pp. 1484–1493, doi: 10.1109/ICDAR.2013.221.
[142]D. Karatzas et al., “ICDAR 2015 competition on Robust Reading,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Aug. 2015, vol. 2015-Novem, pp. 1156–1160, doi: 10.1109/ICDAR.2015.7333942.
[143]C. K. Ch’Ng and C. S. Chan, “Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Oct. 2017, vol. 1, pp. 935–942, doi: 10.1109/ICDAR.2017.157.
[144]L. Gómez and D. Karatzas, “TextProposals: A text-specific selective search algorithm for word spotting in the wild,” Pattern Recognit., vol. 70, pp. 60–74, 2017, doi: 10.1016/j.patcog.2017.04.027.
[145]M. Busta, L. Neumann, J. M.-P. of the IEEE, and undefined 2017, “Deep textspotter: An end-to-end trainable scene text localization and recognition framework,” openaccess.thecvf.com, Accessed: Dec. 16, 2019. [Online]. Available: http://openaccess.thecvf.com/content_iccv_2017/html/Busta_Deep_TextSpotter_An_ICCV_2017_paper.html.
[146]M. Bušta, Y. Patel, and J. Matas, “E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text,” Jan. 2018, Accessed: Apr. 11, 2019. [Online]. Available: http://arxiv.org/abs/1801.09919.
[147]W. Wang et al., “Shape Robust Text Detection with Progressive Scale Expansion Network,” Mar. 2019, Accessed: Jan. 06, 2020. [Online]. Available:
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊