跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.14) 您好!臺灣時間:2025/12/25 06:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊啟昕
研究生(外文):Chi-Hsin Yang
論文名稱:使用ResNet之高精度文字偵測與辨識
論文名稱(外文):(High Accuracy Text Detection and Recognition using ResNet as Feature Extraction
指導教授:謝禎冏謝禎冏引用關係
指導教授(外文):Chen-Chiung Hsieh
口試委員:謝禎冏
口試委員(外文):Chen-Chiung Hsieh
口試日期:2019-07-18
學位類別:碩士
校院名稱:大同大學
系所名稱:資訊工程學系(所)
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:60
中文關鍵詞:卷積神經網路殘差網路光學影像辨識
外文關鍵詞:CNNResNetOCR
相關次數:
  • 被引用被引用:3
  • 點閱點閱:709
  • 評分評分:
  • 下載下載:158
  • 收藏至我的研究室書目清單書目收藏:0
隨著科技的發展,光學字符識別(OCR))已經演變成三個部分:文本檢測,文本識別和後處理。
在實際運用上,傳統的OCR系統無法處理場景文本。
因為自然場景圖像中的背景複雜,所以其文本檢測比掃描文件圖像中的文本識別困難得多。
而且,由於字符種類眾多,漢字檢測和識別一直是OCR的難題。現有以神經網路為基礎的字符分割方法是CTPN、EAST、PixelLink。但是,它們不能很好地處理大圖像和連接字符中的小而密集的字符。為了解決這些問題,我們採用上述流行的字符分割網絡CTPN或EAST作為主要結構,並提出使用ResNet作為特徵提取網絡,因為它具有對圖片微小特徵的優異靈敏度。實驗結果表明,特徵提取網絡可以顯著影響文本定位的精準度。在使用ICDAR數據集的實驗中,更深的ResNet對EAST的影響是顯著的。ICDAR2015的文本分割性能準確率為83.4%,比原來的PVANET-EAST高7%。文本檢測網絡還具有很強的泛化能力,能夠以83.9%的準確率檢測未經訓練的掃描文件,中文書法的準確率達到86.3%。
In passed decades, a complete procedure in Optical Character Recognition (OCR) has established. The procedure is mainly split into three parts: text detection, text recognition, and post-processing. In fact, traditional OCR system cannot handle scene text. Text detection in natural scene images is much more difficult than the recognition of text in scanned document images because of its complicating background. Moreover, Chinese character detection and recognition are difficult topic in OCR due to its numerous characters.
Popular existing methods for characters segmentation are CTPN, EAST and PixelLink. However, they are not very capable dealing with the small and densely character in large image, and connected characters. To cope with these problems, we adopted above popular character segmentation networks CTPN or EAST as the main structure and proposed using ResNet as the feature extraction network due to its excellent sensitivity of tiny features among those existing methods. The experimental result shows that the feature extraction network can affect the precision of locating text significantly. In the experiment with ICDAR dataset, the effect of deeper depth and larger width of ResNet on EAST is notable. The performance of text segmentation on ICDAR2015 is 83.4% accuracy which is 7% higher than original PVANET-EAST. The text detection network also have great ability of generalization that can detect untrained scanned document in 83.9% accuracy and Chinese calligraphy in 86.3% accuracy.
致謝 i
摘要 ii
ABSTRACT iii
Chapter 1 INTRODUCTION 1
1.1 Background 1
1.2 Motivation 3
1.3 Organization of the Thesis 4
Chapter 2 RELATED WORK 5
2.1 Optical Character Recognition 5
2.2 Text Detection 5
2.3 Convolutional Neural Network based Text Detection 8
2.3.1 Residual Network 9
2.3.2 You Only Look Once 10
2.3.3 Connectionist Text Proposal Network 13
2.3.4 Efficient and Accurate Scene Text Detector 15
2.3.5 PixelLink 16
Chapter 3 RESEARCH METHOD 18
3.1 Network Modification 18
3.1.1 CTPN Modification 18
3.1.2 EAST Modification 22
3.2 Text Recognition 25
Chapter 4 EXPERIMENTAL RESULTS 28
4.1 Experimental Environment 28
4.2 Datasets 29
4.3 Training 32
4.4 Evaluation 32
4.5 Chinese calligraphy 33
4.6 ICDAR datasets 41
4.7 Scanned Documents 47
4.8 Text Recognition Result 53
Chapter 5 CONCLUSIONS AND FUTURE WORK 55
Reference 56
[1] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, “Casia online and offline Chinese handwriting databases,” in Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 37–41, IEEE, 2011.
[2] C. Wang, Y. Qi, and X. Wang, “The Chinese characters extraction method based on area voronoi diagram in inscription,” in Virtual Reality and Visualization (ICVRV), 2015 International Conference on, pp. 109–116, IEEE, 2015.
[3] X. Wei, S. Ma, and Y. Jin, “Segmentation of connected chinese characters based on genetic algorithm,”in Document Analysis and Recognition 2005.Proceedings.Eighth International Conference on, pp. 645–649, IEEE, 2005.
[4] C. Hong, G. Loudon, Y. Wu, and R. Zitserman, “Segmentation and recognition of continuous handwriting Chinese text,” International journal of pattern recognition and artificial intelligence, vol. 12, no. 02, pp. 223–232, 1998.
[5] S. Zhao, Z. Chi, P. Shi, and Q. Wang, “Handwritten chinese character segmentation using a two-stage approach,” p. 0179, 2001.
[6] O.E.AgazziandS.-s.Kuo,“Hidden Markov model based optical character recognition in the presence of deterministic transformations,” Pattern recognition, vol. 26, no. 12, pp. 1813–1826, 1993.
[7] C. Bahlmann, B. Haasdonk, and H. Burkhardt, “Online handwriting recognition with support vector machines-a kernel approach,” in Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 49–54, IEEE. 2002.
[8] C.Jawahar, M.P.Kumar, andS.R.Kiran, “A bilingual ocr for Hindi-Telugu documents and its applications,” in Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp. 408–412, IEEE, 2003.
[9] X. Tong and D. A. Evans, “A statistical approach to automatic ocr error correction in context,” in Fourth Workshop on Very Large Corpora, 1996.
[10] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970, IEEE, 2010.
[11] H.Chen,S.S.Tsai,G.Schroth,D.M.Chen,R.Grzeszczuk,andB.Girod,“Robust text detection in natural images with edge-enhanced maximally stable extremal regions,” in 2011 18th IEEE International Conference on Image Processing, pp. 2609–2612, IEEE, 2011.
[12] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image and vision computing, vol. 22, no. 10, pp.761–767, 2004.
[13] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” pp. 779–788, 2016.
[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd:Single shot multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016.
[15] K. He and J. Sun, “Convolutional neural networks at constrained time cost,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5353–5360, 2015.
[16] Y.Bengio, P.Simard, P.Frasconi, etal., “Learning longterm dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[18] R. Zhang, Q. Wang, and Y. Lu, “Combination of resnet and center loss based metric learning for handwritten chinese character recognition,” in 2017 14th IAPR International ConferenceonDocument Analysisand Recognition (ICDAR), vol. 5, pp. 25–29, IEEE, 2017.
[19] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271, 2017.
[20] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[21] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017.
[22] R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Gonçalves, W. R. Schwartz, and D. Menotti, “A robust real-time automatic license plate recognition based on the yolo detector,” in 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–10, IEEE, 2018.
[23] C. F. G. d. Santos, “Optical character recognition using deep learning,” 2018.
[24] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in European conference on computer vision, pp. 56–72, Springer, 2016.
[25] M.SchusterandK.K.Paliwal,“Bidirectionalrecurrentneuralnetworks,”IEEETransactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
[26] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “East: an efficient and accurate scene text detector,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560, 2017.
[27] K.-H. Kim, S. Hong, B. Roh, Y. Cheon, and M. Park, “Pvanet: Deep but lightweight neural networks for real-time object detection,” arXiv preprint arXiv 1608.08021, 2016.
[28] D. Deng, H. Liu, X. Li, and D. Cai, “Pixellink: Detecting scene text via instance segmentation,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[29] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
[30] eragonruan, “text-detection-ctpn.” https://github.com/eragonruan/text-detection-ctpn, 2017. [Online; accessed February-12-2019].
[31] argman, “East.” https://github.com/argman/EAST, 2018. [Online; accessed January-21-2019].
[32] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image based sequence recognition and its application to scene text recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 11, pp. 2298–2304, 2016.
[33] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, pp. 369–376, ACM, 2006.
[34] ZJULearning, “pixel_link.” https://github.com/ZJULearning/pixel\_link, 2019. [Online; accessed April-10-2019].
[35] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE transactions on systems, man, and cybernetics, vol. 9, no. 1, pp. 62–66, 1979.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊