( 您好!臺灣時間:2024/07/19 06:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Yu-An Li
論文名稱(外文):Handwritten and Printed Chinese Character Recognition By Using Computer Font Type Chinese Characters into Convolutional Neural Network
  • 被引用被引用:1
  • 點閱點閱:313
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究的目的在改善中文漢字的手寫與印刷字體之辨識。利用現有網路上與電腦內建的現存的不同風格的字型資源,取常用的5000及10000字,並搭配影像處理技術,對這些字體做數種變形與前處理來產生所需要的訓練資料。運用機器學習中的卷積神經網路(Convolutional Neural Networks)之技術,訓練出一個同時具有辨識手寫與印刷體漢字的模型。調整與優化模型參數,反覆驗證,並用其他具有代表性之不同測試資料集做實驗評估。如何利用影像處理技術生成有效之訓練資料、以提升辨識模型的正確率,對不同代表性測試集皆可辨識正確,是本研究的核心目標。

(1) 如何只以現存的電腦字體來訓練可以同時對手寫字體與印刷字體進行辨識的模型。
(2) 針對古典文獻中的印刷字體辨識最優化,改善古典文獻影像上字體模糊與罕見字等辨識問題。
The main purpose of this paper is to improve Handwritten Chinese Character Recognition and traditional, non-modern Printed Chinese Character Recognition problem. By using the existing different style of Chinese font resources in computer system and online sources, we take most commonly used 5000 and 10000 words, then do several data deformation and preprocessing by image processing skills to produce training data.
Combined with the technology of Convolutional Neural Networks in machine learning, we trained a distinguished model which can be used to recognize handwritten and printed Chinese character both.
The main goal of this paper is to find the valid training features, optimize parameters and fine tune our model to get a better performance.
The results of this paper mainly include:
(1) How to train a model which can recognize both the handwritten font and the printed font simultaneously on by existing computer word font.
(2) For the printed Chinese character font, we mainly focus on early traditional printed fonts, and improves the recognition problems, such as rare Chinese characters recognition and characters easily damaged or blur in the original text.
(3) We conduct our experiments with the Beijing Civil News, the Biansha Tibetan Buddhist Dharma and the 2013 CASIA handwritten Chinese character public test set. The results show that the model and method we proposed in this paper can reach the accuracy of 69.9% on News, 89.29% on Buddhist Dharma, and 58.27% on handwriting testing set. Compared with the existing common OCR recognition software, our model can improve the accuracy about 2~3%.
Key Word : HCCR、PCCR、Image Processing、Machine Learning、Convolutional Neural Networks
口試委員審定書 I
誌謝 I
中文摘要 II
英文摘要 I
目錄 I
圖目錄 I
Chapter 1 緒論 1
1.1研究背景與動機 1
1.2研究目標 2
1.3研究貢獻 3
1.4論文架構 4
Chapter 2 文獻探討與背景知識 5
2.1印刷體漢字辨識探討 5
2.1.1階層式演算法與區塊像素機率分佈 5
2.1.2 神經網絡與機器學習演算法 5
2.1.3利用卷積神經網絡進行多字體辨識 5
2.2手寫體漢字辨識探討 6
2.2.1樣本集擴展與多列神經網絡 6
2.2.2文字校正與對準 6
2.2.3 相似文字定義與分群 7
2.3現況背景與限制 7
2.4基礎影像處理 8
2.4.1影像二值化(Threshold) 8
2.4.2影像灰階化(Gray) 11
2.5卷積神經網絡(Convolutional Neural Network)介紹 13
Chapter 3 問題定義及研究方法 16
3.1問題定義及系統架構 16
3.2 萬國碼漢字生成 18
3.3 常用漢字集篩選 19
3.4 電腦生成字體篩選 22
3.5影像灰階與二值化 24
3.5.1 RGB色彩空間 25
3.5.2 RGB轉灰階與門檻值設定 25
3.6影像多樣性生成與解析度統一 28
3.6.1 一般字型生成(Normal) 29
3.6.2 影像模糊化(Blur) 29
3.6.3 影像侵蝕化(Erosion) 31
3.6.4 影像膨脹化(Dilation) 32
3.6.5影像仿射變換(Affine) 34
3.6.6仿射變換+侵蝕化(Affine + Erosion) 35
3.6.7仿射變換+膨脹化(Affine + Dilation) 36
3.6.8影像細線化(Thinning) 36
3.7影像旋轉 40
3.8卷積神經網絡模型架構與內容 42
Chapter 4 實驗結果與討論 45
4.1實驗資料蒐集與建置 45
4.1.1 佛典、晶報與手寫漢字測試集介紹 45
4.2實驗評估方法 51
4.3實驗結果與比較 52
4.3.1完整訓練集辨識結果 52
4.3.2單一印刷字體對於辨識率之結果 56
4.3.3加入細線化訓練集對於辨識率之結果 58
4.3.4旋轉角度之於辨識率結果 58
4.3.5測試集細線化對於辨識率之結果 59
4.3.6 不同CNN模型架構實驗結果比較 62
4.3.7與現有其他OCR軟體之辨識結果比較 64
Chapter 5結論與未來展望 66
5.1結論 66
5.2未來展望 67
參考文獻 68
附錄 70
附錄一 字頻參考文本之小說全名列表(一共349部小說) 70
附錄二 TOP5000與TOP10000模型漢字集 74
1.Liu, Y., J. Tai, and J. Liu. An introduction to the 4 million handwriting Chinese character samples library. in Proceedings of the International Conference on Chinese Computing and Orient Language Processing. 1989.
2.Casey, R. and G. Nagy, Recognition of printed Chinese characters. IEEE Transactions on Electronic Computers, 1966(1): p. 91-101.
3.Amin, A., S.-G. Kim, and C. Sammut. Hand-printed Chinese character recognition via machine learning. in Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on. 1997. IEEE.
4.Wang, N. Printed Chinese character recognition based on pixel distribution probability of character image. in Intelligent Information Hiding and Multimedia Signal Processing, 2008. IIHMSP''08 International Conference on. 2008. IEEE.
5.Khawaja, A., et al. Recognition of printed Chinese characters by using Neural Network. in Multitopic Conference, 2006. INMIC''06. IEEE. 2006. IEEE.
6.Hu, X., et al. A printed Chinese character recognition method. in Computer Science and Service System (CSSS), 2011 International Conference on. 2011. IEEE.
7.Zhong, Z., L. Jin, and Z. Feng. Multi-font printed Chinese character recognition using multi-pooling convolutional neural network. in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. 2015. IEEE.
8.Tang, Y., et al. Semi-Supervised Transfer Learning for Convolutional Neural Network Based Chinese Character Recognition. in Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. 2017. IEEE.
9.Cireşan, D. and U. Meier. Multi-column deep neural networks for offline handwritten Chinese character classification. in Neural Networks (IJCNN), 2015 International Joint Conference on. 2015. IEEE.
10.Song, X., et al. A handwritten Chinese characters recognition method based on sample set expansion and CNN. in Systems and Informatics (ICSAI), 2016 3rd International Conference on. 2016. IEEE.
11.Zhong, Z., et al. Handwritten Chinese character recognition with spatial transformer and deep residual networks. in Pattern Recognition (ICPR), 2016 23rd International Conference on. 2016. IEEE.
12.Wang, Q. and Y. Lu. Similar Handwritten Chinese Character Recognition Using Hierarchical CNN Model. in Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. 2017. IEEE.
13.Krizhevsky, A., I. Sutskever, and G.E. Hinton. Imagenet classification with deep convolutional neural networks. in Advances in neural information processing systems. 2012.
14.Gao, T.-F. and C.-L. Liu, High accuracy handwritten Chinese character recognition using LDA-based compound distances. Pattern Recognition, 2008. 41(11): p. 3442-3451.
15.Xu, B., K. Huang, and C.-L. Liu. Similar handwritten Chinese characters recognition by critical region selection based on average symmetric uncertainty. in 2010 12th International Conference on Frontiers in Handwriting Recognition. 2010. IEEE.
16.Leung, K. and C.H. Leung, Recognition of handwritten Chinese characters by critical region analysis. Pattern Recognition, 2010. 43(3): p. 949-961.
17.Sharif Razavian, A., et al. CNN features off-the-shelf: an astounding baseline for recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2014.
18.Wright, S., Digital compositing for film and video. 2013: Focal Press.
19.Suryani, D., P. Doetsch, and H. Ney. On the benefits of convolutional neural network combinations in offline handwriting recognition. in Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference on. 2016. IEEE.
20.Simonyan, K. and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
21.https://zh.wikipedia.org/wiki/Category:%E6%BC%A2%E5%AD%97%E5%8D%B0%E5%88%B7%E5%AD%97%E9%AB%94 , 漢字印刷體分類
22.https://chinesefontdesign.com/ , 漢字資源字體網站
第一頁 上一頁 下一頁 最後一頁 top