跳到主要內容

臺灣博碩士論文加值系統

(44.200.101.84) 您好!臺灣時間:2023/10/05 10:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鄭泰銘
研究生(外文):CHENG, TAI-MING
論文名稱:英文字母與數字之辨識
論文名稱(外文):Character Recognition of English Alphabets and Numerals
指導教授:李錫堅李錫堅引用關係
指導教授(外文):Hsi-Jian Lee
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:英文
論文頁數:44
中文關鍵詞:文字辨識文件分析英文字母統計式圖形辨識雙核心架構水平校正連字切割斜體字偵測
外文關鍵詞:Character RecognitionDocument AnalysisEnglish AlphabetsStatistical Pattern RecognitionDual-kernel ArchitectureDe-skewingTouching Character SegmentationDetection of Italic Text Lines
相關次數:
  • 被引用被引用:1
  • 點閱點閱:928
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
在某些應用領域中, 例如名片辨識, 我們必須在沒有整篇文件資訊的情況下,
辨識一行文字. 這篇論文中, 我們提供了一種方法, 針對任何一行獨立文字,
做出正確的辨識. 此方法主要包括三部份, 即前處理、文字辨識核心和後處理.
首先將一行單行的二值化影像做水平校正, 控制傾斜角度於 0.3 度以內, 然後
偵測此行文字是否為斜體字. 在抽取所有的連通元件 (connected components) 之後
, 經過適當的合併與去雜訊處理, 根據先前偵測到的傾斜角度做垂直方向的平移,
然後平滑化.
抽取出來的元件, 由一個「雙核心」架構的核心程式辨識, 視其為斜體或正體
而定, 由這兩個核心其中之一做辨識, 並且, 嘗試切割辨識結果較差之元件, 因為
某些元件可能包含不止一個字元, 而是多個字元相連而成. 切割的方法是利用搜尋
樹的 branch-and-bound 先深 (depth-first) 搜尋.
最後, 元件的垂直位置與字元高度可用來檢查辨識結果. 將一些不可能的字元
排除之後, 正確的字元就可以提升到第一名. 此外, 我們提出了一個決定空白字元
的方法. 對於某些大小寫外型相同的字元, 我們也可以由其垂直位置與字元高度來
判斷其為大寫或小寫.
我們從 107 張英文名片上剪取 646 行的單行文字, 作為測試樣本. 水平校正的
正確率為 99.23%; 斜體字判斷的正確率為 100%, 相連文字有 93.18% 被正確地
切割出來. 核心方面, 正體與斜體的正確率分別達到了 99.07% 與 98.53%.
In this thesis, we design a procedure for recognizing single text lines. In certain applications, single text lines
are to be recognized without any whole-document information. This procedure consists of three parts: pre-processing,
character recognition kernel, and post-processing.
In the first phase, the skewing angle and italicness of the binarized image of a single text line are detected. After
all connected components being extracted and proper combination/deletion, the vertical positions of components
are shifted. Images are smoothed then.
The components are to be recognized and, if necessary, segmented, using a dual-kernel according yto whether it is an italic text line or a roman one. Touching charcters are segmented using branch-and-bound tree traversal.
Finally, vertical position information is used to post-process the recognition results. Some impossibilities are
rejected and the correct class is eventually promoted to the first candidate. An approach to determining space characters using the profile is introduced. Characters that have the same shape in capital and lower case are justified according
to their heights.
In our experiments, we tested 646 text lines cut from English business name cards. The accuracy of skewing-angle
detection was 99.23%. The accuracy of italicness detection was 100%. 93.18% of touching characters were correctly segmented. The character recognition rates for correctly segmented or un-touched roman and italic characters were
99.07 and 98.53 respectively.
CHAPTER 1. INTRODUCTION
1.1 Motivation
1.2 Problem Definition
1.3 Survey of Related Research
1.4 System Description and Assumptions
1.5 Thesis Organization
CHAPTER 2. RECOGNITION OF TEXT IN A SINGLE LINE
2.1 Introduction
2.2 Skewing Angle Detection
2.3 Detection of Italicness
2.4 Smoothing
2.5 Connected Components Extraction and De-skewing
2.6 Touching Character Segmentation and Post-Processing
2.7 Space Character Determination
2.8 Upper/Lower Case Determination
CHAPTER 3. MULTI-FONT CHARACTER RECOGNITION
3.1 Introduction
3.2 Dual-Kernel Architecture
3.3 Training of the Roman Kernel
3.4 Training of the Italic Kernel
3.5 Post Processing
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
4.1 Introduction
4.2 Test Images
4.3 Accuracy of Skewing-Angle Detection
4.4 Accuracy of Italicness Detection
4.5 Results of Touching Character Segmentation
4.6 Character Recognition Rate
4.7 Accuracy of Space Character Determination
CHAPTER 5. CONCLUSIONS AND FUTURE WORK
[1] Y.Lu, "Machine printed character segmentation - An overview," Pattern Recognition, Vol. 28, No. 1, pp.67-80, 1995.
[2] K. Fukunaga, Introduction to Statistical Pattern Recognition, Second Edition, Academic Press, Inc., 1990.
[3] R.C.Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley Publishing Company, 1993.
[4] C. H. Tung, A Study of Hand-written Chinese Text Recognition, Ph.D. Dissertation, Institute of Computer Science and Information Engineering, National Chiao Tung University, Taiwan, R.O.C., 1994.
[5] Y.Lu, "Machine printed character segmentation - An overview," Pattern Recognition, Vol. 28, No. 1, pp.67-80, 1995.
[6] Y. H. Chiou, Recognition of Chinese Business Cards, Master Thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Taiwan, R.O.C., 1996.
[7] Ch. H. Wu, Chinese Hand-written Characters Segmentation in Form Document, Master Thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Taiwan, R.O.C., 1997.
[8] S. H. Lee, Design of a Business Card Understanding System, Master Thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Taiwan, R.O.C., 1998.
[9] A. Zramdini and R. Ingold, "Optical Font Recognition Using Typographical Features," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, Auguest 1998.
[10]S. Tsujimoto and H. Asada, "Major Components of a Complete Text Reading System," Proc. IEEE, Vol. 80, No. 7, July 1992, pp.1133-1149.
[11]R. G. Casey and G. Nagy, "Recursive Segmentation and Classification of Composite Character Patterns," Proc. 6th Int. Conf. Pattern Recognition (Munich, Germany), 1982, pp.1023-1026.
[12]W. Niblack, An Introduction to Digial Image Processing, pp.115-116, Prentice Hall, 1986.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top