(44.192.10.166) 您好!臺灣時間:2021/03/05 08:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳宥光
研究生(外文):Yow-Kuang Chen
論文名稱:對手填文件影像之內容特徵作抽取,分類與應用
論文名稱(外文):Extraction, Classification, and Utilization of Content Features in Written Document Images
指導教授:蔡文祥蔡文祥引用關係
指導教授(外文):Wen-Hsiang Tsai
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:英文
論文頁數:112
中文關鍵詞:文件影像抽取分類應用區塊大小分類法群聚特徵
外文關鍵詞:Document ImageExtractionClassificationUtilizationArea ThresholdingGroupingFeatures
相關次數:
  • 被引用被引用:0
  • 點閱點閱:67
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文發展了一套對手填文件影像之內容特徵作抽取、分類與應用的系統。此系統主要可以分為三個主要步驟,第一個步驟是抽取手填文件影像之內容特徵,第二個步驟是將抽取之後的影像元件做分類,最後一步驟是對分類之後的結果作重現與儲存。在第一個步驟中,可再分為兩大類,即內容增強與元件抽取。在內容增強的步驟中,我們提出了幾種遮罩來修補表格框線抽取過程中所造成的破碎字。此外,我們也提出了兩種雜訊消除的方法,一種是消除因框線去除所造成的雜訊;另一種是消除因掃瞄影像所造成的雜訊。在元件抽取的步驟中,我們用尋找連接元件的方法,和幾項我們提出的合併與切割區塊的技巧,將影像中的內容元件完整的抽取出來。接著,我們提出了五種對內容元件分類的方法,可將元件分成預先印刷元件、手寫填入元件或是圖形元件並且可以辨認出印刷的分號。最後,我們對分類後的結果作表格重建與儲存。如此一來,我們同時達到了表格美化、數位化與壓縮的效果。良好實驗的結果證明了我們所提出的方法是可行的。

In this study, we propose a system for extraction, classification, and utilization of content features in written document images. This system process consists of three major stages. The first stage is the extraction of content features in written document images; the second stage is the classification of the components that are produced from the first stage; the last stage is the utilization of classification results, which include reconstruction and record. There are two chief steps in the first stage: content enhancement and component extraction. In content enhancement, we propose a masking method to restore broken strokes, which are produced by frame structure removal. Additionally, two noise reduction methods are proposed. One is used to reduce the noise resulting from the frame structure removal, and the other is used to reduce the noise resulting from the scanning operation. In component extraction, we propose several merging and splitting methods and use the region growing method to extract components from the image completely. Then five classification methods are proposed to classify components into three types: preprinted, handwritten, and graphic and to recognize printed colons. Finally, we finish the form reconstruction work and store the classified result. At this moment, digitization, layout enhancement and compression of written documents are also achieved. Experimental results show the feasibility and practicability of the proposed approaches.

Contents
ABSTRACT(in Chinese) .......................................................................i
ABSTRACT(in English) ........................................................................ii
ACKNOWLEDGMENTS ....................................................................iii
CONTENTS ..................................................................................................iv
LIST OF FIGURES ..................................................................................vii
Chapter 1 Introduction……………………………………… 1
1.1 Motivation ……………………………………………………………… 1
1.2 Survey of Related Studies ……………………………………………… 2
1.3 Overview of Proposed Approach ……………………………………… 4
1.3.1 Definition of Terminologies ……………………………………… 4
1.3.2 Assumption ……………………………………………………… 5
1.3.3 Overview of Proposed Approach ………………………………… 6
1.4 Thesis Organization ……………………………………………………… 8
Chapter 2 Extraction of Form Content …………………… 10
2.1 Introduction …………………………………………………………… 10
2.2 Form frame Noise Reduction and Content Enhancement ……………… 10
2.2.1 Broken Stroke Restoration ………………………………………… 12
2.2.2 Frame Noise Reduction …………………………………………… 18
2.2.3 Non-Frame Noise Reduction ……………………………………… 19
2.2.4 Experimental Result ……………………………………………… 20
2.3 Form Component Extraction …………………………………………… 25
2.3.1 Connecting Characters by Brushing ……………………………… 26
2.3.2 Region Growing …………………………………………………… 27
2.3.3 Neighbor-Regions Merging ………………………………………… 28
2.3.4 Noise Reduction by Region Spatial Relation ………………………… 31
2.3.5 Region Splitting ……………………………………………………… 32
2.3.6 Experimental Result ………………………………………………… 33
2.4 Discussion ………………………………………………………………… 39
Chapter 3 Classification of Form Component ………… 41
3.1 Introduction ……………………………………………………………… 41
3.2 Classification of Single Regions …………………………………………… 42
3.2.1 Area Thresholding …………………………………………………… 46
3.2.2 Classification of Weighted Features …………………………………… 48
3.2.3 Classification by Horizontal Neighbor Information …………………… 54
3.2.4 Printed Colon Recognition …………………………………………… 55
3.2.5 Classification by Distance Difference ………………………………… 57
3.2.6 Experimental Result …………………………………………………… 59
3.3 Classification of Multiple Regions ………………………………………… 66
3.3.1 Grouping ……………………………………………………………… 67
3.3.2 Experimental Results ………………………………………………… 70
3.4 Discussion ………………………………………………………………… 72
Chapter 4 Utilization of classified data …………………… 74
4.1 Introduction ……………………………………………………………… 74
4.2 Manual Correction ………………………………………………………… 74
4.3 Reconstruction of form Information ……………………………………… 76
4.3.1 Processing Form Contents …………………………………………… 76
4.3.2 Blank Form Reconstruction ………………………………………… 79
4.3.3 Filled-in Form Reconstruction ……………………………………… 80
4.4 Saving and Retrieval of Form Information ……………………………… 81
Chapter 5 Experimental Results and Discussions ………… 83
5.1 Experimental Results …………………………………………………… 83
5.2 Discussions ……………………………………………………………… 84
Chapter 6 Conclusions and Suggestions ………………… 106
6.1 Conclusions ……………………………………………………………… 106
6.2 Suggestions for Future Works …………………………………………… 108
References ……………………………………………………………………… 111

[1] Z. Lu, "Detection of Text Regions From Digital Engineering Drawings," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 4, pp. 431-439, 1998.
[2] D. N. Ying, E. J. Wang, L. Ye, W. Li, and Y. Wang, "A Study on Automatic Input and Recognition of Engineering Drawing," Proc. CAD/GRAPHICS, pp. 478-481, Hangzhou, China, Sept. 23-26, 1991.
[3] F. M. Wahl et al., "Block Segmentation and Text Extraction in Mixed Text/Image Documents," CVGIP, vol. 20, pp. 375-390, 1982.
[4] L. A. Fletcher and R. Kasturi, "A Robust Algorithm for Text String Separation Form Mixed Text/Graphics Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 10, no. 6, pp. 910-918, 1988.
[5] D. Dori and Y. Velkovitz, "Separation of Text From Graphics I : Engineering Drawings," Proceedings Int'l Workshop Graphics Recognition, Pennsylvania State Univ., Aug 1995.
[6] Y. C. Tseng and W. H. Tsai, "Form Segmentation and Component Classification for Clinic Data Image Analysis," proceeding of 7th Optical Character Recognition and Document Analysis Workshop, 1997.
[7] K. C. Fan and L. S. Wang, "Classification of document blocks using density feature and connectivity histogram," Pattern Recognition Letters, vol. 16, pp. 955-962, 1995.
[8] K. C. Fan, J. M. Lu, L. S. Wang, and H. Y. Liao, "Extraction of characters from form documents by feature point clustering," Pattern Recognition Letters, vol. 16, pp. 963-970, 1995.
[9] F. M. Wahl, K. Y. Wong, and R. G. Casey, "Block Segmentation and Text Extraction in Mixed Text/Image Documents," Computer Graphics and Image Processing, vol. 20, pp. 375-390, 1982
[10] S. L. Taylor, R. Fritzson, and J. A. Pastor, "Extraction of Data form Preprinted Forms," Machine Vision and Application, vol. 5, pp. 211-222, 1992.
[11] R. Kasturi, S. T. Bow, W. El-Masri, J. Shah, J. R. Gattiker, and U. B. Mokate, "System for Interpretation of Line Drawings," IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 978-991, 1990.
[12] C. P. and R. Kasturi, "Detection of Dimension Sets in Engineering Drawings," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 8, 1994.
[13] C. L. Tan and P. O. NG, "Text Extraction Using Pyramid," Pattern Recognition, vol. 31, no. 1, pp. 63-72, 1998.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔