(3.235.41.241) 您好!臺灣時間:2021/04/15 04:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:楊淑雅
研究生(外文):Shu-Ya Yang
論文名稱:表單文件手寫資料欄位擷取之研究
論文名稱(外文):Form Field and Filled-in Data Extraction from Printed Documents
指導教授:李忠謀李忠謀引用關係
指導教授(外文):Greg C. Lee
學位類別:碩士
校院名稱:國立臺灣師範大學
系所名稱:資訊教育學系
學門:教育學門
學類:專業科目教育學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:96
中文關鍵詞:表單文件辨識表單手寫欄位擷取手寫資料萃取破碎字修補Run-Based 演算法
外文關鍵詞:Form document analysis and recognitionForm field extractionFilled-in data extractionBroken stroke reconstructionRun-based Algorithm
相關次數:
  • 被引用被引用:1
  • 點閱點閱:108
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究旨在針對表單文件自動化處理進行研究,針對表單處理中之手寫欄位分類、擷取與手寫資料擷取等問題提出解決的方法。在表單手寫欄位擷取的階段,分別利用表單中物件的尺寸大小、比例、物件整體性結構特性與物件方向性結構特徵,作為物件之分類特徵。為便於取得物件之結構特徵,本研究利用影像編碼的方式,將空白表單影像轉換成簡化的結構圖。同時為區辨說明欄位與包含說明文字之填寫欄位,分別利用欄位區域水平及垂直方向之像素投影,配合說明文字之分佈、大小與文字間距等特徵,進行分析辨識。
在手寫資料擷取的階段中,將已填寫之表單影像與已知空白表單樣本進行比
對後,根據相同類別的空白表單之手寫欄位資訊,擷取出已填寫表單中之手寫欄位資料。對於所擷取出之手寫資料中,因框線去除後,造成與框線相交之手寫筆畫斷裂的問題,提出判斷筆畫相交區段,並重建相交區段之手寫筆畫的方法,修補破碎手寫筆畫。
本研究之測試影像,共分為一般單純格式之表單影像與格式複雜之複合式表
單影像等兩類。由實驗結果可證明本研究所提出之方法,針對不同類型之表單影像,皆可得到不錯的效果。
Form document analysis is one of the most essential tasks in document analysis and recognition. The problems of form fields and filled-in data extraction are two important parts of form document analysis. For form field extraction, the first major task was to classify the preprinted text, lines, check boxes, text boxes and the tables of a form. This thesis proposes a method which based on direction-invariant global structural features and directional dependant structural features to classify the form fields, and then extract the filled-in spaces in a form document. Since tables can contain both name fields and data fields, for the second task, we used a method based on horizontal and vertical color histogram distribution features to segment the fields and extract the data fields. For filled-in data extraction, we propose a method which based on Run-based algorithm and the idea of interpolation to detect the character strokes overlapped by printed form frame and reconstruct the broken strokes after removing the frame line. The experimental results on different types of form
documents showed a 99% recognition rate on form fields extraction, and a 91% successful filled-in data extraction rate was achieved.
表目錄 ..................................................iii
圖目錄 ...................................................iv
第一章 緒論.................................................1
1.1 研究動機與目的..........................................1
1.2 研究範圍與限制..........................................3
1.3 論文結構................................................4第二章 文獻探討.............................................5
2.1 表單文件處理............................................6
2.1.1 結構切割..............................................6
2.1.2 結構表示方式..........................................8
2.1.3 區塊物件識別..........................................8
2.1.4 相似度比對............................................9
2.2 手寫資料處理............................................9
2.2.1 框線去除及破碎字修補...................................9
2.2.2 光學字元辨識.........................................10
第三章 系統簡介............................................11
第四章 方法與步驟...........................................15
4.1 表單物件的分類及結構分析.................................15
4.1.1 表單物件分類.........................................18
4.1.2 物件結構分析.........................................21
4.2 虛線重組...............................................26
4.3 底線去除...............................................29
4.4 填寫欄及說明欄之區別....................................32
4.4.1 表格欄位之擷取.......................................32
4.4.2 表格與方格填寫欄位之擷取...............................33
4.5 框線去除及破碎字修補....................................44
第五章 實驗結果與討論.......................................46
5.1 實驗資料來源...........................................46
5.2 實驗驗證...............................................48
5.2.1 實驗一、簡單結構之問卷式表單文件處理....................49
5.2.2 實驗二、複雜結構之複合式表單文件處理....................53
5.3 總結..................................................60
第六章 結論與未來研究.......................................61
6.1 結論..................................................61
6.2 未來研究...............................................62
參考文獻...................................................63
附錄 實驗用表單文件範本.....................................67
[1] S. Di Zenzo, L. Cinque, and S. Levialdi, “Run-based Algorithm for Binary Image Analysis and Processing,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 83-89, 1996.
[2] R. Zanibbi, D. Blostein, J. R. Cordy, “A Survey of Table Recognition,” International Journal on Document Analysis and Recognition, vol. 7, no. 1, pp. 1-16, 2004.
[3] Y. F. Zheng, C. S. Liu, X. Q. Ding and S. Y. Pan, “Form Frame Line Detection with Directional Single-Connected Chain,” Proc. Int. Conf. Document Analysis and Recognition, pp. 699-703, 2001.
[4] Y. F. Zheng, H. P. Li and D. Doermann, “A Parallel-Line Detection Algorithm Based on HMM Decoding,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 777-790, 2005.
[5] H. E. Nielson and W. A. Barrett, “Consensus-Based Table Form Recognition,” Proc. Int. Conf. Document Analysis and Recognition, pp. 906-910, 2003.
[6] Y. Y. Tang, H. Ma, J. M. Liu, B. F. Li and D. H. Xi, “Multiresolution Analysis in Extraction of Reference Lines from Documents with Gray Level Background,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 8, pp. 921-926, 1997.
[7] D. H. Xi and S. W. Lee, “Reference Line Extraction from Form Documents with Complicated Backgrounds,” Proc. Int. Conf. Document Analysis and Recognition, pp. 080-084, 2003.
[8] T. M. Lu and K. C. Fan, “Form Segmentation and Recognition by Clustering of Feature Points and Matching of Feature Graphs,” Proc. Workshop of Center of Excellence for Computer System Technology, pp. 120-133, 1994.
[9] S. W. Lam, L. Javanbakht and S. N. Srihari, “Anatomy of A Form Reader,” Proc. Int. Conf. Document Analysis and Recognition, pp. 506-509, 1993.
[10] P. Duygulu and V. Atalay, “A Hierarchical Representation of Form Documents for Identification and Rereival,” International Journal on Document Analysis and
Recognition, vol. 5, no. 1, pp. 17-27, 2002.
[11] J. H. Liu and A. K. Jain, “Image-based Form Document Retrieval,” Proc. Int. Conf. Pattern Recognition, vol. 1, pp. 626-628, 1998.
[12] J. H. Liu, X. Q. Ding and Y. S. Wu, “Description and Recognition of Form and Automated Form Data Entry,” Proc. Int. Conf. Document Analysis and Recognition, pp. 579-582, 1995.
[13] A. Busch, W. W. Boles, S. Sridharan and V. Chandran, “Detection of Unknown Forms from Document Images,” Proc. Workshop on Digital Image Computing, pp. 141-144, 2003.
[14] T. Watanabe and T. Sobue, “Layout Analysis of Complex Documents,” Proc. Int. Conf. Pattern Recognition, vol. 4, pp. 447-450, 2000.
[15] T. Watanabe, Q. Luo and N. Sugie, “Layout Recognition of Multi-Kinds of Table-Form Documnets,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 4, pp. 432-445, 1995.
[16] K. Luo, S. latifi, K. Taghva and E. Regentova, “Recognition and Identification of Form Document Layouts,” Proc. Int. Conf. Information Technology: Coding and Computing, pp. 352-356, 2004.
[17] M. Diligenti, P. Frasconi and M. Gori, “Hidden Tree Markov Models for Document Image Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 4, pp. 519-523, 2003.
[18] A. Amano and N. Asada, “Graph Grammar Based Analysis System of Complex Table Form Document,” Proc. Int. Conf. Document Analysis and Recognition, vol. 2, pp. 916-920, 2003.
[19] A. Amano, N. Asada and M. Mukunoki, “Modification Table Form Generation System based on The Form Recognition,” Proc. Int. Conf. Document Analysis and Recognition, vol. 2, pp. 659-662, 2004.
[20] A. Amano, N. Asada, T. Motoyama, T. Sumiyoshi and K. Suzuki, “Table Form Document Synthesis by Grammar-Based Structure Analysis,” Proc. Int. Conf. Document Analysis and Recognition, pp. 533-537, 2001.
[21] K. C. Fan, Y. K. Wang and M. L. Chang, “Form Document Identification Using Line Structure Based Features,” Proc. Int. Conf. Document Analysis and Recognition, pp. 704-708, 2001.
[22] H. C. Peng, F. H. Long, W. C. Siu, Z. R. Chi and D. D. G. Feng, “Document Image Matching Based on Component Blocks,” Proc. Int. Conf. Image Processing, pp. 601-604, 2000.
[23] S. Shimotsuji and M. Asano, “Form Identification based on Cell Structure,” Proc. Int. Conf. Pattern Recognition, pp. 793-797, 1996.
[24] 張貴雲,“表單手寫欄位資料之萃取”,碩士論文,國立師範大學資訊教育研究所,民國八十七年六月。
[25] S. Tabbone, L. Wendling and K. Tombre, “Matching of Graphical Symbols in Line-drawing Images Using Angular Signature Information,” Int. Journal on Document Analysis and Recognition, vol. 6, pp. 115-125, 2003.
[26] M. Ye, M. Bern and D. Goldberg, “Document Image Matching and Annotation Lifting,” Proc. Int. Conf. Document Analysis and Recognition, pp. 753-760, 2001.
[27] B. Yu and A. K. Jain, “A Generic System for Form Dropout,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 11, pp. 1127-1134, 1996.
[28] J. S. Chen and D. C. Tseng, “Overlapped-Character Separation and Reconstruction for Table-Form Documents,” Proc. Int. Conf. Image Processing, pp. 233-236, 1996.
[29] G. Nagy, “Twenty Years of Document Image Analysis in PAMI,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 38-62, 2000.
[30] 林育慈,“離線手寫印刷體英數字之辨識”,碩士論文,國立師範大學資訊教育研究所,民國八十七年六月。
[31] S. Marinai, M. Gori and G. Soda, “Artificial Neural Networks for Document Analysis and Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 23-35, 2005.
[32] C. F. Liou and H. C. Yang, “Hand-printed Character Recognition based on Spatial Topology Distance Measurement,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 18, no. 9, pp. 941-945, 1996.
[33] D. F. Chang, J. Y. J. Hsu and C. S. Fuh, “Handwritten Character Recognition Using a Neural Network,” Proc. OCR & DA, pp. 17-20, 1996.
[34] H. W. Hao, X. H. Xizo and R. W. Dai, “Handwritten Chinese Character Recognition by Metasynthetic Approach,” Pattern Recognition, vol. 30, no. 8, pp. 1321-1328, 1997.
[35] H. H. Tseng and W. H. Tsai, “Character Font Recognition Using Fourier Spectrum Features and Back-propagation Neural Network,” Proc. Int. Conf. Computer
Vision Graph and Image Processing, pp. 206-213, 1996.
[36] K. C. Fan and L. S. Wang, “A Run Length Histogram Based Approach to the Identification of Machine-printed and Handwritten Chinese Text Images,” Proc. Int. Conf. Computer Vision Graph and Image Processing, pp. 416-419, 1996.
[37] H. Shinjo, E. Handano, K. Marukawa, Y. Shima and H. Sako, “A Recursive Analysis for Form Cell Recognition,” Proc. Int. Conf. Document Analysis and Recognition, pp. 694-698, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 唐永泰(2006)。轉換型領導、工作動機與員工創新行為的關係。人力資源管理學報。6(4),47-66。
2. 周麗芳(2002)。華人組織中的關係與社會網絡。本土心理學研究,18,175-228。
3. 余安邦、楊國樞(1987)。社會取向成就動機與個我取向成就動機:概念分析與實徵研究。中央研究院民族學研究所集刊。64,51-98。
4. 唐錦秀(2004)。工作壓力對競爭與員工創造力關係之干擾分析。企業管理學報,60,115-142。
5. 黃家齊、黃荷婷(2006)。團隊成員目標導向對於自我與集體效能及創新之影響一個多層次研究。管理學報,23-3,327-346。
6. 楊中芳(1999)。人際關係與人際情感的構念化。本土心理學研究,12,105-180。
7. 廖述賢、費吳琛、陳志強(2006)。知識分享、吸收能力與創新能力關聯性研究-以台灣知識密集型產業為例。人力資源管理學報。6(2),1-21。
8. 蔡啟通、高泉豐(2004)。動機取向、組織創新氣候與員工創新行為之關係:Amabile 動機綜效模型之驗證。管理學報,21,571-592。
9. 羅家德、朱慶忠,(2004)。人際網絡結構因素對工作滿足之影響。中山管理評論,12-4,795-823。
10. 羅新興、周慧珍(2006)。組織成員知覺主管支持對其離職傾向之影響-探討工作負荷與成就動機之干擾作用。人力資源管理學報,第六卷第四期,67-80。
 
系統版面圖檔 系統版面圖檔