(54.198.151.108) 您好!臺灣時間:2017/06/26 13:18          離開系統
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

本論文永久網址: 
研究生:吳美真
研究生(外文):Mei-chen Wu
論文名稱:以字音為基礎之中文字詞錯誤偵測與修正
論文名稱(外文):Error Detection and Correction Based on Chinese Phonemic Alphabet in Chinese Text
指導教授:黃純敏黃純敏引用關係
指導教授(外文):Chuen-min Huang
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文出版年:2008
畢業學年度:94
語文別:英文
論文頁數:85
中文關鍵詞:字詞錯誤偵測字音語言模型字詞錯誤修正
外文關鍵詞:Error correction of Chinese textChinese phonemic alphabetlanguage modelError detection of Chinese text
相關次數:
  • 被引用被引用:0
  • 點閱點閱:439
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於中文特性、網際網路發達與輸入法使用的影響,因為字音而產生錯別字的情況愈來愈廣泛出現在網路文件、日常寫作及報章雜誌中。若能提供文字的自動偵測與修正的機制,則能提昇發佈文章的品質,且能減少人力校正的負擔。雖然其它語系的字詞偵測與修正的相關研究已行之有年,但在中文語系方面則尚有相當的發展空間,而且大多仍面臨低偵測率與修正率的問題。因此本研究提出以字音為基礎,並且運用未知詞偵測與語言模型所組成字詞錯誤偵測與修正模式,以期能更準確地進行錯誤字的偵測與修正。實驗結果顯示,本模式能有效偵測出多數錯別字並提供正確的候選詞,而且能有效提昇現有模式的偵測率與修正率。
Misspelling and misconception resulting from similar pronunciation appears more and more frequently in Chinese texts such as online documents, writings, newspapers and magazines because of influence of Chinese characters, a flourish of the internet and input method for computers. If automatic error detection and correction mechanism of Chinese texts is provided, the quality of the texts shall be enhanced, and the burden of manpower to proofread several times shall be released. However, researches on automatic error detection and correction of Chinese text undergo many challenges needed to be resolved and suffer from low detection and correction rate, even though those researches of Western text have widely used as a standard tool in document preparation. In view of the phenomenon in Chinese texts, this study proposes a model based on Chinese phonemic alphabet and consisted of unknown word detection and language model for error detection and correction. The experimental results demonstrate this model is effective to find out most of words spelled incorrectly and to automatically suggest optimal candidate word for users, and furthermore this model improves detection and correction rate.
摘要 i
Abstract ii
誌 謝 iii
List of Figures vii
List of Tables viii
1. Introduction 1
1.1. Research Background and Motivation 1
1.2. Research Objective 2
1.3. Research Scope and Restriction 3
1.4. Research Contribution 3
2. Literature Review 4
2.1. Automatic Detection and Correction in Text 4
2.2. Error Types in Text 7
2.3. Properties of Chinese language 8
2.4. Unknown Word Detection 9
2.5. Language Model 10
2.5.1 Witten-Bell Smoothing 11
2.5.2 Perplexity 12
3. System Architecture 14
3.1. Error Detection 15
3.1.1 Word Segmentations 16
3.1.1.1 Unknown Word Extraction 16
3.1.1.2 Extract Unknown Word Detection Result 17
3.1.2 Dubious Word Area Formation 18
3.1.2.1 Sentence Separation, Word Separation, Tags Filter and Dubious Word Location 18
3.2. Error Correction 19
3.2.1 Lexical Analysis 20
3.2.1.1 Extract Dubious Sentence and Word 20
3.2.1.2 Extract Candidate Word 21
3.2.1.3 Word Matching 21
3.2.2 Optimal Word Extraction 22
3.3. Language Model 24
3.4. Confusing Word Set 25
3.5. Lexicon 25
4. Experiment 26
4.1. Training Data Set 26
4.2. Confusing word Set 27
4.3. Lexicon 28
4.4. Evaluation 28
5. Experiment Results 29
6. Conclusion and Future Work 31
7. References 33
Appendix A 35
[1]楊欣怡, "學士中文程度 企業主管搖頭," vol. 2005. 台北報導: 中時晚報, 2005.
[2]王鴻儒, "高中職學生作文錯別字研究--以高雄市高中職學生作文為例," in 國文教學, vol. 碩士. 高雄: 國立高雄師範大學 2003, pp. 220.
[3]L. D. Harmon, "Automatic recognition of print and script," presented at Proceedings of the IEEE 1972.
[4]K. Kukich, "Technique for automatically correcting words in text," ACM Comput. Surv. , vol. 24 pp. 377 - 439 1992.
[5]R. A. Wagner, "Order-n correction for regular languages " Commun. ACM vol. 17, pp. 265-268 1974.
[6]C.-H. Chang, "A New Approach for Automatic Chinese Spelling Correction," presented at Proceedings of Natural Language Processing Pacific Rim Symposium ''95, Seoul, Korea, 1995.
[7]L. Zhang, M. Zhou, C. Huang, and M. Lu, "Approach in automatic detection and correction of errors in Chinese text based on feature and learning," presented at Proceedings of the 3rd world congress on Intelligent Control and Automation, Hefei, 2000.
[8]L. Zhang, M. Zhou, C. Huang, and H. Pan, "Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm," presented at The 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 2000.
[9]F. Ren, H. Shi, and Q. Zhou, "A hybrid approach to automatic Chinese text checking and error correction," in Proceedings of 2001 IEEE International Conference on Systems, Man, and Cybernetics, vol. 3. Tucson, USA, 2001, pp. 1693-1698.
[10]F. J. Damerau, "A technique for computer detection and correction of spelling errors," Communications of the ACM vol. 7, pp. 171-176, 1964.
[11]陳定安, 英漢比較與翻譯, 8 ed. 臺北市: 書林出版有限公司, 2005.
[12]何永清, 現代漢語語法新探, vol. 1, 1 ed. 臺北市: 臺灣商務, 2005.
[13]K. J. Chen and M. H. Bai, "Unknown Word Detection for Chinese by a Corpus-based Learning Method," presented at Computational Linguistics and Chinese Language Processing, 1998.
[14]黃昌寧 and 張小鳳, "自然語言處理技術的三個里程碑," in 外語教學與研究, vol. 2005, 2002, pp. 180∼187.
[15]C. D. Manning and H. Schütze, Foundations of statistical natural language processing. Cambridge: MIT Press, 1999.
[16]A. Papoulis, Probability, Random Variables, and Stochastic Processes, 2 ed. New York: McGraw-Hill, 1984.
[17]I. H. Witten and T. C. Bell, "The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression," Information Theory, IEEE Transactions on, vol. 37, pp. 1085, 1991.
[18]C. E. Shannon, "A Mathematical Theory of Communication " Bell System Technical Journal vol. 27, pp. 379-423
623-656, 1948.
[19]曹逢甫, "臺式日語與臺灣國語:百年來在臺灣發生的兩個語言接觸實例," 漢學研究, vol. 36, pp. p273-297, 2000.
[20]謝國平, "臺灣地區年輕人ㄓㄔㄕ與ㄗㄘㄙ真的不分嗎?," 華文世界, vol. 90, pp. p1-7, 1998.
[21]司徒愛蓉, 好玩錯別字遊戲 香港: 星島, 2005.
[22]司馬特, 錯別字出列, 1 ed. 臺北市: 商周出版, 2005.
[23]左秀靈, 錯別字辨正. 臺北巿: 臺灣商務, 1980.
[24]洪富連, 辨字集錦. 高雄巿: 復文, 1997.
[25]莊. 莊澤義, 一字之差. 臺北巿: 健琳, 1991.
[26]齊騁, 又見別字, 2 ed. 臺北市: 名人, 1980.
[27]樊善標, 校園常見錯別字手冊. 香港: 語文改進工作小組, 1998.
[28]謝邦振, 別再寫錯字了. 臺北市: 商周出版, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔