跳到主要內容

臺灣博碩士論文加值系統

(44.211.31.134) 您好!臺灣時間:2024/07/25 18:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:葉展坤
研究生(外文):Chan-Kun Yeh
論文名稱:第二語言學習之機器學習華語文法檢查之研究
論文名稱(外文):第二語言學習之機器學習華語文法檢查之研究
指導教授:葉瑞峰
學位類別:碩士
校院名稱:國立嘉義大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
畢業學年度:104
語文別:中文
中文關鍵詞:華語文法檢查深度學習遞歸神經網路長短記憶單元句法剖析
外文關鍵詞:Mandarin Grammar detectionDeep LearningRecurrent Neural NetworksLong Short-term Memory
相關次數:
  • 被引用被引用:0
  • 點閱點閱:304
  • 評分評分:
  • 下載下載:26
  • 收藏至我的研究室書目清單書目收藏:1
華語對外語學習者是十分不容易學習的語言,主要原因華語的語法有許多特殊的用法,且組合詞語成分沒有一定的規則,因此外語學習者在學習華語的過程中,時常出現的錯誤類型分為四大類:漏字(Missing)、冗詞(Redundant),選詞錯誤(Selection)、語序錯誤(Disorder)。本論文主要貢獻如下:首先透過分群方法以詞語向量與詞性向量作為句子的表達方式,接著透過遞歸神經網路結合長短記憶單元(RNN-LSTM)訓練出語言機率模型,並依據剖析樹分析句子架構針對外語學習者其受母語影響,造成華語學習上容易犯錯之用法做為訓練的基本規則,建立各錯誤模型,並以句子機率判斷接近哪類模型。在實驗中針對未標記之語句進行分詞語斷句,實驗結果首先透過詞彙分群可以達到降低模型困惑度(Perplexity)之目的,並由改善率((Percentage Improvement))公式得知分群後的模型困惑度平均改善達20%~30%,接著由測試階段評估模型得到詞語與詞性之混合模型的效能最佳,與單一詞語模型比較,尤其是召回率(Recall)部分的改善率達54%,若是結合錯誤樣版(Error Pattern)模型,則召回率部分的改善率提升了6%。從上述數據可看出本論文之貢獻在於找出錯誤的數量,若與其他方法相比召回率提升約42%。
Mandarin is not simple language for foreigner. Even using Mandarin as the mother tongue, they have to spend more time to learn when they were child. The following issues are the reason why cause learning problem. First, the word is evolved by Hieroglyphic. So a character can express meanings independently, but become a word has another semantic. Second, the Mandarin's grammars have flexible rule and special usage. Therefore, the common grammatical errors can classify to missing, redundant, selection and disorder.
In this paper, we proposed the structure of the Recurrent Neural Networks using Long Short-term memory (RNN-LSTM). It can detect the error type from the foreign learner writing. The features based on the word vector and part-of-speech vector. And in the observation experiment, our method’s recall is better than other method.
中文摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vii
表目錄 viii
第一章 緒論 1
1.1 研究背景與動機 1
1.2 問題描述 3
1.3 章節安排 4
第二章 相關研究 5
2.1自然語言處理 5
2.2 深度學習 7
2.2.1 類神經網路 7
2.2.2相關知識 9
2.2.3 相關應用 10
第三章 系統架構 12
3.1 系統架構 12
3.2 訓練階段 14
3.3 測試階段 15
第四章 研究方法 16
4.1 詞彙分群 16
4.2 機器學習 18
4.2.1回歸神經網路 18
4.2.2 長短記憶單元 20
4.2.3 剖析樹 22
4.3 錯誤樣板 27
第五章 實驗結果與分析 28
5.1 實驗資料與比較對象 28
5.1.1 實驗資料 28
5.1.2 比較對象 30
5.2 實驗評估標準 30
5.2.1 困惑度 31
5.2.2 混淆矩陣 31
5.3 實驗結果分析 33
第六章 結論與未來方向 41
6.1 結論 41
6.2 未來研究方向 42
參考文獻 43
附錄一 中研院平衡語料庫詞類標記集 46
附錄二 常用量詞表 48
[1] David Graddol (1998). The future of English. London: The British Council.
[2] http://www.cw.com.tw/article/article.action?id=5010581
[3] J. Ye, S .Li, G .Hao, S. Li, Y. Yang & C. Jin (2011, October). The prefix and suffix query of Chinese word segmentation algorithm for maximum matching. In Image Analysis and Signal Processing (IASP), 2011 International Conference on (pp. 74-77). IEEE.
[4] Z .Li, M. Zhang, W. Che, T. Liu & W. Chen (2014). Joint Optimization for Chinese POS Tagging and Dependency Parsing. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(1), 274-286.
[5] C. H. Wu, H. Y. Su, Y. H. Chiu, & C. H. Lin (2007). Transfer-based statistical translation of Taiwanese sign language using PCFG. ACM transactions on Asian language information processing (TALIP), 6(1), 1.
[6] W. Y. Ma, & K. J. Chen (2005). Design of CKIP Chinese word segmentation system. Chinese and Oriental Languages Information Processing Society, 14(3), 235-249.
[7] X. Sun, & X. Nan (2010, August). Chinese base phrases chunking based on latent semi-CRF model. In Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on (pp. 1-7). IEEE.
[8] Z. Jinjin, & Z. Yangsen (2010, October). Research and implementation on a hybrid algorithm for Chinese automatic error-detecting. In Artificial Intelligence and Computational Intelligence (AICI), 2010 International Conference on (Vol. 1, pp. 413-417). IEEE.
[9] B. Zhang, Y. Zhou, & Y. Mao (2010, August). Extracting opinion sentence by combination of SVM and syntactic templates. In Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on (pp. 1-7). IEEE.
[10] 林千翔, 張嘉惠, & 陳貞伶. (2010). 結合長詞優先與序列標記之中文斷詞研究. 中文計算語言學期刊, 15(3-4), 161-179.
[11] H. H. Feng, A. Saricaoglu, & E. Chukharev-Hudilainen (2016). Automated Error Detection for Developing Grammar Proficiency of ESL Learners. calico journal, 33(1), 49.
[12] C. H. Wu, C. H. Liu, M. Harris, & L. C. Yu (2010). Sentence correction incorporating relative position and parse template language models. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1170-1181.
[13] L. H. Lee, L. P. Chang, K. C. Lee, Y. H. Tseng, & H. H. Chen (2013, November). Linguistic rules based Chinese error detection for second language learning. In Work-in-Progress Poster Proceedings of the 21st International Conference on Computers in Education (ICCE-13) (pp. 27-29).
[14] C. H. Yu, & H. H. Chen (2012). Detecting Word Ordering Errors in Chinese Sentences for Learning Chinese as a Foreign Language. In COLING (pp. 3003-3018).
[15] Shuk-Man Cheng, Chi-Hsin Yu, and Hsin-Hsi Chen. "Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners." Proceedings of COLING’14 (2014): 279-289.
[16] 神經元模型 https://zh.wikipedia.org/wiki/%E4%BA%BA%E5%B7%A5%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C#.E7.A5.
[17] F. Rosenblatt (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
[18] Geoffrey E. Hinton, and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507.
[19] X. Zheng, H. Chen, & T. Xu (2013, October). Deep Learning for Chinese Word Segmentation and POS Tagging. In EMNLP (pp. 647-657).
[20] P. Wiriyathammabhum, B. Kijsirikul, H. Takamura, & M. Okumura (2012). Applying Deep Belief Networks to Word Sense Disambiguation. arXiv preprint arXiv:1207.0396.
[21] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, & P. Kuksa (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12, 2493-2537.
[22] T. Mikolov (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April.
[23] M. Sundermeyer, H. Ney, & R. Schlüter (2015). From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(3), 517-529.
[24] G. A. Miller (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41.
[25] W. T. Chen, S. C. Lin, S. L. Huang, Y. S. Chung, & K. J. Chen (2010, August). E-HowNet and automatic construction of a lexical ontology. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations (pp. 45-48). Association for Computational Linguistics.
[26] F. J. Och (1999, June). An efficient method for determining bilingual word classes. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (pp. 71-76). Association for Computational Linguistics.
[27] R. Kneser, & H. Ney (1993, September). Improved clustering techniques for class-based statistical language modelling. In Eurospeech (Vol. 93, pp. 973-76).
[28] Qin Xue Herzberg and Larry Herzberg (2010). Basic Patterns of Chinese Grammar: A Student's Guide to Correct Structures and Common Errors. Berkeley, California: Stone Bridge Press. ISBN-10:1933330899.
[29] 常用量詞表http://resources.hkedcity.net/downloadResource.php?rid=1314569312&pid=991692700
[30] R. Agrawal, & R. Srikant (1994, September). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).ISO 690
[31] L. C. Yu, L. H. Lee, & L. P. Chang (2014, November). Overview of grammatical error diagnosis for learning Chinese as a foreign language. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA'14), Nara, Japan (pp. 42-47).
[32] L. C. Yu, L. H. Lee, & L. P. Chang (2015). Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis. Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA'15), Beijing, China, 31 July, 2015, pp. 1-6.
[33] P. L. Chen, S. H. Wu, & T. Wufeng (2015). Chinese Grammatical Error Diagnosis by Conditional Random Fields. ACL-IJCNLP 2015, 7.
[34] J. F. Yeh, C. K. Yeh, K. H. Yu, Y. T. Li, & W. L. Tsai (2015). Condition Random Fields-based Grammatical Error Detection for Chinese as Second Language. ACL-IJCNLP 2015, 105.
[35] I. Oparin, M. Sundermeyer, H. Ney, & J. L. Gauvain (2012, March). Performance analysis of neural networks in combination with n-gram language models. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 5005-5008). IEEE.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top