跳到主要內容

臺灣博碩士論文加值系統

(44.192.22.242) 您好!臺灣時間:2021/08/01 12:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蘇嘉穎
研究生(外文):Ka-WengSou
論文名稱:利用語法結構與語意相似度建立改寫句子抄襲偵測方法
論文名稱(外文):Developing a Plagiarism Detecting Method of Paraphrasing Sentences by Syntactical Structure and Semantic Similarity
指導教授:王惠嘉王惠嘉引用關係
指導教授(外文):Hei-Chia Wang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:67
中文關鍵詞:改寫抄襲偵測語法結構分析語意相似度
外文關鍵詞:paraphraseplagiarism detectionsyntactical structuresemantic similarity
相關次數:
  • 被引用被引用:0
  • 點閱點閱:351
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著台灣高等教育全球化,學術研究領域的發展與貢獻成為評估國家經濟發展的重要指標之一,為了提升台灣各大學在全球高等教育的競爭力,教育部近幾年更加重視各大專院校的國際化程度,英語成為各大專院校發表學術研究成果的主要媒介。
然而,對英語為外語的學者而言,要以全英語撰寫研究成果並非易事,其英文寫作的訓練經常不足以獨自撰寫全英語的學術研究成果,而隨著網際網路的進步,配合功能強大的搜尋引擎,使得資訊的取得越來越容易,直接或間接地導致學者有意或無意地抄襲他人的想法。
由於許多學者對於抄襲的認知不正確,認為稍作修改內容就不算抄襲,並不知道自己已觸犯抄襲行為。然而,目前市面上抄襲偵測軟體其功能多為比對論文資料庫或網路資源,只能偵測出簡單的抄襲類型,且只有單純使用單一字詞為基礎進行比對,不但無法偵測文句改寫的現象,且無法教導學者如何適切的改寫句子等反抄襲方法。因此,我們更需主動協助或教育學者偵測所著文件是否抄襲。
有鑑於此,本研究將提出一個偵測方法能指引改寫句子期望改寫時不要犯下抄襲的行為,本偵測方法利用語法結構分析擷取句子中所有的片語,以片語為比對單位,改善單純使用單一字詞比對的準確性,並同時考量同義字替代及字詞次序的改變,計算句子的語意相似度及次序相似度。透過來源文件及使用者修改文件中句子的比對,找出可能有抄襲情形的改寫文件,並詳細分析其所使用的改寫手法,期望藉由分析結果,建議學者如何避免觸犯抄襲。
實驗結果發現本研究所提出的方法是可行的,實驗數據證實採用片語比對可改善傳統使用單一字詞的準確性,使用語意相似度也對結果有正面的影響。另外,結合PATH和WUP計算語意相似度,比單純使用PATH或WUP有較好的表現。
Along with globalization of higher education in Taiwan, the contribution and development of academic researches become one of national economic development indicators. In order to enhance the competitiveness of universities in Taiwan, Ministry of Education pay more attention to the degree of internationalization in recent years. As a result, English has become the main medium for universities to publish academic researches.
However, it is not an easy task for Taiwanese researchers─English as a foreign language learners─to compose a variety of writings in English. There is not enough English writing training for researchers to compose researches in English by themselves. With the advance of Internet, it is easier to obtain information using powerful search engines on the web. It leads researchers to copy the ideas of others intentionally or unintentionally.
In fact, many researchers’ recognition of plagiarism is incorrect. They think that it is not regarded as plagiarism when they modify the documents slightly, so they does not recognize their behavior is illegal. Nevertheless, the major function of the currently available plagiarism-detecting softwares which plays the role of checking possibly plagiarized papers with the essay database or Internet-based search engines. These existing systems can only detect simple kinds of plagiarism based on only single terms. It causes that the cases of paraphrase can not be detected, and systems are not able to guide researchers to paraphrase properly. Hence, there is a requirement of an environment which can help researchers and educate them detect whether their documents are plagiarized or not.
In this study, a new plagiarism detection method which can guide paraphasing the original sentences to avoid plagiarism will be proposed. The proposed method make use of the syntactical structure to retrieve all phrases of the sentences to improve the accuracy of using single terms. In addition, we considers the substitution and reversal of the terms, computing the semantic similarity and order similarity of the sentences. Find out the paraphrased documents which is suspected to avoid plagiarism through original document and user modified document. Finally, we are looking forward to suggest users how to avoid plagiarism.
After evaluation, the proposed method can improve the accuracy of using single term traditionally. The semantic similarity can also take an advantage of the results. Moreover, the performance of using PATH and WUP to calculate the semantic similarity is better than only using PATH or WUP.
第1章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 3
1.3 研究範圍與限制 4
1.4 研究流程 5
1.5 論文大綱 6
第2章 文獻探討 8
2.1 拼湊寫作 8
2.2 自然語言處理 9
2.2.1 詞性標註 9
2.2.2 語法分析 10
2.2.3 字根還原 11
2.3 語意相似度 12
2.3.1 英文語意相似工具 - WordNet 12
2.3.2 字詞語意相似度計算 13
2.4 抄襲偵測方法 14
2.5 現有的抄襲偵測系統 15
2.6 小結 18
第3章 研究方法 19
3.1 研究架構 19
3.2 語法處理階段 22
3.2.1 斷句 23
3.2.2 完全抄襲偵測 23
3.2.3 語法分析 24
3.2.3.1 詞性標註 24
3.2.3.2 片語擷取 25
3.2.4 字根還原 27
3.2.5 範例說明 28
3.3 片語識別階段 29
3.3.1 片語識別步驟 30
3.3.2 範例說明 30
3.4 語意比對階段 31
3.4.1 字詞、片語之語意相似度 32
3.4.1.1 字詞與字詞之語意相似度 33
3.4.1.2 字詞與片語之語意相似度 35
3.4.1.3 片語與片語之語意相似度 35
3.4.2 句子相似度計算 36
3.4.3 範例說明 37
3.5 改寫手法分析階段 39
3.5.1 改寫手法 39
3.5.2 範例說明 40
3.6 小結 41
第4章 系統建置與驗證 42
4.1 系統建置 42
4.1.1 系統建置環境 42
4.1.2 系統處理流程 42
4.2 實驗方法 44
4.2.1 資料來源 44
4.2.2 抄襲之定義 45
4.2.3 評估指標 46
4.2.4 實驗方法設計 47
4.3 實驗結果與分析 48
4.3.1 實驗一:門檻值τ的選擇 48
4.3.2 實驗二:探討使用片語為比對單位的成效 51
4.3.3 實驗三:探討語意資訊對結果的影響 53
4.3.4 實驗四:不同的語意相似度計算方法對結果的影響 55
4.3.5 實驗五:改寫手法分析方法之準確性 58
第5章 結論與未來研究方向 59
5.1 研究成果 59
5.2 未來研究方向 60
參考文獻 61
附錄 65
英文文獻
Aimmanee, P. (2011). Automatic Plagiarism Detection Using Word-Sentence Based S-gram. Chiang Mai Journal of Science, 38, 1-7.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval: Addison-Wesley Longman Publishing Co., Inc.
Chen, C.-Y., Yeh, J.-Y., & Ke, H.-R. (2010). Plagiarism Detection using ROUGE and WordNet. Journal of Computing, 2(3), 34-44.
Clough, P., & Stevenson, M. (2011). Developing a Corpus of Plagiarised Short Answers. Language Resources and Evaluation, 45(1), 5-24.
Culwin, F., & Lancaster, T. (2000). A Review of Electronic Services for Plagiarism Detection in Student Submissions. Paper presented at the LTSN-ICS 1st Annual Conference.
Howard, R. M. (1995). Plagiarisms, authorships, and academic death penalty. College English, 57(7), 788-806.
Howard, R. M. (2010). Writing Matters: A Handbook for Writing and Research. New York: McGraw-Hill.
Jiang, J. J., & Conrath, D. W. (1997). Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Paper presented at the International Conference Research on Computational Linguistics (ROCLING X).
Kakkonen, T., & Mozgovoy, M. (2010). Hermetic and Web Plagisrism Detection Systems for Student Essays-An Evaluation of the State-of-the-art. Journal of Educational Computing Research, 42(2), 135-159.
Kang, N., Gelbukh, A., & Han, S. (2006). PPChecker: Plagiarism Pattern Checker in Document Copy Detection. In P. K. I. P. K. Sojka (Ed.), Text, Speech and Dialogue, Proceedings (Vol. 4188, pp. 661-667).
Li, Y. H., McLean, D., Bandar, Z. A., O'Shea, J. D., & Crockett, K. (2006). Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8), 1138-1150.
Lin, D. (1997). Using Syntactic Dependency as Local Context to Eesolve Word Sense Ambiguity. Paper presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.
Losee, R. M. (2001). Natural Language Processing in Support of Decision-making: Phrases and Part-of-speech Tagging. Information Processing & Management, 37(6), 769-787.
Maurer, H., Kappe, F., & Zaka, B. (2006). Plagiarism - A Survey. Journal of Universal Computer Science, 12(8), 1050-1084.
Miller, G. A. (1995). WordNet - A Lexical Database For English. Communications of the Acm, 38(11), 39-41.
Mozgovoy, M., Kakkonen, T., & Cosma, G. (2010). Automatic Student Plagiarism Detection: Future Perspectives. Journal of Educational Computing Research, 43(4), 511-531.
Oetsch, J., Puehrer, J., Schwengerer, M., & Tompits, H. (2010). The System Kato: Detecting Cases of Plagiarism for Answer-set Programs. Theory and Practice of Logic Programming, 10, 759-775.
Oliva, J., Ignacio Serrano, J., Dolores del Castillo, M., & Iglesias, A. (2011). SyMSS: A Syntax-based Measure for Short-text Semantic Similarity. Data & Knowledge Engineering, 70(4), 390-405.
Patwardhan, S. (2003). Incorporating Dictionary and Corpus Information Into a Context Vector Measure of Semantic Relatedness. University of Minnesota, Duluth.
Pecorari, D. (2003). Good and Original: Plagiarism and Patchwriting in Academic Second-language Writing. Journal of Second Language Writing, 12(4), 317-345.
Pera, M. S., & Ng, Y.-K. (2011). SimPaD: A Word-similarity Sentence-based Plagiarism Detection Tool On Web Documents. Web Intelligence and Agent Systems, 9(1), 27-41.
Porter, M. F. (1980). An Algorithm For Suffix Stripping. Program-Automated Library and Information Systems, 14(3), 130-137.
Potthast, M., Barron-Cedeno, A., Stein, B., & Rosso, P. (2011). Cross-language Plagiarism Detection. Language Resources and Evaluation, 45(1), 45-62.
Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., & Rosso, P. (2009). Overview of the 1st International Competition on Plagiarism Detection. Paper presented at the PAN-09 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse and 1st International Competition on Plagiarism Detection.
Rabin, M. O. (1981). Fingerprinting by Random Polynomials. Center for Research in Computing Technology, Harvard University, Report TR-15-81.
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development And Application Of A Metric on Semantic Nets. IEEE Transactions on Systems Man and Cybernetics, 19(1), 17-30.
Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Paper presented at the Proceedings of the 14th international joint conference on Artificial intelligence.
Roig, M. (1997). Can undergraduate students determine whether the text has been plagiarized? The Psychological Record, 47, 113-123.
Roig, M. (1999). When college students’ attempts at paraphrasing become instances of potential plagiarism. Psychological Reports, 84, 973-982.
Sun, Y.-C. (2009). Using a Two-tier Test in Examining Taiwan Graduate Students' Perspectives on Paraphrasing Strategies. Asia Pacific Education Review, 10(3), 399-408.
Uzuner, Ö., Katz, B., & Nahnsen, T. (2005). Using Syntactic Information to Identify Plagiarism. Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, 37-44.
Walker, A. L. (2008). Preventing Unintentional Plagiarism: A Method for Strengthening Paraphrasing Skills. Journal of Instructional Psychology, 35(4), 387-395.
White, D. R., & Joy, M. S. (2004). Sentence-based Natural Language Plagiarism Detection. J. Educ. Resour. Comput., 4(4), 2.
Wu, Z., & Palmer, M. (1994). Verb Semantics and Lexical Selection. Paper presented at the 32nd. Annual Meeting of the Association for Computational Linguistics.
Yamada, K. (2003). What Prevents ESL/EFL Writers from Avoiding Plagiarism?: Analyses of 10 North-American college websites. System, 31(2), 247-258.
Zaka, B. (2009). Empowering Plagiarism Detection with a Web Services Enabled Collaborative Network. Journal of Information Science and Engineering, 25(5), 1391-1403.
網路資料
Bull, J., Colins, C., Coughlin, E., & Sharp, D. (2001). Technical review of plagiarism detection software report. Retrieved Nov 26, 2011, from http://www.jisc.ac.uk/uploaded_documents/luton.pdf
Canexus Inc. (2011). EVE2 - Essay Verification Engine., from http://www.canexus.com/
CFL Software Limited. (2011). CopyCatch. Retrieved Nov 17, 2011, from http://cflsoftware.com/
Howard, R. M. (2001). Plagiarism: What Should a Teacher Do? Retrieved Nov 21, 2011, from http://wrt-howard.syr.edu/Papers/CCCC2001.html
iParadigms. (2011). Turnitin.com. Digital assessment suite. Retrieved Nov 17, 2011, from http://turnitin.com
教育部高教司. (2004). 大學校務評鑑規劃與實施計畫─評鑑手冊. 2011年11月13日,取自:http://academic.ntou.edu.tw/service/dia/ntou/book1.pdf.
網站資料
The Comprehensive Perl Archive Network (CPAN)
http://search.cpan.org/
The Stanford NLP (Natural Language Processing) Group
http://nlp.stanford.edu/software/lex-parser.shtml
WordNet
http://wordnet.princeton.edu/
維基百科
http://www.wikipedia.org/
電子全文 電子全文(網際網路公開日期:20230101)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top