跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.176) 您好!臺灣時間:2025/09/08 07:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林家禾
研究生(外文):Jia-He Lin
論文名稱:偵測抄襲之原始碼分析方法研究
論文名稱(外文):A study on effective approaches of source code analysis to detect plagiarism
指導教授:吳宜鴻吳宜鴻引用關係
指導教授(外文):Yi-Hung Wu
學位類別:碩士
校院名稱:中原大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:60
中文關鍵詞:抄襲偵測程式碼相似度混合式抄襲偵測
外文關鍵詞:Plagiarism detectionProgram similarityHybrid clone detection
相關次數:
  • 被引用被引用:0
  • 點閱點閱:316
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
從企業間的大型軟體程式碼剽竊到學生程式作業的抄襲,偵測程式抄襲一直是重要的課題。偵測程式抄襲方法大致上可以分成文本分析和結構分析兩種類型,文本分析方法大部分都採用單一演算法擷取部分字串,藉此估算兩兩程式之間的相似程度,再依相似度判斷是否抄襲。結構分析方法主要以樹狀結構的方式紀錄程式碼的結構語法,藉由探勘兩棵樹之間相似的部份以評估程式相似度。每一種演算法都有它的優缺點,只以單一方法評估有無抄襲是不夠全面的,所以本研究提出結合兩種類型的分析方法,希望藉此能夠綜合不同層面偵測程式抄襲。為了驗證可行性,實驗採用真實學生作業的程式碼,依照人工確認的實際抄襲名單評估準確度,與其它方法相較之下,本研究在各種指標的表現都較為優異。
From source code plagiarism among large software in enterprises to duplicates of programming assignments among students, code plagiarism detection have been an important issue at all times. The methods of code plagiarism detection can be roughly divided into two categories: textual analysis and structural analysis. Most of textual analysis methods adopt one single algorithm to extract a portion of strings from source code, compute the similarity between every two programs and then assess the possibility of plagiarism accordingly. Structural analysis methods mainly record the structural syntax in a program as a tree structure, discover the similar parts between every two trees and then estimate the similarity among programs accordingly. Every algorithm has its own pros and cons. Detection of code plagiarism by only one single algorithm is not comprehensive. Therefore, this thesis proposes an approach to integrate the methods of two categories in order to detect code plagiarism from different aspects. To verify the effectiveness, our experiments take into account the source codes from actual student assignments and evaluate the accuracy of our results by using a plagiarism list confirmed manually. Compared with the existing tools, our approach performs better in each of the accuracy measures.
目錄
摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 v
表目錄 vii
第一章 緒論 1
第二章 相關研究 11
第一節 抄襲偵測方法 11
第二節 相似度計算方法 22
第三章 主要方法 25
第一節 產生資料表示法 25
第二節 相似度計算 29
第三節 抄襲名單判定 32
第四章 實驗 36
第一節 實驗說明 36
第二節 系統參數對方法的影響 42
第三節 資料條件對於方法的影響 46
第四節 分組投票 49
第五節 與其他方法的效能比較 54
第五章 結論與未來展望 59
參考文獻 60
圖目錄
圖一、CodeSim輸出結果 2
圖二、WinMerge輸出結果 2
圖三、Jplag輸出結果 3
圖四、一個程式碼區塊的巢狀結構範例 6
圖五、圖四的程式碼區塊以一條符號集合序列表示 6
圖六、圖四的程式碼區塊以一個表單表式 6
圖七、系統流程 7
圖八、CCFinder的C語言符號轉換規則 12
圖九、原始程式碼 13
圖十、符號轉換後的結果 13
圖十一、CCFinder 符號比對 14
圖十二、範例程式碼 16
圖十三、Winnowing 指紋轉換步驟 16
圖十四、Jplag 比對範例 18
圖十五、範例程式碼 21
圖十六、PDG示意圖 21
圖十七、各類方法偵測能力比較 22
圖十八、程式碼組成 25
圖十九、前處理規則 26
圖二十、符號轉換規則 26
圖二十一、產生資料表示法架構 27
圖二十二、區塊轉換文本特徵符號表示法範例 27
圖二十三、結構特徵轉換符號表示法流程 28
圖二十四、結構特徵切割程式碼範例 28
圖二十五、區塊轉換成結構特徵符號表示法範例 29
圖二十六、區塊相似度分支統整 31
圖二十七、區塊連線範例 32
圖二十八、浮動門檻範例 33
圖二十九、表示法架構統整 34
圖三十、相似度計算方法統整 35
圖三十一、單次實驗流程 37
圖三十二、量測指標狀態關係 39
圖三十三、11-point範例 41
圖三十四、人工驗證流程 42
圖三十五、參數α比較之precision指標 43
圖三十六、參數α比較recall指標 43
圖三十七、參數α比較之F1-measure指標 44
圖三十八、固定門檻與浮動門檻F1-measure比較 45
圖三十九、以m值為基底九種方法precision比較 46
圖四十、以m值為基底九種方法之recall比較 47
圖四十一、以m值為基底九種方法之F1-measure比較 47
圖四十二、以m值為基底九種方法之被偵測出的結果數量比較 48
圖四十三、以m值為基底不同分群方式投票機制結果比較 51
圖四十四、作業長度分佈圖 53
圖四十五、以m值為基底不同分群方式投票機制結果比較之作業二 53
圖四十六、相關研究比較之precision指標 55
圖四十七、相關研究比較之recall指標 55
圖四十八、相關研究比較之F1-measure指標 56
圖四十九、相關研究與投票機制比較之11-point指標 57
圖五十、相關研究與投票機制比較圖之11-point 指標 58
表目錄
表一、選項代號對照表 37
表二、方法代號對照表 38
表三、最佳門檻之固定門檻與浮動門檻比較表 45
表四、分群投票代號對照 51
表五、投票機制與方法之F1-measure值比較表 52
表六、投票機制與方法之F1-measure值比較表 54
[1].G. Canfora, A. Cimitile, U. De Carlini, and A. De Lucia, "An Extensible System for Source Code Analysis," in IEEE Transactions on Software Engineering, pp. 721-740, 1998.
[2].S. Schleimer, D.S. Wilkerson, and A. Aiken, "Winnowing: Local Algorithms for Document Fingerprinter, " in Proceedings of ACM SIGMOD Conference, pp. 76-85, 2003.
[3].C. Liu, C. Chen, J. Han, and P. Yu, "GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis," in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 872-881, 2006.
[4].L. Prechelt, G. Malpohl, and M. Philippsen, "Finding Plagiarism among a Set of Programs with Jplag," in Journal of Universal Computer Science, 8 (11), pp. 1016-1038, 2002.
[5].T. Kamiya, S. Kusumoto, and K. Inoue, "CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code," in IEEE Transactions on Software Engineering, pp. 654-670, 2002.
[6].J. Y. Kuo, and L. Chu, "Intelligent Code Analyzer for Online Course Management System," in Proceedings of the 3rd ACIS International Conference on Software Engineering Research, Management & Applications, pp. 228-234, 2005.
[7].J. Y. Kuo, and F. C. Huang, "Code Analyzer for an Online Course Management System," in Journal of Systems and Software, pp. 2478-2486, 2010.
[8].S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, "Comparison and evaluation of clone detection tools," in IEEE Transactions on Software Engineering 33 (9), 577-591, 2007.
[9].R. Koschke, "Survey of Research on Software Clones," in Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, 2007.
[10].J.H. Johnson, "Substring Matching for Clone Detection and Change Tracking," in Proceedings of the 10th International Conference on Software Maintenance, Victoria, British Columbia, Canada, pp. 120-126, 1994.
[11].S. Horwitz, "Identifying the Semantic and Textual Differences Between Two Versions of a Program," in Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pp. 234-245, 1990.
[12].I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier, "Clone Detection Using Abstract Syntax Trees," in Proceedings of the 14th International Conference on Software Maintenance, Bethesda, Maryland, pp. 368-378, 1998.
[13].V. Wahler, D. Seipel, J.W. Gudenberg, and G. Fischer, "Clone Detection in Source Code by Frequent Itemset Techniques," in Proceedings of the 4th IEEE International Workshop Source Code Analysis and Manipulation, Chicago, pp. 128-135, 2004.
[14].R. Tairas, and J. Gray, "Phoenix-Based Clone Detection Using Suffix Trees," in Proceedings of the 44th Annual Southeast Regional Conference, Melbourne, Florida, pp. 679-684, 2006.
[15].M. Bruntink, A. van Deursen, R. van Engelen, and T. Tourwe, "On the Use of Clone Detection for Identifying Crosscutting Concern Code," IEEE Transactions on Software Engineering 31 (10), 804-818, 2005.
[16].Y. Higo, and S. Kusumoto, "Code Clone Detection on Specialized PDGs with Heuristics," in European Conference on Software Maintenance and Reengineering, Oldenburg, pp. 75-84, 2011.
[17].R. M. Karp, and M. O. Rabin, "Efficient randomized pattern-matching algorithms," in IBM Journal of Research and Development, pp. 249-260, 1987.
[18].M. J. Wise, "Yap3: Improved Detection of Similarities in Computer Program and other Texts," in Proceedings of the Twenty-seventh SIGCSE Technical Symposium on Computer Science Education, pp. 130-134, 1996.
[19].V. Levenshtein, "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," in Soviet Physics Doklady, pp. 707, 1966.
[20].J. a. Mendes-Moreira, C. Soares, A. M. Jorge, and J. F. D. Sousa, "Ensemble Approaches for Regression: A survey," ACM Comput. Surv. pp. 10: 1-10: 40, 2012.
[21].W. Yang, "Identifying Syntactic Differences Between Two Programs," in Software Practice and Experience 21 (7), pp. 739-755, 1991.
[22].L. Jiang, G. Misherghi, Z. Su, and S. Glondu, "DECKARD: Scalable and Accurate Treebased Detection of Code Clones," in Proceedings of 29th International Conference on Software Engineering, Minneapolis, pp. 96-105, 2007.
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top