跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.176) 您好!臺灣時間:2025/09/08 07:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃福助
研究生(外文):Fu-Chu Huang
論文名稱:利用多個相似度演算法實作程式碼抄襲系統
論文名稱(外文):A Source Code Plagiarism Detection System Using Multiple Similarity Algorithms
指導教授:郭忠義郭忠義引用關係
口試委員:劉建宏鄭永斌李允中薛念林鄭有進
口試日期:2013-07-08
學位類別:博士
校院名稱:國立臺北科技大學
系所名稱:資訊工程系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:65
中文關鍵詞:抄襲抄襲偵測相似度
外文關鍵詞:PlagiarismPlagiarism DetectionSimilarity
相關次數:
  • 被引用被引用:1
  • 點閱點閱:1015
  • 評分評分:
  • 下載下載:66
  • 收藏至我的研究室書目清單書目收藏:0
在教育上,作業抄襲一直是一個嚴重的問題。本研究的目的是提供一個可以客觀地偵測學生程式作業的抄襲系統,可以偵測的程式語言包括程序式語言、物件導向語言及一般文字。目前主要的程式碼抄襲研究大多使用單一的方法找出抄襲,然而每一個單一方法都有其缺陷,這些缺點可能會影響抄襲檢測的準確性和客觀性。因此,本研究提出利用三個不同的方法計算相似度,藉此能夠客觀地找出抄襲作業。研究的方法分別是文字分析方法、結構分析方法及屬性分析方法。在文字分析方法中,建立文字處理流程並應用 winnowing 演算法計算相似度。在結構分析方法中,增強 F(p)演算法用以轉換類別結構變成文字,再使用 winnowing演算法比對文字。此外,在屬性分析方法中,提出變數分析方法比較兩個程式變數的相似度,以及利用統計方法比較類別相似度。為了證明提出的方法有效性,利用十個常用的抄襲慣用方法製做測試檔案,用來對開發的系統、JPlag 及Wcopyfind 系統進行測試,結果發現提出的系統比較能夠有效找到抄襲。另外,利用三個資訊檢索方法進行系統評估,包括 Precision、Recall 和 F-Measure,結果顯示所提出的方法比 JPlag 系統更能有效地偵測抄襲。

Plagiarism is a serious issue in education. This study proposes a system for detecting plagiarism in programming assignments of students objectively. Most previous researches used a single method to find out plagiarism programs. However, every single method has its drawbacks which might undermine the accuracy and objectivity. This research proposes three methods, namely text-based, structure-based, and attribute-based methods, to compute similarity for detecting plagiarism fairly. In text-based method, a process flow is built and winnowing algorithm is employed. In structure-based method, a proposed algorithm is used to translate the class structure to text and winnowing algorithm is employed to compare the translated text. Furthermore, in attribute-based method, a variable analysis method is proposed to analyze the variable similarity and a statistic method is used to measure the class similarity. To demonstrate the desired effectiveness of the proposed approach, ten benchmark files made according to the often used tricks are fed to the proposed system, JPlag and Wcopyfind, respectively. The result shows that the proposed system is more effective to find out the plagiarisms. Next, information retrieval measures, including Precision, Recall and F-Measure, are employed to evaluate system. The proposed system is more effective than JPlag in plagiarism detection.

TABLE OF CONTENTS

摘 要..................................................... i
ABSTRACT................................................ ii
ACKNOWLEDGMENTS........................................ iii
LIST OF FIGURES........................................ vii
CHAPTER I INTRODUCTION................................ 1
1.1 Motivation.......................................... 1
1.2 Purpose of Research................................. 2
1.3 Overview of the paper .............................. 3
CHAPTER II RELATED WORK .............................. 4
2.1 Text-based analysis ................................ 5
2.2 Structure-based Analysis............................ 9
2.3 Attribute-based Analysis .......................... 13
2.4 Precision, Recall and F-measure ................... 15
2.5 Comparisons of Java, C and C++ .................... 16
CHAPTER III PLAGIARISM DETECTION METHODS ............ 18
3.1 Noise-removing and Code-formatting ................ 18
3.2 Text-based Method ................................. 20
3.3 Structure-based Method ............................ 23
3.4 Attribute-based Method ............................ 25
3.4.1 Variable analysis method ........................ 25
3.4.2 Class analysis method ........................... 27
3.4.3 Total attribute-based similarity ................ 28
3.5 Computation Method of Total Similarity ............ 28
CHAPTER IV SYSTEM DESIGN AND IMPLEMENTATION ......... 30
4.1 System User Interface, Process and Architecture ... 30
4.2 Supported Types of Program ........................ 33
4.2.1 Single-Class Single-Method Comparison ........... 34
4.2.2 Single-Class Multiple-Method Comparison ......... 34
4.2.3 Multiple -Class Multiple-Method Comparison ...... 35
4.2.4 Multiple -Files Comparison ...................... 35
4.3 Evaluation of Various Plagiarisms.................. 35
4.3.1 Changing comments................................ 37
4.3.2 Adding or deleting spaces ....................... 38
4.3.3 Inserting variables ............................. 38
4.3.4 Changing function types ......................... 39
4.3.5 Reordering statements ........................... 39
4.3.6 Inserting redundant functions ................... 40
4.3.7 Modifying control structures..................... 40
4.3.8 Reordering blocks ............................... 41
4.3.9 Modifying identifiers ........................... 41
4.3.10 Changing data types ............................ 42
4.4 Performance Evaluation ............................ 43
4.4.1 Test C Program .................................. 44
4.4.2 Test C++ Program ................................ 49
4.5 Random Insertions ............................... 55
4.6 Comparing System Features and Single method ....... 56
4.6.1 Feature Comparison with Other Systems ........... 56
CHAPTER V CONCLUSION AND FUTURE WORK ................ 60
REFERENCES ............................................. 62

[1] Z. Durić and D. Gašević, “A Source Code Similarity System for Plagiarism Detection,” The Computer Journal, 2012.
[2] L. Prechelt, G. Malpohl and M. Philippsen, “Finding plagiarisms among a set of programs with JPlag,” Journal of Universal Computer Science, 8(11): 1016-1038, 2002.
[3] A. Ahtiainen, S. Surakka and M. Rahikainen, “Plaggie: Gnu-licensed source code plagiarism detection engine for java exercises,” In Baltic Sea ’06: Proceedings of the 6th Baltic Sea conference on Computing education research, pp. 141–142, 2006.
[4] E. Flores, A. Barron-Cedeno, P. Rosso and L. Moreno, “Towards the Detection of Cross-Language Source Code Reuse,” Natural language processing and information systems lecture notes in computer science, 6716: 250-253, 2011.
[5] T.W.S. Chow and M.K.M. Rahman, “Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection,” IEEE Transactions on Neural Networks, 20: 1385-1402, 2009.
[6] J. Hage, “Programmeer plagiaatdetectie met marble,” Technical Report UU-CS-2006-062, Department of Information and Computing Sciences, Utrecht University, 2006.
[7] G. Canfora, A. Cimitile, U. De Carlini and A. De Lucia, “An Extensible System for Source Code Analysis,” IEEE Transactions on Software Engineering, 24(9): 721-740, 1998.
[8] X. Chen, B. Francia and M. Li, “Shared Information and Program Plagiarism Detection,” IEEE Transactions on Information Theory, vol. 50, pp. 1545-1551, 2004.
[9] H. Jiang and Z. Jiang , “The Study of Plagiarism Detection for Program Code, “ Advances In Computer Science And Education Applications Communications in Computer and Information Science, 202: 128-133, 2011.
[10] Z.A. Al-Khanjari, J. A. Fiaidhi, R. A. Al-Hinai and N. S. Kutt, “PlagDetect: a Java programming plagiarism detection tool,” ACM Inroads, vol. 1, pp.66-71, 2010.
[11] D. Grune and M. Huntjens, “Sim,” Available from: http://www.cs.vu.nl/~dick/sim.html.
[12] A. Aiken et al, “Moss,” http://theory.stanford.edu/~aiken/moss/.
[13] S. Schleimer, D. S. Wilkerson and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting”, Proceedings of the 2003 ACM SIGMOD international conference on Management of Data, June 2003.
[14] J. L. Donaldson, A. Lancaster and P. H. Sposato, “A Plagiarism Detection System,” Proceedings of the twelfth SIGCSE technical symposium on Computer science education, 1981.
[15] M. H Halstead, “Elements of Software Science (Operating and programming systems series),” Elsevier Science Inc, New York, USA, 1977.
[16] Longman Dictionary of Contemporary English. Longman, Harlow, Essex, new edition, 1987.
[17] Dictionary.com, Available from: http://dictionary.reference.com/.
[18] J. A. McCart and J. Jarman, “A Technological Tool to Detect Plagiarized Projects in Microsoft Access,” IEEE Transactions on Education, 51(2): 166-174, 2008.
[19] J. Y. Kuo and F. C. Huang, “Code analyzer for an online course management system,” Journal of Systems and Software, 83(12): 2478-2486, 2010.
[20] D. Gitchell and N. Tran, “A utility for detecting similarity in computer programs”, In Proceedings of the 30th ACM Special Interest Group on Computer Science Education Technology Symposium, New Orleans, LA, pp. 266-270, 1998.
[21] M. J. WISE, “String Similarity via Greedy String Tiling and Running Karp-Rabin Matching,” Online Preprint, Dec. 1993. Available from: http://vernix.org/marcel/share/RKR_GST.ps.
[22] J. Hage, P. Rademaker and N. van Vugt, “Plagiarism detection for Java: a tool comparison,” In: Computer Science Education Research Conference, pp. 33–46, 2011.
[23] C. Daly and J. Horgan. Patterns of plagiarism. In W. Dann, T. L. Naps, P. T. Tymann, and D. Baldwin, editors, Proc. of the 36th SIGCSE Technical Symposium on Computer Science Education (SIGCSE 2005), pages 383–387. ACM, 2005.
[24] M. Li and P. Vit’anyi, “An introduction to Kolmogorov complexity and its applications,” 2nd Ed., Springer, New York, 1997.
[25] Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Trans. Inform Theory, vol. IT-23, pp. 337-343, 1977.
[26] A. Cimitile and U. De Carlini, “Reverse Engineering. Algorithms for Program Graph Production,” Software—Practice and Experience, vol. 21, no. 5, pp. 519–537, 1991.
[27] G. Whale, “YAP3: Improved detection of similarities in computer program and other texts,” in Proceedings 27th SCGCSE Technology Symposium, Philadelphia, PA, pp. 130-134, 1996.
[28] H. Ding and M. Samadzadeh, “Extraction of Java program fingerprints for software authorship identification,” The Journal of Systems and Software, 72(1):49-57, 2004.
[29] K. Ottenstein, “An algorithmic approach to the detection and prevention of plagiarism,” SIGCSE Bull., vol. 8, no. 4, pp. 30-41, 1977.
[30] G. Whale, “Identification of program similarity in large populations,” Computer Journals, vol. 33, no. 2, pp. 140-146, 1990.
[31] W. Wong, S. Gokhale, “Static and dynamic distance metrics for feature-based code analysis,” The Journal of Systems and Software, Vol. 74, pp. 283-295, 2005.
[32] P. A. Brusilovsky, “Hypermedia user Modeling and User Adapted Interaction,” Ten Year Anniversary Issue (Alfred Kobsa, ed.) 11 (1/2), pages 87-110, 2001.
[33] J. Y. Kuo and L. Chu, “Intelligent Code Analyzer for Online Course Management System,” Proceedings of the 3rd ACIS International Conference on Software Engineering Research, Management & Applications. Michigan, U.S.A, 2005.
[34] G. Cosma, “An approach to source-code plagiarism detection and investigation using latent semantic analysis,” Ph.D. Thesis, University of Warwick, Department of Computer Science, 2008.
[35] WinMerge, Available from: http://winmerge.org/
[36] C. Gladisch, “How C differs from Java for symbolic program execution,” In Hendrik Tews, editor, Proceedings, C/C++ Verification Workshop, Oxford, United Kingdom, Oxford, United Kingdom, July 2007.
[37] P. Bothner, “Java/C++ integration - writing native Java methods in natural C++,” November 1997.
[38] L. Bloomfield, WCopyfind 4.1.1, Available from: http://plagiarism.phys.virginia.edu/

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top