(3.227.249.155) 您好!臺灣時間:2021/05/07 06:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:魏忠志
研究生(外文):Chung-Chih Wei
論文名稱:SCI/SSCI文章比對方法之研究
指導教授:陳彥良陳彥良引用關係
指導教授(外文):Yen-Liang Chen
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:77
中文關鍵詞:倒傳遞類神經網路文章比對資料挖掘
相關次數:
  • 被引用被引用:5
  • 點閱點閱:318
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:75
  • 收藏至我的研究室書目清單書目收藏:5
文件檢索的技術行之有年,現今已廣泛地運用在各種線上文件檢索系統中。大部份的檢索工具是依據使用者輸入的查詢字串進行全文比對,或者將查詢字串做過部份處理之後再行比對文章。目前,已有為數眾多的研究者致力於發展文章分類、相關文章比對、文件相似度衡量方法及權重計算模型,更有許多改良過的方法實際應用於檢索系統中,逐步改善檢索效果與效率。
於本論文中,我們以現行著名之SCI/SSCI期刊文獻資料庫檢索工具為對象,根據此類文章之特色額外發展文章相似度比對方法。本研究專注於該檢索工具所擁有的標題、摘要、關鍵字及引用文獻,共四項不同特色的重要屬性,並利用著名之向量空間模式和TFIDF公式,計算文章向量的相似度。由於四項屬性之權重大小將影響兩兩文章之整體相似度,我們輔以倒傳遞類神經網路技術,建立屬性權重分配與兩兩文章之間的總相似度值之關係模式。而為了驗證ANN模組之成效,以及本文提出的文章比對方法與傳統比對方法之差異,本論文實際建立真實的期刊文章資料庫,並按照文章比對流程進行研究實作。最後則設計實驗,邀請實驗受測者測試文章比對之效果。
實驗結果顯示,我們所提出的文章比對方法,相較傳統方法而言,確實能大幅改善相似度比對效果。同時,我們也驗證了ANN模組確實帶來更佳的成效。
在SCI/SSCI檢索工具中,本研究期望能在保留標準欄位查詢功能之前提下,額外增加本論文所發展之文章相似度比對方法,藉以提昇檢索工具之彈性及實用性,協助研究學者或一般使用者更有效地查詢資料庫內相關文章。
第一章 緒論 1
第一節 研究背景 1
第二節 研究動機 2
第三節 研究目的 3
第四節 研究流程 4
第五節 論文架構 5
第二章 文獻探討與相關技術 6
第一節 文章前處理 6
第二節 文章比對 12
第三節 相關技術與方法 17
第三章 文章比對方法 24
第一節 文章比對流程概述 24
第二節 文章前處理 26
第三節 建構文章向量及引用文獻列表 29
第四節 處理新文章 41
第五節 文章相似度比對 43
第四章 實證評估 48
第一節 實驗發展工具與環境 48
第二節 建立文章資料庫 50
第三節 實驗設計及實驗流程 51
第四節 評估準則 58
第五節 實驗結果及分析 60
第五章 結論與未來展望 67
第一節 研究結論與貢獻 67
第二節 研究限制 67
第三節 未來展望 69
參考文獻 70
附錄A:STEMMING-PORTER’S ALGORITHM 75
[1]Amer-Yahia, S., Botev, C. and Shanmugasundaram, J., 2004. TeXQuery: A Full-Text Search Extension to XQuery. In Proceedings International WWW Conference, New York, USA.
[2]Amer-Yahia, S., Fern´andez, M., Srivastava, D. and Xu, Y., 2003. Phrase Matching in XML. Proceedings of the 29th VLDB Conference, Berlin, Germany.
[3]Baeza-Yates, R. and Ribeiro-Neto, B., 1999. Modern Information Retrieval. New York: The ACM Press.
[4]Buckley, C., SMART, Version 7.
[5]Callan, J.P., Croft, W.B. and Harding, S.M., 1995. The INQUERY Retrieval System. In DEXA 3. International Conferrence on Database and Expert Systems Applications, pp. 83-97, Berlin: Springer Verlag.
[6]Cohen, W., June 1998. Integration of Heterogeneous databases Without Common Domains Using Queries Based on Textual Similarity. In Proceeding ACM SIGMOD, 27(2): pp. 201-212, Seattle, WA.
[7]CORDIS: Community Research & Development Information Service, http://www.cordis.lu/en/home.html.
[8]Cutting, D. and Pedersen, J., 1990. Optimizations for Dynamic Inverted Index Maintenance. The 13th International Conference on Research and Development in Information Retrieval, pp. 405-411.
[9]Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R., 1990. Indexing by Latent Semantic analysis. Journal of the American Society for Information Sciences, 41, 6, pp. 391-407.
[10]Dickson, G.W., Senn, J.A. and Chervany, N.L., May 1977. Research in Management Information Systems: The Minnesota Experiments. Management Science, vol. 23, no. 9, pp. 913-923.
[11]Doszkocs, T.E., 1983. From Research to Application: The CITE Natural Language Information Retrieval System. In Research and Development in Information Retrieval, Salton, G. and Schneider, H.J., eds. (Lecture Notes in Computer Science Series, 146) Berlin: Springer-Verlag, pp. 251-262.
[12]Dumais, S.T., 1991. Improving the retrieval of information from external sources. Behavior Research Methods, Instruments & Computers, vol. 23, no. 2, pp. 229-236.
[13]Ellman, J., 2000. Using Roget's Thesaurus to Determine the Similarity of Texts. Ph.D. Thesis, School of Computing, Engineering and Technology, University of Sunderland, England.
[14]Fagan, J.L., March 1989. The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval. Journal of the American Society for Information Science (ASIS), Vol. 40, Iss. 2, pp. 115-132.
[15]Fellbaum, C., 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
[16]Fontaine, A., May 1995. Sub-element indexing and probabilistic retrieval in the POSTGRES database system. Technical Report CSD-95-876, University of California at Berkeley. ftp://s2k-ftp.CS.Berkeley. EDU/pub/postgres/papers/.
[17]Fox, C., 1990. A stop list for general text. SIGIR Forum 20(12), pp. 19-35.
[18]Frakes, W.B. and Fox, C.J., 2003. Strength and similarity of affix removal stemming algorithms. SIGIR Forum 37(1): pp. 26-30.
[19]Geffet, M. and Feitelson, D.G., Jun 2001. Hierarchical indexing and document matching in BoW. In first ACM/IEEE Joint Conferrence Digital Libraries, pp. 259-267.
[20]George Allan Alderman III, M.A., 2000. Information Retrieval using an adaptive resonance theory (ART)-based Neural Net. Ph.D. dissertation, Georgetown University, UMI Number: 9978116.
[21]Grossman, D.A. and Frieder O., 1998. Information Retrieval: algorithms and heuristics. Boston: Kluwer.
[22]Hammouda, K. and Kamel, M., 2004. Document Similarity Using a Phrase Indexing Graph Model. Knowledge and Information Systems, vol. 6, no. 6, pp. 710-727.
[23]ISI Web of Knowledge, Version 3.0, http://isi01.isiknowledge.com/portal.cgi.
[24]Korfhage, R.R., 1997. Information Storage and Retrieval. N.Y.: John Wiley and Sons.
[25]Kowalski, G.J. and Maybury, M.T., 2000. Information Storage and Retrieval Systems: Theory and Implementation. Kluwer International Series on Information Retrieval, Inre 8. Kluwer Academic.
[26]Lee, K.H., Choy, Y.C. and Cho, S.B., 2004. An Efficient Algorithm to Compute Differences between Structured Documents. IEEE Transactions on Knowledge and Data Engineering, 16(8): pp. 965-979.
[27]Lin, D., 1997. Using Syntactic Dependency as Local Context to Resolve Word-Sense Ambiguity. In Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics. Somerset, N.J.: Association for Computational Linguistics.
[28]Lin, S.H., Shih, C.S., Chen, M.C., Ho, J.M., Ko, M.T. and Huang, Y.M., 1998. Extracting Classification Knowledge of Internet Documents with Mining Term Associations: A Semantic Approach. SIGIR: pp. 241-249.
[29]Luhn, H.P., 1958. The automatic creation of literature abstracts. IBM Journal of Research, 2(4): pp. 159-165.
[30]Maayan, G. and Feitelson, D., 2001. Hierarchical Indexing and Document Matching in BoW. Proceedings of the first ACM/IEEE-CS joint conference on Digital Libraries, Roanoke, Virginia, pp. 259-267.
[31]Meadow, C.T., Wang, J. and Stamboulie, M., 1993. An Analysis of Zipf-Mandelbrot Language Measures and Their Application to Artificial Languages. Journal of Information Science, 19(4): pp. 247-258.
[32]Meadow, C.T., Boyce, B.R., and Kraft, D.H., 2000. Text Information Retrieval Systems. 2nd edition. San Diego: Academic Press.
[33]Michaelj, A.B., 1997. Data Mining Techniques For Marketing, sales, and Customer Support. Wiley Computer Publishing.
[34]Miller, G., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K., 1990. Introduction to WordNet: An on-line lexical database. J. Lexicography 3(4): pp. 235-244.
[35]Miller, G.A., 1995. WorldNet: a lexical database for English. Communications of the ACM, 38(11): pp. 39-41.
[36]Ng, H.T. and Zelle, J., 1997. Corpus based approaches to semantic interpretation in natural language processing. AI Magazine, 18(4): pp. 25-31.
[37]Palo Alto, C.A., 1987. Dialog Information Services. DIALOG System Seminar Manual, Problem Set 3.1.1, pp. 20.
[38]Patel-Schneider, P.F., Simeon, J., 2003. The Yin/Yang Web: A unified model for XML syntax and RDF semantics. IEEE Transactions on Knowledge and Data Engineering, 15: pp. 797-812.
[39]Petrakis, E.G.M. and Tzeras K., November 2000. Similarity Searching in the CORDIS Text Database. Software Practice and Experience, Vol. 30, No. 13, pp. 1447-1464.
[40]Principe, C.J., Euliano, R.N. and Lefebvre, W.C., 2000. Neural and Adaptive Systems: Fundamentals Through Simulations. John Wiley and Sons.
[41]Quah, T.S. and Srinivasan B., 2000. Utilizing Neural Networks in Stock Pickings. Proceedings of the International Conference on Artificial Intelligence.
[42]Raeisi, R., 2005. Modeling and Verification of Digital Logic Circuit Using Neural Networks. 2005 ASEE IL/IN Sectional Conference, Session B-T2-3.
[43]Rijsbergen, C.J., 1975. Information Retrieval. Butterworth.
[44]Salton, G. and Yang, C.S., 1973. On the Specification of Term Values in Automatic Indexing. J. Documentation 29(4), pp. 351-72.
[45]Salton, G. and McGill, M.J., 1983. Text Analysis and Automatic Indexing in Introduction to Modern Information Retrieval. New York: McGrae-Hill.
[46]Salton, G., July 1986. Another Look At Automatic Text Retrieval Systems. Communication of the ACM, vol. 29, no. 7, pp. 648-656.
[47]Salton, G. (editor), 1988. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison Wesley.
[48]Salton, G. and Buckley, C., 1988. Improving retrieval performance by relevance feedback. Computer Science Technical Report TR88-898, Department of Computer Science, Cornell University, Ithaca, N.Y.
[49]Salton, G., 1989. Automatic Text Processing. Addison-Wesley.
[50]Weiss, S.M., White, B.F., Apte, C.V. and Damerau, F.J., March 2000. Lightweight Document Matching for Help-Desk Applications. IEEE Intelligent Systems, vol. 15, no. 2, pp. 57-61.
[51]Spark Jones, K. and Furnas, G.W., November 1987. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6): pp. 420-442.
[52]Suarez, A., Noeda, M. and Palomar, M., 1999. A Method of Restricted Knowledge Acquisition from WordNet. Proceeding of the third International Conference on Knowledge-Based Intelligent Information Engineering System, IEEE, pp. 38-41.
[53]Turban E. and Aronson J.E., 2001. Decision Support Systems and Intelligent Systems. sixth edition, Prentice Hall.
[54]Tzeras, K. and Petrakis, E.G.M., 1999. Similarity searching in text databases with multiple field types. Proceedings, the fifteenth International Conference on Data Engineering, pp. 100.
[55]Utsuro, T., Ikeda, H., Yamane, M., Matsumoto, Y. and Nagao, M., 1994. Bilingual Text Matching using Bilingual Dictionary and Statistics. Proceedings of fifteenth International Conference on Computational Linguistics, pp. 1076-1082, Kyoto.
[56]Wei, J., Bressan, S. and Ooi, B.C., 2000. Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results. Proceedings of the First International Conference on Web Information Systems Engineering, pp. 366-373.
[57]Web WordNet, Version 2.0, http://wordnet.princeton.edu/cgi-bin/webwn.
[58]Yarowsky, D., 1995. Unsupervised Word Sense Disambiguation rivaling Supervised Method. Proceedings of the Thirty-third Annual Meeting of the Association for Computational Linguistics, pp. 189-196.
[59]Yunjae Jung, Haesun Park and Ding-zhu Du, 2001. A Balanced Term-Weighting Scheme for Effective Document Matching. Technical Report TR-01-009, Department for Mathematics and Computer Science, University of Mannheim.
[60]Zipf, G.K., 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley. Cambridge, MA, pp. 22-27.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 5. 葛永光,2000,菁英抉擇與制度選擇:總統制、內閣制、抑或雙首長制?,政策月刊,第59期,頁13-16。
2. 6. 周陽山,2001,「半總統制的考驗:台灣的政黨政治與權力運作」,華岡社科學報,第15期,頁15-22。
3. 7. 周陽山,1996,「總統制、議會制、半總統制與政治穩定」,問題與研究,第35卷第8期,頁50-61。
4. 9. 徐正戎,2000,「左右共治-雙首長制之宿命?」,政策月刊,第59期,頁8~12。
5. 20. 蘇子喬,1999,「從當前憲政體制的辯論釐清我國九七憲改後中央政府定位」,憲政時代,第25卷第2期,頁96-112。
6. 21. 蘇子喬,1999,「法國第五共和與台灣當前憲政體制之比較:以憲政選擇與憲政結構為中心」,美歐季刊,第13卷第4期,頁465-515
7. 23. 黃錦堂,2004,「我國中央政府體制的現況與展望」,月旦法學雜誌,第108期,頁9-19。
8. 24. 周育仁,1996,「總統直選對我國憲政體制之影響」,問題與研究,第35卷第8期,頁62-74。
9. 27. 薛化元,1997,「中華民國憲政藍圖的歷史演變-行政權為中心的考察」,月旦法學雜誌,第二十六期,頁10-22。
10. 39. 黃錦堂,2002,「機關爭議問題釋憲方法之應用」,憲政時代,第27卷第7期,頁65-106。
11. 40. 曲兆祥,1998,「民國八十六年修憲後總統、行政院與立法院權力運作關係及其問題分析」,行政管理學報,第一期,頁1-20。 41. 許志雄,2000,「政黨輪替在我國憲政發展上的意義-從統治機論的角度分析」,月旦法學雜誌,第61期,頁26-33。
12. 47. 陳淳文,1995,「法國公民投票制度簡述」,憲政時代,第21卷第4期,頁85-109。
13. 48. 蘇永欽,2001,「創制複決與諮詢性公頭-從民主理論與憲法的角度探討」,憲政時代,第27卷第2期,頁21-49。
14. 52. 彭堅汶,2000,「公民投票與台灣地區的憲政發展」,中山人文社會科學期刊,第8卷第1期,頁1-34。
15. 54. 湯紹成,2000,「從直接與間接民權的角度檢視瑞士與法國的公民投票」,問題與研究,第39卷第2期,頁67-78。
 
系統版面圖檔 系統版面圖檔