跳到主要內容

臺灣博碩士論文加值系統

(3.231.230.177) 您好!臺灣時間:2021/08/04 04:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:江珅薇
研究生(外文):Shen-Wei Chiang
論文名稱:相關學術論文集合關鍵詞擷取-學術領域自動命名
論文名稱(外文):The Extraction of Collective Key Terms from a Scientific Literature Corpus
指導教授:陳宗天陳宗天引用關係
指導教授(外文):Tsung-Teng Chen
學位類別:碩士
校院名稱:國立臺北大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:72
中文關鍵詞:引文分析資訊檢索語意相似性知識領域繪制(知識領域視覺化)
外文關鍵詞:Von Neumann kernelInformation retrievalSemantic similarityCitation database
相關次數:
  • 被引用被引用:6
  • 點閱點閱:888
  • 評分評分:
  • 下載下載:83
  • 收藏至我的研究室書目清單書目收藏:5
以引文分析為基礎的學術領域辨識已被討論很長一段時間,透過引文分析的特性,針對某
領域文獻的引用及被引用狀況,找出具影響性的文獻,進而解讀了解某一領域中可能存在的子領域及研究方向。近來,也有學者統整知識領域繪制流程,透過相關技術及演算法的探討,一探學術領域知識結構下的本質。然而,不管是單純的引文分析、或是遵循知識領域繪制流程還是須仰賴人工的方式解讀結果。

本研究設計一學術領域相關學術論文集合關鍵詞擷取、自動命名架構,透過資訊檢索技術
的輔助,探索學術論文集合的本質,根據每一集合特性不同,給予具代表性的關鍵詞彙,進而組成關鍵片語,幫助研究者快速了解某一學術領域的發展狀況。以往的資訊檢索技術,往往忘了考量詞彙間的語意相似性,造成所擷取的資訊不夠準確,因此,本研究所擬之研究架構在以引文分析為基礎下收集相關文獻,再經由因素分析找出潛在的因素,即可能存在的子領域後,進行內文分析,並且加以考量詞彙間的語意相似性,找出具代表性的詞彙,進而設計三種詞彙組合方式- TOPO、MI及MMI,組成由二個詞彙形成的關鍵片語。

本研究所設計之自動命名架構及關鍵片語組成方式,其中TOPO及MMI除了與人工解讀結果高度符合、能傳達每一論文論文的研究主題外,本研究也為學術領域辨識做一延伸應用。
Finding representative key terms from a collection of related literature is a useful technique with many potential applications. Since document is regarded as a bag of words and a word is regarded as a bag of documents, the relatedness and importance of words to documents can be derived from Von Neumann kernel function. This study developed novel key words extraction procedures that combined techniques originated from the researches of information retrieval, semantic similarity analysis, and mutual information calculation. These procedures were applied to two datasets collected from the CiteSeer citation database to gauge their effectiveness. These procedures are capable to extract representative key terms from literature corpora which were derived from the bibliometric method. The experimental results showed the key words derived by these techniques agreeing with the coding result of human readers.
中文摘要 I
英文摘要 II
目錄 III
圖次 V
表次 VI
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 3
1.3 研究範圍 3
1.4 研究流程 4
第二章 文獻探討 6
2.1 學術領域辨識 6
2.1.1 引文分析(Citation Analysis) 6
2.1.2 知識領域視覺化(Knowledge Domain Visualization) 8
2.2 資訊檢索(INFORMATION RETRIEVAL) 11
2.2.1 資訊檢索介紹 11
2.2.2 資訊檢索步驟 13
2.3 語意相似性 (SEMANTIC SIMILARITY) 16
2.3.1 von Neumann kernel 18
2.3.2 Exponential Kernel 19
2.4 詞彙間的關聯強度 19
2.4.1 關聯網路 19
2.4.2 相互資訊(Mutual Information) 21
第三章 研究方法 22
3.1 研究架構 22
3.2 前置階段 22
3.2.1資料收集 22
3.2.2引文分析 23
3.2.3 學術論文集合 25
3.3 視覺化階段 26
3.3.1矩陣轉換 26
3.3.2維度縮減 27
3.3.3視覺化工具 28
3.4命名階段 28
3.4.1 文章內文分析 28
3.4.2 語意相似度分析 32
3.4.3 關鍵片語組成 33
3.4.5 命名結果討論與評估 34
第四章 結果分析 35
4.1 學術領域-UC分析結果 35
4.2 學術領域-IV分析結果 45
4.3小結 54
第五章 結論與建議 57
5.1 結論 57
5.2 研究限制 58
5.3 未來研究建議 58
參考文獻 60
中文部份 60
英文部份 60
網站部份 63
附錄一 UC資料集分析文獻相關資訊 64
附錄二 IV資料集分析文獻相關資訊 69
參考文獻
中文部份
[1]蔡明月, 資訊計量學與文獻特性. 文獻計量學. 2003, 台北市: 國立編譯館.
[2]蔡明月, 引文索引與引文分析之探討. 圖書館學與資訊科學, 2005. 31(1): p. 45-53.
[3]黃俊英, 多變量分析. 第七版. 2000, 台北: 華泰文化. 364.
[4]陳宗天, 江珅薇, and 黃彬彬, 應用引文分析辨識學術研究領域-以知識視覺化為例, in 第十七屆國際資訊管理學術研討會(ICIM17). 2006, 義守大學資訊管理系: 高雄.
[5]林原宏, 知識結構分析-徑路搜尋、多向度量尺和集群分析的方法論探討. 測驗統計年刊, 1996. 4: p. 47-69.
[6]林彥成, 林旻宏, 蘇育葳, and 許中川, 改良式SOM應用於市場區隔及區塊特徵擷取. 第十三屆國際資訊管理學術研討會, 2002.
[7]吳萬益, 企業研究方法. 第二版. 2005, 台北: 華泰文化.
[8]江淑卿, 知識結構的重要特性之分析暨促進知識結構 教學策略之實驗研究, in 國立台灣師範大學 教育心理與輔導研究所博士論文. 1997: 台北市.
英文部份
[9]Y. Yang and J.O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. in Proceedings of the Fourteenth International Conference on Machine Learning 1997: Morgan Kaufmann Publishers Inc.
[10]R. Wullianallur and P.N. Sridhar, Research themes and trends in artificial intelligence: an author co-citation analysis. intelligence, 1999. 10(2): p. 18-23.
[11]H.D. White and K.W. McCain, Visualizing a Discipline: An Author Co-Citation Analysis of Information Science, 1972-1995. Journal of the American Society for Information Science (JASIS), 1998. 49.
[12]M. Subramani, S.P. Nerur, and R. Mahapatra, Examining the Intellectual Structure of Knowledge Management, 1990-2002 – An Author Co-citation Analysis MISRC Working Papers No. 03-23, 2003.
[13]R. Sproat and C.L. Shih, A statistical method for finding word boundaries in Chinese text. Computer processing of Chinese & Oriental Languages, 1990. 4(4): p. 336-351.
[14]H. Small and B.C. Griffith, The Structure of Scientific Literatures I: Identifying and Graphing Specialties. Science Studies, 1974. 4: p. 17-40.
[15]H. Small, Co-citation in the Scientific Literature: A New Measure of the Relationship between Two Documents. Journal of the American Society for Information Science and Technology (JASIST), 1973. 24: p. 265-269
[16]G. Siolas and F. d'Alché-Buc. Support Vector Machines Based on a Semantic Kernel for Text Categorization. in Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00). 2000.
[17]A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. in Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval( SIGIR96). 1996: ACM.
[18]Shawe-Taylor and N. John Cristianini, Kernel Methods for Pattern Analysis. 2004: Cambridge University Press.
[19]G. Salton, A. Wong, and C.S. Yang, A vector space model for automatic indexing. Communications of the ACM 1975. 18(11): p. 613 - 620
[20]G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. 1983, New York: McGraw Hill Book Co.
[21]S.E. Robertson, S. Walker, M.M. Beaulieu, M. Gatford, and A. Payne, Okapi at TREC-4, in Proceedings of the Fourth Text Retrieval Conference (TREC- 4). 1996, NIST Special Publication 500-236, October 1996. . p. 73-96.
[22]M.F. Porter, An algorithm for suffix stripping. Program, 1980. 14(3): p. 130-137.
[23]H. Nottelmann and N. Fuhr, From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications. Information Retrieval, 2003. 6(3-4): p. 263-388.
[24]S. Noel, C.H. Chu, and V. Raghavan. Visualization of document co-citation counts. in Information Visualisation, 2002. Proceedings. Sixth International Conference on. 2002.
[25]N. Liu, B. Zhang, J. Yan, Q. Yang, S. Yan, Z. Chen, F. Bai, and W.Y. Ma, Learning similarity measures in non-orthogonal space, in Proceedings of the thirteenth ACM international conference on Information and knowledge management D.C. Washington, USA Editor. 2004, ACM Press. p. 334 - 341
[26]J. Leskovec and J. Shawe-Taylor, Semantic Text Features from Small World Graphs in Subspace, Latent Structure and Feature Selection techniques: Statistical and Optimization perspectives Workshop. 2005: Slovenia.
[27]D.L. Lee, H. Chuang, and K.E. Seamons, Document ranking and the vector-space model. IEEE Software, 1997. 14(2): p. 67-75.
[28]R.I. Kondor and J. Lafferty. Diffusion Kernels on Graphs and Other Discrete Structures. in Proceedings of International Conference on Machine Learning(ICML2002). 2002.
[29]W.B. Kevin, N.W. Brian, and S.D. George, Domain visualization using VxInsight for science and technology management. Journal of the American Society for Information Science and Technology (JASIST), 2002. 53(9): p. 764-774.
[30]J. Kandola, J. Shawe-Taylor, and N. Cristianini, On the Application of Diffusion Kernels to Text Data, in NeuroCOLT Technical Report NC-TR-02-122. 2002.
[31]J. Kandola, J. Shawe-Taylor, and N. Cristianini, Learning Semantic Similarity. In NIPS, 2003. 15: p. P673-680.
[32]K.S. Jones, A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 1972. 28(1): p. 11-21.
[33]T. Ito, M. Shimbo, D. Mochihashi, and Y. Matsumoto, Application of kernels to link analysis, in Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 2005, ACM Press: Chicago, Illinois, USA.
[34]T. Ito, M. Shimbo, D. Mochihashi, and Y. Matsumoto. Investigating the effect of multiple communities on kernel-based citation analysis. in Proceedings of the 22nd International Conference on Data Engineering Workshops(ICDEW'06). 2006.
[35]E. Hatcher and O. Gospodnetic, Lucene in Action 2005: Manning 456.
[36]E. Garfield, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas. Science, 1988. 122(3159): p. 108-111.
[37]R. Ferrer and R.V. Sole. The small world of human language. in Proceedings of the Royal Society of London SeriesB-Biological Sciences. 2001.
[38]D.W. Dearholt and R.W. Schvaneveldt, Properties of Pathfinder Networks. Pathfinder associative networks:Studies in knowledge organization, ed. R.W. Schvaneveldt. 1990: Ablex Publishing Corp. 1-30.
[39]Y. Dai and T.E. Loh, A New Statistical Formula for Chinese Text Segmentation Incorporating Contextual Information, in ACM SIGIR99. 1999.
[40]G.G. Chowdhury, Introduction to Modern Information Retrieval 1999, London: Library Association Publishing. 452.
[41]T.T. Chen and L.Q. Xie. Identifying Critical Focuses in Research Domains. in Proceedings of the Information Visualisation, Ninth International Conference on (IV'05). 2005. London: IEEE Computer Society.
[42]H. Chen, Y.M. Chung, M. Ramsey, and C.C. Yang, An Intelligent Personal Spider(Agent) for Dynamic Internet/Intranet Searching. Decision Support Systems, 1998. 23(Special issue: intranets and intranetworking ): p. 41-58.
[43]C. Chen and R.J. Paul, Visualizing a knowledge domain's intellectual structure. Computer of IEEE Computer Society, 2001. 34(3): p. 65-71.
[44]C. Chen and S. Morris. Visualizing evolving networks: minimum spanning trees versus pathfinder networks. in IEEE Symposium on Information Visualization 2003(IV'03) 2003.
[45]C. Chen and L. Carr. A semantic-centric approach to information visualization. in 1999 IEEE International Conference on Information Visualization 1999.
[46]C. Chen, Visualising semantic spaces and author co-citation networks in digital libraries. Information Processing and Management 1999. 35: p. 401-420.
[47]C. Chen, Visualization of knowledge structures. Handbook of Software Engineering and Knowledge Engineering, 2002: p. 700.
[48]C. Chen, Searching for intellectual turning points: Progressive knowledge domain visualization The Proceeding of National Academy of Sciences of the USA(PNAS), 2004. 101: 5303-5310.
[49]R.R. Braam, H.F. Moed, and A.F.J.v. Raan, Mapping of science by combined co-citation and word analysis, I. Structural aspects. Journal of the American Society for Information Science (JASIS), 1997. 42(4): p. 233-251.
[50]K.D. Bollacker, S. Lawrence, and C.L. Giles, in 2nd International ACM Conference on Autonomous Agents. 1998, ACM press. p. 116-123.
[51]P.v.d. Besselaar and G. Heimeriks, Mapping research topics using word-reference co-occurrences: A method and an exploratory case study Scientometrics, 2006. 68(3): p. 377-393.
[52]Börner, Chen, and Boyack, Visualizing Knowledge Domains. Annual Review of Information Science & Technology, 2003. 37.
網站部份
[53]CiteSpace: Visualizing Patterns and Trends in Scientific Literature. [cited 2007 5]; Available from: http://cluster.cis.drexel.edu/~cchen/citespace/.
[54]Graph theory-WikiPedia, the free encyclopedia. [cited 2007 5]; Available from: http://en.wikipedia.org/wiki/Graph_theory#Network_flow.
[55]InfoVis CyberInfrastructure. [cited 2007 5]; Available from: http://iv.slis.indiana.edu/sw/index.html.
[56]InfoVis CyberInfrastructure - Latent Semantic Analysis. [cited 2007 4]; Available from: http://iv.slis.indiana.edu/sw/lsa.html.
[57]Apache Lucene Overview. [cited 2007 5]; Available from: http://lucene.apache.org/java/docs/.
[58]Pajek Network Analysis Package. [cited; Available from: http://www.itee.uq.edu.au/~comp4001/Assignments/Pajek.html.
[59]Stop Word List1. [cited 2007 5]; Available from: http://www.lextek.com/manuals/onix/stopwords1.html.
[60]PDFBox-Java PDF Library. [cited 2007 5]; Available from: http://www.pdfbox.org/.
[61]The Porter Stemming Algorithm. [cited 2007 5]; Available from: http://www.tartarus.org/~martin/PorterStemmer/.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top