跳到主要內容

臺灣博碩士論文加值系統

(44.222.189.51) 您好!臺灣時間:2024/05/18 17:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張駿志
研究生(外文):Chun-Chih Chang
論文名稱:利用外部資訊提升非監督式文件分類績效
論文名稱(外文):Enhance Performance of Unsupervised Text Categorization by Using External Information
指導教授:許中川許中川引用關係
指導教授(外文):Chung-Chian Hsu
口試委員:許中川葉蕙菁藍友烽
口試委員(外文):Chung-Chian HsuHui-Chin Yehyu-feng Lan
口試日期:2014-06-11
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:中文
論文頁數:33
中文關鍵詞:文件分類相似度測量非監督式學習演算法
外文關鍵詞:Text classification,Similarity measureUnsupervised learning algorithms
相關次數:
  • 被引用被引用:0
  • 點閱點閱:217
  • 評分評分:
  • 下載下載:26
  • 收藏至我的研究室書目清單書目收藏:0
網路的發展,有越來越多的文字資訊被產生出來,但是要如何將海量的資料正確地有組織的分類是一件困難的工作,為了要達到這個目的,因此有許多監督式學習演算法被應用在文件分類領域上,而監督式演算法有一些缺點,要做監督式學習需要大量的標記文件來當訓練資料,而要收集標記文件是需要花很多時間和人力,而本論文提出了一個非監督式學習演算法應用在文件分類上,它能改善監督式學習的需要預先收集大量的標記資料這個缺點,利用維基百科、WordNet以及Google搜尋引擎當作外部知識庫,藉由這些外部知識庫能夠準確的計算關鍵字間的語義關係或相關度,並且準確的分類到正確的類別,由初步實驗結果可證明,我們提出的方法有更佳的績效結果。
With swift growth of online text, how to organize text data effectively has become a major issue. Text classification is the task of classifying documents into pre-defined categories. For this, many supervised classification methods have been proposed. But supervised learning methods have some disadvantage. The biggest bottleneck is the requirement of a large amount of training data for better classification performance. While unlabeled documents are simply collected and abundant, labeled documents are difficult to collect because labeling is usually done manually. The task is time-consuming. To overcome those disadvantages and achieve better classification accuracy without labeled data, we propose the combination of three external sources “Wikipedia”, ”WordNet” and ”Google distance” for text classification on unsupervised learning. The result of experiments shows that the combination of Wikipedia with WordNet achieves better performance than the individual methods
圖目錄.... iv
表目錄… v
一、 緒論 1
1.1研究動機 1
1.2 研究目的 1
二、 文獻探討 3
2.1 非監督式學習 3
2.2 相似度測量 3
三、 方法 6
3.1系統架構 6
3.2 預處理 7
3.2.1 命名實體識別 8
3.3 維基百科過濾文章關鍵字 10
3.4 維基百科測量兩字詞關係 11
3.5全面測量特徵選取 13
3.6 相似度測量 14
3.7 搜尋引擎計算兩字詞相關性 16
四、 實驗結果 18
4.1 資料集與實驗設計 18
4.2 績效指標與實驗結果 19
五、結論 24
參考文獻 25
附錄一 各種方法組合之公式列表 27

[1]Barak, L., Dagan, I., & Shnarch, E. (2009). Text categorization from category name via lexical reference. Paper presented at the Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers.
[2]Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. IEEE Transactions on Knowledge and Data Engineering, 7, 757-766.
[3]Chen, P.-I., & Lin, S.-J. (2010). Automatic keyword prediction using Google similarity distance. Expert Systems with Applications, 37(3), 1928-1938.
[4]Cilibrasi, R. L., & Vitanyi, P. M. (2007). The google similarity distance. Knowledge and Data Engineering, IEEE Transactions on, 19(3), 370-383.
[5]Joachims, T. (2001). A statistical learning learning model of text classification for support vector machines. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.
[6]Karthiga, M., Kalaivaani, P., & Sankarananth, S. (2013). A semantic similarity approach based on web resources. Paper presented at the Information Communication and Embedded Systems (ICICES), 2013 International Conference on.
[7]Ko, Y., Park, J., & Seo, J. (2004). Improving text categorization using the importance of sentences. Information Processing & Management, 40(1), 65-79.
[8]Ko, Y., & Seo, J. (2000). Automatic text categorization by unsupervised learning. Paper presented at the Proceedings of the 18th conference on Computational linguistics-Volume 1.
[9]Ko, Y., & Seo, J. (2009). Text classification from unlabeled documents with bootstrapping and feature projection techniques. Information Processing & Management, 45(1), 70-83.
[10]Lan, M., Tan, C. L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(4), 721-735.
[11]Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database, 49(2), 265-283.
[12]Lewis, D. D., Schapire, R. E., Callan, J. P., & Papka, R. (1996). Training algorithms for linear text classifiers. Paper presented at the Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval.
[13]McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. Paper presented at the AAAI-98.
[14]Milne, D., & Witten, I. H. (2012). An open-source toolkit for mining Wikipedia. Artificial intelligence, 194, 222–239.
[15]Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligenc, 448-453.
[16]Romero, M., Moreo, A., Castro, J. L., & Zurita, J. M. (2012). Using Wikipedia concepts and frequency in language to extract key terms from support documents. Expert Systems with Applications.
[17]Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
[18]Tsai, C.-H. (2013). Unsupervised Text Categorization Method Using Wikipedia Content and Linking Information. Department of Information Management National Yunlin University of Science & Technology Master Thesis.
[19]Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. Paper presented at the Proceedings of the 32nd annual meeting on Association for Computational Linguistics.
[20]Yang, J., Liu, Y., Zhu, X., Liu, Z., & Zhang, X. (2012). A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Information Processing & Management, 48(4), 741-754.
[21]Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2-3), 219-241.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top