(3.238.173.209) 您好!臺灣時間:2021/05/17 10:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:邱聖斌
研究生(外文):Sheng-Bin Chiu
論文名稱:中文文件表示法在文件分類中之比較
論文名稱(外文):Comparing Representations for Chinese Text Categorization
指導教授:蔡志忠蔡志忠引用關係
指導教授(外文):Jyh-Jong Tsay
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2001
畢業學年度:89
語文別:英文
論文頁數:38
中文關鍵詞:中文文件表示法文件自動分類
相關次數:
  • 被引用被引用:3
  • 點閱點閱:213
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:3
在論文中,我們研究幾種中文文件的表示法在文件自動分類中的影響,我們比較了word-based 和 n-gram-based的表示法,並且結合了幾種加權的方式,其中包含有t erm frequency (TF)、inverse document frequency (IDF) 以及inverse class frequency (ICF)。在我們對CNA的新聞文章所做的實驗結果中顯示出用 bi-gram 與用word-based 的表示方式所做出來的結果類似,並且以結合TF、IDF和ICF的加權方式所達到的結果是最好的。
In this thesis, we study the effects of various representations of Chinese documents for automatic text categorization. We make a comparison for word-based and n-gram-based representations when they are combined with weighting factors, such as term frequency(TF), inverse document frequency(IDF) and inverse class frequency(ICF). Experiment on CNA news collection shows that bigram achieves performance close to that of statistical word-based representations, and weighting methods that combine
TF, IDF and ICF achieve the best performance.
1 Introduction
2 Vectorization
2.1 Term Extraction
2.1.1 N-gram-based model
2.1.2 Word-based model
2.2 Term Selection
2.3 Term Clustering
2.4 Term Weighting
3 Clusterifiers
3.1 Rocchio Linear Classifier
3.2 K-Nearest Neighbor Classifier
3.3 Naive Bayes Classifier
3.4 Probabilistic Model for Linear Classifier
4 Experiment
4.1 Sample Document Collection
4.2 Performance Measure
4.3 Experimental Results
5 Conclusion
1 Central News Agency . http://www.cna.com.tw
2 Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern Information Retrieval. New York, NY: ACM Press, 1999, 513 pp.
3 L. Douglas Baker, Andrew McCallum. Distributional Clustering of Words for Text Classification. SIGIR 1998: 96-103, 1998.
4 Ryszard S. Michalski, Ivan Bratko, Miroslav Kubat. Machine Learning and Data Mining Methods and Applications. John Wiley and Sons LTD. 1998.
5 Tom M. Mitchell. Machine Learning. The McGraw-Hill Companies, Inc. 1997.
6 D. Lewis, R. Schapire, J. Callan, R. Papka. Training Algorithms for Linear Text Classifiers. Proceedings of ACM SIGIR, pp.298-306, 1996.
7 Jyh-Jong Tsay, Jing-Doo Wang. Term Selection with Distributional Clustering for Chinese Text Classification using N-grams. ROCLING 1999, Page 151-170,August 24.25,1999.
8 Jyh-Jong Tsay, Jing-Doo Wang. Comparing Classifiers for Automatic Chinese Text Categorization. Proceeding of National Computer Symposium 1999(NCS''99), R.O.C, Dec 20,21,1999.
9 Jyh-Jong Tsay, Jing-Doo Wang. A scalable approach for Chinese term extraction. In 2000 International Computer Sympoyium(ICS2000), Taiwan, R.O.C, pages 246-253, 2000.
10 Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learing(ICML''97), 1997.
11 Yih-Jeng Lin, Ming-Shing Yu, Shyh-yang Hwang and Ming-Jer Wu. A way to extract unknown words without dictionary form Chinese corpus and its application. In Research on Computational Linquistics Conference(ROCLING XI), pages 217-226, 1998.
12 Yiming Yang and Jan O. Pedersen. A re-examination of text categorizaion methods. In Proceedings of 22th Ann Int ACM SIGIR Conference on Research and Development Information Retrieval(SIGIR''94), pages 42-49, 1999.
13 Cho K. and Kim J. Automatic Text Categorization on Hierarchical Category Structure by using ICF(Inverted Category Frequency) Weighting. In Proceedings of KISS conference, pages 507-510, 1997
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top