跳到主要內容

臺灣博碩士論文加值系統

(2600:1f28:365:80b0:2119:b261:d24c:ce10) 您好!臺灣時間:2025/01/21 07:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳慧貞
研究生(外文):Huei-Chen Wu
論文名稱:使用資料探勘探討多層式圖書自動分類系統之研究
論文名稱(外文):A Study on Multi-layered Automatic Book Classification System Using Data Mining
指導教授:郭俊桔郭俊桔引用關係
指導教授(外文):June-Jei Kuo
口試委員:黃明居陳舜德
口試委員(外文):Ming-Jiu HwangShun-Der Ryan Chen
口試日期:2015-07-24
學位類別:碩士
校院名稱:國立中興大學
系所名稱:圖書資訊學研究所
學門:傳播學門
學類:圖書資訊檔案學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:中文
論文頁數:130
中文關鍵詞:多層式圖書自動分類系統投票策略分類器資料探勘
外文關鍵詞:multi-layered automatic book classification systemvoting strategyclassifierdata mining
相關次數:
  • 被引用被引用:3
  • 點閱點閱:770
  • 評分評分:
  • 下載下載:45
  • 收藏至我的研究室書目清單書目收藏:4
圖書資料分類編目作業,為各級圖書館經營管理的核心,亦是最重要的基礎工作;例行性的分類編目事務,便是由館員依文意與內容主旨,決定該館藏所屬類別。但是國內的圖書館館員多半為圖書資訊領域背景,卻必須負責所有到館圖書的編目,因此常常有因為學科背景不足,造成分類困難的情形。再加上,近年各個學科領域皆有長足進步,圖書出版的數量大幅度增加,造成編目館員負擔日益沉重,除了影響新進館藏之上架時程外,更容易因為受到主觀認知差異性的影響,導致產生inter-consistency和intra-consistency一致性低落等編目品質問題。
本研究探討傳統單層式圖書分類系統的作法,並結合多種分類器的優點,提出使用投票策略之多層式圖書自動分類系統。為了探討多層式圖書分類系統的效能,分別使用兩種語料集(博碩士論文、網路書店書目)及其對應至圖書分類號的資料,作為訓練與測試語料。同時,針對博碩士論文的文件內容,探討各種內容組合對於文件特徵值擷取的影響後,找出應用於圖書自動分類之最佳內容組合。另外,針對各種分類器的組合,進一步探討多層式圖書分類器的最佳組合。最後,實驗結果證實,多層式圖書分類系統的正確率達99%,比傳統的單層式圖書分類系統,具有更佳的分類效能。


Cataloging books are the kernel and foundation of the management for the library at all levels. Most of librarians only understand the knowledge of the library information sciences, but they are responsible for bibliography of the knowledge fields. Due to lack of background knowledge the bibliography becomes more and more difficult for the librarians. Moreover, as the recent repid achievement in every knowledge field the amount of publishing increases very quickly, the bibliography load further increases. The good quality of bibliography cannot be maintained such as high inter-consistency and high intra-consistency of library classification.
Thus, this paper deals with issues of traditional one layered book classification systems and employs the advantages of various classifiers to propose a two layered book classification system using voting strategy. Moreover, the collection of dissertations from National Chung Hsing University and the bibliographies of network bookstore are used as the training and test corpus. The classification codes of each dissertation are employed as the gold standard as well. Each dissertation contains various content parts such as title, authors or cited papers et al. On the one hand, to understand the classification effect of all the combinations of content parts, various combinations are studied as well and the best combination is recommended. On the other hand, to obtain the best classification performance, the combination of classifier for multi-layered book classification system is studied and the best combination is also recommended as well. Finally, the experimental results show that the performance of the proposed multi-layered book classification system outperforms the traditional one layered book classification systems.


摘要 i
Abstract iii
目次 v
表目次 vii
圖目次 ix
第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的與問題 3
第三節 研究範圍與限制 3
第四節 名詞解釋 7
第二章 文獻探討 11
第一節 文件表示方法 13
第二節 分類器建構方法相關研究 14
第三節 評估分類器成效之方法 21
第四節 影響分類成效之因素 23
第五節 圖書自動分類之相關研究 24
第三章 研究設計與實施 27
第一節 研究架構 27
第二節 研究對象 29
第三節 研究工具 29
第四節 分類模組流程 30
第四章 語料分析與實驗 33
第一節 先導實驗-博碩士論文資料集 33
第二節 先導實驗-少量網路書店書目資料集 43
第三節 正式實驗-網路書店書目資料 49
第四節 效用評估與討論 62
第五章 結論與未來研究方向 67
第一節 結論 67
第二節 未來研究方向 69
參考文獻 71
附錄一 中文停用字 77
附錄二 英文停用字 79
附錄三 書目資料原始樣態 83
附錄四 經中文詞斷字系統處理後之書目樣態 87
附錄五 經軟體轉換後的特徵值-內容檔範例 91
附錄六 經軟體轉換後的特徵值-二進位格式範例 97
附錄七 使用WEKA軟體進行文件自動分類步驟 101


Aghdam, M. H., Ghasem-Aghaee, N., & Basiri, M. E. (2009). Text feature selection using ant colony optimization. Expert systems with applications, 36(3), 6843-6853.
Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., & Al-Rajeh, A. (2008). Automatic Arabic text classification.
AL-Nabi, D. L. A., & Ahmed, S. S. (2013). Survey on Classification Algorithms for Data Mining:(Comparison and Evaluation). Computer Engineering and Intelligent Systems, 4(8), 18-24.
Antonie, M.-L., & Zaiane, O. R. (2002). Text document categorization by term association. Paper presented at the Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on.
Borko, H., & Bernick, M. (1963). Automatic document classification. Journal of the ACM (JACM), 10(2), 151-162.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees: CRC press.
Cheatham, M., & Rizki, M. (2006). Feature and prototype evolution for nearest neighbor classification of web documents. Paper presented at the Information Technology: New Generations, 2006. ITNG 2006. Third International Conference on.
Chen, K.-h., & Wu, C.-t. (1999). Automatically Controlled-Vocabulary Indexing for Text Retrieval. Paper presented at the Proceedings of the 12th Research on Computational Linguistics Conference.
Chou, C.-H., Han, C.-C., & Chen, Y.-H. (2007). GA based optimal keyword extraction in an automatic Chinese web document classification system. Paper presented at the Frontiers of High Performance Computing and Networking ISPA 2007 Workshops.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. Information Theory, IEEE Transactions on, 13(1), 21-27.
Denoyer, L., & Gallinari, P. (2004). Bayesian network model for semi-structured document classification. Information Processing & Management, 40(5), 807-827.
Domingos, P., & Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple bayesian classi er. Paper presented at the Proc. 13th Intl. Conf. Machine Learning.
Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. Neural Networks, IEEE Transactions on, 10(5), 1048-1054.
Duda, P. E., & Richard, O. (1973). Hart, Pattern Classification and Scene Analysis: John Wiley and Sons, New York.
Escudero, G., Màrquez, L., & Rigau, G. (2000). Boosting applied to word sense disambiguation. Machine Learning: ECML 2000, 129-141.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.
Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1992). Knowledge discovery in databases: An overview. AI magazine, 13(3), 57.
Goebel, M., & Gruenwald, L. (1999). A survey of data mining and knowledge discovery software tools. ACM SIGKDD explorations newsletter, 1(1), 20-33.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.
Hamill, K. A., & Zamora, A. (1980). The use of titles for automatic document classification. Journal of the American Society for Information Science, 31(6), 396-402.
Han, J., & Kamber, M. (2006). Data Mining, Southeast Asia Edition: Concepts and Techniques: Morgan kaufmann.
Hornik, K., Buchta, C., & Zeileis, A. (2009). Open-source machine learning: R meets Weka. Computational Statistics, 24(2), 225-232.
Jaillet, S., Laurent, A., & Teisseire, M. (2006). Sequential patterns for text categorization. Intelligent Data Analysis, 10(3), 199-214.
King, M. A., Elder IV, J. F., Gomolka, B., Schmidt, E., Summers, M., & Toop, K. (1998). Evaluation of fourteen desktop data mining tools. Paper presented at the Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on.
KNIME (Konstanz Information Miner),. Retrieved 20, October, 2013, from http://www.knime.org/
Kwok, K. (1975). The use of title and cited titles as document representation for automatic classification. Information Processing & Management, 11(8), 201-206.
Lan, M., Tan, C. L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(4), 721-735.
Larkey, L. S. (1999). A patent search and classification system. Paper presented at the Proceedings of the fourth ACM conference on Digital libraries.
Larson, R. R. (1992). Experiments in automatic library of congress classification. JASIS, 43(2), 130-148.
Lee, C., & Lee, G. G. (2006). Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing & Management, 42(1), 155-165.
Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval Machine learning: ECML-98 (pp. 4-15): Springer.
Li, Y., Shiu, S. C.-K., Pal, S. K., & Liu, J. N.-K. (2006). A rough set-based case-based reasoner for text categorization. International journal of approximate reasoning, 41(2), 229-255.
Linoff, G. S., & Berry, M. J. (2011). Data mining techniques: for marketing, sales, and customer relationship management: John Wiley & Sons.
Lu, S.-H., Chiang, D.-A., Keh, H.-C., & Huang, H.-H. (2010). Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values. Knowledge-based systems, 23(6), 598-604.
Ma, B. L. W. H. Y. (1998). Integrating classification and association rule mining. Paper presented at the Proceedings of the 4th.
Orange-Data Mining Fruitful and Fun. Retrieved 20, October, 2013, from http://orange.biolab.si/
Pietramala, A., Policicchio, V. L., Rullo, P., & Sidhu, I. (2008). A genetic algorithm for text classification rule induction Machine Learning and Knowledge Discovery in Databases (pp. 188-203): Springer.
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
Quinlan, J. R. (1993). C4. 5: programs for machine learning (Vol. 1): Morgan kaufmann.
Quinlan, R. (2004). Data mining tools See5 and C5. 0.
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
Tanagra-A Free Data Mining Software for Teaching and Research. Retrieved 10, October, 2013, from http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html
Tauritz, D. R., Kok, J. N., & Sprinkhuizen-Kuyper, I. G. (2000). Adaptive information filtering using evolutionary computation. Information Sciences, 122(2), 121-140.
Torkkola, K. (2004). Discriminative features for text document classification. Formal Pattern Analysis & Applications, 6(4), 301-308.
Vapnik, V. (2000). The nature of statistical learning theory: springer.
Wahbeh, A. H., Al-Radaideh, Q. A., Al-Kabi, M. N., & Al-Shawakfa, E. M. (2011). A comparison study between data mining tools over some classification methods. International Journal of Adv anced Computer Science and Applications, Special Issue, 18-26.
Wei, C.-P., Lin, Y.-T., & Yang, C. C. (2011). Cross-lingual text categorization: Conquering language boundaries in globalized environments. Information Processing & Management, 47(5), 786-804.
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., . . . Philip, S. Y. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.
Xia, F., Jicun, T., & Zhihui, L. (2009). A text categorization method based on local document frequency. Paper presented at the Fuzzy Systems and Knowledge Discovery, 2009. FSKD''09. Sixth International Conference on.
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Paper presented at the ICML.
Yi, K. (2006). Challenges in Automatic Classification using Library Classification Schemes. Paper presented at the. World Library andI nformation Congress: 72ndIFLA General Conference and Council, Seoul.
Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition, 40(7), 2038-2048.
Zheng, Z., Wu, X., & Srihari, R. (2004). Feature selection for text categorization on imbalanced data. ACM SIGKDD explorations newsletter, 6(1), 80-89.
中央研究院語言所. 中央研究院現代漢語標記語料庫版簡介. Retrieved 2月21日, 2015, from http://app.sinica.edu.tw/cgi-bin/kiwi/mkiwi/kiwi.sh
王棯志, & 張俊盛. (2001). 適應性文件分類系統. Paper presented at the 第十四屆計算語言學研討會.
林昕潔. (2006). 以 SVM 與詮釋資料設計書籍分類系統. (碩士), 國立交通大學, 新竹市.
國家教育研究院. (2012). 雙語詞彙、學術名詞暨辭書資訊網. Retrieved 7月16日, 2014, from http://terms.naer.edu.tw/detail/1678994/
陳光華, 羅思嘉, & 林純如. (2002). 圖書資訊學學術期刊文獻主題編目一致性之探討.
陳信源, 葉鎮源, 林昕潔, 黃明居, 柯皓仁, 楊維邦, & 圖書館. (2009). 結合支援向量機與詮釋資料之圖書自動分類方法. 資訊科技國際期刊, 3(1), 2-21.
曾元顯. (2002). 文件主題自動分類成效因素探討. 68, 22.
曾元顯. (2012). 圖書館學與資訊科學大辭典. from http://terms.naer.edu.tw/detail/1679007/
曾綜源, & 吳俊儀. (2008). 文件內容來源對文件分類之績效評估. Paper presented at the 2008數位科技與創新管理研討會.
曾憲雄, 蔡秀滿, 蘇東興, 曾秋蓉, & 王慶堯. (2005). 資料探勘: 旗標出版股份有限公司.
黃純敏. ( 2002). 學術論文自動分類技術研討. 行政院國家科學委員會專題研究計畫成果報告(計劃編號:NSC90-2416-H-224-016).
黃嘉宏. (2008). 基於自動分類為基礎的圖書題名特徵擷取之研究-以輔助圖書分類系統為例. (碩士), 天主教輔仁大學, 台北縣.
蔡永橙, 黃國倫, & 邱志義. (2007). 數位典藏技術導論: 國立臺灣大學出版中心.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top