( 您好!臺灣時間:2021/07/29 08:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


論文名稱(外文):A Hybrid Classification Method for Conference Information
指導教授(外文):Hei-Chia Wang
外文關鍵詞:text miningfeature selectionSVMhybrid classification
  • 被引用被引用:1
  • 點閱點閱:122
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
為了要讓研究學者快速找尋到適合的會議資訊,本研究將利用文字探勘技術過濾,並將會議資訊分類,以期能蒐集完整的資訊並透過分類,讓使用者可以容易找到適合自己有興趣的會議。因過去文獻的傳統分類演算法像是Support Vector Machine、Decision Tree和Naïve Bayes Classifier,並未有專門針對學術會議資訊做處理,如只用傳統文字方式分類可能會造成分類錯誤的情形發生。故本研究的目的為設計一為以學術會議資訊的分類演算法。

There are many researchers who want to realize the latest research topic and exchange information with others. They will surf the Internet for scholar conference information, and choose some of them to attend. Some websites have provided part of conference information, but most of them cannot help users find the information users really want to explore; besides, it is a hard work to filter the searched conference information by human. Hence, it is an important issue to help reseachers find out the suited conference information from the huge dataset to attend.
To find out the suited conference information efficiently for researcher, this study will classify the conference by text mining. The previous references of traditional classification algorithm like Decision Tree, Naïve Bayes Classifier and Support Vector Machine are not designed to classify documents of conference information, so when we classify these academic documents, we may get some incorrect answers. Therefore, the goal of this study is designing a classification algorithm for conference information.
Because there are many terminology nouns or phrases which consist of two words in the conference, when we analyze the importance of the terms, we should take this situation into consideration. Moreover, there are pros and cons in different existing classification algorithms, so the hybrid classification is adopted to integrate the traditional algorithm. We expect the new method designing for conference information can help researchers find the suited conferences efficiently and exactly.

第1章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 研究範圍與限制 4
1.4 研究流程 5
1.5 論文大綱 6
第2章 文獻探討 8
2.1 資料檢索 8
2.2 自然語言處理 10
2.2.1 詞性標註 10
2.2.2 字根還原 11
2.3 特徵選取 11
2.3.1 文件頻率 12
2.4 文件分類方法 12
2.4.1 簡單貝氏分類器(Naïve Bayes Classifier) 14
2.4.2 支援向量機(SVM) 15
2.4.3 決策樹(Decision Tree) 17
2.5 會議資訊網站 19
2.5.1 All Conference 19
2.5.2 Conference Alert 20
2.5.3 DBWorld 21
2.6 混合式分類法 22
2.7 小結 22
第3章 研究方法 23
3.1 研究架構 23
3.2 資料蒐集與前處理 25
3.3 訓練資料特徵選取 26
3.4 混合式分類模組 31
3.4.1 測試資料過濾 31
3.4.2 混合式分類 32
3.5 小結 35
第4章 系統建置與驗證 36
4.1 系統建置環境 36
4.2 實驗設計 37
4.2.1 實驗資料來源 37
4.2.2 前處理階段 38
4.2.3 特徵選取 38
4.2.4 Classifier 38
4.2.5 評估指標 38
4.3 實驗結果分析 39
4.3.1 實驗一:特徵選取方法比較 39
4.3.2 實驗二:混合式分類方法與傳統分類方法之比較 53
第5章 結論及未來研究方向 62
5.1 結論 62
5.2 未來研究方向 64
參考文獻 65

Bonnie Jean, D. (2001). Review of Natural Language Processing in R.A. Wilson and F.C. Keil (Eds.), The MIT Encyclopedia of the Cognitive Sciences. Artificial Intelligence, 130(2), 185-189.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory, Pittsburgh, Pennsylvania, United States.
Cheng, M. Y., Peng, H. S., Wu, Y. W., & Chen, T. L. (2010). Estimate at Completion for construction projects using Evolutionary Support Vector Machine Inference Model. Automation in Construction, 19(5), 619-629.
Cordon, O., Herrera-Viedma, E., Lopez-Pujalte, C., Luque, M., & Zarco, C. (2003). A review on the application of evolutionary computation to information retrieval. International Journal of Approximate Reasoning, 34(2-3), 241-264.
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory: Wiley-Interscience.
Debska, B., & Guzowska-Swider, B. (2011). Decision trees in selection of featured determined food quality. Analytica Chimica Acta, 705(1-2), 261-271.
Duchrow, T., Shtatland, T., Guettler, D., Pivovarov, M., Kramer, S., & Weissleder, R. (2009). Enhancing navigation in biomedical databases by community voting and database-driven text classification. Bmc Bioinformatics, 10.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. . Paper presented at the Paper presented at the Proceedings of the seventh international conference on Information and knowledge management, Bethesda, Maryland, United States.
Fan, C.-Y., Chang, P.-C., Lin, J.-J., & Hsieh, J. C. (2011). A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Applied Soft Computing, 11(1), 632-644.
Frakes, W. B., & Baeza-Tates, R. (1992). Information Retrieval: Data Structures and Algorithms: Englewood Cliffs, N.J. : Prentice Hall.
Galavotti, L., Nardi, V., Sebastiani, F., & Simi, M. (2000). Feature Selection and Negative Evidence in Automated Text Categorization. Paper presented at the Proceedings of the 4 th European Conference on Research and Advanced Technology for Digital Libraries, ECDL-00.
Ganiz, M. C., George, C., & Pottenger, W. M. (2011). Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification. Ieee Transactions on Knowledge and Data Engineering, 23(7), 1022-1034.
Garbarine, E., DePasquale, J., Gadia, V., Polikar, R., & Rosen, G. (2011). Information-theoretic approaches to SVM feature selection for metagenome read classification. Computational Biology and Chemistry, 35(3), 199-209.
Gonzalez-Albo, B., & Bordons, M. (2011). Articles vs. proceedings papers: Do they differ in research relevance and impact? A case study in the Library and Information Science field. Journal of Informetrics, 5(3), 369-381.
Govindarajan, M., & Chandrasekaran, R. M. (2011). Intrusion detection using neural based hybrid classification methods. Computer Networks, 55(8), 1662-1671.
Harding, J. A., Shahbaz, M., Srinivas, & Kusiak, A. (2006). Data mining in manufacturing: a review American Society of Mechanical Engineers (ASME). Journal of Manufacturing Science and Engineering 128(4), 969–976.
Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. Lecture Notes in Computer Science, 1398, 137-142.
Kauchak, D., & Chen, F. (2005). Feature-based segmentation of narrative documents. Paper presented at the Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, Ann Arbor, Michigan.
Kim, S. B., Han, K. S., Rim, H. C., & Myaeng, S. H. (2006). Some effective techniques for naive Bayes text classification. [Article]. Ieee Transactions on Knowledge and Data Engineering, 18(11), 1457-1466.
Kumar, M. A., & Gopal, M. (2010). A hybrid SVM based decision tree. Pattern Recognition, 43(12), 3977-3987.
Larkey, L. S., & Croft, W. B. (1996). Combining classifiers in text categorization. Paper presented at the Paper presented at the Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. , Zurich, Switzerland.
Lewis, D. D., & Ringuette, M. (1994). A Comparison of Two Learning Algorithms for Text Categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval 81-93.
Li, S., Xia, R., Zong, C., & Huang, C.-R. (2009). A framework of feature selection methods for text categorization. Paper presented at the Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, Suntec, Singapore.
Lu, S.-H., Chiang, D.-A., Keh, H.-C., & Huang, H.-H. (2010). Chinese text classification by the Naive Bayes Classifier and the associative classifier with multiple confidence threshold values. Knowledge-Based Systems, 23(6), 598-604.
McLachlan, G. J., Do, K.-A., & Ambroise, C. (2004). Analyzing Microarray Gene Expression Data Wiley-Interscience.
Maron, M. E. (1961). Automatic Indexing: An Experimental Inquiry. Journal of the ACM (JACM), 8(3), 404 - 417.
Middleton, S. E., Shadbolt, N. R., & De Roure, D. C. (2004). Ontological user profiling in recommender systems. Acm Transactions on Information Systems, 22(1), 54-88.
Moisl, H. (2011). Finding the Minimum Document Length for Reliable Clustering of Multi-Document Natural Language Corpora. Journal of Quantitative Linguistics, 18(1), 23-52.
Ng, H. T., Goh, W. B., & Low, K. L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. SIGIR Forum, 31(SI), 67-73.
Oezguer, L., & Geungoer, T. (2010). Text classification with the support of pruned dependency patterns. Pattern Recognition Letters, 31(12), 1598-1607.
Pai, P. F., Hsu, M. F., & Wang, M. C. (2011). A support vector machine-based model for detecting top management fraud. [Article]. Knowledge-Based Systems, 24(2), 314-321.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. Paper presented at the Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10.
Pham, D. T., & Afify, A. A. (2005). Machine learning techniques and their applications in manufacturing. Proceedings of the Institution of Mechanical Engineers, Journal of Engineering Manufacture: Part B 219, 395–412.
Rak, R., Kurgan, L. A., & Reformat, M. (2007). Multilabel associative classification categorization of MEDLINE articles into MeSH keywords - An intelligent data mining technique to more accurately classify large volumes of documents. Ieee Engineering in Medicine and Biology Magazine, 26(2), 47-55.
Ren, N., Zargham, M., & Rahimi, S. (2006). A decision tree-based classification approach to rule extraction for security analysis. International Journal of Information Technology & Decision Making, 5(1), 227-240.
Robertson, S. E., & Jones, K. S. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129-146.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11), 613-620.
Sun, A., Lim, E.-P., & Liu, Y. (2009). On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, 48(1), 191-201.
Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining: Addison Wesley.
Tu, Y.-N., & Seng, J.-L. (2009). Research intelligence involving information retrieval - An example of conferences and journals. Expert Systems with Applications, 36(10), 12151-12166.
Vapnik, V. N. (1995). The nature of statistical learning theory: Springer-Verlag New York, Inc.
Weiss, S. M., Apte, C., Damerau, F. J., Johnson, D. E., Oles, F. J., Goetz, T., & Hampp, T. (1999). Maximizing Text-Mining Performance. IEEE Intelligent Systems Retrieved 4, 14
Wu, C.-H., Ken, Y., & Huang, T. (2010). Patent classification system using a new hybrid genetic algorithm support vector machine. Applied Soft Computing, 10(4), 1164-1177.
Xie, X. L., & Beni, G. (1991). A Validity Measure for Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8), 841 - 847
Xu, Y., Wang, B., Li, J., & Jing, H. (2008). An extended document frequency metric for feature selection in text categorization. Paper presented at the Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, Harbin, China.
Yang, Y. (1994). Expert network: effective and efficient learning from human decisions in text categorization and retrieval. Paper presented at the Paper presented at the Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland.
Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. Paper presented at the Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,, Berkeley, California, United States.
Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States.
Zaghloul, W., Lee, S. M., & Trimi, S. (2009). Text classification: neural networks vs support vector machines. Industrial Management & Data Systems, 109(5-6), 708-717.
林卓彥(2005)。 自動分類方法之比較。 國立中正大學資訊工程研究所,嘉義市。
賴銘偉(2010)。 基於文件分段之電子書特徵選取。 國立成功大學資訊管理研究所,台南市。
All Conference:http://www.allconferences.com/
Conference Alert:http://www.conferencealerts.com/
THOMSON REUTERS (ISI) WEB OF KNOWLEDGE:http://apps.webofknowledge.com/

電子全文 電子全文(網際網路公開日期:20230101)
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top