跳到主要內容

臺灣博碩士論文加值系統

(44.201.97.0) 您好!臺灣時間:2024/04/24 11:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳信宏
論文名稱:機器學習與深度學習於短文本摘要與分類之研究–以信用卡帳單與企業名錄為例
論文名稱(外文):Machine Learning and Deep Learning for Short Text Extraction and Categorization– A Case Study on Credit Card Bills and Business Directories
指導教授:鄒慶士鄒慶士引用關係許晉雄許晉雄引用關係
指導教授(外文):Tsou, Ching-ShihHSU, CHIN-HSIUNG
口試委員:吳牧恩洪明欽
口試委員(外文):WU, MU-ENHUNG, MING-CHIN
口試日期:2019-01-04
學位類別:碩士
校院名稱:東吳大學
系所名稱:巨量資料管理學院碩士學位學程
學門:電算機學門
學類:電算機應用學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:136
中文關鍵詞:機器學習淺層學習深度學習深度神經網絡自動摘要自動分類短文探勘自然語言處理信用卡帳單企業名錄短文摘支援向量機天真貝氏隨機森林梯度提升樹C5.0決策樹全連結層神經網路多層感知器卷積神經網路遞歸神經網路自動編碼器詞嵌入字元嵌入
外文關鍵詞:machine learningshallow learningdeep learningshort text miningnatural language processsummarizationcategorizationcredit card statementbusiness directoriessupport vector machine (SVM)k nearest neighbor (kNN)Random Forestextreme gradient boostingNaïve Bayes (NB)C5.0fully-connected neural networkmultiple layers preceptron (MLP)convolution neural network (CNN)recurrent neural network (RNN)autoencoderword embeddingcharacter embeddingword2vec
相關次數:
  • 被引用被引用:1
  • 點閱點閱:975
  • 評分評分:
  • 下載下載:158
  • 收藏至我的研究室書目清單書目收藏:0
資料是新能源,但是就跟石油一樣,它需要被提煉、萃取才能轉換成有價值的商品,在銀行多種業務的客戶往來資料中,以信用卡資料量最多,同時客戶的資訊含量也最為豐富,傳統上以資訊科學手法將這些資料賦予消費類別或彙整為特定商家,皆為工人智慧所累積出來的規則集,既無法全面也不易維護,運用資料科學手法建立自動摘要、自動分類,將可大幅提高處理效率以及包含更廣泛範圍。

本研究之短文摘自動摘要為對信用卡帳單摘要,運用自訂詞性標記與標準詞性標記提取特徵,形成特徵向量空間,再運用機器學習包含隨機森林(Random Forest)、梯度提升樹(Gradient Boosting)、天真貝氏、類人工神經網路與C5.0決策樹等演算法,自動摘要出商店名稱,實驗結果以決策樹系列的Random Forest模型與C5.0模型表現最佳,F1度量95%以上、ROC曲線下面積99%以上。

本研究之短文摘自動分類為對中華黃頁企業名錄之公司行號名稱,以分別以監督式字詞層級編碼(詞頻卡方)及字元層級非監督式編碼(word2vec)建立向量特徵空間,再各自運用機器學習(包含Gradient Boosting與支援向量機(SVM))與深度學習(包含多層感知器(MLP)、卷積神經網路(CNN)、遞歸神經網路(RNN)、長短期記憶網路(LSTM)、門控循環單元(GRU)、雙向遞歸神經網路等)演算法建立自動分類器,並混合深度學習之Autoencoder提取特徵後再運用機器學習之C5.0演算法建立分類器,實驗結果監督式字詞層級編碼SVM模型F1度量82.5%表現最佳,但其因特徵空間建立於重要關鍵字詞之上,可運用於該模型的資料集大幅減縮,而字元層級word2vec的深度學習演算法所建立的模型,以RNN系列的LSTM、GRU與雙向LSTM表現最佳,F1度量81%以上。

Data is the new oil. Like crude oil, however, it must be found, extracted, and refined before it can be monetized Data from credit card transactions is the most enormous and informative inkling to customers among various data in retail banking. Formerly, payment descriptions to be tagged with an expenditure category and the merchant names to be extracted were based on computer science approach, which consists of handcrafted rules by experts. However, this method is high-maintenance and non-exhaustive. With the data science approach, the automatic extractors and classifiers learned from data substantially boost efficiency and enclose a broader range of to-be-processed raw data from credit card transactions. In this paper, the auto-extraction of merchant names from credit card payment descriptions is explored with machine learning algorithms, including Random Forests, gradient boosting, Naïve Bayes, artificial neural network, and C5.0 tree. To train the models, a feature space has been built on features derived from the ICTCLAS’ (Institute of Computing Technology, Chinese Lexical Analysis System) Chinese POS tagging and self-defined POS tagging. The implementation results show that both Random Forests and C5.0 ensemble tree models outperform others in terms of the F1 measure and area under ROC. The F1 measure is above 95% and the area under ROC is above 99%. As for the automatic categorization, the business directory is crawled from SuperhiPage. Both supervised word-level (term frequency weighted by chi-square score) and unsupervised character-level encoding (word2vec) are adopted to construct a feature space for machine learning and deep learning, respectively. To build the classifier, two machine learning algorithms—support vector machine and gradient boosting—were used, along with seven deep learning algorithms, namely MLP, RNN, CNN, LSTM, CNN with GRU, and bi-directional LSTM, and autoencoder with C5.0 tree. The implementation results show that classifiers with supervised encoding perform better than unsupervised ones. The 82.5% F1 measure of the SVM model is the champion model. Being constrained by the top 1000 selected keywords used to construct the feature space, there were far fewer applicable samples for the machine learning model than for the character-level deep learning models. Among the deep learning models, RNN series classifiers–LSTM, GRU, and bi-directional LSTM excel, with an F1 measure above 81%.
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation and objective 3
1.3 Thesis organization 6
Chapter 2 Related research 7
2.1 Text summarization 7
2.1.1 Workflow of text summarization 7
2.1.2 Types of text summarization 9
2.2 Text categorization 12
2.2.1 Types of categorization problems 13
2.2.2 Supervised text mining workflow 16
2.3 Preprocessing for text mining 17
2.3.1 Tokenization 18
2.3.2 Chinese word boundary detection 18
2.3.3 Chinese word segmentation 19
2.3.4 Ambiguity and unknown words 22
2.4 Feature engineering 23
2.4.1 Feature construction 24
2.4.2 Dimensionality reduction 42
2.5 Learning algorithms 54
2.5.1 Algorithms 55
2.5.2 Objective functions 66
2.5.3 Algorithm selection vs. automatic feature engineering 77
Chapter 3 Extraction of credit card bill 81
3.1 Proposed methodology 81
3.1.1 Word segmentation and POS tagging 83
3.1.2 Feature extraction and hypothesis space construction 86
3.1.3 Learning algorithms 87
3.2 Dataset 89
3.3 Experiment results and discussion 89
Chapter 4 Categorization of business directory 91
4.1 Proposed methodology 92
4.1.1 Pre-processing 94
4.1.2 Feature engineering and learning 96
4.1.3 Learning algorithms 97
4.2 Dataset 103
4.3 Experiment results and discussion 106
4.3.1 Results 106
4.3.2 Discussion 110
Chapter 5 Conclusion and future work 113
[1]James Haycock & Shane Richmond (2015). Bye bye bank. London, London Adaptive Lab.
[2]Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159–165.
[3]Sebastiani, F. (2005). Text categorization. In Encyclopedia of Database Technologies and Applications (pp. 683–687). IGI Global.
[4]Rousseau, F., Kiagias, E., & Vazirgiannis, M. (2015). Text categorization as a graph classification problem. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Vol. 1, pp. 1702–1712).
[5]Hahn, U., & Mani, I. (2000). The challenges of automatic summarization. Computer, 33(11), 29–36.
[6]Sherry & Parteek Bhatia (2015). A Survey to Automatic Summarization Techniques. International Journal of Engineering Research and General Science Volume 3, Issue 5.
[7]Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457–479.
[8]Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In mining text data (pp. 43–76). Springer, Boston, MA.
[9]Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457–479.
[10]Jaya Kapoor, Kailas K. Devadkar (2016). Clustering Approach for Automatic Text Summarization. International Journal On Advanced Computer Theory And Engineering (IJACTE) Volume 5, Issue 3.
[11]Marcu, D. (1997). From discourse structures to text summaries. Intelligent Scalable Text Summarization.
[12]Gupta, V., & Lehal, G. S. (2010). A Survey of Text Summarization Extractive Techniques.
[13]Feldman, R., & Dagan, I. (1995, August). Knowledge Discovery in Textual Databases (KDT). In KDD (Vol. 95, pp. 112–117).
[14]Maron, M. (1961). Automatic Indexing: An Experimental Inquiry. Journal of the Association for Computing Machinery, 8(3):404–417.
[15]Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1–47.
[16]Stan Matwin (2014, May). Text Mining: Where do we come from? Where do we go? In AtefehFarzindar (General Chair), 27th Canadian Conference on Artificial Intelligence conducted at the meeting of AI 2014, Montréal, Québec, Canada. Retrieved from https://cs.uwaterloo.ca/conferences/ai2014/Slides/AI2014-Invited.pdf
[17]Nayak, S., Ramesh, R., & Shah, S. A Study of multilabel text classification and the effect of label hierarchy.
[18]Guo, W., Li, H., Ji, H., & Diab, M. (2013). Linking tweets to news: A framework to enrich short text data in social media. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 239–249).
[19]Gabrilovich, E., & Markovitch, S. (2005, July). Feature generation for text categorization using world knowledge. In IJCAI (Vol. 5, pp. 1048–1053).
[20]Luo, X., & Zincir-Heywood, N. (2007, April). Incorporating temporal information for document classification. In Data Engineering Workshop, 2007 IEEE 23rd International Conference on (pp. 780–789). IEEE.
[21]Bell, D. (2006). Using kNN Model-based Approach for Automatic Text Categorization.
[22]Malliaros, F. D., & Skianis, K. (2015, August). Graph-based term weighting for text categorization. In Advances in Social Networks Analysis and Mining (ASONAM), 2015 IEEE/ACM International Conference on (pp. 1473–1479). IEEE.
[23]Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), 966–974.
[24]Khan, A., Baharudin, B., Lee, L.H., & Khan, K. A Review of Machine Learning Algorithms for Text-Documents Classification.
[25]Chen, G., Ye, D., Xing, Z., Chen, J., & Cambria, E. (2017, May). Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 2377–2383). IEEE.
[26]Xue, N. (2001). Defining and automatically identifying words in Chinese (Doctoral dissertation, University of Delaware).
[27]Sproat, R., Gale, W., Shih, C., & Chang, N. (1996). A stochastic finite-state word-segmentation algorithm for Chinese. Computational linguistics, 22(3), 377–404.
[28]Webster, J. J., & Kit, C. (1992, August). Tokenization as the initial phase in NLP. In Proceedings of the 14th conference on Computational linguistics-Volume 4 (pp. 1106–1110). Association for Computational Linguistics.
[29]Gan, K. W., Lua, K. T., & Palmer, M. (1996). A statistically emergent approach for language processing: Application to modeling context effects in ambiguous Chinese word boundary perception. Computational Linguistics, 22(4), 531–553.
[30]Ma, W. Y., & Chen, K. J. (2003, July). Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. In Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17 (pp. 168–171). Association for Computational Linguistics.
[31]Peng, F., Feng, F., & McCallum, A. (2004, August). Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th international conference on Computational Linguistics (p. 562). Association for Computational Linguistics.
[32]Luo, W., & Yang, F. (2016). An Empirical Study of Automatic Chinese Word Segmentation for Spoken Language Understanding and Named Entity Recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 238–248).
[33]Xue, N. (2003). Chinese word segmentation as character tagging. International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 1, February 2003: Special Issue on Word Formation and Chinese Language Processing, 8(1), 29–48.
[34]Huang, X., Peng, F., Schuurmans, D., Cercone, N., & Robertson, S. E. (2003). Applying machine learning to text segmentation for information retrieval. Information Retrieval, 6(3–4), 333–362.
[35]Nie, J. Y., Brisebois, M., & Ren, X. (1996, August). On Chinese text retrieval. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 225–233). ACM.
[36]Cheng, Y., Yin, B., & Sun, Y. (2006). Research on Word Segmentation for Chinese Sign Language. In Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation (pp. 407–413).
[37]Salton, G., & Yang, C. S. (1973). On the specification of term values in automatic indexing. Journal of documentation, 29(4), 351–372.
[38]Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Systems with Applications, 38(3), 2758–2765.
[39]Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.
[40]Caropreso, M. F., Matwin, S., & Sebastiani, F. (2001). A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. Text databases and document management: Theory and practice, 5478, 78–102.
[41]Palmer, D. D. (1997, July). A trainable rule-based algorithm for word segmentation. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (pp. 321–328). Association for Computational Linguistics.
[42]Zhang, Q., Liu, X., & Fu, J. (2018). Neural Networks Incorporating Dictionaries for Chinese Word Segmentation.
[43]Xue, N., & Converse, S. P. (2002, September). Combining classifiers for Chinese word segmentation. In Proceedings of the first SIGHAN workshop on Chinese language processing-Volume 18 (pp. 1–7). Association for Computational Linguistics.
[44]Mladeni'c, D., & Grobelnik, M. (1998). Feature selection for classification based on text hierarchy. In Text and the Web, Conference on Automated Learning and Discovery CONALD-98.
[45]Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87.
[46]Yogan, J. K., Goh, O. S., Halizah, B., Ngo, H. C., & Puspalata, C. (2016). A review on automatic text summarization approaches. Journal of Computer Science, 12(4), 178–190.
[47]Liu, H., & Motoda, H. (1998). Feature transformation and subset selection. IEEE Intelligent Systems, (2), 26–28.
[48]Roshdi A, Roohparvar A. Review: Information Retrieval Techniques and Applications. International Journal of Computer Networks and Communications Security VOL. 3, NO. 9, SEPTEMBER 2015, 373–377.
[49]Raghavan, V. V., & Wong, S. M. (1986). A critical analysis of vector space model for information retrieval. Journal of the American Society for information Science, 37(5), 279–287.
[50]Manwar, A. B., Mahalle, H. S., Chinchkhede, K. D., & Chavan, V. (2012). A vector space model for information retrieval: A matlab approach. Indian Journal of Computer Science and Engineering (IJCSE), 3(2), 222–229.
[51]Salton, G., & Lesk, M. E. (1968). Computer evaluation of indexing and text processing. Journal of the ACM (JACM), 15(1), 8–36.
[52]Gravano, L., García-Molina, H., & Tomasic, A. (1999). GlOSS: text-source discovery over the Internet. ACM Transactions on Database Systems (TODS), 24(2), 229–264.
[53]Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing.Communications of the ACM, 18 (11), 613–620.
[54]Hotho, A., Nürnberger, A., & Paaß, G. (2005, May). A brief survey of text mining. In Ldv Forum (Vol. 20, No. 1, pp. 19–62).
[55]Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.
[56]Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 238–247).
[57]Lebret, R., & Collobert, R. (2014). N-gram-based low-dimensional representation for document classification. arXiv preprint arXiv:1412.6277.
[58]Milios, E. E., Shafiei, M. M., Wang, S., Zhang, R., Tang, B., & Tougas, J. (2006). A systematic study on document representation and dimensionality reduction for text clustering. In Report. Technical report, Faculty of Computer Science, Dalhousie University.
[59]Cavnar, W. (1995). Using an n-gram-based document representation with a vector processing retrieval model. NIST SPECIAL PUBLICATION SP, 269–269.
[60]Tan, C. M., Wang, Y. F., & Lee, C. D. (2002). The use of bigrams to enhance text categorization. Information processing & management, 38(4), 529–546.
[61]Devi, B. V., & Yaganteeswarudu, A. (2016). Text Categorization by Distributional Features. International Journal, 4(4).
[62]Deng, Z. H., Tang, S. W., Yang, D. Q., Li, M. Z. L. Y., & Xie, K. Q. (2004, April). A comparative study on feature weight in text categorization. In Asia-Pacific Web Conference (pp. 588–597). Springer, Berlin, Heidelberg.
[63]He, J., Tan, A. H., & Tan, C. L. (2000, August). A Comparative Study on Chinese Text Categorization Methods. In PRICAI workshop on text and web mining (Vol. 35).
[64]Zhang, W., Yoshida, T., & Tang, X. (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems, 21(8), 879–886.
[65]Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.
[66]Malliaros, F. D., & Skianis, K. (2015, August). Graph-based term weighting for text categorization. In Advances in Social Networks Analysis and Mining (ASONAM), 2015 IEEE/ACM International Conference on (pp. 1473–1479). IEEE.
[67]Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11–21.
[68]Spärck Jones, K. (2004). IDF term weighting and IR research lessons. Journal of documentation, 60(5), 521–523.
[69]Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1–47.
[70]Bell, D. (2006). Using kNN Model-based Approach for Automatic Text Categorization.
[71]Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 649–657).
[72]Zhong, G., Wang, L. N., Ling, X., & Dong, J. (2016). An overview on data representation learning: From shallow feature learning to recent deep learning. The Journal of Finance and Data Science, 2(4), 265–278.
[73]Zheng, A. & Casari, A. (2018). Text Data: Flattening, Filtering, and Chunking In Feature Engineering for Machine Learning (pp. 44). Sebastopol, CA: O’Reilly Media.
[74]Roy1, B. & Purkayastha, B.S. (2016). A study on different part of speech (POS) tagging approaches in Assamese language. International Journal of Advanced Research in Computer and Communication Engineering, 5(3) (pp. 934–938).
[75]Tian, Y., & Lo, D. (2015, March). A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (pp. 570–574). IEEE.
[76]Jatav, V., Teja, R., Bharadwaj, S., & Srinivasan, V. (2017). Improving part-of-speech tagging for NLP pipelines. arXiv preprint arXiv:1708.00241.
[77]Giesbrecht, E., & Evert, S. (2009, September). Is part-of-speech tagging a solved task? An evaluation of POS taggers for the German web as corpus. In Proceedings of the fifth Web as Corpus workshop (pp. 27–35).
[78]Manning, C. D. (2011, February). Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In International conference on intelligent text processing and computational linguistics (pp. 171–189). Springer, Berlin, Heidelberg.
[79]Voutilainen, A. (1997). Engcg tagger, version 2. Sprog ogMultimedier. Aalborg Universitetsforlag, Aalborg.
[80]Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational linguistics, 21(4), 543–565.
[81]Daniel Jurafsky, D & Martin, J. H. (2018, September) Hidden Markov Models. Retrieved from https://web.stanford.edu/~jurafsky/slp3/A.pdf
[82]Bigram and trigram approach. Adapted from http://www.csie.ntnu.edu.tw/~u91029/MarkovModel8.png
[83]Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
[84]Sarkar, K. (2016). A CRF based POS tagger for code-mixed Indian social media text. arXiv preprint arXiv:1612.07956.
[85]Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101–107). Association for Computational Linguistics.
[86]Sutton, C., & McCallum, A. (2012). An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4), 267–373.
[87]Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. ieee Computational intelligence magazine, 13(3), 55–75.
[88]LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
[89]Shao, Y., Hardmeier, C., Tiedemann, J., & Nivre, J. (2017). Character-based joint segmentation and POS tagging for Chinese using bidirectional RNN-CRF. arXiv preprint arXiv:1704.01314.
[90]Zheng, X., Chen, H., & Xu, T. (2013). Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 647–657).
[91]Chen, X., Qiu, X., Zhu, C., Liu, P., & Huang, X. (2015). Long short-term memory neural networks for Chinese word segmentation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1197–1206).
[92]Cai, D., & Zhao, H. (2016). Neural word segmentation learning for Chinese. arXivpreprint arXiv:1606.04300.
[93]Xie, Z. (2017). Closed-Set Chinese Word Segmentation Based on Convolutional Neural Network Model. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (pp. 24–36). Springer, Cham.
[94]Pei, W., Ge, T., & Chang, B. (2014). Max-margin tensor neural network for Chinese word segmentation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 293–303).
[95]Wang, X., Wang, M., & Zhang, Q. (2017, July). Realization of Chinese word segmentation based on deep learning method. In AIP Conference Proceedings (Vol. 1864, No. 1, p. 020150). AIP Publishing.
[96]Zhao Hai, Cai Deng, Huang Changning, Kit Chunyu. Chinese Word Segmentation, a decade review (2007–2017) The Frontier of Empirical and Corpus Linguistics, Chunyu Kit and Meijun Liu ed., China Social Sciences Press, Beijing, China, July 2017.
[97]Liang, D., Xu, W., & Zhao, Y. (2017, August). Combining word-level and character-level representations for relation classification of informal text. In Proceedings of the 2nd Workshop on Representation Learning for NLP (pp. 43–47).
[98]Chen, X., Xu, L., Liu, Z., Sun, M., & Luan, H. (2015, June). Joint learning of character and word embeddings. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
[99]Xie, S., Rastogi, R., & Chang, M. Deep Poetry: Word-Level and Character-Level Language Models for Shakespearean Sonnet Generation.
[100]Nakov, P., & Tiedemann, J. (2012, July). Combining word-level and character-level models for machine translation between closely-related languages. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 (pp. 301–305). Association for Computational Linguistics.
[101]Tran, P., Dinh, D., & Nguyen, H. T. (2016). A character level based and word level based approach for Chinese-Vietnamese machine translation. Computational intelligence and neuroscience, 2016.
[102]Chen, J., Hu, Y., Liu, J., Xiao, Y., & Jiang, H. (2019). Deep Short Text Classification with Knowledge Powered Attention. arXiv preprint arXiv:1902.08050.
[103]Xue, N. (2003, February). Chinese word segmentation as character tagging. In Computational Linguistics and Chinese Language Processing Vol. 8(1) (pp. 29–48). The Association for Computational Linguistics and Chinese Language Processing.
[104]Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137–1155.
[105]Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[106]Köhn, A. (2015). What’s in an embedding? Analyzing word embeddings through multilingual evaluation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 2067–2073).
[107]Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
[108]Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[109]Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
[110]Pan, X. (2018, September). Character embedding and ELMO. Retrieved from https://panxiaoxie.cn/2018/09/24/%E8%AE%BA%E6%96%87%E7%AC%94%E8%AE%B0-character-embedding-and-ELMO/.
[111]Li, Y., Li, W., Sun, F., & Li, S. (2015). Component-enhanced Chinese character embeddings. arXiv preprint arXiv:1508.06669.
[112]Gerz, D., Vulić, I., Ponti, E., Naradowsky, J., Reichart, R., & Korhonen, A. (2018). Language modeling for morphologically rich languages: Character-aware modeling for word-level prediction. Transactions of the Association of Computational Linguistics, 6, 451–465.
[113]Ling, W., Luís, T., Marujo, L., Astudillo, R. F., Amir, S., Dyer, C., ... & Trancoso, I. (2015). Finding function in form: Compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096.
[114]Zareapoor, M., & Seeja, K. R. (2015). Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60.
[115]BİRİCİK, G., Diri, B., & SÖNMEZ, A. C. (2012). Abstract feature extraction for text classification. Turkish Journal of Electrical Engineering & Computer Sciences, 20(Sup. 1), 1137–1159.
[116]Wang, X., & Paliwal, K. K. (2003). Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern recognition, 36(10), 2429–2439.
[117]Lee, M. J., Lee, K. D., & Lee, S. Y. (2006, August). Unsupervised feature extraction for the representation and recognition of lip motion video. In International Conference on Intelligent Computing (pp. 741–746). Springer, Berlin, Heidelberg.
[118]Hira, Z. M., & Gillies, D. F. (2015). A review of feature selection and feature extraction methods applied on microarray data. Advances in bioinformatics, 2015.
[119]Lai, P. L., & Fyfe, C. (2000). Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10(05), 365–377.
[120]Wu, M., & Farquhar, J. D. (2007, January). A Subspace Kernel for Nonlinear Feature Extraction. In IJCAI (pp. 1125–1130).
[121]Raducanu, B., & Dornaika, F. (2012). A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recognition, 45(6), 2432–2444.
[122]Lebret, R., & Collobert, R. (2014). N-gram-based low-dimensional representation for document classification. arXiv preprint arXiv:1412.6277.
[123]Pa, M. D., Sane, S. S., & Nasik, K. K. W.I.E.E.R. (2014, April). Dimension Reduction: A Review. In International Journal of Computer Applications Vol. 92(16) (pp. 23–29).
[124]Cayton, L. (2005). Algorithms for manifold learning. Univ. of California at San Diego Tech. Rep, 12(1–17), 1.
[125]Aggarwal, C. C., & Zhai, C. (2012). A survey of text classification algorithms. In Mining text data (pp. 163–222). Springer, Boston, MA.
[126]Meng, Q., Catchpoole, D., Skillicom, D., & Kennedy, P. J. (2017, May). Relational autoencoder for feature extraction. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 364–371). IEEE.
[127]Liang, H., Sun, X., Sun, Y., & Gao, Y. (2017). Text feature extraction based on deep learning: a review. EURASIP journal on wireless communications and networking, 2017(1), 211.
[128]Kowsari, K., Meimandi, K.J., Heidarysafa, M., Mendu, S., Barnes, L.E., & Brown, D.E. (2019, April). Text classification algorithms: a survey. arXiv preprint arXiv:1904.08067.
[129]Song, Q., Ni, J., & Wang, G. (2011). A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE transactions on knowledge and data engineering, 25(1), 1–14.
[130]Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. Data classification: algorithms and applications, 37.
[131]Goswami, S., & Chakrabarti, A. (2014). Feature selection: A practitioner view. International Journal of Information Technology and Computer Science (IJITCS), 6(11), 66.
[132]Dy, J. G., Brodley, C. E., Kak, A., Broderick, L. S., &Aisen, A. M. (2003). Unsupervised feature selection applied to content-based retrieval of lung images. IEEE transactions on pattern analysis and machine intelligence, 25(3), 373–378.
[133]Singh, D. A. A. G., Balamurugan, S. A. A., & Leavline, E. J. (2015). An unsupervised feature selection algorithm with feature ranking for maximizing performance of the classifiers. International Journal of Automation and Computing, 12(5), 511–517.
[134]Karegowda, A. G., Jayaram, M. A., & Manjunath, A. S. (2010). Feature subset selection problem using wrapper approach in supervised learning. International journal of Computer applications, 1(7), 13–17.
[135]Golay, J., & Kanevski, M. (2017). Unsupervised feature selection based on the Morisita estimator of intrinsic dimension. Knowledge-Based Systems, 135, 125–134.
[136]Wei, X., Cao, B., & Philip, S. Y. (2016, March). Unsupervised feature selection on networks: a generative view. In Thirtieth AAAI Conference on Artificial Intelligence.
[137]Wang, S., Tang, J., & Liu, H. (2016). Feature Selection. In Encyclopedia of Machine Learning and Data Mining. Retrieved from http://www.public.asu.edu/~swang187/publications/Feature_Selection.pdf.
[138]Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of machine learning research, 5(Oct), 1205–1224.
[139]Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of machine learning research, 3(Mar), 1289–1305.
[140]Nanas, N., Uren, V., & De Roeck, A. (2004, August). A comparative evaluation of term weighting methods for information filtering. In Proceedings. 15th International Workshop on Database and Expert Systems Applications, 2004. (pp. 13–17). IEEE.
[141]Khan, M., & Quadri, S. M. K. (2013). Effects of using filter based feature selection on the performance of machine learners using different datasets. BVICA M's International Journal of Information Technology, 5(2), 597.
[142]Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2015) Foundations of feature selection. In Feature selection for high-dimensional data (pp. 13–28). New York:Springer.
[143]Kukreja, K. (2015, June). Informed Search Algorithms. Retrieved from https://kartikkukreja.wordpress.com/2015/06/07/informed-search-algorithms/
[144]Welling, M. (2010) Informed search algorithms. Retrieved from https://www.ics.uci.edu/~welling/teaching/ICS171fall10/InfSearch171f10.pdf.
[145]Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge & Data Engineering, (4), 491–502.
[146]Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157–1182.
[147]Marcus, M. (2015). Uninformed search strategies. Retrieved from https://www.seas.upenn.edu/~cis391/Lectures/uninformed-search%20fall%202015.pdf.
[148]Ljunglöf. P. (2013). Uninformed search algorithms. Retrieved from http://www.cse.chalmers.se/edu/year/2013/course/TIN171/slides/chapter03b.pdf.
[149]Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.
[150]Ranjitha, K.S., Prakruthi, S., Arpitha, Tk., & Mouneshachari, S. (2015, May). Survey on heuristic search techniques to solve artificial intelligence problems. In International Journal on Recent and Innovation Trends in Computing and Communication 3(5),2903–2905.
[151]Jović, A., Brkić, K., & Bogunović, N. (2015, May). A review of feature selection methods with applications. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 1200–1205). IEEE.
[152]Domingos, P. M. (2012). A few useful things to know about machine learning. Commun. acm, 55(10), 78–87.
[153]Rimer, M., & Martinez, T. (2006). Classification-based objective functions. Machine Learning, 63(2), 183–205.
[154]Nayak, S., Ramesh, R., & Shah, S. A Study of multilabel text classification and the effect of label hierarchy.
[155]Gabrilovich, E., & Markovitch, S. (2005, July). Feature generation for text categorization using world knowledge. In IJCAI (Vol. 5, pp. 1048–1053).
[156]Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information retrieval, 1(1), 69–90.
[157]Kuhn, M., & Johnson, K. (2013). Nonlinear Classification Methods. In Applied Predictive Modeling (pp. 27, 329–358). Springer, New York, NY.
[158]Ranjan, M. N. M., Ghorpade, Y. R., Kanthale, G. R., Ghorpade, A. R., & Dubey, A. S. (2017). Document Classification using LSTM Neural Network. Journal of Data Mining and Management, 2(2).Xxx
[159]Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.
[160]Raju, M. K., Subrahmanian, S. T., & Sivakumar, T. (2017). A Comparative Survey on Different Text Categorization Techniques. International Journal of Computer Science and Engineering, 5(3), 1612–1618.
[161]Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
[162]Zhou, C., Sun, C., Liu, Z., & Lau, F. (2015). A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630.
[163]Lai, S., Xu, L., Liu, K., & Zhao, J. (2015, January). Recurrent Convolutional Neural Networks for Text Classification. In AAAI(Vol. 333, pp. 2267–2273).
[164]Lee, J. Y., & Dernoncourt, F. (2016). Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827.
[165]Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
[166]Collobert, R., & Weston, J. (2008, July). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (pp. 160–167). ACM.
[167]Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709.
[168]Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
[169]Peirsman, Y. (2017, October). Deep learning for NLP. Retrieved from https://www.slideshare.net/hendrikdo/yves-peirsman-deep-learning-for-nlp.
[170]Yin, W., Kann, K., Yu, M., & Schütze, H. (2017). Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923.
[171]Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1.
[172]Japkowicz, N. (2013). Assessment metrics for imbalanced learning. Imbalanced learning: Foundations, algorithms, and applications, 187–206.
[173]Tharwat, A. (2018) Classification assessment methods. In Applied Computing and Informatics. Retrieved from https://reader.elsevier.com.
[174]Lopes, J. C. D., dos Santos, F. M., Martins-José, A., Augustyns, K., & De Winter, H. (2017). The power metric: a new statistically robust enrichment-type metric for virtual screening applications with early recovery capability. Journal of cheminformatics, 9(1), 7.
[175]Shin, S.J., Kim, H., & Han, S. (2016, August). Comparison of the performance evaluations in classification. In International Journal of Advanced Research in Computer and Communication Engineering. 5(8). (pp. 441–444).
[176]Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: the kappa statistic. Fam med, 37(5), 360–363.
[177]Powers, D. M. (2012, April). The problem with kappa. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 345–355). Association for Computational Linguistics.
[178]Řezáč, M., & Řezáč, F. (2011). How to measure the quality of credit scoring models. Finance a úvěr: Czech Journal of Economics and Finance, 61(5), 486–507.
[179]Khan, R. (2017, December) Binary classifier evaluation metrics: error rate, KS statistic, AUROC, lift, gains table. Retrieved from http://rstudio-pubs-static.s3.amazonaws.com/303414_fb0a43efb0d7433983fdc9adcf87317f.html#ks-statistic.
[180]Davis, J., & Goadrich, M. (2006, June). The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233–240). ACM.
[181]Jewson, S. (2004). The problem with the Brier score. arXiv preprint physics/0401046.
[182]Pereira, L., & Nunes, N. (2017, October). A comparison of performance metrics for event classification in Non-Intrusive Load Monitoring. In 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm) (pp. 159–164). IEEE.
[183]Chase Lipton, Z., Elkan, C., &Narayanaswamy, B. (2014). Thresholding Classifiers to Maximize F1 Score. arXiv preprint arXiv:1402.1892.
[184]Smith-Miles, K. A. (2009). Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys (CSUR), 41(1), 6.
[185]Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1), 67–82.
[186]Ho, Y. C., &Pepyne, D. L. (2001). Simple explanation of the no free lunch theorem of optimization. In Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No. 01CH37228) (Vol. 5, pp. 4409–4414). IEEE.
[187]Kohavi, R., & John, G. H. (1995). Automatic parameter selection by minimizing estimated error. In Machine Learning Proceedings 1995 (pp. 304–312). Morgan Kaufmann.
[188]Rice, J. R. (1976). The algorithm selection problem. In Advances in computers (Vol. 15, pp. 65–118). Elsevier.
[189]Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 427–436).
[190]Chollet, F. (2017) The limitations of deep learning. In Deep learning with Python. Retrieved from https://blog.keras.io/the-limitations-of-deep-learning.html.
[191]Masih, A. (2018, September). Why businesses must focus on intelligence augmentation over artificial intelligence. CIODive Retrieved from https://www.ciodive.com/news/why-businesses-must-focus-on-intelligence-augmentation-over-artificial-inte/532471/.
[192]Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.
[193]Ren, X. Y., & Fu, Y. X. (2015). The review of automatic Chinese word segmentation. In Applied Mechanics and Materials (Vol. 701, pp. 386–389). Trans Tech Publications.
[194]Max Kuhn & Kjell Johnson (2020).Feature engineering and selection (1st ed.). Boca Raton: CRC Press.
[195]Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6), 1–45.
[196]Pros and Cons of K-Nearest Neighbors. (2018, September 25). Retrieved from https://www.fromthegenesis.com/pros-and-cons-of-k-nearest-neighbors/.
[197]Rakhecha, A. (2019, September 16). Importance of Loss Function in Machine Learning. Retrieved from https://towardsdatascience.com/importance-of-loss-function-in-machine-learning-eddaaec69519.
[198]Delgado, R., & Tibau, X. A. (2019). Why Cohen’s Kappa should be avoided as performance measure in classification. PloS one, 14(9).
[199]Beger, Andreas, Precision-Recall Curves (April 15, 2016). Available at SSRN: https://ssrn.com/abstract=2765419.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊