(3.80.6.131) 您好!臺灣時間:2021/05/15 00:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:施品光
研究生(外文):Shih, Pin-Kuang
論文名稱:倒傳遞類神經網路與癌症基因統計方法應用於白血病基因醫學文件分類之研究
論文名稱(外文):A Study on Applying a Back-propagation Neural Network Approach with Cancer Gene Census Data Collections to Categorizing Leukemia Related Medical Documents
指導教授:李俊宏李俊宏引用關係
指導教授(外文):Lee, Chung-Hong
學位類別:碩士
校院名稱:國立高雄應用科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:104
中文關鍵詞:倒傳遞演算法癌症基因統計資訊白血病
外文關鍵詞:Back-Propagation AlgorithmCancer Gene CensusEntrez GeneLeukemiaPubMed
相關次數:
  • 被引用被引用:0
  • 點閱點閱:608
  • 評分評分:
  • 下載下載:18
  • 收藏至我的研究室書目清單書目收藏:0
近年來生物資訊(Bioinformatics)在醫學及相關領域以驚人速度成長與發展,其中以癌症及基因的相關醫學研究特別受到矚目。由於這些醫學文獻資源中潛藏著大量珍貴的癌症與基因知識,因此從癌症基因文獻中發掘隱藏於文獻資料中的癌症知識已成為醫學文獻探勘領域重要的議題。

現階段於文件探勘領域裡,文件分類之技術已廣泛地應用在基因分析及醫學文件分類,本研究實驗主要以倒傳遞演算法(Back-Propagation Algorithm)之類神經網路分類技術為基礎,結合Cancer Gene Census資訊與gene2pubmed檢索資料,並且從PubMed文獻資料庫內萃取出特定癌症與基因名稱,應用於癌症基因醫學文獻分類上。

本研究將以白血病作為主要探討之癌症疾病,依據Cancer Gene Census資訊與gene2pubmed檢索資料,將白血病分成八個類別(AML、ALL、CML、CLL、AML&ALL、AML&CML、ALL&CML、ALL&CLL),最後將蒐集的白血病基因文獻依據此八個類別進行分類與效能評估,並且與決策樹演算法、貝氏演算法評比其分類效能,證實以倒傳遞類神經網路分類技術之分類架構,在precision 與 F1 measures評估上有佳的分類效能。
In recent years Bioinformatics has been developed and grown rapidly in medical science and related fields, especially on the studies of cancer-and-gene related researches. The cancer gene literatures preserve tremendously valuable cancer and gene knowledge. As a result, discovery of cancer and gene knowledge from the cancer gene literatures has become a significant research issue in the field of medical literature mining.
At present, scientists have extensively applied the techniques of literatures categorization to analysing relationships of genes and classifying medical literatures. In this research, the BP-NN (Back-Propagation Neural Network) classification technique, combined with gene2pubmed index file and Cancer Gene Census information, was employed to categorizing cancer gene documents. Furthermore, the gene symbols and specific cancer diseases were extracted from PubMed databases for the BP-NN based categorization task.
In this work, the cases of Leukemia were studied as the main cancer diseases to identify the proposed methods addressed in this research. According to the gene2pubmed index file and Cancer Gene Census information, the types of Leukemia are derived into eight subclasses (AML、ALL、CML、CLL、AML&ALL、AML&CML、ALL&CML、ALL&CLL), and then Leukemia gene documents are categorized into the above eight subclasses. We compared the performance of BP-NN based method with the Decision Tree algorithm and Naïve Bayes algorithm for performance evaluation of Leukemia-gene literatures categorization, respectively. The experimental results showed that the BP-NN based technique has better performance in precision and F1 measures for the Leukemia gene literatures categorization.
摘要 i
Abstract iii
誌謝 v
目錄 vii
圖目錄 ix
表目錄 xi
第一章、 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 白血病概述 3
1.4 研究實驗概述 4
1.5 論文架構 4
第二章、 文獻探討 7
2.1 文件探勘之應用 7
2.2 文件分類與資料前處理 8
2.2.1 文件特徵萃取 8
2.2.2 特徵選取策略 9
2.2.3 向量空間模型 10
2.3 文件分類技術 12
2.3.1 貝氏分類器(Naïve Bayes classifier) 13
2.3.2 k最近鄰演算法(k-Nearest Neighbor Algorithm) 14
2.3.3 支持向量機器(Support Vector Machines) 16
2.3.4 決策樹(Decision Trees) 17
2.3.5 類神經網路(Artificial Neural Network) 18
2.4 醫學文獻資料庫 20
2.4.1 PubMed 20
2.4.2 Entrez Gene 21
2.4.3 Sanger Cancer Genome Project 22
第三章、 類神經網路與多類別分類 25
3.1 類神經網路 25
3.1.1 認知器(Perceptron) 25
3.1.2 轉換函數(Activation function) 26
3.1.3 多層認知器(Multilayer Perceptron) 27
3.1.4 倒傳遞演算法(Back-Propagation Algorithm) 28
3.2 貝氏分類器與決策樹分類器 33
3.2.1 貝氏分類器 33
3.2.2 決策樹分類器 35
3.3 多類別分類 38
3.3.1 One-against-all分類方法 38
3.3.2 One-against-one分類方法 41
第四章、 研究方法 45
4.1 白血病基因文件集合之蒐集與前處理 46
4.1.1 白血病基因文件集合之蒐集 47
4.1.2 文件集合之前處理 47
4.2 白血病基因文件之特徵篩選 47
4.2.1 癌症基因文件特徵選取之流程 48
4.2.2 基因與癌症名稱資料分析結果 49
4.2.3 白血病基因階層架構 53
4.2.4 癌症基因名稱清單與特徵選取 53
4.3 文件分類 56
4.3.1 分類器訓練與測試 56
4.3.2 分類效能評估之方式 59
4.3.3 多類別文件分類測試 61
第五章、 分類實驗結果 63
5.1 訓練與測試文件集合 63
5.2 各分類器訓練與測試結果 65
5.2.1 特徵維度篩選 65
5.2.2 各分類器效能評估 65
5.2.3 與其他分類演算法效能評比 69
5.3 多類別文件分類測試結果 72
5.3.1 各分類器效能評估 72
5.3.2 其他分類演算法效能評比 73
5.3.3 基因文件資料分類效能評估 74
5.4 實驗結果與討論 76
第六章、 結論 77
6.1 章節回顧 77
6.2 研究結果討論 78
6.3 結論與本研究之貢獻 81
6.4 研究限制 83
6.5 未來展望 84
英文文獻 87
中文文獻 93
相關網站 93
附錄、Cancer Gene Census (Leukemia) 95
[1]Adamic, L.A., Wilkinson, D., Huberman, B.A., and Adar, E., (2001), "A Literature Based Method for Identifying Gene-Disease Connections.", Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society.
[2]Adeva, J.J.G. , Calvo, R.A. ,and Ipina, D.L.d. ,(2005), "Multilingual Approaches To Text Categorization.", The European Journal for the Informatics Professional.
[3]Anderson, J.A., (1995), "An Introduction to Neural Networks.", MIT Press, Cambridge, MA.
[4]Bousquet, O., Boucheron, S. and Lugosi, G. (2004), "Introduction to Statistical Learning Theory.", Lecture Notes in Artificial Intelligence 3176.
[5]Bottou, L. et al., (1994), "Comparison of classifier methods: a case study in handwritten digitrecognition.", Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. 9-13 Oct 1994.
[6]Chauvin, Y. and Rumelhart, D.E., (1995), "Backpropagation: Theory, Architectures, and Applications.", Erlbaum, Mahwah, NJ.
[7]Chen, D., Müller, H.M. and Sternberg, P.W. (2006), "Automatic document classification of biological literature.", BMC Bioinformatics.
[8]Chen, S.N. and Wen, K.C. (2006), "An Integrated System for Cancer-Related Genes Mining from Biomedical Literatures.", International Journal of Computer Science & Applications.
[9]Cheng, S.C., Huang, Y.M., Chen, J.N., and Lin, Y.T., (2005), "Automatic Leveling System for E-learning Examination Pool Using Entropy-based Decision Tree." Proceedings of 4th International Conference on Web-based Learning (ICWL2005), 2005.
[10]Chiang, J.H., Yu, H.C. and Hsu, H.J., (2004), "GIS: a biomedical text-mining system for gene information discovery.", Bioinformatics Oxford University Press 2004, Vol. 20 no. 1 2004, pages 120-121.
[11]Eccles, J.C., (1953), "The Neurophysiological Basis of Mind.", Clarendon Press, Oxford, UK.
[12]Esposito, F., Malerba, D., and Semeraro, G., (1997), "A Comparative Analysis of Methods for Pruning Decision Trees." IEEE transactions on pattern analysis and machine intelligence, vol. 19, No. 5, May 1997.
[13]Estes, W.K. and Burke, C.J., (1953), "A theory of stimulus variability in learning.", Psychological Review 6: 276-86.
[14]Franco-Lopez, H., Ek, A.R. and Bauer M.E. (2001), "Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method.", Remote Sensing of Environment, Volume 77, Issue 3, September 2001, Pages 251-274.
[15]Fukuda et al., (1998),"Toward information extraction: identifying protein names from biological papers.", In Proceedings of the 3rd Pacific Symposium on Biocomputing.
[16]Gaizauskas, R., Demetriou, G. and Humphreys K. (2000), "Term Recognition and Classification in Biological Science Journal Articles.", In Proceedings of the Computational Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on Natural Language Processing.
[17]Gunn,S.R., Brown, M. and Bossley, K.M. (1997), "Network Performance Assessment for Neurofuzzy Data Modelling ", Lecture Notes In Computer Science; Vol. 1280 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data.
[18]Han, J.W. and Kamber, M., (2001), "Data mining : concepts and techniques.", Mogan Kaufmann Publishers, (ISBN: 1-55860-489-8).
[19]Hatzivassiloglou et al, (2001), "Columbia Multi-Document Summarization: Approach and Evaluation.", In Proceedings of the Document Understanding Conference.
[20]Haykin, S., (1999), "Neural Network : a comprehensive foundation 2nd Edition.", Prentice Hall Internation.
[21]Hornik K., Stinchcome, M., and White, H., (1989), "Multilayer feedforward networks are universal approximators.", Neural Networks 2: 359-66.
[22]Joachims, T. (1998), "Text Categorization with Support Vector Machines: Learning with many Relevant Features." In Proceedings 10th European Conference on Machine Learning (ECML), Springer Verlag. Science, Number 1398 pp. 137–142.
[23]Joachims, T. (1997), "Text Categorization with Support Vector Machines: Learning with Many Relevant Features.", Proceedings of ECML-98, 10th European Conference on Machine Learning.
[24]Kanal, L., (1962), "Evaluation of a class of pattern-recognition networks.", Biological Prototypes and Synthetic Systems. Plenum Press, New York.
[25]Kantardzic, M., (2003), "Data Mining: Concepts, Models, Methods, and Algorithms.", Wiley Inter-science, (ISBN: 0-471-22852-4).
[26]Kazama, J., Makino, T., Ohta, Y. and Tsujii, J. (2002), "Tuning Support Vector Machines for Biomedical Named Entity Recognition.", Annual Meeting of the ACL archive Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3.
[27]Koller, D. and Sahami, M., (1997), "Hierarchically classifying documents using very few words." In Proceedings of ICML-97, 14th International Conference on Machine Learning (Nashville, TN, 1997), pp.170–178.
[28]Krauthammer, M. and Nenadic, G. (2004), "Term identification in the biomedical literature.", J Biomed Inform.;37(6):512-26.
[29]Langford, J., (2005), "Tutorial on Practical Prediction Theory for Classification." Journal of Machine Learning Research, vol. 6, pp. 273-306, 2005.
[30]Larkey, L. S. and Croft, W. B., (1996), "Combining classifiers in text categorization." In Proceedings of SIGIR-96, 19thACMInternational Conference on Research and Development in Information Retrieval (Z ¨ urich, Switzerland, 1996), pp.289–297.
[31]Larsen, K., (2005), "Generalized Naive Bayes Classifiers." SIGKDD Explorations, vol. 7, 2005.
[32]Lee, C.H. and Tsai, C.H., (2005), "A Classification Approach on Gene Microarray Data for Identification of Tumors.", The 10th Conference on Artificial Intelligence and Applications, Kaohsiung, Taiwan.
[33]Lee, C.H. and Yang, H.C., (2005), "A Classifier-based Text Mining Approach for Evaluating Semantic Relatedness Using Support Vector Machines.", International Conference on Information Technology (ITCC 2005)., IEEE Computer Society, Las Vegas, Nevada, USA.
[34]Lee, C.H. and Yang, H.C., (2004), "A Text Mining Approach for Text Categorization via Computing Semantic Relatedness Using Support Vector Machines.", The 2004 IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2004)
[35]Lee, C.H., Yang, C.H., Chen, T.C., and Ma, S.M., (2006), "Development of a Multi-Classifier Approach for Multilingual Text Categorization.", The 2006 International Conference on Data Mining(DMIN 2006) , Las Vegas, USA.
[36]Lee, C.H., Yang, H.C., Hsu, F.C., Chen, T.C., and Hung, C.C., (2005), "A Multiple Classifier Approach for Measuring Text Relatedness Based on Support Vector Machines Techniques.", 9th World Multi-conference on Systemics, Cybernetics and Informatics (WMSCI 2005), Orlando, USA .
[37]Lee, C.H., Yang, C.H. and Ma, S.M., (2006), " A Novel Multi-Language Text Categorization System Using Latent Semantic Indexing.", The first International Conference on Innovative Computing, Information and Control (ICICIC-06) , Beijing, China.
[38]Lewis, D. D., (1992), "An evaluation of phrasal and clustered representations on a text categorization task." In Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval (Copenhagen, Denmark, 1992), pp.37–50.
[39]Lewis, D. D. AND Gale, W. A., (1994), "A sequential algorithm for training text classifiers." In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval (Dublin, Ireland, 1994), pp.3–12.
[40]Liu, Y. (2005), "Text Mining Biomedical Literature for Genomic Knowledge Discovery.", Georgia Institute of Technology.
[41]McCallum, A. and Nigam K. (1998), "A Comparison of Event Models for Naive Bayes Text Classification", In AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48. Technical Report WS-98-05. AAAI Press.
[42]Miller, T., Leroy, G., Chatterjee, S., Fan, J. and Thoms, B. (2007), "A Classifier to Evaluate language Specificity of Medical Documents.", accepted for 40th IEEE HICSS.
[43]Nguyen, M. N. and Rajapakse, J. C. (2003), "Multi-class support vector machines for protein secondary structure prediction.", Genome Inform.; 14:218-27. Links.
[44]Polavarapu, N., Navathe, S.B., Ramnarayanan, R.ul, Haque, A., Sahay, S. and Liu, Y. (2005), "Investigation into biomedical literature classification using support vector machines.", Computational Systems Bioinformatics Conference, 2005, Proceedings., 2005 IEEE.
[45]Rish, Irina. (2001), "An empirical study of the naive Bayes classifier", IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.
[46]Ritchie, M.D., White, B.C., Parker, J.S., Hahn, L.W. and Moore, J.H., (2003), "Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases.", BMC Bioinformatics 2003, Volume 4.
[47]Robertson, S. E. and Harding, P., (1984), "Probabilistic automatic indexing by learning from human indexers." Journal of Document. Vol.40, No.4, pp.264–270.
[48]Rokach, L. and Maimon, O., (2004), "Top-Down Induction of Decision Trees Classifiers -- A Survey." IEEE transactions on systems ,man ,and cybernetics-PART C:Application and Reviews, October 2004.
[49]Rosenblatt, F., (1958), "The Perceptron: A probabilistic model for information storage and organization in the brain.", Psychological Review 65: 386-408.
[50]Rumelhart, D.E., Hinton, G.E., and Williams R.J., (1986), "Learning internal representations by error propagation.", Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA, Vol. 1, pp. 318-62.
[51]Salton, G., Wong, A., and Yang, C.S. (1975), "A Vector Space Model for Automatic Indexing.", Communications of the ACM, vol.18, no.11, pp.613-620.
[52]Salton, G. (1998), "Automatic Text Processing : the Transformation, Analysis, and Retrieval of Information by Computer.", Gerard Salton Reading, Mass. : Addison-Wwsley.
[53]Sebastiani, F., (2002), "Machine Learning in Automated Text Categorization." ACM Computing Surveys, vol. 54, 2002.
[54]Shakhnarovish, Darrell, and Indyk. (2005), "Nearest-Neighbor Methods in Learning and Vision.", The MIT Press.
[55]Tanabe, L. and Wilbur, W.J., (2002), "Tagging gene and protein names in biomedical text.", National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland 20894, USA.
[56]Torii, M., Kamboj, S. and Vijay-Shanker, K. (2003), "An investigation of various information sources for classifying biological names.", Annual Meeting of the ACL.Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13.
[57]Vapnik, V. (1999), "An Overview of Statistical Learning Theory.", IEEE Trans. Neur. Nets., Vol. 10, No. 5, pp.988-999.
[58]Vapnik, V. (1998), "Statistical Learning Theory.", Springer, N.Y.
[59]Werbos, P., (1994), "The Roots of Backpropogation: From Ordered Derivatives to Neural Networks and Political Forecasting." Wiley, New York.
[60]Wheeler D.L., Benson,D.A., Bryant,S., Canese,K., Church,D.M., Edgar,R., Federhen,S., Helmberg,W., Kenton,D., Khovayko,O. et al. (2005), "Database resources of the National Center for Biotechnology Information: Update.", Nucleic Acid Res, 33, D39–D45.
[61]Xu, Y., Mural, R.J., Einstein, J.R., Shah, M.B. and Uberbacher, E.C., (1996), "GRAIL: a multi-agent neural network system for gene identification.", Proceedings of the IEEE Volume: 84, Issue: 10 On page(s): 1544-1552.
[62]Zhang, H. (2004), "The optimality of Naïve Bayes.", In: Proc. 17th Internat. FLAIRS Conf., Florida, USA.
[63]徐豐智, (2004), "Support Vector Machines技術應用於文件相關性量測之探討", 國立高雄應用科技大學電機工程系碩士班碩士論文, 2004.
[64]馬聖珉, (2005), "一個監督式學習與非監督式學習技術應用於多國語言文件探勘之比較研究", 國立高雄應用科技大學電機工程系碩士班碩士論文, 2005.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊