論文名稱(外文):Generalized Dirichlet Priors for Naïve Bayesian Classifiers with Multinomial Models in Classifying Gene Sequence Data
指導教授(外文):Tzu-Tsung Wong
外文關鍵詞:Dirichlet distributiongene sequence data classificationgeneralized Dirichlet distributionnaïve Bayesian classifier
With the passing of time, biologists are no longer limited to make observations on Petri dishes in labs. Nowadays, they can easily obtain samples from the natural world by using the new technology developed for metagenomics. Although the new technology is helpful in studying the relationships among species and the places where they live, samples obtained in this way cannot be analyzed by traditional methods. This research attempts to propose a new operational mechanism for naïve Bayesian classifiers to classify gene sequence data for biologists. Since the number of class values or species is generally over one hundred, and the number of features extracted from gene sequence data can be more than ten thousand, the information carried by a feature for classification will be relatively little. In this case, priors can play an important role in the operation of the naïve Bayesian classifier. This research adopts Dirichlet and generalized Dirichlet distributions that have been shown to be appropriate priors for improving the performance of the naïve Bayesian classifier to enhance its prediction accuracy on gene sequence data. The experimental results on two gene sequence data sets demonstrate that priors do helpful in classifying gene sequence instances, and that a significant improvement can be achieved in a gene sequence data set in which the original prediction accuracy is poor.
摘 要 I
Abstract II
致 謝 III
目 錄 IV
表目錄 VI
圖目錄 VII
符號表 VIII
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究流程 3
第二章 文獻探討 4
2.1 簡易貝氏分類器 4
2.1.1 基本運作原理 4
2.1.2 簡易貝氏分類器的應用 6 簡易貝氏分類器應用於文件分類 6 簡易貝氏分類器應用於基因序列分類 9
2.2平滑常數 12
2.3 狄氏分配與廣義狄氏分配 14
2.3.1 狄氏分配的計算公式 14
2.3.2 廣義狄氏分配的計算公式 16
2.3.3 狄氏與廣義狄氏分配的關係 16
第三章 研究方法 18
3.1 基因序列分類流程與敘述 18
3.2 基因序列資料的前置處理 21
3.3多項式模型 22
3.4 先驗分配參數的調整以及修正方法 22
3.5 尋找最佳先驗分配參數的方法 24
3.5.1 狄氏分配參數的尋找方法 24
3.5.2 廣義狄氏分配參數的尋找方法 25
3.6 驗證方式 30
第四章 實證研究 32
4.1 資料檔介紹 32
4.2 狄氏分配之實證結果 32
4.3 廣義狄氏分配之實證結果 34
4.4 小結 39
第五章 結論與建議 40
參考文獻 42
附錄一 狄氏分配正確率變化表-Bacteria資料檔 46
附錄二 狄氏分配正確率變化表-Fungi資料檔 47
附錄三 廣義狄氏分配正確率變化表- Bacteria資料檔 49
附錄四 廣義狄氏分配正確率變化表- Fungi資料檔 51

