跳到主要內容

臺灣博碩士論文加值系統

(44.200.140.218) 您好!臺灣時間:2024/07/19 00:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:彭姵晨
研究生(外文):Pei-Chen Peng
論文名稱:以遺傳基因體資料推導基因調控網路及預測複雜表現型
論文名稱(外文):Gene Regulatory Network Inference and Complex Phenotype Prediction from Genetical Genomics Data
指導教授:趙坤茂
指導教授(外文):Kun-Mao Chao
口試委員:王弘倫陳怡靜
口試委員(外文):Hung-Lung WangYi-Ching Chen
口試日期:2013-06-11
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:47
中文關鍵詞:遺傳基因體基因調控網路推導複雜表現型預測
外文關鍵詞:Genetical genomicsGene regulatory network reconstructionComplex phenotype prediction.
相關次數:
  • 被引用被引用:0
  • 點閱點閱:187
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
基因表現量與基因型資料近年來成指數成長,為了廣泛運用此類型資料於各項研究,統整分析遺傳基因體資料,並尋找顯著表現基因,已成為目前的趨勢。
此研究提出一套簡單的遺傳基因體資料分析流程,首先整合基因表現量與基因型資料,接著以隨機森林演算法進行特徵選取。這套流程被應用於兩個項目: 推導基因調控網路及預測複雜表現型。十五個分別含有一千個基因的基因調控網路被推導,我們以接收操作特徵曲線下的面積及精確與檢索率曲線下面積來評量推導結果。關於預測複雜表現型方面,此套流程可被用來預測大豆的抗病能力,我們以斯皮爾曼等級相關係數來評量預測結果。實驗結果顯示,不論在模擬或真實的遺傳基因體資料,此套分析流程的效果都優於其他方法。整合基因表現量與基因型資料是分析遺傳基因體資料的關鍵步驟。此外,隨機森林演算法是一個找出顯著表現基因的理想方法。

The amount of gene expression pro ling and genotype data have grown exponentially. To apply them for extensive studies, integrated analysis of genetical genomics data becomes a trend and thus identifying relevant genes of specifc response is an essential issue. We propose a simple workflow for genetical genomics data analysis which includes integration of genotype and gene expression data as well as Random Forest feature selection. The proposed workflow is utilized in two applications: gene regulatory network inference and complex phenotype prediction. Fifteen different gene networks composed of one thousand genes respectively are reconstructed. Area under Receiver Operator Characteristic curve and Precision-Recall curve are measured for inference performance. For the other application, disease susceptibility of soybean plants are predicted. Spearman''s rank correlation coefficient is used for prediction evaluation. Results show that our method outperforms other methods in both simulated and real genetical genomics data. Integration of genotype and gene expression is a pivotal step in genetical genomics data analysis. And Random Forest is an ideal way to find out relevant genes for further applications.

口試委員會審定書 iii
致謝 v
摘要 vii
Abstract ix

1 Introduction 1
1.1 Feature Selection in Bioinformatics . . . . . . . . . . 2
1.2 Gene Regulatory Network . . . . . . . . . . . . . . . . 3
1.3 Complex Phenotype Prediction . . . . . . . . . . . . . 5
1.4 Workflow of Genetical Genomics Data Analysis and Its
Applications . . . . . . . . . . . . . . . . . . . . . . . 6
2 Genetical Genomics Data Analysis 9
2.1 Genetical Genomics . . . . . . . . . . . . . . . . . . . 9
2.2 Integration of Genotype and Gene Expression Data . . 11
2.3 Feature Selection by Random Forest . . . . . . . . . . 13
3 Gene Regulatory Network Inference 17
3.1 Genetical Genomics Data Collection . . . . . . . . . . 17
3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Network Inference as a Feature Selection Problem 18
3.2.2 Genetical Genomics Data Analysis . . . . . . . 19
3.2.3 Ranked Edge List . . . . . . . . . . . . . . . . 20
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Evaluation of Performance . . . . . . . . . . . 21
3.3.2 Assessment of Each Network Inference . . . . 22
4 Complex Phenotype Prediction 29
4.1 Genetical Genomics Data Collection . . . . . . . . . . 29
4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Genetical Genomics Data Analysis . . . . . . 31
4.2.2 Support Vector Regression . . . . . . . . . . . 32
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Evaluation of Performance . . . . . . . . . . . 33
4.3.2 Assessment of Predictions . . . . . . . . . . . 33
5 Concluding Remarks 37
5.1 Genetical Genomics Data Analysis Performs Well in Simulated
and Real Data . . . . . . . . . . . . . . . . . . . 37
5.2 Integration of Genotype and Expression Data is Pivotal 38
5.3 Random Forest is Suitable for Genetical Genomics Data
Feature Selection . . . . . . . . . . . . . . . . . . 39

Bibliography. . . . . . . . . . . . .. . .. . 41

[1] M. Ackermann, M. Clement-Ziza, J. J. Michaelson, and A. Beyer. Teamwork: improved eqtl mapping using combinations of machine learning methods. PLoS One, 7(7):e40916, 2012.
[2] A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769):503–511, 2000.
[3] M. Bhattacharjee and M. J. Sillanpaa. A Bayesian mixed regression based prediction of quantitative traits from molecular marker and gene expression data. PLoS One, 6(11):e26959, 2011.
[4] H. Bolouri. Computational modelling of gene regulatory networks: a primer. World Scientific, 2008.
[5] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
[6] R. B. Brem and L. Kruglyak. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences of the United States of America, 102(5):1572–1577, 2005.
[7] L. Bullinger, K. Dohner, E. Bair, S. Frohling, R. F. Schlenk, R. Tibshirani, H. Dohner, and J. R. Pollack. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. New England Journal of Medicine, 350(16):1605–1616, 2004.
[8] L. Bullinger, K. Dohner, R. Kranz, C. Stirner, S. Frohling, C. Scholl, Y. H. Kim, R. F. Schlenk, R. Tibshirani, H. Dohner, et al. An flt3 gene-expression signature predicts clinical outcome in normal karyotype aml. Blood, 111(9):4490–4495, 2008.
[9] J. Davis and M. Goadrich. The relationship between precisionrecall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240. ACM, 2006.
[10] A. De La Fuente, N. Bing, I. Hoeschele, and P. Mendes. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20(18):3565–3574, 2004.
[11] G. De Los Campos, H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel, and J. M. Cotes. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics, 182(1):375–385, 2009.
[12] D. di Bernardo, M. J. Thompson, T. S. Gardner, S. E. Chobot, E. L. Eastwood, A. P. Wojtovich, S. J. Elliott, S. E. Schaus, and J. J. Collins. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nature Biotechnology, 23(3):377–383, 2005.
[13] R. Diaz-Uriarte and S. A. De Andres. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1):3, 2006.
[14] E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, and A. Weingessel. Misc functions of the Department of Statistics (e1071), TU Wien. R Package, pages 1–5, 2008.
[15] F. Emmert-Streib, G. V. Glazko, G. Altay, and R. de Matos Simoes. Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Frontiers in Genetics, 3, 2012.
[16] J. J. Faith, B. Hayete, J. T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J. J. Collins, and T. S. Gardner. Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1):e8, 2007.
[17] G. Finak, N. Bertos, F. Pepin, S. Sadekova, M. Souleimanova, H. Zhao, H. Chen, G. Omeroglu, S. Meterissian, A. Omeroglu, et al. Stromal gene expression predicts clinical outcome in breast cancer. Nature Medicine, 14(5):518–527, 2008.
[18] R. Flassig, S. Heise, K. Sundmacher, and S. Klamt. An effective framework for reconstructing gene regulatory networks from genetical genomics data. Bioinformatics, 29(2):246–254, 2013.
[19] N. Friedman, M. Linial, I. Nachman, and D. Pe’er. Using bayesian networks to analyze expression data. Journal of computational biology, 7(3-4):601–620, 2000.
[20] L. Glass and S. A. Kauffman. The logical analysis of continuous, non-linear biochemical control networks. Journal of Theoretical Biology, 39(1):103 – 129, 1973.
[21] A. J. Hartemink, D. K. Gifford, T. S. Jaakkola, R. A. Young, et al. Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. In Pacific Symposium Biocomputing, volume 6, pages 422–433, 2001.
[22] M. Hecker, S. Lambeck, S. Toepfer, E. van Someren, and
R. Guthke. Gene regulatory network inference: data integration in dynamic models-a review. Biosystems, 96(1):86–103, 2009.
[23] L.-C. Huang, S.-Y. Hsu, E. Lin, et al. A comparison of classification methods for predicting chronic fatigue syndrome based on genetic data. Journal of Translational Medicine, 7(1):81, 2009.
[24] A. Irrthum, L. Wehenkel, P. Geurts, et al. Inferring regulatory networks from expression data using tree-based methods. PLoS One, 5(9):e12776, 2010.
[25] R. C. Jansen. Studying complex biological systems using multifactorial perturbation. Nature Reviews Genetics, 4(2):145–151, 2003.
[26] R. C. Jansen and J.-P. Nap. Genetical genomics: the added value from segregation. Trends in Genetics, 17(7):388–391, 2001.
[27] G. Karlebach and R. Shamir. Modelling and analysis of gene regulatory networks. Nature Reviews Molecular Cell Biology, 9(10):770–780, 2008.
[28] J. J. Keurentjes, J. Fu, I. R. Terpstra, J. M. Garcia, G. van den Ackerveken, L. B. Snoek, A. J. Peeters, D. Vreugdenhil, M. Koornneef, and R. C. Jansen. Regulatory network construction in arabidopsis by using genome-wide gene expression quantitative trait loci. Proceedings
of the National Academy of Sciences, 104(5):1708–1713,
2007.
[29] H. Lan, M. Chen, J. B. Flowers, B. S. Yandell, D. S. Stapleton, C. M. Mata, E. T.-K. Mui, M. T. Flowers, K. L. Schueler, K. F. Manly, et al. Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genetics, 2(1):e6, 2006.
[30] K.-C. Liang and X. Wang. Gene regulatory network reconstruction using conditional mutual information. EURASIP Journal on Bioinformatics and Systems Biology, 2008, 2008.
[31] A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18–22, 2002.
[32] P.-R. Loh, G. Tucker, and B. Berger. Phenotype prediction using regularized regression on genetic data in the DREAM5 Systems Genetics B Challenge. PLoS One, 6(12):e29095, 2011.
[33] A. A. Margolin, I. Nemenman, K. Basso, C. Wiggins,
G. Stolovitzky, R. D. Favera, and A. Califano. Aracne: an
algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7(Suppl 1):S7, 2006.
[34] F. Markowetz. Support vector machines in bioinformatics. Master’s thesis, University of Heidelberg, 2001.
[35] L. B. J. F. R. Olshen and C. J. Stone. Classification and regression trees. Wadsworth International Group, 1984.
[36] A. Pinna, N. Soranzo, I. Hoeschele, and A. de la Fuente. Simulating systems genetics data with sysgensim. Bioinformatics, 27(17):2459–2462, 2011.
[37] H. W. Ressom, R. S. Varghese, Z. Zhang, J. Xuan, and R. Clarke. Classification algorithms for phenotype prediction in genomics and proteomics. Frontiers in Bioscience: a Journal and Virtual Library, 13:691, 2008.
[38] Y. Saeys, I. Inza, and P. Larranaga. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507–2517, 2007.
[39] H. Saigo, T. Uno, and K. Tsuda. Mining complex genotypic features for predicting hiv-1 drug resistance. Bioinformatics, 23(18):2455-2462, 2007.
[40] E. E. Schadt, J. Lamb, X. Yang, J. Zhu, S. Edwards,
D. GuhaThakurta, S. K. Sieberts, S. Monks, M. Reitman, C. Zhang, et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nature genetics, 37(7):710-717, 2005.
[41] I. Shmulevich, E. R. Dougherty, S. Kim, and W. Zhang. Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics, 18(2):261-274, 2002.
[42] C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8(1):25, 2007.
[43] Z. Szallasi and S. Liang. Modeling the normal and neoplastic cell cycle with realistic boolean genetic networks: Their application for understanding carcinogenesis and assessing therapeutic strategies. In proceedings of Pacific Symposium on Biocomputing, volume 3, pages 66-76, 1998.
[44] V. G. Tusher, R. Tibshirani, and G. Chu. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 98(9):5116-5121, 2001.
[45] L. J. van’t Veer, H. Dai, M. J. Van De Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871):530–536, 2002.
[46] V. Vapnik. The nature of statistical learning theory. Springer, 1999.
[47] M. Vignes, J. Vandel, D. Allouche, N. Ramadan-Alban, C. Cierco-Ayrolles, T. Schiex, B. Mangin, and S. de Givry. Gene regulatory network reconstruction using bayesian networks, the dantzig selector, the lasso and their meta-analysis. PLoS ONE, 6(12):e29165, 12 2011.
[48] B. Weir. Impact of dense genetic marker maps on plant population genetic studies. Euphytica, 154(3):355–364, 2007.
[49] M. West, G. S. Ginsburg, A. T. Huang, and J. R. Nevins. Embracing the complexity of genomic data for personalized medicine. Genome Research, 16(5):559–566, 2006.
[50] M. Xiong, J. Li, and X. Fang. Identification of genetic networks. Genetics, 166(2):1037–1052, 2004.
[51] E.-J. Yeoh, M. E. Ross, S. A. Shurtleff, W. K. Williams, D. Patel, R. Mahfouz, F. G. Behm, S. C. Raimondi, M. V. Relling, A. Patel, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.
Cancer Cell, 1:133–144, 2002.
[52] L. Zhou, S. Mideros, L. Bao, R. Hanlon, F. Arredondo, S. Tripathy, K. Krampis, A. Jerauld, C. Evans, S. St Martin, et al. Infection and genotype remodel the entire soybean transcriptome. BMC Genomics, 10(1):49, 2009.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top