跳到主要內容

臺灣博碩士論文加值系統

(44.210.151.5) 您好!臺灣時間:2024/07/13 11:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳思萍
研究生(外文):Szu-Ping Chen
論文名稱:利用基因體選拔確認最佳的基因型
論文名稱(外文):Identification of the best genotypes from a breeding population via genomic selection
指導教授:董致韡廖振鐸廖振鐸引用關係
指導教授(外文):Chih-Wei TungChen-Tuo Liao
口試委員:蔡欣甫高振宏
口試委員(外文):Shin-Fu TsaiChen-Hung Kao
口試日期:2023-06-16
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:農藝學系
學門:農業科學學門
學類:一般農業學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
論文頁數:47
中文關鍵詞:基因組選拔正規化累計折損增益廣義決定係數
外文關鍵詞:genomic selectionNDCGgeneralized coefficient of determinationCD
DOI:10.6342/NTU202302577
相關次數:
  • 被引用被引用:0
  • 點閱點閱:33
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在植物育種中,基因體選拔 (genomic selection) 可以基於基因型資料去挑選出優良的品系且能夠省去調查外表型的繁重工作。而在基因體選拔中則需要在族群中挑選出最佳的基因型組合去建立訓練族群 (training set),而使得挑選出來的訓練族群所建立的模型能夠有優良的預測效果。在本篇研究中,利用基因組BLUP (GBLUP) 預測模型估計基因型值,並以正規化累計折損增益(Normalized Discounted Cumulative Gain; NDCG) 作為評估的指標。由於在育種中,大多數時間在意的是優良的品種的表現,因此採用 NDCG 基於排序正確性的方法作為評估標準。我們提出利用廣義決定係數 (generalized coefficient of determination; CD)來找出建立訓練集的最佳的基因型,並以四個資料集分別為兩組水稻資料、一組小麥資料與一組大豆資料使用R語言進行分析模擬試驗。結果顯示 CD 方法在遺傳率低且訓練族群大小較小時所挑選出的訓練集表現對於優秀品系的正確排序能力能夠優於其他方法。
In plant breeding, genomic selection can select superior lines based on their genotypes without laborious phenotyping. For genomic selection, we have to choose the best genotypes to build the training set, which will have an excellent prediction performance. In this study, we predicted the genetic values by the GBLUP model and evaluated the performance by the Normalized Discounted Cumulative Gain (NDCG). As breeding primarily focuses on the performance of outstanding varieties, we utilize the NDCG score, a criterion based on their ranking quality. We proposed a method to find the best genotypes for building the training set by using the generalized coefficient of determination (CD) and illustrating the performance by four datasets, including two rice datasets (tropical rice and 44K rice), a wheat dataset, and a soybean dataset. We implement our simulation and analysis in R language. The simulation results show that the CD method outperforms in selecting great lines with the correct order when the trait heritability is low, or the training set size is small.
口試委員會審定書 #
誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vi
LIST OF TABLES viii
Chapter 1 Introduction 1
Chapter 2 Materials and Methods 4
2.1 Genetic dataset materials 4
2.1.1 Tropical rice dataset 4
2.1.2 Wheat dataset 4
2.1.3 44K rice dataset 5
2.1.4 Soybean dataset 5
2.2 Methods 6
2.2.1 Criteria for Training Set Optimization 6
2.2.2 Ability of a Training Set to Identify the Best Genotypes 9
2.2.3 The procedure to evaluate the ability of the training sets obtained by using the CD 11
2.2.4 A Comparison between optimization criteria 12
2.2.5 Genotypes selected from each cluster by D-efficiency 14
2.2.6 Evaluating the robustness of various methods 14
Chapter 3 Results 15
3.1 Generalized coefficient of determination (CD) 15
3.2 Evaluate the efficiency of the training sets 17
3.3 Comparison between optimization criteria based on WGR model and GBLUP model 22
Chapter 4 Discussion 28
4.1 Sampling rule to determine individuals to select in each subpopulation in optimal training set 28
4.2 Evaluation of the robustness of different methods 31
APPENDIX 40
REFERENCES 45
Akdemir, D., & Isidro-Sánchez, J. (2019). Design of training populations for selective phenotyping in genomic prediction. Scientific Reports, 9(1), 1446.
Akdemir, D., Sanchez, J. I., & Jannink, J. L. (2015). Optimization of genomic selection training populations with a genetic algorithm. Genetics Selection Evolution, 47, 1-10.
Atkinson, A., Donev, A., & Tobias, R. (2007). Optimum experimental designs, with SAS (Vol. 34). OUP Oxford.
Blondel, M., Onogi, A., Iwata, H., & Ueda, N. (2015). A ranking approach to genomic selection. PloS one, 10(6), e0128570.
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
Burges, C. J. (2010). From ranknet to lambdarank to lambdamart: An overview. Learning, 11(23-581), 81.
Chung, P. Y., & Liao, C. T. (2020). Identification of superior parental lines for biparental crossing via genomic prediction. PloS one, 15(12), e0243159.
Covarrubias-Pazaran, G. (2016). Genome-assisted prediction of quantitative traits using the R package sommer. PloS one, 11(6), e0156744.
Endelman, J. B. (2011). Ridge regression and other kernels for genomic selection with R package rrBLUP. The plant genome, 4(3).
Heffner, E. L., Lorenz, A. J., Jannink, J. L., & Sorrells, M. E. (2010). Plant breeding with genomic selection: gain per unit time and cost. Crop science, 50(5), 1681-1690.
Heslot, N., & Feoktistov, V. (2020). Optimization of selective phenotyping and population design for genomic prediction. Journal of Agricultural, Biological and Environmental Statistics, 25(4), 579-600.
Isidro, J., Jannink, J. L., Akdemir, D., Poland, J., Heslot, N., & Sorrells, M. E. (2015). Training set optimization under population structure in genomic selection. Theoretical and applied genetics, 128, 145-158.
Järvelin, K., & Kekäläinen, J. (2017, August). IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum (Vol. 51, No. 2, pp. 243-250). New York, NY, USA: ACM.
Kristensen, P. S., Jensen, J., Andersen, J. R., Guzmán, C., Orabi, J., & Jahoor, A. (2019). Genomic prediction and genome-wide association studies of flour yield and alveograph quality traits using advanced winter wheat breeding material. Genes, 10(9), 669.
Laloë, D. (1993). Precision and information in linear models of genetic evaluation. Genetics Selection Evolution, 25(6), 557-576.
Laloë, D., Phocas, F., & Menissier, F. (1996). Considerations on measures of precision and connectedness in mixed linear models of genetic evaluation. Genetics selection evolution, 28(4), 359-378.
Li, P., Wu, Q., & Burges, C. (2007). Mcrank: Learning to rank using multiple classification and gradient boosting. Advances in neural information processing systems, 20.
Meuwissen, T. H., Hayes, B. J., & Goddard, M. (2001). Prediction of total genetic value using genome-wide dense marker maps. genetics, 157(4), 1819-1829.
Ou, J. H., & Liao, C. T. (2019). TSDFGS: training set determination for genomic selection. R package version, 1(0).
Ou, J. H., & Liao, C. T. (2019). Training set determination for genomic selection. Theoretical and Applied Genetics, 132, 2781-2792.
R Core Team, (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Rincent, R., Laloë, D., Nicolas, S., Altmann, T., Brunel, D., Revilla, P., ... & Moreau, L. (2012). Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics, 192(2), 715-728.
Rincent, R., Charcosset, A., & Moreau, L. (2017). Predicting genomic selection efficiency to optimize calibration set and to assess prediction accuracy in highly structured populations. Theoretical and applied genetics, 130, 2231-2247.
Searle, S. R., Casella, G., & McCulloch, C. E. (2009). Variance components. John Wiley & Sons.
Spindel, J., Begum, H., Akdemir, D., Virk, P., Collard, B., Redoña, E., ... & McCouch, S. R. (2015). Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS genetics, 11(2), e1004982.
Stewart-Brown, B. B., Song, Q., Vaughn, J. N., & Li, Z. (2019). Genomic selection for yield and seed composition traits within an applied soybean breeding program. G3: Genes, Genomes, Genetics, 9(7), 2253-2265.
Tanaka, R., & Iwata, H. (2018). Bayesian optimization for genomic selection: a method for discovering the best genotype among a large number of candidates. Theoretical and applied genetics, 131, 93-105.
Tsai, S. F., Shen, C. C., & Liao, C. T. (2021). Bayesian optimization approaches for identifying the best genotype from a candidate population. Journal of Agricultural, Biological and Environmental Statistics, 26, 519-537.
Xu, Y., Li, P., Zou, C., Lu, Y., Xie, C., Zhang, X., ... & Olsen, M. S. (2017). Enhancing genetic gain in the era of molecular breeding. Journal of Experimental Botany, 68(11), 2641-2666.
Zhao, K., Tung, C. W., Eizenga, G. C., Wright, M. H., Ali, M. L., Price, A. H., ... & McCouch, S. R. (2011). Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature communications, 2(1), 467.
電子全文 電子全文(網際網路公開日期:20240801)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top