(3.236.118.225) 您好!臺灣時間:2021/05/17 08:47
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:溫淑惠
研究生(外文):Shu-Hui Wen
論文名稱:大型資料相關性研究之多重檢定問題
論文名稱(外文):Multiple Hypothesis Testing in Large-scale Association Studies
指導教授:蕭朱杏蕭朱杏引用關係
指導教授(外文):Chu-Hsing Hsiao
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:流行病學研究所
學門:醫藥衛生學門
學類:公共衛生學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:141
中文關鍵詞:多重檢定兩階段法Bonferroni法TPRFPRFDR
外文關鍵詞:Two-stage methodBonferroni methodTPRFPRFDR
相關次數:
  • 被引用被引用:1
  • 點閱點閱:142
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
摘 要
大型相關性研究要推論疾病與成千上萬個標誌基因之間的關係時,常藉由檢定多個標誌基因,如SNP,與疾病間之相關性來找出疾病易感受基因。在檢定過程會遇到多重檢定的問題,此時型Ⅰ誤可能會隨檢定個數增加而變大。傳統Bonferroni法雖不至於增加型Ⅰ誤,但因其過於保守,相對降低檢定力。然而,初步掃描基因的研究首重檢定力,換句話說,降低型Ⅱ誤較符合所需。本論文提出兩階段作法,在第一階段挑選相關的SNP時,不希望錯過相關性只有中等的SNP;亦即調高檢定力,希望可以挑到大部分與疾病相關的SNP。在第二階段則採取較嚴格的顯著性水準,以控制整體偽陽率。在本論文中,將提出兩階段方法的實際操作流程,並探討兩階段方法的統計特性,包括整體偽陽率及偵測力(TPR)的推導。此外,也建議兩階段設計所需的樣本數,以及第一、第二階段樣本數的選擇。然後,將藉由模擬研究評估兩階段方法的表現,並與傳統Bonferroni法比較整體偽陽率及TPR。模擬結果顯示,當病例組與對照組的基因頻率差異越小時,兩階段方法之整體TPR比Bonferroni法好;在整體偽陽率的部分,兩階段方法仍適當控制整體犯偽陽之情況。
Abstract
Multiple hypothesis testing is a commonly occurred problem in genome-wide association studies. As the number of markers increases, the overall false positive rate inflates. The traditional Bonferroni correction is so stringent that the overall power is usually small. This may not meet the primary interest of finding the markers of even mild effect. In this thesis, we propose a two-stage selection method to address this problem. The main idea is to maintain a substantial power in the first stage and control the incurred false positives in the second stage. The implementation of the proposed procedure will be provided. Its statistical properties, including the rate of diminishing non-associated SNPs, overall false positive rate, and overall true positive rate, will be derived. In addition, we will recommend the determination of the sample size under each stage. We also illustrate the proposed method with a simulation study, and compare it with Bonferroni method. The two-stage procedure performs better than Bonferroni method even when the difference in marker allele frequency between case and control group is moderate.
目 錄

第一章、研究問題背景………………………………………… 1
第一節 相關性研究................................ 1
第二節 研究動機…........................... .... 4
第三節 以單一核苷酸多型性(SNP)標誌基因為例....... 6

第二章、多重檢定現行方法回顧………………….......... 9
第一節 符號定義…………………................. 9
第二節 現有解決多重檢定的方法….……….............. 14
(一) 控制FWER的方法:Bonferroni-based methods.... 14
(二) 控制FDR的方法.................................... 18

第三章、兩階段篩選方法............................................…. 21
第一節 同時降低型Ⅰ誤與型Ⅱ誤的困難................... 22
第二節 兩階段法操作流程............................... 25
第三節 兩大特性....................................... 29
第四節 統計性質..................................... 30
第五節 不同偽陽測度量的評估........................ 52
第六節 評估指標的建議................................. 56
第七節 樣本數估計..................................... 62

第四章、模擬研究………………………………....... 67
第一節 模擬流程.............................................. 67
第二節 兩階段選取相關SNP方法在模擬資料的表現..... 71
(一) 不同測度量的評估.................................................... 71
(二) 與Bonferroni法比較................................................ 77
(三) 模擬結果總結............................. 83

第五章、討論與建議…………………………………………… 95
第一節 總結與討論............................................... 95
第二節 未來研究.................................................................. 101

參考文獻…………………………………………..…………….. 103
附錄一 簡介單一核苷酸多型性(SNP) 110
附錄二 利用Taylor展式推導比例(ratio)之漸近期望值及漸近變異數…………........................................................ 114
附錄三 檢定力公式及樣本數推導…………............................ 117
附錄四 檢定統計量的表現........................................................ 119
附錄五 FPR與FDR的關係...................................................... 125
附錄六 兩階段法與Bonferroni法不同偽陽測度量理論值... 126
附錄七 Bonferroni法與兩階段法:在第二階段重抽獨立樣本
(N1+N2) 模擬結果......................................................... 128
附錄八 兩階段法之第二階段採取BH procedure之模擬結果.................................................................................... 130
模擬程式........................................................................................ 133
參考文獻
Barkur, S.S. (2002), “SNP alleles in human disease and evolution”, Journal of Human Genetics, 47, 561-66.
Barnes, M.R. & Gray, I.C. (2003), Bioinformatics for Geneticists, Hoboken, New Jersey: Wiley.
Benjamini, Y. & Hochberg, Y. (1995), “Controlling the false discovery rate: A practical and powerful approach to multiple testing”, Journal of the Royal Statistical Society, Series B (Methodological), Vol. 57, No. 1, 289-300.
Botstein, D., White, R.L., Skolinick, M. & Davis, R.W. (1980), “Construction of a genetic linkage map in man using restriction fragment length polymorphisms”, American Journal of Human Genetics, 32, 314-331.
Botstein, D. & Risch, N. (2003), “Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease”, Nature Genetics, 33, 228-237.
Brookes, A.J. (1999), “The essence of SNPs”, Gene, 234, 177-186.
Cardon, L.R. & Bell, J.I. (2001), “Association study designs for complex diseases”, Nature Reviews Genetics, 2, 91-99.
Collins, A., Lonjou, C. and Morton, N.E. (1999), “ Genetic Epidemiology of single-nucleotide polymorphisms”, Proceedings of the National Academy of Sciences of the United States of America, 96, 15173-15177.
Collins, F.S., Guyer, M.S., & Chakravarti, A. (1997), “Variations on a theme: cataloging human DNA sequence variatoin”, Science, 278, 1580-1581.
Dale, R.N. (2004), “A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other”, American Journal of Human Genetics, 74, 765-769.
Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003), “Multiple hypothesis testing in microarray experiments”, Statistical Science, 18, 71-103.
Ewens, W.J. & Spielman, R.S.(2001) “Locating genes by linkage and association”, Theoretical Population Biology, 60, 135-139
Falk, C.T. & Rubinstein, P. (1987), “Haplotype relative risk: an easy reliable way to construct a proper control sample for risk calculation”, Annals of Human Genetics, 51, 227-233.
Fleiss, J.L. (1981), Statistical Methods for Rates and Proportions, 2nd edition, New York : Wiley.
Ge, Y., Dudoit, S. and Speed, T. P. (2003), “Resampling-based Multiple Testing for Microarray Data Analysis”, Test, 12, 1-77.
Genovese, C. & Wasserman, L. (2002), “Operating characteristics and extensions of the false discovery rate procedure”, Journal of the Royal Statistical Society, Series B (Methodological), Vol. 64, Part. 3, 499-517.
Greg, G. & Spencer, V.M. (2002), A Primer of Genome Science, Sunderland, Massachusette: Sinauer
Holm, S. (1979), “A simple sequentially rejective multiple test procedure”, Scandinavian Journal of Statistics, 6, 65-70.
Hochberg, Y. (1988), “A sharper Bonferroni procedure for multiple tests of significance”, Biometrika, 75, 800-802.
Hochberg, Y. & Tamhane, A.C. (1987), Multiple Comparison Procedures, New York: Wiley.
Hommel, G. (1988), “A stagewise rejective multiple test procedure based on a modified Bonferroni method”, Biometrika, 75, 383-386.
Johnson, G.C.L & Todd, J.A.(2002), “Strategies in complex disease mapping”, Trends in Genetics, 18, s25-s29
Jonathan, L.H. & Margaret, A.P.V. (1998), Approaches to Gene Mapping in Complex Human Diseases, New York : Wiley-Liss.
Jurg, O. (1999), Analysis of Human Genetic Linkage, the third edition, Baltimore: Johns Hopkins University Press.
Kelsey, J.L., Whittemore A.S., Evans A.S. & Thompson W.D. (1996), Methods in Observational Epidemiology, New York: Oxford University Press.
Kerem, B. et al. (1989), “Identification of the cystic fibrosis gene: genetic analysis”, Science, 245, 1073-1080.
Kouichi, O. et al. (2002), “Functional SNPs in the lymphotoxin-?gene that are associated with susceptibility to myocardial infarction”,Nature Genetics, published online http://www.nature.com/naturegenetics
Kruglyak, L. (1999), “Prospects of whole-genome linkage disequilibrium mapping of common disease genes”, Nature Genetics. 22, 139-144.
Lander, E.S. & Kruglyak, L. (1995), “Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results”, Nature Genetics. 11, 241-247.
Nelson, R. (2000) “Estimation of population parameters and recombination rates from single nucleotide polymorphism”, Genetics, 154, 931-942.
Page, G.P., George, V., Go, R.C., Page, P.Z. & Allison, D.B. (2003), “ “Are we there yet?”: deciding when one has demonstrated specific genetic causation in complex and quantitative traits”, American Journal of Human Genetics, 73, 711-719.
Reich, D. E., Gabriel, S. B. & Altshuler, D. (2003), “Quality and completeness of SNP databases”, Nature Genetics, 33, 457-458.
Risch, N.J. (2000), “Searching for genetic determinants in the new millennium”, Nature, 405, 847-856.
Risch, N.J. & Merikangas, K. (1996), “The future of genetic studies of complex human diseases”, Science, 273, 1516-1517.
Rom, D. M. (1990), “A sequentially rejective test procedure based on a modified Bonferroni inequality”, Biometrika, 77, 663-665.
Sachidanandam, R. et al. (2001), “A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms”, Nature, 409, 928-333.
Shaffer, J. P. (1995), “Modified sequentially rejective multiple test procedures”, Journal of American Statistical Association, 81, 826-831.
Shaffer, J. P. (1995), “Multiple hypothesis testing”, Annual Review of Psychology. 46, 561-584.
Sham, P. (1998), Statistics in Human Genetics, New York: John Wiley & Sons, Inc.
Simes, R. J. (1986), “An improved Bonferroni procedure for multiple tests of significance”, Biomstrika, 73, 751-754.
Soriç, B. (1989), ”Statistical ‘discoveries’ and effect size estimation”,
Journal of the American Statistical Association, 84, 608-610.
Spielman, R.S. McGinnis, R.E. and Ewens, W.J.(1993) “Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Inculin-dependent Diabetes Mellitus (IDDM) ”, American Journal of Human Genetics, 52:506-16
Spjøtvoll, E. (1972), “On the optimality of some multiple comparison procedure”, Annals of Mathematical Statistics, 43, 398-411.
Storey, J. D. (2002), “A direct approach to false discovery rates”, Journal of the Royal Statistical Society, Series B (Methodological), Vol. 64, Part. 3, 479-498.
Storey, J. D. & Tibshirani, R. (2003), “Statistical significance for genomewide studies”, Proceedings of the National Academy of Sciences USA Vol. 100, No.16, 9440-9445.
Storey, J. D. (2003), ”The positive flase discovery rate: A Bayesian interpretation and the q-value”, The Annals of Statistics, Vol. 31, No. 6, 1-23.
Westfall, P. H. & Young, S.S. (1993), Resampling-Based Multiple Testing: Examples and Methods for p-value Adjustment, New York: Wiley.
Witherly, J.L., Perry, G..P. & Leja, D.L. (2001), An A to Z of DNA Science: What Scientists Mean When They Talk about Genes and Genomes, Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top