跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/14 17:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:劉育廷
研究生(外文):Yu-Ting Liu
論文名稱:樣本正規化於跨世代微陣列資料比較之影響
論文名稱(外文):Effect of sample-wise normalization on cross-generation comparison of microarray data
指導教授:歐昱言
學位類別:碩士
校院名稱:元智大學
系所名稱:生物科技暨生物資訊研究所
學門:生命科學學門
學類:生物科技學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:45
中文關鍵詞:樣本正規化微陣列
外文關鍵詞:normalizationmicroarray
相關次數:
  • 被引用被引用:0
  • 點閱點閱:242
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文主要是針對Affymetrix公司所生產之寡核苷酸生物微陣列,探討其所發展出的各種晶片世代所產生的資料,在運用不同的正規化方法後,用於分類工作時是否能到達一定水準的精確率。本論文所分析之Affymetrix寡核苷酸生物微陣列資料,包含四種癌症多種子類別的樣本,所使用的Affymetrix微陣列晶片跨越三個不同的世代。共來自八個不同的實驗室,針對這些來自於不同的實驗室、不同世代的微陣列資料,本論文將探討一般常使用的幾種正規化方式,其對分類準確度之影響。研究方法首先以Significance analysis of microarrays (SAM)對訓練資料集(training data)選出有鑑別之基因,再使用k-Nearest neighbor (KNN)分類器預測來自不同實驗室或不同世代的微陣列資料病人之所屬類別,藉此評斷樣本資料之正規化,對於跨世代與跨實驗室微陣列資料之整合有何影響,本論文實驗所得之類別預測準確度,可提供生醫研究人員於選擇正規化方法時之參考。
Past experiments of the popular Affymetrix (Affy) microarrays have accumulated a huge amount of public data sets. To apply them for more wide studies, the comparability across generations and experimental environments is an important research topic. This thesis particularly investigates the issue of cross-generation/laboratory predictions. That is, whether models built upon data of one generation (laboratory) can differentiate data of another. This work employs eight public data sets of four cancers. They are from different laboratories and are across various generations of Affy human microarrays. Each cancer has certain subtypes, and we investigate if a model trained from one set correctly differentiates another. This thesis compares several approaches for sample normalization to make data from different sources more comparable. Results show that the rank-based normalization leads to higher prediction accuracy than the others.
書名頁 i
中文摘要 ii
英文摘要 iii
誌謝 iv
目錄 v
圖目錄 vii
表目錄 viii
一、 序 論 1
1.1 研究動機 1
1.2 研究目的 1
二、 文 獻 探 討 4
2.1 樣本正規化 4
2.1.1 Quantile 正規化取平均數 4
2.1.2 MRS 5
2.1.3 Rank 5
2.1.4 GEO 5
2.1.5 Cyclic loess 5
2.1.6 Fast Linear Loess (fastlo) 6
2.1.7 Quantile discretization (QD) 6
2.2 基因挑選 6
2.3 交叉驗證 (Cross-validation) 7
2.4 分類分析 (Classification) 7
2.4.1 類神經網路 (Neural networks) 7
2.4.2 KNN 8
2.4.3 決策樹 (Decision tree) 8
2.4.4 Support vector machines (SVM) 8
三、 方 法 10
3.1 資料類型 10
3.2 資料來源 10
3.3 分析流程 12
3.4 分析軟體 17
3.5 分析方法 17
3.5.1 樣本正規化方法 17
3.5.1.1 Mean normalization 17
3.5.1.2 Rank Normalization 17
3.5.1.3 Quantile Normalization: 17
3.5.1.4 MRS 18
3.5.1.5 QD 18
3.5.2 基因方向的正規化(gene-wise normalization) 18
3.5.3 基因挑選 18
3.5.4 交叉驗證與分類器 18
四、 結 果 討 論 24
4.1 於跨世代資料探討樣本正規化對預測精確率的影響 24
4.2 於同世代資料去探討樣本正規化對組外預測精確率的影響 29
4.3 於各組資料去探討樣本正規化對組內預測精確率的影響 31
五、 結 論 36
六、 參考文獻 37
附 錄 41
[1] Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics 2001, 29:365–371.
[2] Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies.
Bioinformatics 2002, 18(3):405–412.
[3] Mah N, Thelin A, Nikolaus TLS, K¨uhbacher T, Gurbuz Y, Eickhoff H, Kl¨oppel G,Lehrach H, Mellg°ard B, Costello CM, Schreiber S: A comparison of oligonucleotide and cDNA-based microarray systems. Physiological Genomics 2004, 16:361–370.
[4] Irizarry RA,Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JGN, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nature Methods 2005, 2:345–350.
[5] Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J: Independence and reproducibility across microarray platforms. Nature Methods 2005, 2(5):337– 344.
[6] Bammler T, Beyer R, Bhattacharya S, Boorman G, Boyles A, Bradford B, Bumgarner R, Bushel P, Chaturvedi K, Choi D: Standardizing global gene expression analysis between laboratories and across platforms. Nature Methods 2005, 2(5):351–356.
[7] Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ: Multi-platform, multi-site, microarray-based human
tumor classification. American Journal of Pathology 2004, 164:9–16.
[8] Jiang H, Deng Y, Chen HS, Tao L, Sha Q, Chen J, Tsai CJ, Zhang S: Joint analysis of two microarray gene-expression data sets to select lung adeno carcinoma marker genes. BMC Bioinformatics 2004, 5:81.
[9] Xu L, Tan AC, Naiman DQ, Geman D, Winslow RL: Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics 2005, 21(20):3905–3911.
[10] Mary E. Ross, Rami Mahfouz, Mihaela Onciu,et al. Gene Expression Profiling of Pediatric Acute Myelogenous Leukemia. Blood.2004;03:1154
[11] Mary E. Ross, Xiaodong Zhou, Guangchun Song,et al. Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. Blood.2003;102:2951-2959.
[12] Mike West, Carrie Blanchette, Holly Dressman,et al. Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS.2001;98: 11462–11467.
[13] Erich Huang, Skye H Cheng, Holly Dressman,et al. Gene expression predictors of breast cancer outcomes. Lancet 2003; 361: 1590–96.
[14] Stephanie A Mitchell, Kevin M Brown, Michael M Henry,et al.Inter-Platfor-mcomparability of microarrays in acute lymphoblastic leukemia .BMC Geno-mics.2004;71:1-7.
[15] Karla V. Ballman, Diane E. Grill, Ann L. Oberg,et al. Faster cyclic loess:n-ormalizing RNA arrays via linear models.Bioinformatics.2004;20:2778–2-786.
[16] Xing Qiu, Andrew I Brooks, Lev Klebanov and Andrei Yakovlev. The effects of normalization on the correlation structure of microarray data.BMC Bio-informatic.2005;120:1-11.
[17] Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of
normalization methods for high density oligonucleotide
array data based on variance and bias. Bioinfor-matics 2003,
19(2):185-193.
[18] Simon RM, Korn EL, McShane LM, Radmacher MD, Wright GW,
Zhao Y: . In Design and Analysis of DNA Microarray Investigations
Springer, New York; 2003.
[19] Szabo A, Boucher K, Carroll W, Klebanov L, Tsodikov A, Yakovlev A:Variable selection and pattern recognition with gene expression data generated by the microarray technology. Mathematical Biosciences 2002, 176:71-98.
[20] Toedling J, Spang R: Assessment of Five Microarray Experiments on Gene Expression Profiling of Breast Cancer. Poster Presentation RECOMB 2003 [http://citeseer.ist.psu.edu/611350.html].
[21] Liu H, Hussain F, Tan CL, Dash M: Discretization: An enabling technique. Data Mining and Knowledge Discovery 2002, 6:393-423.
[22] The R project for statistical computing [http://www.rproject.org]
[23] Lancaster JM, Dressman HK, Whitaker RS, Havrilesky L, Gray J, Marks JR, Nevins JR, Berchuck A: Gene expression patterns that characterize advanced stage serous ovarian cancers.J Soc Gynecol Investig 2004, 11:51-59.
[24] Berchuck,A., Iversen,E.S., Lancaster,J.M., Pittman,J., Luo,J., Lee,P.,
Murphy,S., Dressman,H.K., Febbo,P.G., West,M. et al. (2005) Patterns of gene expression that characterize long-term survival in advanced stage serous ovarian cancers. Clin. Cancer Res., 11, 3686–3696.
[25] Warnat P, Eils R, Brors B: "Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes". BMC Bioinformatics 2005, 6:265.
[26] S. Dudoit, Y. H. Yang, M. J. Callow, and T. P. Speed. Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments. Statistica Sinca, 2002;12:111–139,.
[27] Cleveland WS, Devlin SJ. Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 1988;83:596–610.
[28] Ripley, Brian D. 1996. Pattern Recognition and Neural Networks. New York: Cambridge University Press.
[29] Scheffe H. The analysis of variance. New York: Wiley, 1959; 221–60.
[30] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De Jong, P.J. Lewi, A. Smeyers-Verbeke, Data handling in science and technology, in: Handbook of Chemometrics and Qualimetrics: Part A, Elsevier, Amsterdam, 1997.
[31] Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98:5116-5121.
[32] B. Schölkopf, A. Smola, R. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12, 2000, 1207-1245.
[33] Ronny Kohavi , J. Ross Quinlan, Data mining tasks and methods: Classification: decision-tree discovery, Handbook of data mining and knowledge discovery, Oxford University Press, Inc., New York, NY, 2002
[34] Hsi-Che Liu,Chien-Yu Chen,Yu-Ting Liu,Cheng-Bang Chu,Der-Cherng Liang,Lee-Yung Shih,Chih-Jen Lin. Robust Cross-generation and Cross-laboratory Predictions of Affymetrix Microarrays by Rank-based Methods. Technical report, Mackay Memorial Hospital.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top