跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.107) 您好!臺灣時間:2025/12/19 09:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林雯婷
研究生(外文):Wen-Ting Lin
論文名稱:單色微晶片資料之系統化預處理與多重顯型之因子萃取
論文名稱(外文):Systematic data preprocess procedures and factor extraction of multiple phenotypes for one-color microarray
指導教授:陳正剛陳正剛引用關係
指導教授(外文):Argon Chen
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:工業工程學研究所
學門:工程學門
學類:工業工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:47
中文關鍵詞:微晶片預處理因子萃取
外文關鍵詞:preprocessmicroarray data analysisnormalizationmultiple phenotypes
相關次數:
  • 被引用被引用:0
  • 點閱點閱:125
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
為了瞭解基因組內含的資訊,目前微陣列晶片已被廣泛地用來作為觀測基因表現的工具。學界已經提出了很多的方法與機制來從微陣列晶片資料中萃取出有用的資訊;然而,原始資料的預處理卻決定了後續所萃取出來的資訊之可靠度與準確性。本研究的第一個目的在於提出一個系統化的預處理流程,來處理微陣列晶片原始數據。其包括3個步驟:強度讀值校正、訊號標準化與壞點篩選。在強度讀值校正步驟裡,使用變異係數(coefficient of variation)來評估原始數據之平均強度與中位數強度的一致性,以決定採用何者;接著檢驗前景強度與背景強度的相關性(correlation),並矯正背景強度的影響。訊號標準化步驟分別使用對數轉換、減去中位數與除以變異數的方法轉化校正過的資料,以消除不同晶片間的亮度差異和對比差異。在訊號標準化之後,再以t假設檢定來篩除重複點(replicated spots)中的壞點。
微陣列晶片的研究已不止於觀測基因與單一顯型(phenotype)的關係,目前更新的一項發展是同時觀測多重顯型。本研究的第二個目的在於運用因子分析(Factor Analysis, FA)方法來找出多重顯型間共有的獨立因子,然後將這些處理過的因子當作個別的顯型來分析,以找出有不同表現量的基因。本研究的二個目的都在於妥善地處理單色微陣列晶片的實驗數據,使得後續之生物資訊萃取分析能夠正確且有效率。最末,我們將上述的方法運用於一批觀測基因表現與19個顯型之間關係的實驗數據,來驗證並說明所提出的預處理方法。
Microarrays are widely used to monitor gene expressions to yield information for genomes. Though there are many methods and mechanisms proposed to extract information from microarray data, the preprocess of raw expression data determine the accuracy and reliability of the extracted information. The first objective of this research is to implement a systematic procedure to preprocess the raw intensity reading. The proposed data preprocess procedure has 3 steps: rectification of intensity reading, signal normalization and bad spots screening. The rectification of intensity uses coefficient of variation (CV) to assess the consistencies of mean intensity and median intensity from raw intensity readings to decide which one to employ and then test the correlations between foreground intensity and background intensity to correct background intensity effects. Signal normalization transforms the rectified data to remove the chip-to-chip brightness variation and contrast variation by logarithm transformation, median subtraction and deviation division. After signal normalization, the hypothesis T-test is used to screen out bad expressions in replicated spots.
More recently, microarrays have been conducted not only to relate genes with one phenotype, but also inquire relations between gene expression levels and multiple phenotypes. The second objective of this research is to apply Factor Analysis (FA) to extraction of the underlying co-regulating and independent factors of the multiple phenotypes. And then the treated factors can be taken as an individual phenotype for testing differentially expressed genes. Both of the objectives are to prepare experimental readings for accurate, effective biological information mining procedure. Finally, a real case of microarray experiment investigating gene expressions in 24 human blood samples with 19 phenotypes is provided to demonstrate and test the proposed preprocessing procedures.
Abstract i
Contents iii
Contents of Figures iv
Contents of Tables v
Chapter 1: Introduction 1
1.1 Backgrounds and Motivation 1
Chapter2: Preprocess of Gene Expression Raw Data and Multiple Phenotypes Analysis 5
2.1 Rectification of Intensity Reading 6
2.1.1 Selection of intensity reading 6
2.1.2 Background intensity correction 7
2.2 Signal Normalization 12
2.2.1 Logarithm transformation 12
2.2.2 Brightness (location) normalization 14
2.2.3 Contrast (scale) normalization 16
2.3 Bad signal screening 17
2.4 Preprocess of Multiple Phenotypes 20
Chapter3: Case Study 27
Chapter4: Conclusions and Future Researches 38
References 39
Appendix A: Spot CV of median intensity and median intensity for selection of intensity in blood dataset. 42
Appendix B: More Details of Factor Analysis. 45
[1] van der Pouw Kraan TC, van Gaalen FA, Kasperkovitz PV, Verbeet NL, Smeets TJ, Kraan MC, Fero M, Tak PP, Huizinga TW, Pieterman E, Breedveld FC, Alizadeh AA, Verweij CL. Rheumatoid arthritis is a heterogeneous disease: evidence for differences in the activation of the STAT-1 pathway between rheumatoid tissues. Arthritis & Rheum. 2003 Aug; 48(8): 2132–45.
[2] Barone AD, Beecher JE, Bury PA, Chen C, Doede T, Fidanza JA, Mc-Gall GH. Photolithographic synthesis of high-density oligonucleotideprobe arrays. Nucleosides Nucleotides Nucleic Acids 2001, 20:525–531.
[3]John Quackenbush. Computational analysis of microarray data. Nature Reviews Genetics 2, June 2001 Vol. 2 No. 6:418–427 (2001).
[4] Terry Speed. Statistical analysis of gene expression microarray data. Boca Raton, FL : Chapman & Hall/CRC, 2003.
[5] Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98:5116–21.
[6]U. Alon, N. Barkai D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci., 96:6745–6750, 1999.
[7]Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).
[8] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M.L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloom_eld, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.
[9] Tamayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl Acad. Sci. USA 96, 2907–2912 (1999).
[10]Brown, M. P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA 97, 262–267 (2000).
[11]Dudoit S, Yang Y, Callow MJ, Speed TP (2000). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. sin., in press.
[12]Ando T, Suguro M, Hanai T, Kobayashi T, Honda H, Seto M. Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma. Jpn J Cancer Res. 2002 Nov; 93(11):1207–12.
[13]Richard A. Johnson, Dean W. Wichern. Applied multivariate statistical analysis. Prentice Hall 2002.
[14]Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30:e15.
[15]Chen Y., Dougherty E.R., Bittner M.L. (1997). Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics, 2, 364–374.
[16]Szabo MC, Soo KS, Zlotnik A, Schall TJ. Chemokine class differences in binding to the Duffy antigen-erythrocyte chemokine receptor. J Biol Chem. 1995 Oct 27; 270(43):25348-51.
[17]Cai Y, Shen XZ, Wang JY. Effects of glycyrrhizin on genes expression during the process of liver fibrosis. Zhonghua Yi Xue Za Zhi.2003 Jul 10; 83(13): 1122-5.
[18]Yang Y, Harvey SA, Gandhi CR. Kupffer cells are a major source of increased platelet activating factor in the CCl4-induced cirrhotic rat liver. J Hepatol. 2003 Aug; 39(2):200-7.
[19]Shioi A, Katagi M, Okuno Y, Mori K, Jono S, Koyama H, Nishizawa Y. Induction of bone-type alkaline phosphatase in human vascular smooth muscle cells: roles of tumor necrosis factor-alpha and oncostatin M derived from macrophages. Circ Res. 2002 Jul 12; 91(1):9-16.
[20]Iordanov MS, Paranjape JM, Zhou A, Wong J, Williams BR, Meurs EF, Silverman RH, Magun BE. Activation of p38 mitogen-activated protein kinase and c-Jun NH(2)-terminal kinase by double-stranded RNA and encephalomyocarditis virus: involvement of RNase L, protein kinase R, and alternative pathways. Mol Cell Biol. 2000 Jan; 20(2):617-27.
[21]de Leeuw FE, Richard F, de Groot JC, van Duijn CM, Hofman A, Van Gijn J, Breteler MM. Interaction between hypertension, apoE, and cerebral white matter lesions. Stroke 2004 May; 35(5):1057-60. Epub 2004 Apr 01.
[22]Pierre Baldi and Anthony D. Long. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics. Vol. 17 no. 6 2001: 509–519.
[23]Xiangqin Cui, J. T. Gene Hwang, Jing Qiu, Natalie J. Blades, and Gary A. Churchill. Improved statistical tests for differential gene expression by shrinking variance components estimates. (technical report from The Jackson Laboratory, Maine 04609 U.S.A.)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top