跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.91) 您好!臺灣時間:2025/01/16 19:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳子堯
研究生(外文):Tzu-Yao Chen
論文名稱:利用LINCS L1000 大數據來研究未被註解之基因的功能
論文名稱(外文):Investigating the functions of unannotated genes using LINCS L1000 big data
指導教授:王禹超
指導教授(外文):Yu-Choa Wang
學位類別:碩士
校院名稱:國立陽明大學
系所名稱:生物醫學資訊研究所
學門:生命科學學門
學類:生物化學學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:34
中文關鍵詞:整合細胞標記網路資料庫共表現網路基因的功能
外文關鍵詞:LINCSgene-gene co-expression networkgene functions
相關次數:
  • 被引用被引用:0
  • 點閱點閱:524
  • 評分評分:
  • 下載下載:20
  • 收藏至我的研究室書目清單書目收藏:1
中文摘要
整合細胞標記網路資料庫(LINCS)是一個美國國立衛生研究院所主持的計畫,這個計畫的宗旨是從不同的角度,比如說基因表現量或是其他的生物反應,去觀察細胞在各種外在因素擾動之下的改變,在這個資料庫的各種資料裏,其中又以擁有超過130萬個樣本的L1000基因表現量資料為最大宗,每個樣本都是某個正常或是癌症細胞在某種外在干擾之下的基因表現量,這些干擾因素又可以分為很多不同的種類,比如關掉某個基因的表現,或是加了某種化合物等等,本篇研究主要是利用這大量的數據來分析,並且對那些還沒有被註解功能的基因進行功能預測。
基因與基因的共表現網路是透過整合細胞標記資料庫裡的基因表現量資料來建立的,經過計算基因與基因的皮爾森相關係數,我們就能得知基因與基因是否有共表現的傾向,進而建立共表現網路。而蛋白質交互作用網路則是利用BioGRID資料庫裡蛋白質交互作用的資料來建立的。接著將這兩個網路做疊合就能得到組合網路,在組合網路裡的每對有連線的基因都是在擾動下有共表現傾向以及它們的產物蛋白質有交互作用。
因為有相同或相似功能的基因通常會有共表現傾向,且它們產物蛋白質通常會有交互作用,我們提出了一個簡易比較法,透過目標基因在組合網路中相鄰的基因去預測目標基因的功能,另外,我們也在三種不同的網路中運用簡易比較法來得到每個網路預測基因的能力,由結果得知組合網路在預測基因功能的效力方面是更為優秀的,然而,對於簡易比較法所預測出來的基因功能的正確性,這部分還有待實驗驗證。
Abstract
Library of Integrated Network-based Cellular Signatures (LINCS) is an NIH program which aims to understand biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents. L1000 gene expression data, which include about 1.3 million samples, are the most comprehensive data in LINCS. Since each sample indicates the gene expression status under the treatment of different perturbagens, such as chemical compounds, shRNAs, to normal or cancer cells, the aim of this study is to use the comprehensive big data to investigate the functions of unannotated genes.
The gene-gene co-expression network under perturbation was constructed based on LINCS L1000 gene expression data and the protein-protein interaction network was constructed using BioGRID database. These two networks were then integrated into the combined network. A gene pair in the combined network indicates that these two genes are co-expressed under perturbation and their corresponding proteins are interacted. Since genes in the local cluster tend to have similar functions, we proposed a simple comparing method to infer the functions of unannotated genes based on their neighbors in the combined network. The proposed method was also applied to three types of networks to evaluate the network performance, and the results showed that the combined network is more appropriate in functional inference of unannotated genes. Further experiments are needed to verify the inferred functions by the proposed network-based approach.
Contents
中文摘要..............................i
Abstract...............................ii
Contents...............................iii
List of Figures........................v
List of Tables.........................vi
Chapter 1 Introduction................1
1.1 Library of Integrated Network-based Cellular Signatures (LINCS)...............................1
1.2 L1000 Gene Expression Data.........2
1.3 Co-expression Networks.............5
1.4 Gene Ontology......................6
1.5 Motivation and Objective...........7
Chapter 2 Materials and Methods.......8
2.1 Method Overview....................8
2.2 Data Collection and Preprocessing.....................10
2.3 Calculation of Pearson Correlation Coefficient.....................11
2.4 Co-expression Network Construction............................12
2.5 Protein-protein Interaction (PPI) Network Construction........................................14
2.6 Combination of PPI Network and Co-expression Network..........................................14
2.7 A Simple Comparing Method for Function Inference...........................15
2.8 Network Performance...............17
Chapter 3 Results.....................18
3.1 Network Construction...............18
3.2 Functional Overlapping between Target Gene and Neighbors......................................18
3.3 Network Performance................20
3.4 Comparison SCM with gene enrichment analysis..................................................26
3.5 Gene Function Prediction for Unannotated Genes................................................26
Chapter 4 Discussion and Conclusions................................28
References............................32

List of Figures
Figure 1 | The processes of L1000 signature generation and analysis...............................4
Figure 2 | Workflow of network construction and combination.......................................9
Figure 3 | Pearson correlation coefficient distribution for giant PCC matrix......................13
Figure 4 | Pearson correlation coefficient distribution after absolute value transformation
for giant PCC matrix...................13
Figure 5 | The diagram of transformation from probe name to Entrez ID.............................15
Figure 6 | The relationship between degree and network performance for the three networks
(cut-off of prediction percentage is 50%)................................21
Figure 7 | The relationship between degree and network performance for the three networks
(cut-off of prediction percentage is 80%)..........................22
Figure 8 | The relationship between degree and network performance of the co-expression network
with degree scale 500 to 2400 (prediction accuracy 50%)...........................................30
Figure 9 | The relationship between degree and network performance of the co-expression network
with degree scale 500 to 2400 (prediction accuracy 80%)...........................................31

List of Tables
Table 1 | All perturbation types of L1000 gene expression datasets................................2
Table 2 | Sample number by treatment for 15 cell lines............................................11
Table 3 | Twenty-nine annotated GO terms of RPL7A.................................................19
Table 4 | The average degree in ‘good’/’bad’ group of combined network for different
degree thresholds......................23
Table 5 | The average degree in ‘good’/’bad’ group of co-expression network for different
degree thresholds......................24
Table 6 | The average degree in ‘good’/’bad’ group of PPI network for different
degree thresholds.....................25
Table 7 | The prediction percentage of SCM and gene enrichment analysis...........................26
Table 8 | Some unannotated genes and their inferred functions.....................................27
References
1. Lamb, J., et al., The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science, 2006. 313(5795): p. 1929-1935.
2. Bolstad, B.M., et al., A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 2003. 19(2): p. 185-193.
3. Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Research, 2007. 35(suppl 1): p. D760-D765.
4. Wang, Z., N.R. Clark, and A. Ma’ayan, Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics, 2016. 32(15): p. 2338-2345.
5. Liu, C., et al., Compound signature detection on LINCS L1000 big data. Molecular BioSystems, 2015. 11(3): p. 714-722.
6. Ji, Z., et al., Integrating genomics and proteomics data to predict drug effects using binary linear programming. PloS one, 2014. 9(7): p. e102798.
7. Young, W.C., K.Y. Yeung, and A.E. Raftery, A Posterior Probability Approach for Gene Regulatory Network Inference in Genetic Perturbation Data. arXiv preprint arXiv:1603.04835, 2016.
8. van Dam, S., et al., Gene co-expression analysis for functional classification and gene–disease predictions. Briefings in Bioinformatics, 2017: p. bbw139.
9. Pearson, K., LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901. 2(11): p. 559-572.
10. Zhang, B. and S. Horvath, A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 2005. 4(1): p. 1128.
11. Liao, Q., et al., Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network. Nucleic acids research, 2011. 39(9): p. 3864-3878.
12. Presson, A.P., et al., Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome. BMC systems biology, 2008. 2(1): p. 95.
13. Consortium, G.O., The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 2004. 32(suppl 1): p. D258-D261.
14. Chua, H.N., W.-K. Sung, and L. Wong, Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinformatics, 2006. 22(13): p. 1623-1630.
15. Yang, J., et al., The I-TASSER Suite: protein structure and function prediction. Nature methods, 2015. 12(1): p. 7-8.
16. Deng, M., et al., Prediction of protein function using protein–protein interaction data. Journal of Computational Biology, 2003. 10(6): p. 947-960.
17. Kuramochi, M. and G. Karypis. Gene classification using expression profiles: A feasibility study. in Bioinformatics and Bioengineering Conference, 2001. Proceedings of the IEEE 2nd International Symposium on. 2001. IEEE.
18. Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431(7006): p. 308-312.
19. Joshi, T., et al., Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. Omics: a journal of integrative biology, 2004. 8(4): p. 322-333.
20. Stark, C., et al., BioGRID: a general repository for interaction datasets. Nucleic Acids Research, 2006. 34(suppl 1): p. D535-D539.
21. Kendall, M.G., The advanced theory of statistics. The advanced theory of statistics., 1946(2nd Ed).
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top