

( 您好!臺灣時間:2025/01/16 19:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Tzu-Yao Chen
論文名稱:利用LINCS L1000 大數據來研究未被註解之基因的功能
論文名稱(外文):Investigating the functions of unannotated genes using LINCS L1000 big data
指導教授(外文):Yu-Choa Wang
外文關鍵詞:LINCSgene-gene co-expression networkgene functions
  • 被引用被引用:0
  • 點閱點閱:524
  • 評分評分:
  • 下載下載:20
  • 收藏至我的研究室書目清單書目收藏:1
Library of Integrated Network-based Cellular Signatures (LINCS) is an NIH program which aims to understand biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents. L1000 gene expression data, which include about 1.3 million samples, are the most comprehensive data in LINCS. Since each sample indicates the gene expression status under the treatment of different perturbagens, such as chemical compounds, shRNAs, to normal or cancer cells, the aim of this study is to use the comprehensive big data to investigate the functions of unannotated genes.
The gene-gene co-expression network under perturbation was constructed based on LINCS L1000 gene expression data and the protein-protein interaction network was constructed using BioGRID database. These two networks were then integrated into the combined network. A gene pair in the combined network indicates that these two genes are co-expressed under perturbation and their corresponding proteins are interacted. Since genes in the local cluster tend to have similar functions, we proposed a simple comparing method to infer the functions of unannotated genes based on their neighbors in the combined network. The proposed method was also applied to three types of networks to evaluate the network performance, and the results showed that the combined network is more appropriate in functional inference of unannotated genes. Further experiments are needed to verify the inferred functions by the proposed network-based approach.
List of Figures........................v
List of Tables.........................vi
Chapter 1 Introduction................1
1.1 Library of Integrated Network-based Cellular Signatures (LINCS)...............................1
1.2 L1000 Gene Expression Data.........2
1.3 Co-expression Networks.............5
1.4 Gene Ontology......................6
1.5 Motivation and Objective...........7
Chapter 2 Materials and Methods.......8
2.1 Method Overview....................8
2.2 Data Collection and Preprocessing.....................10
2.3 Calculation of Pearson Correlation Coefficient.....................11
2.4 Co-expression Network Construction............................12
2.5 Protein-protein Interaction (PPI) Network Construction........................................14
2.6 Combination of PPI Network and Co-expression Network..........................................14
2.7 A Simple Comparing Method for Function Inference...........................15
2.8 Network Performance...............17
Chapter 3 Results.....................18
3.1 Network Construction...............18
3.2 Functional Overlapping between Target Gene and Neighbors......................................18
3.3 Network Performance................20
3.4 Comparison SCM with gene enrichment analysis..................................................26
3.5 Gene Function Prediction for Unannotated Genes................................................26
Chapter 4 Discussion and Conclusions................................28

List of Figures
Figure 1 | The processes of L1000 signature generation and analysis...............................4
Figure 2 | Workflow of network construction and combination.......................................9
Figure 3 | Pearson correlation coefficient distribution for giant PCC matrix......................13
Figure 4 | Pearson correlation coefficient distribution after absolute value transformation
for giant PCC matrix...................13
Figure 5 | The diagram of transformation from probe name to Entrez ID.............................15
Figure 6 | The relationship between degree and network performance for the three networks
(cut-off of prediction percentage is 50%)................................21
Figure 7 | The relationship between degree and network performance for the three networks
(cut-off of prediction percentage is 80%)..........................22
Figure 8 | The relationship between degree and network performance of the co-expression network
with degree scale 500 to 2400 (prediction accuracy 50%)...........................................30
Figure 9 | The relationship between degree and network performance of the co-expression network
with degree scale 500 to 2400 (prediction accuracy 80%)...........................................31

List of Tables
Table 1 | All perturbation types of L1000 gene expression datasets................................2
Table 2 | Sample number by treatment for 15 cell lines............................................11
Table 3 | Twenty-nine annotated GO terms of RPL7A.................................................19
Table 4 | The average degree in ‘good’/’bad’ group of combined network for different
degree thresholds......................23
Table 5 | The average degree in ‘good’/’bad’ group of co-expression network for different
degree thresholds......................24
Table 6 | The average degree in ‘good’/’bad’ group of PPI network for different
degree thresholds.....................25
Table 7 | The prediction percentage of SCM and gene enrichment analysis...........................26
Table 8 | Some unannotated genes and their inferred functions.....................................27
1. Lamb, J., et al., The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science, 2006. 313(5795): p. 1929-1935.
2. Bolstad, B.M., et al., A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 2003. 19(2): p. 185-193.
3. Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Research, 2007. 35(suppl 1): p. D760-D765.
4. Wang, Z., N.R. Clark, and A. Ma’ayan, Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics, 2016. 32(15): p. 2338-2345.
5. Liu, C., et al., Compound signature detection on LINCS L1000 big data. Molecular BioSystems, 2015. 11(3): p. 714-722.
6. Ji, Z., et al., Integrating genomics and proteomics data to predict drug effects using binary linear programming. PloS one, 2014. 9(7): p. e102798.
7. Young, W.C., K.Y. Yeung, and A.E. Raftery, A Posterior Probability Approach for Gene Regulatory Network Inference in Genetic Perturbation Data. arXiv preprint arXiv:1603.04835, 2016.
8. van Dam, S., et al., Gene co-expression analysis for functional classification and gene–disease predictions. Briefings in Bioinformatics, 2017: p. bbw139.
9. Pearson, K., LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901. 2(11): p. 559-572.
10. Zhang, B. and S. Horvath, A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 2005. 4(1): p. 1128.
11. Liao, Q., et al., Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network. Nucleic acids research, 2011. 39(9): p. 3864-3878.
12. Presson, A.P., et al., Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome. BMC systems biology, 2008. 2(1): p. 95.
13. Consortium, G.O., The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 2004. 32(suppl 1): p. D258-D261.
14. Chua, H.N., W.-K. Sung, and L. Wong, Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinformatics, 2006. 22(13): p. 1623-1630.
15. Yang, J., et al., The I-TASSER Suite: protein structure and function prediction. Nature methods, 2015. 12(1): p. 7-8.
16. Deng, M., et al., Prediction of protein function using protein–protein interaction data. Journal of Computational Biology, 2003. 10(6): p. 947-960.
17. Kuramochi, M. and G. Karypis. Gene classification using expression profiles: A feasibility study. in Bioinformatics and Bioengineering Conference, 2001. Proceedings of the IEEE 2nd International Symposium on. 2001. IEEE.
18. Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431(7006): p. 308-312.
19. Joshi, T., et al., Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. Omics: a journal of integrative biology, 2004. 8(4): p. 322-333.
20. Stark, C., et al., BioGRID: a general repository for interaction datasets. Nucleic Acids Research, 2006. 34(suppl 1): p. D535-D539.
21. Kendall, M.G., The advanced theory of statistics. The advanced theory of statistics., 1946(2nd Ed).
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top