跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.86) 您好!臺灣時間:2025/02/12 11:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:周偉堯
研究生(外文):Wei-Yao Chou
論文名稱:多重索引序列排比應用於群組特徵擷取之研究
論文名稱(外文):A Study on Group Feature Extraction Based on Multiple Indexing Sequence Alignment
指導教授:白敦文
指導教授(外文):Tun-Wen Pai
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:50
中文關鍵詞:共同近似短片段多重索引序列排比組合式特徵獨特群組特徵
外文關鍵詞:approximate consensus motifMultiple Indexing Sequence Alignmentcombinatorial featureexclusive group feature
相關次數:
  • 被引用被引用:0
  • 點閱點閱:215
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
在過去相關的論文及生物實驗中顯示,具有相同功能的生物序列中,通常擁有共同近似短片段。有鑑於此,本論文提出一套基於共同相似短片段的多重索引序列排比方法,來擷取出共同的組合式群組特徵。首先,我們採用具有不同容忍特性的開放式基座容忍或替換容忍來搜尋共同的相似樣本。開放式基座容忍的特徵是允許樣本可以具有限制個基座殘基相異;替換容忍則是依據殘基的化學替換性質允許樣本中殘基可以替換的方式。透過共同片段的搜尋比對與分析,愈是相似的序列,將擁有愈多的共同片段,此假設可以提供多重序列進行階層式分群的基本依據,亦是群組特徵擷取的首要步驟。而多重索引序列排比係將共同片段各別以唯一的數值來表示,以區域片段取代原來的單一字元,採取由下而上的概念,來實現片段對片段的索引序列排比方法。結合群組分析及多重索引序列排比技術,各子群的組合式特徵可以進行擷取與分析,若進一步將各子群間的共同片段去除,則可過濾出每一子群所屬的獨特群組特徵。本論文提出的方法論已用來分析實際的生物資料,並與其它知名系統進行比較結果,證明本論文所提出的機制能正確有效地擷取出群組特徵。
Previous studies and biological experiments have revealed that protein sequences with same functionality always possess common or highly conserved motifs. Here, a novel method, Multiple Indexing Sequence Alignment (MISA) is developed to extract combinatorial features from a set of family sequences. To search tolerable consensus motifs, variable-site and substitution tolerance are considered. Variable-site tolerance allows limited number of residues mismatched in a pattern; substitutable tolerance allows similar residues substituted in a pattern according to chemical properties. Based on searched consensus motifs, sequence possessing high similarities are grouped together to achieve hierarchical clustering. Furthermore, sequences in residues are replaced by labeled consensus motifs to perform MISA in a labeled-motif versus labeled-motif manner. Combining the results of hierarchical clustering and MISA, the system extracts combinatorial features from each subgroup and identifies common motifs among the target subgroup and others. The common motifs can be removed to obtain the exclusive group features of the target set. We have employed and compared several real biological data to show that the proposed mechanisms are practical to extract the combinatorial features. Also, comparisons with other well-known algorithms show that our proposed methodologies provide better performance than others.
摘 要 I
ABSTRACT III
TABLE OF CONTENTS IV
LIST OF TABLES V
LIST OF FIGURES VI
LIST OF EQUATIONS VIII
1. INTRODUCTION 1
2. SYSTEM ARCHITECTURE 4
3. METHODOLOGY DESCRIPTION 10
3.1 MOTIF FINDING 10
3.1.1 Variable-site Tolerance 10
3.1.2 Substitutable Tolerance 14
3.2 HIERARCHICAL CLUSTERING 17
3.3 MULTIPLE INDEXING SEQUENCE ALIGNMENT 19
3.4 EXCLUSIVE GROUP FEATURE EXTRACTION 26
3.5 BACKGROUND MODEL ANALYSIS 27
4. EXPERIMENTAL RESULTS 30
4.1 COMPARISON WITH EXISTING ALGORITHMS 30
4.2 TRANSCRIPTION ELEMENTS PREDICTION 34
4.3 EXCLUSIVE GROUP FEATURE EXTRACTION 36
5. CONCLUSIONS 40
REFERENCES 42
APPENDIX A 45
1. Chan, S., K. Wong, and K. Chiu, A survey of multiple sequence comparison methods. Bulletin of Mathematical Biology, 1992. 54: p. 563-598.
2. Needleman, S. and W. Christian, A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 1970. 3: p. 443-453.
3. Wang, L. and T. Jiang, On The Complexity of Multiple Sequence Alignment. Journal of Computational Biology, 1994. 1: p. 337-348.
4. Bonizzoni, P. and G. Vedova, The complexity of multiple sequence alignment with SP-score that is a metric. Theoretical Computer Science, 2001. 259(1-2): p. 63-79.
5. Pevzner, P., Multiple alignment, communication cost, and graph matching. Applied Mathematics, 1992. 52: p. 1763-1779.
6. Gusfield, D., Efficient methods for multiple sequence alignment with guaranteed error bounds. Bulletin of Mathematical Biology, 1993. 30(30): p. 141-154.
7. Bafna, V., E. Lawler, and P. Pevzner, Approximation algorithms for multiple sequence alignment. Theoretical Computer Science, 1997. 182(1-2): p. 233-244.
8. Hudek, A. and D. Brown, Ancestral sequence alignment under optimal conditions. BMC bioinformatics, 2005. 6: p. 273.
9. Bray, N. and L. Pachter, MAVID: Constrained Ancestral Alignment of Multiple Sequences. Genome Research, 2004. 14: p. 693-699.
10. Thompson, J., D. Higgins, and T. Gibson, CLUSTAL W: Improving The Sensitivity of Progressivemultiple Sequence Alignment through Sequence Weighting, Position-specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research, 1994. 22: p. 4673-4680.
11. Notredame, C., D. Higgins, and J. Heringa, T-coffee: A novel method for multiple sequence alignments. Journal of Molecular Biology, 2000. 302: p. 205-217.
12. Feng, D. and R. Doolittle, Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolutio, 1987. 25: p. 315-360.
13. Thompson, W., E.C. Rouchka, and C.E. Lawrence, Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites. Nucleic Acids Research, 2003. 31: p. 3580-3585.
14. Bailey, T. and C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 1994: p. 28-36.
15. Rousseeuw, P., Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987. 20(1): p. 23-65.
16. Cormen, T.H., et al., Introduction to Algorithms. second ed. 2001: MIT Press and McGraw-Hill. 221-252.
17. Karp, R. and M. Rabin, Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 1987. 31(2): p. 249-260.
18. Chu, J., et al., Approximate Matching Using Interval Jumping Searching Algorithms for Sequences. International Computer Symposium, 2004: p. 994-999.
19. Pai, T., et al., Unique Peptide Identification of RNaseA Superfamily Sequences based on Reinforced Merging Algorithms. Journal of Bioinformatics and Computational Biology, 2006. 4(1): p. 75-92.
20. BAEZA-YATES, R. and M. REGNIER, Average running time of the Boyer-Moore-Horspool algorithm. Theoretical Computer Science, 1992. 92(1): p. 19 - 31.
21. Wu, P., Approximate Feature Matching Techniques for Unique Pattern Detection, in Department of Computer Science College of Engineering. 2005, National Taiwan Ocean University: Keelung, Taiwan, R.O.C.
22. Saitou, N. and M. Nei, The neighbor-joining method: a new method for reconstruction of phylogenetic trees. Molecular Biology and Evolution, 1987. 4: p. 406-425.
23. Benson, D., et al., GenBank. Nucleic Acids Research, 2005. 33.
24. Su, B., et al., Constrained Multiple Structure Feature Alignment. National Computer Symposium, 2005.
25. Chang, Y., Experimental and Computational Analysis of Disease-related Primate Promoters: Novel Regulatory Motifs in RNase2 and Their Functions in Liver Cells, in Institute of Molecular and Cellular Biology. 2005, National Tsing Hua University.
26. Liu, D. and I. Fischer, Structural analysis of the proximal region of the microtubule-associated protein 1B promoter. Journal of neurochemistry, 1997. 69: p. 910-919.
27. Sandelin, A., W. Wasserman, and B. Lenhard, ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Research, 2004. 32: p. 249-252.
28. Tang, C., et al., Constrained Multiple Sequence Alignment Tool Development and Its Application to RNase Family Alignment. Proceedings of the First IEEE Computer Society Bioinformatics Conference, 2002. 1: p. 127-137.
29. Chin, Y., et al., Efficient Constrained Multiple Sequence Alignment with Performance Guarantee. Proceedings of the IEEE Computer Society Conference on Bioinformatics, 2003. 2: p. 337-346.
30. Landau, G. and U. Vishkin, Fast String Matching with k Differences. Journal of Computer and System Sciences, 1988. 37(1): p. 63-78.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top