跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.86) 您好!臺灣時間:2025/02/15 07:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳佩芷
研究生(外文):Pei-Chih Wu
論文名稱:近似特徵比對技術應用於唯一樣式之偵測
論文名稱(外文):Approximate Feature Matching Techniques for Unique Pattern Detection
指導教授:白敦文
指導教授(外文):Tun-Wen Pai
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:英文
論文頁數:94
中文關鍵詞:專一蛋白質片段近似特徵比對強化合併技術位元分類方法
外文關鍵詞:Unique Peptide MotifApproximate Feature MatchingReinforced Merging AlgorithmBitwise Clustering Method
相關次數:
  • 被引用被引用:0
  • 點閱點閱:274
  • 評分評分:
  • 下載下載:22
  • 收藏至我的研究室書目清單書目收藏:2
蛋白質家族序列係由具備相似序列或相同生物功能的多個蛋白質組成,通常同一家族的蛋白質他們也具備相似的三度空間立體結構,但是有的研究報告顯示某些高度相似性的蛋白質酵素,除了具有共同的催化功用外,還可以經由其特異的序列與其他細胞蛋白質之交互作用,進而衍化出新的功能,因此在一個蛋白質家族中需對每一條序列進行分析,各別擷取出具有代表性之專一蛋白質片段,再依據其所能提供重要的訊息及實驗證明以了解其功能的特異性。本論文提出一套基於強化合併技術之近似擷取演算法以預測專一蛋白質片段的完整系統,透過系統的定位預測來確認專一蛋白質片段的位置並進行後續抗體的研究開發。本系統的設計分為三個子系統,分別為分類、蒐尋及合併子系統,每一個子系統都含有本論文所研提的新演算法技術。分類模組執行傳統分類技術的系統設計,將20種胺基酸依使用者設定之BLOSUM/PAM相似分數矩陣進行分組,並作為後續可容忍字串比對的依據。除了傳統階層結構式的分類技術外,本論文也提出位元編碼及基本位元演算法以進行胺基酸分群運算,並會證明所提出的位元比對演算法的確具有近似特徵分類及正確比對的功能。在蒐尋模組運算中,本論文根據前分類模組之技術分別設計所需之字串蒐尋比對演算法,依完全/近似字串比對演算對可容忍字串的需求進行開發設計,論文中所提出之位元比對技術可以解決在分組過程因交叉組別所造成的誤差,我們也概念性舉例說明如何透過本論文所提之演算法來克服該問題。在最後的合併模組設計中,本論文提出新的字串合併概念,區域性的基本片段將依據鄰近關係進行重疊檢測及合併,由下而上逐漸組成更具代表性的專一蛋白質片段,在論文中我們也將推導鬆散型及嚴密型合併之相對關係,這些合併成果都將反應專一蛋白質片段唯一性之強度。
A protein family is composed of several members with highly homologous sequences and/or similar biological functions. In general, members of a protein family possess similar three-dimensional structures. However, previous experimental results revealed that enzymes with high sequence homology may acquire differential function other than the common catalytic ability, probably due to additional interactions among the variable regions and other cellular proteins. It is thus important to identify and localize the unique peptide motifs in each member of a protein family for functional analysis. In this thesis, we have suggested reinforced merging algorithms to identify the unique peptide motifs present in the highly conservative protein families. This algorithm could efficiently identify the unique peptide motifs from a set of family sequences. The commendable advantages of the proposed algorithms are able to perform approximate matching functions with tolerant characteristics, which will provide more suitable prediction results for bio-related experiments. The proposed systems contain three main phases: clustering, searching, and merging phases. In clustering phase, the module classifies 20 amino acids into different groups based on specified BLOSUM/PAM series of matrices. Traditional and novel clustering methodologies are analyzed and compared in this thesis. Searching phase performs exact/approximate string matching procedures. We have shown examples that our proposed algorithms can provide better results with respect to grouped tolerant characteristics. In the last phase, merging algorithms initiate a novel idea to extract unique peptide motif by bottom-up merging processes. This developed system will be implemented and compared with existing algorithms, and we believe that the developed tools are efficient and effective for biologists to analyze protein sequences prior to their practical laboratory experiments such as peptide antibody design.
Table of Contents
Abstract (in Chinese)………………………………………………………… i
Abstract…………………………………………………………………………iii
Acknowledgements…………………………………………………………………v
Table of Contents………………………………………………………………vi
List of Tables…………………………………………………………………vii
List of Figures………………………………………………………………viii
1 Introduction 1
2 The String Matching Problems 3
2.1 Exact String Matching…………………………………………… 3
2.2 Approximate String Matching…………………………………… 9
3 System Architecture 13
4 Modules Description and Algorithms 15
4.1 Approximate Feature Matching Algorithms………………………16
4.1.1 Amino Acids Clustering based on Hierarchical Method………16
4.1.2 Amino Acids Clustering based on Bitwise Operation Method.21
4.1.2.1 Algorithm Introduction…………………………………………… 22
4.1.2.2 Proof……………………………………………………………………28
4.2 Searching Unique Peptide Motifs Module……………………… 29
4.3 Merge Modules……………………………………………………… 32
4.4 The Primary Pattern Length Analysis Module………………… 40
4.5 Formula for Computing the Probability of Primary Tolerant Pattern......41
5 Simulation Results 44
5.1 Computational Complexity Analysis………………………………44
5.2 Simulation Results……………………..………………………… 45
5.2.1 Examples of Primary Pattern Length Analysis…………………46
5.2.2 Examples of Probability of Primary Tolerant Pattern and Strict-Merging Operation……51
5.3 Experimental Resultss………………………………………………54
6 Conclusions 59
References……………………………………………………………………… 61
Appendices……………………………………………………………………… 66
[1] Michailidis, P.D. and Margaritis, K.G.. (2002), On-Line Approximate String Searching Algorithms: Survey and Experimental Results, Intern. J Computer Math., 79(8), pp. 867-888.
[2] Michailidis, P.D. and Margaritis, K.G.. (2001), String Matching Problem on a Cluster of Personal Computers: Experimental Results, In: Proc. of the 15th International Conference on Systems for Automation of Engineering and Research, 2001, pp. 71-75.
[3] Michailidis, P.D. and Margaritis, K.G.. (1999), A Survey of On-line Approximate String Matching Algorithms, Technical Report, Dept. of Applied Informatics, University of Macedoniz.
[4] Cheng, L.L., Cheung, D.W., Yiu, S.M. (2003), Approximate String Matching in DNA Sequences, In: Eighth International Conference on Database Systems for Advanced Applications (DASFAA), Kyoto, Japan, March 2003, pp. 303-310.
[5] Ukkonen, E., Approximate String-Matching over Suffix Trees (1993), In: Fourth Annual Sysmposium on Combinatorial Matching, Padova, Italy, volume 684 of LNCS, Springer, June 1993, pp. 228-242.
[6] Navarro, G. (2001), A Guided Tour to Approximate String Matching, ACM Computing Surveys, 33(1):31-88.
[7] Boyer, R.S. and Moore, J.S. (1977), A Fast String Searching Algorithm, Communications of the ACM, 20(10):762-772.
[8] Horspool, R.N. (1980), Practical Fast Searching in Strings, Software-Practice and Experience, 10(6):501-506.
[9] Charras, C. and Lecpoq ,T. (1997), Boyer-Moore algorithm, Retrieved December 2004, from Web site: http://www-igm.univ-mlv.fr/~lecroq/string/node14.html.
[10] Charras, C. and Lecpoq ,T. (1997), Horspool algorithm, Retrieved December 2004, from Web site: http://www-igm.univ-mlv.fr/~lecroq/string/node18.html.
[11] Halmos, P.R. (1973), Distance between Strings, Retrieved December 2004, from Web site: http://www.cut-the-knot.org/do_you_know/Strings.shtml.
[12] Black, P.E. (2000), Hamming distance, Retrieved December 2004, from Web site: http://www.nist.gov/dads/HTML/hammingdist.html.
[13] Baeza-Yates, R.A. and Gonnet, G.H. (1992), A New Approach to Text Searching, Communications of the ACM, 35(10): 74-92.
[14] Baeza-Yates, R.A. and Gonnet, G.H. (1994), Fast String Matching with Mismatches, Information and Computation, 180(2): 187-199.
[15] Baeza-Yates, R.A. and Navarro, G. (1996), A Fast Heuristic for Approximate String Matching, In: Proc. of the 3rd South American Workshop on String Processing, Carleton University Press, pp. 47-63.
[16] Baeza-Yates, R.A. and Navarro, G. (1996), A Faster Algorithm for Approximate String Matching, In: Proc. of the 7th Annual Symposium on Combinatorial Pattern Matching, No. 1075, Springer-Verlag, Berlin, pp. 1-23.
[17] Baeza-Yates, R.A. and Navarro, G. (1999), A Faster Algorithm for Approximate String Matching, Algorithmica, 23(2): 127-158.
[18] Baeza-Yates, R.A. and Perleberg, C.H. (1996), Fast and Practical Approximate String Matching, Information Processing Letters, 59(1): 21-27.
[19] Chang, W.I. and Lampe, J. (1992), Theoretical and Empirical Comparisons of Approximate String Matching Algorithms, In: Proc. of the 3rd Annual Symposium on Combinatorial Pattern Matching, No. 664, Springer-Verlag, Berlin, pp. 175-184.
[20] Chang, W. and Lawler, E. (1994), Sublinear Approximate String Matching and Biological Applications, Algorithmica, 12(4-5): 327-344.
[21] Dermouche, A. (1995), A Fast Algorithm for String Matching with Mismatches, Information Processing Letters, 55(2): 105-110.
[22] El-Mabrouk, N. and Chrochemore, M. (1996), Boyer-Moore Strategy to Efficient Approximate String Matching, In: Proc. of 711 Annual Symposium on Combinatorial Pattern Matching, No. 1075, Springer-Verlag, Berlin, pp.24-38.
[23] Galil, Z. and Giancarlo, R. (1986), Improved String Matching with k Mismatches, Sigact News, 17(4): 52-54.
[24] Grossi, R. and Luccio, F. (1989), Simple and Efficient String Matching with k Mismatches, Information Processing Letters, 33(3): 113-120.
[25] Galil, Z. and Park, K. (1990), An Improved Algorithm for Approximate String Matching, SIAM Journal of Computing, 19(6): 989-999.
[26] Jokinen, P., Tarhio, J. and Ukkonen, E. (1996), A Comparison of Approximate String Matching Algorithms, Software-Practice and Experience, 26(12): 1439-1458.
[27] Kurtz, S. (1996), Approximate String Searching under Weighted Edit Distance. In: Proc. of the 3rd South American Workshop on String Processing, Carleton University Press, pp. 156-170.
[28] Landau, G.M. and Vishkin, U. (1986), Efficient String Matching with k Mismatches, Theoretical Computer Science, 43(2–3): 239-249.
[29] Landau, G.M. and Vishkin, U. (1988), Fast String Matching with k Differences, Journal of Computer and System Sciences, 37(1): 63-78.
[30] Landau, G.M. and Vishkin, U. (1989), Fast Parallel and Serial Approximate String Matching, Journal of Algorithms, 10(2): 157-169.
[31] Myers, G. (1998), A Fast Bit-Vector Algorithm for Approximate Pattern Matching Based On Dynamic Programming, In: Proc. of the 9th Annual Symposium on Combinatorial Pattern Matching, No. 1448 , Springer-Verlag, Berlin, pp. 1-13.
[32] Myers, G. (1999), A Fast Bit-Vector Algorithm for Approximate Pattern Matching Based On Dynamic Programming, Journal of the Association for Computing Machinery, 46(3): 395-415.
[33] Navarro, G. (1997), Multiple Approximate String Matching by Counting, In: Proc. of the 4th South American Workshop on String Processing, Carleton University Press, pp. 125-139.
[34] Navarro, G. (1997), A Partial Deterministic Automaton for Approximate String Matching, In: Proc. of the 4th South American Workshop on String Processing, Carleton University Press, pp. 112-124.
[35] Navarro, G. and Raffinot, M. (1998), A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching, In: Proc. of the 9th Annual Symposium on Combinatorial Pattern Matching, No. 1448, Springer-Verlag, Berlin, pp. 14-33.
[36] Pevzner, P. and Waterman, M. (1995), Multiple Filtration and Approximate Pattern Matching, Algorithmica, 13(1–2): 135–154.
[37] Sellers, P.H. (1980), The Theory and Computation of Evolutionary Distance: Pattern Recognition, Journal of Algorithms, 1(4): 359–373.
[38] Sutinen, E. and Tarhio, J. (1995), On Using Q-Gram Locations in Approximate String Matching, In: Proc. 3rd Annual European Symposium, No. 979, Springer-Verlag, Berlin, pp. 327-340.
[39] Takaoka, T. (1994), Approximate Pattern Matching with Samples, In: Proc. of the 5th International Symposium on Algorithm and Computation, No. 834, Springer-Verlag, Berlin, pp. 234-242.
[40] Tarhio, J. and Ukkonen, E. (1993), Approximate Boyer-Moore String Matching, SIAM Journal on Computing, 22(2): 243-260.
[41] Ukkonen, E. (1985), Finding Approximate Patterns in Strings, Journal of Algorithms, 4(1–3): 32-137.
[42] Ukkonen, E. (1992), Approximate String Matching with Q-Grams and Maximal Matches, Theoretical Computer Science, 92(1): 191-211.
[43] Ukkonen, B. and Wood, D. (1993), Approximate String Matching with Suffix Automata, Algorithmica, 10(5): 353-364.
[44] Wu, S. and Manber, U. (1992), Fast Text Searching Allowing Errors, Communications of the ACM, 35(10): 83-91.
[45] Wu, S., Manber, U. and Myers, G. (1996), A Subquadratic Algorithm for Approximate Limited Expression Matching, Algorithmica, 15(1): 50-67.
[46] Wright, A. (1994), Approximate String Matching using Within-Word Parallelism, Software-Practice and Experience, 24(4): 337-362.
[47] Henikoff, S. and Henikoff, J.G. (1992), Amino Acid Substitution Matrices from Protein Blocks, In: Proc. of the National Academy of Sciences, Vol. 89, pp. 10915-10919.
[48] Dayhoff, M. O., Schwartz, R. M. and Orcutt, B. C. (1978). Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 3, National Biomedical Research Foundation, Washington, DC.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top