研究生(外文):Pei-Chih Wu
論文名稱(外文):Approximate Feature Matching Techniques for Unique Pattern Detection
指導教授(外文):Tun-Wen Pai
外文關鍵詞:Unique Peptide MotifApproximate Feature MatchingReinforced Merging AlgorithmBitwise Clustering Method
A protein family is composed of several members with highly homologous sequences and/or similar biological functions. In general, members of a protein family possess similar three-dimensional structures. However, previous experimental results revealed that enzymes with high sequence homology may acquire differential function other than the common catalytic ability, probably due to additional interactions among the variable regions and other cellular proteins. It is thus important to identify and localize the unique peptide motifs in each member of a protein family for functional analysis. In this thesis, we have suggested reinforced merging algorithms to identify the unique peptide motifs present in the highly conservative protein families. This algorithm could efficiently identify the unique peptide motifs from a set of family sequences. The commendable advantages of the proposed algorithms are able to perform approximate matching functions with tolerant characteristics, which will provide more suitable prediction results for bio-related experiments. The proposed systems contain three main phases: clustering, searching, and merging phases. In clustering phase, the module classifies 20 amino acids into different groups based on specified BLOSUM/PAM series of matrices. Traditional and novel clustering methodologies are analyzed and compared in this thesis. Searching phase performs exact/approximate string matching procedures. We have shown examples that our proposed algorithms can provide better results with respect to grouped tolerant characteristics. In the last phase, merging algorithms initiate a novel idea to extract unique peptide motif by bottom-up merging processes. This developed system will be implemented and compared with existing algorithms, and we believe that the developed tools are efficient and effective for biologists to analyze protein sequences prior to their practical laboratory experiments such as peptide antibody design.
Table of Contents
Abstract (in Chinese)………………………………………………………… i
Table of Contents………………………………………………………………vi
List of Tables…………………………………………………………………vii
List of Figures………………………………………………………………viii
1 Introduction 1
2 The String Matching Problems 3
2.1 Exact String Matching…………………………………………… 3
2.2 Approximate String Matching…………………………………… 9
3 System Architecture 13
4 Modules Description and Algorithms 15
4.1 Approximate Feature Matching Algorithms………………………16
4.1.1 Amino Acids Clustering based on Hierarchical Method………16
4.1.2 Amino Acids Clustering based on Bitwise Operation Method.21 Algorithm Introduction…………………………………………… 22 Proof……………………………………………………………………28
4.2 Searching Unique Peptide Motifs Module……………………… 29
4.3 Merge Modules……………………………………………………… 32
4.4 The Primary Pattern Length Analysis Module………………… 40
4.5 Formula for Computing the Probability of Primary Tolerant Pattern......41
5 Simulation Results 44
5.1 Computational Complexity Analysis………………………………44
5.2 Simulation Results……………………..………………………… 45
5.2.1 Examples of Primary Pattern Length Analysis…………………46
5.2.2 Examples of Probability of Primary Tolerant Pattern and Strict-Merging Operation……51
5.3 Experimental Resultss………………………………………………54
6 Conclusions 59
References……………………………………………………………………… 61
Appendices……………………………………………………………………… 66
