跳到主要內容

臺灣博碩士論文加值系統

(44.200.169.3) 您好!臺灣時間:2022/12/04 11:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳宗興
研究生(外文):Tsung-Hsing Chen
論文名稱:以相互間關係值進行蛋白質序列分類
論文名稱(外文):Protein Sequence Clustering Based on Inter-Score
指導教授:陳淑媛陳淑媛引用關係
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:35
中文關鍵詞:蛋白質序列群聚相互間關係值蛋白質序列蛋白質資料庫
外文關鍵詞:Protein sequence clusteringinter-scoreprotein sequenceprotein database
相關次數:
  • 被引用被引用:0
  • 點閱點閱:152
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
蛋白質序列的群聚已經被廣泛地利用于蛋白質功能的描述。不過各式方法的準確率與效率都普遍的不佳,因此為因應當代蛋白質數據資料庫持續迅速增長,發展更準確及有效率的群聚方法是必須的。本篇論文針對蛋白質的特性及分類群聚的準確性和效率加以改善。首先針對蛋白質序列算出其相互間關係值(Inter-Score)以利於後續做動態群聚,爾後每代再依循每代的關係值經過動態的方式做蛋白質序列的群聚,直到無任何蛋白質序列可被群聚為止。此方法已針對Swiss-Port資料庫中人類蛋白質進行分類,並與InterPro 資料庫進行驗證比對,証實群聚結果的精確度(Precision)可達到87.4%,重喚率(Recall)也可達到83.2%。
The clustering of protein sequences has already been widely utilized to describe protein function. However, the effectiveness and efficiency of various existing methods are still not good. Thus, in order to accommodate the fast growth of protein database, the development of more accurate and more efficient clustering method is necessary. This thesis is to improve the accuracy and efficiency of protein clustering using protein characteristic. The relation value between each pair of proteins (Inter-Score) is calculated to facilitate the subsequent dynamic combination. In the following, the proteins are iteratively clustered according to the relation value between iterations, until no more protein sequences can be combined. This method has already been applied to human proteins in Swiss-Port and finally verified through InterPro database. The precision and recall of our method can reach 87.4% and 83.2%, respectively.
摘要……………………………………………………………………ii
英文摘要………………………………………………………………iii
致謝……………………………………………………………………iv
目錄……………………………………………………………………v
圖式目錄………………………………………………………………vi
表格目錄………………………………………………………………vii
第一章 緒論……………………………………………………………1
1.1. 研究動機…………………………………………………………1
1.2. 相關研究調查……………………………………………………1
1.3. 所提方法…………………………………………………………3
1.4. 論文組織…………………………………………………………4
第二章 蛋白質序列分類………………………………………………5
2.1. 蛋白質序列………………………………………………………5
2.2. 蛋白質序列比對之介紹(BLASTP)………………………………6
2.3. 相似性……………………………………………………………8
2.4. 分數矩陣…………………………………………………………9
第三章 以相互間關係值分類…………………………………………10
3.1. 多重領域蛋白質序列……………………………………………10
3.2. 相互間關係值……………………………………………………11
3.3. 合併………………………………………………………………12
3.4. 分類演算法………………………………………………………13
3.5. 演算法流程………………………………………………………14
3.6. 提升連鎖效果……………………………………………………16
3.7. 停止條件…………………………………………………………17
第四章 實驗結果………………………………………………………20
4.1. 成效評估…………………………………………………………20
4.2. 實驗結果及討論…………………………………………………22
第五章 結論與未來研究………………………………………………27
5.1. 成果比較…………………………………………………………27
5.2. 成果對生物資訊的貢獻…………………………………………30
5.3. 未來研究…………………………………………………………31
參考文獻………………………………………………………………32
[1] Altschul, S.F., Madden, T.L., Scha.er, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J.,? Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, NucleicAcids Res., 25:3389–3402, 1997.

[2] Pearson, W.R. and Lipman, D.J., Improved tools for biological sequence comparison, Proc. Natl Acad. Sci. USA, 85:2444–2448, 1998.

[3] Krause, A. and Vingron, M., A set-theoretic approach to database searching and clustering, Bioinformatics, 14:430–438, 1998.

[4] Enright, A.J. and Ouzounis, C.A., GeneRAGE: a robust algorithm for sequence clustering and domain detection, Bioinformatics, 16:451–457, 2000.

[5] Matsuda, H., Ishihara, T., and Hashimoto, A., Classifying molecular sequences using a linkage graph with their pairwise similarities, Theor. Comput. Sci., 210:305–325, 1999.

[6] Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P.,Cerutti, L., Corpet, F., Croning, M.D.R., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J.,Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R.,Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M., Servant, F., Sigrist, C.J.A., and Zdobnov, E.M.,? The InterPro database, an integrated documentation resource for protein families, domains and functional sites, Nucleic Acids Res., 29:37–40, 2001.

[7] Bairoch, A. and Apweiler, R., The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000, Nucleic Acids Res., 28:45–48, 2000.

[8] Kriventseva,? E.V., Fleischmann,? W., Zdobnov,? E.M., and Apweiler,? R., CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins, Nucleic Acids Res., 29:33–36, 2001.

[9] Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., and Sonnhammer, E.L., The PfamProtein families database, Nucleic Acids Res., 28:263–266, 2000.

[10] Attwood, T.K., Croning M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P., Selley, J., and Wright, W., PRINTS-S: the database formerly known as PRINTS, Nucleic Acids Res.,28:225–227, 2000.

[11] Hofmann, K., Bucher, P., Falquet, L., and Bairoch, A., The PROSITE database, its status in1999, Nucleic Acids Res. 27:215–219, 1999.

[12] Corpet, F., Servant, F., Gouzy, J., and Kahn, D., ProDom and ProDom-CG: Tools for protein domain analysis and whole genome comparisons, Nucleic Acids Res., 28:267–269, 2000.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文