(3.236.122.9) 您好!臺灣時間:2021/05/12 20:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:王谷良
研究生(外文):Ku-Liang Wang
論文名稱:以二級結構輔助蛋白質功能預測及分析
論文名稱(外文):Use secondary structure to assist protein domain prediction and analysis
指導教授:留忠賢留忠賢引用關係
指導教授(外文):Chung-Shyan Liu
學位類別:碩士
校院名稱:中原大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2001
畢業學年度:89
語文別:中文
論文頁數:68
中文關鍵詞:蛋白質功能預測功能區塊蛋白質功能資料庫
外文關鍵詞:domainprotein domain predictionprotein domain database
相關次數:
  • 被引用被引用:4
  • 點閱點閱:142
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
自從人類基因體計畫(Human Genome Project)完成後,有許多的人類基因被預測出來並且必須知道他們在細胞中的所扮演的功能為何。
目前有許多預測蛋白質功能的軟體可以幫助生物研究人員預測蛋白質序列有哪些功能區塊 (Domain)。但是蛋白質的三級結構目前尚沒有精確的辦法去預測,所以這些軟體大多從蛋白質一級結構保留的相似之處直接預測蛋白質的功能。雖然可以幫助生物研究人員預測蛋白質序列包含哪些功能,但有其利弊。
以這種方式像是Washington University的Pfam(multiple sequence alignments and HMM-profiles of protein domains)、EMBL(European Molecular Biology Laboratory)的SMART(Simple Modular Architecture Research Tool)、NCBI(National Center for Biotechnology Information) 的CDD(Conserved Domain Database)都是搭配蛋白質一級結構保留後的統計資訊來預測蛋白質功能,其準確度較高但限制比較嚴格,有時可能會漏掉一些功能區塊沒預測到,以致於喪失發現其他功能區塊的機會。因在演化上,具親源關係或同源的蛋白質為了維持其功能,即使序列產生改變,結構上卻非常相似。但是上述的方式在這種情況下就會漏掉一些功能區塊,沒有預測出來。所以在本論文我們想建立一套新的預測流程。配合Pfam以高預測值預測出來的蛋白質功能區塊再以二級結構預測軟體Jnet對這些功能區塊作一次二級結構預測建立我們自己的物種蛋白質功能一級和二級結合的資料庫。想從二級結構方面來彌補這一部份的缺失。讓生物研究人員可以有較多的選擇,預測一些原本在上述哪些系統所預測不到的功能區塊。
我們的作法是在作Pfam一級結構預測時將其搜尋限制放寬,並將搜尋出來的功能區塊結果使用Jnet這個二級結構預測軟體對功能區塊再做一次二級結構預測,然後預測出來的功能區塊和我們事前已建立並分類好的蛋白質功能資料庫做作比對。這一部分我們會先將這些功能區塊一起做多重序列對齊,並畫出其演化樹,顯示這些功能區塊的關係。然後使用者可以選擇顯示功能區塊二級結構出現順序來判斷這些功能區塊是否相像或將所有蛋白質序列的功能區塊展開,觀察每條序列功能區塊和功能區塊參與的情況來推測蛋白質的功能。
我們希望能以這樣的方式來改善SMART、Pfam、NCBI CDD這些蛋白質功能預測軟體不足的地方,可以提供一個更有效的工具讓生物研究人員預測蛋白質的功能。
Since completion of human genome project, many new human genes are predicted and it is essential to learn about their biological functions in cells.
Currently, several bioinformatic softwares are developed to help biologists predicting functional and structure domains from primary protein sequences. However, it is still impossible to predict the precise tertiary protein structures in silico for novel proteins. Most of the bioinformatic tools concentrated on identification of conserved primary amino acid sequences and thus elucidation of the possible functional motifs within. These include Pfam(multiple sequence alignments and HMM-profiles of protein domains),SMART(Simple Modular Architecture Research Tool) of EMBL(European Molecular Biology Laboratory), CDD(Conserved Domain Database) of NCBI(National Center for Biotechnology Information).They are suitable meritoriously for well-studied protein families across evolution, but they lack the confidence for predicting new protein domains. In addition, protein motifs could have their key amino acid residues conserved for structure scaffold, but it is not necessary to retain entire primary sequences within such structures.
The current prediction tools would have difficulties in predicting such domains/motifs. It is likely that current bioinformatic tools would not able to predict many functional motifs in newly predicted human proteins with current threshold setting. Therefore, many proteins could not to be fully annotated if their sequence similarity with know domain sequence below 30% range.
In this thesis, a new prediction process was developed based on the Pfam prediction.We first create Pfam prediction motif databases with lower stringency and then utilize secondary structure prediction program, Jnet, to generate secondary structure of these predicted motifs. Several proteomes were include to generate a more complete datasets, include Sarcomissia, Caneorhabditis elegan, Drosophila melanogaster and Mouse. By combining these information and visual display tools for secondary structure and phylogenic analysis, we generate a user-friendly and effective interface for predicting protein motifs with lower conservation in primary sequences. This program will complement to Pfam and CDD prediction tools and be useful in predicting newly discovered human proteins.
中文摘要Ⅰ
英文摘要Ⅱ
致謝Ⅳ
第一章 簡介 1
動機與目的 1
第二章 背景知識 3
2.1蛋白質簡介 3
2.1.1基本單位:氨基酸 3
2.1.2 蛋白質的一級結構 4
2.1.3蛋白質的二級結構 4
2.1.4蛋白質的三級結構 5
2.2生物資訊軟體介紹 8
2.2.1 HMM(Hidden Markov model) 8
2.2.2 hmmpfam和Pfam12
2.2.3 Jnet12
2.2.4 Clustal W13
2.3蛋白質功能預測流程15
第三章 系統架構分析16
3.1 研究方式16
3.1.1 HTTP(Hypertext Transfer Protocol)16
3.1.2 CGI(Common Gateway Interface) 18
3.1.3 JDBC(Java DataBase Connectivity)19
3.2 系統環境21
3.3系統資料庫結構說明22
3.4自動化資料庫建立流程24
3.5系統架構分析26
第四章 系統實作介紹29
4.1 序列輸入畫面29
4.2功能預測畫面30
4.3 Domain查詢結果畫面33
4.4演化樹結果畫面35
4.5 其他功能畫面37
第五章 突變對二級結構的影響40
第六章 結論與未來展望52
附錄54
系統程式說明56
參考文獻59
作者簡介
[1]蛋白質故事:介紹蛋白質及結構的網頁.http://140.112.78.220/%7Ejuang/JRH/Amino.htm[2]Human Gene Mutation Database Home page.http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html[3]Human Gene Mutation Database Home page.http://ariel.ucs.unimelb.edu.au/~cotton/mdi.htm[4]NCBI(National Center for Biotechnology Information) Home page.http://www.ncbi.nlm.nih.gov/[5]SMART(Simple Modular Architecture Research Tool) Home pagehttp://smart.embl-heidelberg.de/[6]MAGE Home page.http://www.faseb.org/protein/kinemages/MageSoftware.html[7]Hidden Markov Models and Protein Sequence Analysis .http://www.cse.ucsc.edu/research/compbio/ismb99.handouts/KK185FP.html[8]HMMER User’s Guide.http://hmmer.wustl.edu/.[9]Stochastic Modeling Techniques: Understanding and Using Hidden Markov Models.http://www.cse.ucsc.edu/research/compbio/sam.html[10]J. Garnier, D. Osguthorpe and B. Robson, Analysis and implications ofsimple methods for predicting the secondary structure of globular preteins,Molecular Biology, Vol 13:211-222, 1978.[11]J. Cuff, M. Clamp, A. Siddiqui, M. Finlay and G. Barton, Jpred: AConsensus Secondary Structure Prediction Server, Bioinformatics, Vol14:892-893, 1998.[12]J. Thompson, D. Higgins and T. Gibson, CLUSTAL W: improving thesensitivity of progressive multiple sequence alignment through sequenceweighting, positions-specific gap penalties and weight matrix choice, Nucleic Acids Research, Vol 22:4673-4680, 1994.[13]E. Sonnhammer, S. Eddy, E. Birney, A. Bateman and R. Durbin, Pfam: multiple sequence alignments and HMM-profiles of protein domains, Nucleic Acids Research, Vol 26:320-322, 1998.[14]J. Anton, A. Christos, GeneRage: a robust algorithm for sequenceclustering and domain detection, Bioinformatics, Vol 16:451-457, 2000.[15]S. JoRg, M. Frank, B. Peer and P. Chris, SMART: a simple modulararchitecture research tool:Identification of signaling domains,Computational Biomolecular Science, Vol 95:5857-5864, 1998.[16]J. Cuff, G. Barton, Evaluation and Improvement of Multiple SequenceMethods for Protein Secondary Structure Prediction, Proteins, Vol34:508-519, 1999.[17]B. Rost, C.Sander, Prediction of protein secondary structure at better than70% accuracy, Molecular Biology, Vol 232:584-599, 1993.[18]D. Frishman, P. Argos, Seventy-five percent accuracy in protein secondary prediction, Proteins, Vol 27:329-335, 1997.[19]R. King, M. Sternberg, Identification and application of the conceptsimportant for accurate and reliable protein secondary structure prediction,Protein Sci, Vol 5:2298-2310, 1996.[20]A. Salamov, V. Solovyev, Prediction of protein secondary structure bycombining nearest-neighbor algorithm and multiple sequence alignments,Molecular Biology, Vol 247:11-15, 1995.[21]R. Lawrence, A tutorial on hidden Markov models and selectedapplications in speech recognition, In Proceeding of the IEEE, Vol 77:257-285,1989.[22]分子生物學,李權益編著,合記圖書,1999.[23]功能基因體研究中生物資訊工具之開發,郭振維,中原大學資工所碩士論文,1997.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔