跳到主要內容

臺灣博碩士論文加值系統

(3.236.28.137) 您好!臺灣時間:2021/07/25 19:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳冠逸
研究生(外文):Wu,Guan-I
論文名稱:應用隱藏式馬可夫模型在EST序列訊號上之錯誤區域識別
論文名稱(外文):Application of Hidden Markov Models for the Recognition of Erroneous Regions in the Electropherograms of EST Sequences
指導教授:江彥逸江彥逸引用關係
指導教授(外文):Chiang,Yen-I
學位類別:碩士
校院名稱:長庚大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
中文關鍵詞:隱藏式馬可夫模型機器學習型樣辨識表現序列標籤序列訊號圖DNA定序技術生物資訊
外文關鍵詞:Hidden Markov models (HMMs)Machine LearningPatterns RecognitionExpress sequence Tags (ESTs)ElectropherogramsDNA Sequencing techniqueBioinformatics
相關次數:
  • 被引用被引用:0
  • 點閱點閱:157
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在後基因體計畫時代,生醫研究人員將研究重心投入發現基因標記與臨床遺傳表現間的關係,生醫研究人員期盼藉由相關研究找到有效對抗疾病的治療方式,因此這類研究已經成為一個重要且可應用之方向。但就目前已廣泛應用在許多序列分析(諸如基因發現、多型性分析以及基因預測等)上之表現序列標籤(ESTs)雖是很好的序列資料來源,但就其定序技術觀點,ESTs卻仍包含許多因其本身定序方法所發生之錯誤序列區段。
本論文中,我們將建立能用來辨識序列訊號圖內錯誤型樣之隱藏式馬可夫模型(HMMs)。在此HMMs的應用研究中其相關之模型參數也將藉由k-cross validation方法搭配維特比路徑計數(VPC)演算法來取得。再根據訓練結果所得來之參數發展一自動化系統實踐序列訊號錯誤區域之辨識並將此辨識資訊標示於ESTs序列檔案註解內。最後我們期盼能透過系統額外紀錄的資訊能夠給予生醫學者在基因體學研究上有更充足的參考資訊於ESTs之各項分析與研究。
In the era of post-Human Genome Project, many of the researches are focusing on the discovery of association between genetic markers and clinical phenotypes, where finding effective treatments against diseases are becoming crucial and applicable goals. Expressed Sequence Tags (ESTs) are widely used for various sequences analysis (e.g. gene discovery, polymorphism analysis and gene prediction etc). Although ESTs have become a great sequences resource, they might contain sequencing errors due to technical reasons.
In this thesis, we implement a machine learning technique, Hidden Markov Models (HMMs), to identify uneven peak patterns in the electropherograms of ESTs from automatic sequencing machines, there the set of parameters used by the HMMs is trained and obtained by k-cross validation method with the Viterbi Path Counting algorithm. This automated system will be implemented in the recognition of erroneous regions and to capture additional information in the annotation of ESTs. We expect this additional annotation can assist biologists in the study of genomics.
目 錄
指導教授推薦書…………………………………………………….
口試委員會審定書………………………………………………….
致謝………………………………………………………………….iii
中文摘要…………………………………………………………….iv
英文摘要……………………………………………………………..v
第一章 緒論…………………………………………………………1
1-1 研究動機……………………………………………………3
1-2 研究目的……………………………………………………5
1-3. 研究流程…………………………………………………...7
1-4 論文架構……………………………………………………8
第二章 文獻探討……………………………………………………10
2-1 DNA序列與定序方法簡介………………………………...10
2-2 基因表現與表現序列標籤…………………………………14
2-3 錯誤的序列訊號型樣………………………………………16
2-4 型樣識別的介紹……………………………………………18
2-5隱藏式馬可夫模型的介紹………………………………….20
2-6 隱藏式馬可夫模型的三個特性……………………………22
2-7 隱藏式馬可夫模型三個需求之相對應演算法……………24
2-7-1 需求一……………………………………………….24
2-7-2 需求二……………………………………………….27
2-7-3 需求三……………………………………………….29
第三章 研究方法……………………………………………………35
3-1 隱藏式馬可夫模型的建立…………………………………35
3-2 系統架構……………………………………………………38
3-2-1前置處理……………………………………………...39
3-2-2資料轉換……………………………………………...41
3-2-3參數估計……………………………………………...44
3-2-4型樣辨識……………………………………………...46
第四章 驗證與系統實作……………………………………………47
4-1 驗證…………………………………………………………47
4-2 系統實作……………………………………………………55
4-2-1系統之使用案例圖…………………………………...55
4-2-2系統之類別圖………………………………………...56
4-2-3系統之循序圖………………………………………...58
第五章 結論與未來展望……………………………………………61
5-1結論………………………………………………………….61
5-2未來展望…………………………………………………….62
附錄一……………………………………………………………….65
參考文獻…………………………………………………………….68

圖 目 錄
圖2-1-1 DNA分子的二級結構…………………………………….10
圖2-1-2 PCR主要步驟……………………………………………..12
圖2-1-3 ABI定序儀從不同螢光反應偵測出對應的符號………...13
圖2-1-4 經電腦判讀後的Electropherogram………………………13
圖2-2-1 ESTs定序過程之概圖…………………………………….15
圖2-3-1強訊號的鹼基A會在鹼基G訊號之後………………….17
圖2-3-2 isoform概圖……………………………………………….18
圖2-4-1 型樣辨識系統架構………………………………………..20
圖2-5-1 有3個狀態的桶子與色球………………………………..21
圖2-5-2 HMMs toy model…………………………………………..22
圖2-7-1 HMMs training Method –VPC……………………………..32
圖3-1-1弱訊號的鹼基G會在鹼基訊號A之後的HMMs模型示意圖……………………………………………………………………..36
圖3-1-2強訊號的鹼基A會在鹼基訊號G之後的HMMs模型示意圖……………………………………………………………………..36
圖3-1-3修改後之弱訊號的鹼基G會在鹼基訊號A後的HMMs模型示意圖………………………………………………………………..36
圖3-1-4修改後之鹼基C訊號會在弱鹼基G訊號後的HMMs模型示意圖………………………………………………………………….37
圖3-1-5在鹼基C之後的連續鹼基訊號A會有相同強度之HMMs模型示意圖…………………………………………………………….37
圖3-1-6 本研究所建議之HMMs模型概圖………………………38
圖3-2-1本研究所建議之型樣辨識系統架構……………………...39
圖3-2-2切割不穩定訊號之雙向掃描……………………………...40
圖3-2-3 不穩定訊號切割範例……………………………………..41
圖3-2-4修改後之弱訊號的鹼基G會在鹼基訊號A後之HMMs初始模型………………………………………………………………….43
圖3-2-5 在鹼基C之後的連續鹼基訊號A會有相同強度之HMMs初始模型……………………………………………………………….44
圖4-1-1各目標型樣之平均數目…………………………………...50
圖4-1-2利用HMMs所找出之各子模型所描述之型樣平均數目.50
圖4-1-3屬於子模型6所描述之型樣的鹼基被誤認為子模型7所描述之型樣的鹼基……………………………………………………….53
圖4-1-4比較各子模型辨識效能之ROC曲線圖…………………54
圖4-2-1本研究所開發之系統的使用者案例圖…………………..56
圖4-2-2本系統之類別圖…………………………………………..57
圖4-2-3本系統GraphDrawing使用者案例之循序圖……………58
圖4-2-4本研究之系統介面……………………………………….59
圖4-2-5本系統所輸出之序列文字檔(標示各鹼基相對應之型樣與訊號切割位置)……………………………………………………….59
圖4-2-6本系統所輸出之序列文字檔(列出各子模型所辨識之數量以及各子模型在此序列內之開始位置)…………………………….60

表 目 錄
表3-1-1本研究之系統所要辨識型樣列表………………………...35
表4-1-1不穩定訊號平均切除位置………………………………...48
表4-1-2測試組各子模型之平均False Negative及False Positive的錯誤率,子模型3的第二個括弧表其計數單位。另外表中的總體效能並不包含子模型3。……………………………………………………..52
參考文獻
英文文獻
[1]Allex,C.F., Baldwin,S.F., Shavlik,J.W., Blattner,F.R., (1996) Improving the Quality of Automatic DNA sequence Assembly using Fluorescent Trace-Data Classifications, the Fourth International Conference on Intelligent Systems for Molecular Biology, 3-14.
[2]Attwood,T.K. and Parry-Smith,D.J., (1999) Introduction to Bioinformatics, Prentice Hall Ltd.
[3]Attwood,T.K. and Parry-Smith,D.J., (2003) Introduction to Bioinformatics, Pearson Prentice Hall.
[4]Azad,R.K. and Borodovsky,M., (2004) Effects of choice of DNA sequence model structure on gene identification accuracy, BIOINFORMATICS, 20, 993-1005.
[5]Brown,N.P., Sander,C. and Bork,P., (1998) Frame: detection of genomic sequencing errors, BIOINFORMATICS, 14, 367-371.
[6]Bilmes, J.A., (1998) A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, International computer science institute, April.
[7]Brandis, J.W., (1999) Dye Structure affects Taq DNA polymerase terminator selectivity, Nucleic Acids Research, 27 1912-1918.
[8]Btazewicz,J., Formanowicz,P., Glover,F., Kasprzak,M. and Weglarz,J., (1999) An improved tabu search algorithm for DNA sequencing with errors, Proceedings of the III Metaheuristics International Conference, Angra dos Reis, 69-75.
[9]Baxevanis,A.D., Ouellette,B.F.F., (2001) “Bioinformatics second edition,” John Wiley & Sons, Inc.
[10]Btazewicz,J, Formanowicz,P., Guinand,F. and Kasprzak,M., (2002) A heuristic managing errors for DNA sequencing, Bioinformatics, 18(5), 652-660.
[11]Boufounos,P., El-Difrawy,S., Ehrlich,D., (2004) Basecalling using hidden Markov models, Jounal of the Franklin Institute, 341, 23-26.
[12]Besemer,J. and Borodovsky,M., (1999) Heuristic approach to deriving models for gene finding, Nucleic Acids Research, 27(19), 3911-3920.
[13]Duane,C.H., Littlefield,B., (2001) Mastering MATLAB 6: a comprehensive tutorial and reference, Prentice Hall Ltd.
[14]Chou,Hui-Hsien and Holmes,M.H., (2001) DNA sequence quality trimming and vector removal, Bioinformatics, 17, 1093-1104.
[15]Chiang,Yen-I, Wu,Guan-I, Hsu,Wei-Jhong and Cheng,Hsin-Yi, (2005) A PROPOSED PARADIGM FOR EXPRESSED SEQUENCE TAGS DATA FORMAT-ADVANCE TOWARDS DISEASE PREDICTION, IACIS Pacific 2005 Conference.
[16]Durbin,R., Eddy,S., Krogh,A., Mitchison,G., (1998)Biological sequence analysis, CAMBRIDGE UNIVERSITY PRESS.
[17]Dovichi,N. (1999). Development of DNA sequencer. Science 285,1016.
[18]Davix,R.I.A. and Lovell,B.C., (2000) Comparing and Evaluating HMM Ensemble Training Algorithms Using Train and Test and Condition Number Criteria, Pattern Analysis and Applications 0(0), 1-7.
[19]Eddy,S.R., (1998) Profile hidden Markov models, Bioinformatics, 14, 755-763.
[20]Ewing,B., Hillier,L.D., Wendl,M.C., and Green,P., (1998) Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment, Genome Research, 8(3), 175-185.
[21]Ewing,B., Hillier,L.D., Wendl,M.C., and Green,P., (1998) Base-Calling of Automated Sequencer Traces Using Phred. II. Error probabilities, Genome Research, 8(3), 186-193.
[22]Farmer,S.B.S.M.R., (2004) Object-oriented systems analysis and design using UML, 2e, The McGraq-Hill Companies, Inc.
[23]Gibas,C. and Jambeck,P., (2001) Developing Bioinformatics Computer Skills, O’Reilly & Associates, Inc.
[24]Huntley,J.F., (1996) Peak Patterns Seen Using AmpliTaq DNA Polymerase, Iowa State University Research 1996, April.
[25]Horton,I., (2002) Beginning Java 2 SDK 1.4 Edition, Wrox Press Ltd.
[26]Iseli,C., Jongeneel,C.V., Bucher,P., (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences, Proc. 7th ISMB, 138-148.
[27]Itoda,M., Saito,Y., Komamura,K., Ueno,K., Kamakura,S., Ozawa,S., and Sawada,J., (2002) Twelve Novel Single Nucleotide Polymorphisms in ABCB1/MDR1 among Japanese Patients with Ventricular Tachyardia who were Administered Amiodarone, Drug Metab. Pharmacokin., 17(6),SNP32(566)-SNP37(571)
[28]Jain,A.K., Duin,R.P.W., and Mao,J., (2000) Statistical Pattern Recognition: A Review, IEEE Transactions on pattern analysis and machine intelligence, 22, 4-37.
[29]Krogh,A., Brown,M., Mian,I.S., Sjölander,K. and Haussler,D., (1994) Hidden Markov Models in Computational Biology, J. Mol. Biol., 235,1501-1531.
[30]Krogh,A., Mian,I.S. and Haussler,D., (1994) A hidden Markov model that finds genes in E.coli DNA, Nucleic Acids Research, 22, 4768-4778.
[31]Krogh,A., Brown,M., Mian,I.S., Sjolander,K., and Haussler,D., (1994) Hidden Markov models in computational biology: Applications to protein modeling, J. Mol. Biol., 235(5), 1501-1531.
[32]Kwok,Pui-Yan (2002) SNP Genotyping With Fluorescence Polarization Detection, HUMAN MUTATION, 19, 315-323.
[33] Lukashin,A.V. and Borodovsky,M., (1998) GeneMark.hmm: new solutions for gene finding, Nucleic Acids Research, 26(4), 1107-1115.
[34]Lottaz,C., lseli,C., Jongeneel,C.V. and Bucher,P., (2003) Modeling Sequencing errors by combining Hidden Markov models, Bioinformatics, 19, ii103-ii112.
[35]Liu,Nianjun, Lovell,B.C., (2003) Gesture Classification Using Hidden Markov Models and Viterbi Path Counting, Proc. VIIth Digital Image Computing: Techniques and Applications, 10-12 Dec.
[36]Matsumoto,Toshiko, Yukawa,Wataru, Nozaki,Yasuyuki and Nakashige,Ryo, (2004) Novel algorithm for automated genotyping of mocrosatellites, Nucleic Acids Research, 32(20), 6069-6077.
[37]Nurpeisov,Viktoria, Hurwitz,S.J., and Sharma,P.L., (2003) Fluorescent Dye Terminator Sequencing Methods for Quantitative Determination of Replication Fitness of Human Immunodeficiency Virus Type 1 Containing the Codon 74 and 184 Mutations in Reverse Transcriptase, Journal of Clinical Microbiology, 40(7), 3306-3311.
[38]Parker,L.T., Zakeri,H., Deng,Q., Spurgeon,S., Kwok,P.-Y. and Nickerson,D.A., (1996) AmpliTaq DNA Polymerase, FS Dye-Terminator Sequencing: Analysis of Peak Height Patterns, BioTechniques, 21, 694-699.
[39]Parsons,J.D., Buehler,E. and Hillier,L.D., (1999) DNA Sequence Chromatogram Browsing Using JAVA and CORBA, Genome Research, 9, 277-281.
[40]Parkinson,J., Guiliano,D.B. and Blaxter,M., (2002) Making sense of EST sequences by CLOBBing them, BMC Bioinformatics, 25 October, 3-31.
[41]Rabiner,L.R., (1989) A Tutorial of Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, 77, 257-286.
[42]Ryan,M.S. and Nudd,G.R., (1993) The Viterbi Algorithm, Warwick Research Report RR238, 12th February, (Energy 1992).
[43]Rao,B.S. and Buckler-White,A., (1998) Direct Visualization of site-specific and strand-specific DNA methylation patterns in automated DNA sequencing data, Nucleic Acids Research, 1998, 26(10), 2505-2507.
[44]Sanger,F., Nicklen,S. and Coulson,A.R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463-5467.
[45]Searls,D.B.,(2000) Using bioinformatics in gene and drug discovery, DDT, 5(4), 135-143.
[46]Trivedi,T.E.S.N., Chad A. Roberts, (2003) ESTprep:preprocessing cDNA sequence reads, BIOINFORMATICS, 19, 1318-1324.
[47]Xing,Y., Resch,A., and Lee,C., (2004) The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms From EST Fragment Mixtures, Genome Research, 14, 426-441.

網站資料與相關技術手冊
[48]Applied Biosystems, (2000) Automated DNA Sequencing, Applied Biosystems.
[49]U.S. Department of Energy Office of Energy (1992) Primer on Molecular Genetics 1992 DOE Human Genome 1991-92 Program Report.
[50]The BIOJAVA website, http://www.biojava.org/.
[51]The Molecular Medicine Institute web site, http://www.pitt.edu/~rsup/mgbresupfac2.html
[52]JavaTM 2 Platform, Standard Edition, v 1.4.2 API Specification, http://java.sun.com/j2se/1.4.2/docs/api/index.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top