跳到主要內容

臺灣博碩士論文加值系統

(3.87.250.158) 您好!臺灣時間:2022/01/25 19:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王丁立
研究生(外文):Ting-Li Wang
論文名稱:結合演化式策略和完整式搜尋在DNA序列中尋找有意義的基因片段
論文名稱(外文):Finding Motif in DNA Sequences Using Evolutionary Strategy with Exhausted Search
指導教授:孫春在孫春在引用關係
指導教授(外文):Chuen-Tsai Sun
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:62
中文關鍵詞:演化式策略完整式搜尋DNA 序列基因片段
外文關鍵詞:Evolutioinary StrategyExhausted SearchDNA SequencesMotif
相關次數:
  • 被引用被引用:2
  • 點閱點閱:383
  • 評分評分:
  • 下載下載:70
  • 收藏至我的研究室書目清單書目收藏:0
由於Motif Finding這一類的問題已經出現多時,吸引了許多的人投入這一方面的研究,也提出了許多不同的方法。但是由於這些方法在設計上多半有著一些限制,例如只能用在找尋較短的字串,或是不允許資料中含有雜訊等,使得在應用上面也減少了不少的實用性。雖然在之後也有不少兼具實用性和效能的方法被提出來,但是部份的方法在面對由Pevnzer 和 Sze所提出的The Challenge Problem時卻不法順利而正確的解出答案。而本研究的目的即是針對The Challenge Problem進行研究,使用人工智慧方面的方法來解這一個問題。
雖說使用演化式計算的方法可以對這種需要大量計算的問題,以這種方法的特性來快速的算出答案來,但也因為要求要快速的解出答案,伴隨的就是不甚令人滿意的正確性。而和演化式計算相反的則是完整式搜尋。這種方法可以算出幾進百分之百的正確答案,可是卻要花上上百小時,以及大量的記憶體資源。因而我們嘗試將這兩者結合一起,希望可以融合二者的優點,並改善它們本身的缺點。
而最後研究的結果可說是相當的符合期待,可以在正確性上以及效能上獲得一個不錯的平衡點。而本研究的貢獻就是在結合了兩種不同演算法,並得到一個不錯的結果。此外,由於方法觀念並不複雜,並可以和不同的方法進行搭配,進而改進雙方面的成果。

Motif finding is a fundamental problem in Bioinformatics, and have been investigated for the last decade. Several algorithms for finding motif have been developed, but most of them were restricted in solving special types of motif finding problems. Therefore, those algorithms might fail while the DNA sequences length increasing, motif length increasing or noise increasing.
Pevnzer and Sze defined a problem named the Challenge Problem. They used this problem to test the existing motif finding algorithms and found that most algorithms can not effectively solve the Challenge problem. This thesis proposes a novel approach, which synthesize evolutionary strategy and exhausted search, to solve the Challenge Problem.
In this thesis, we tested the algorithm with the Challenge Problem. The experimental results indicate that the proposed approach have the advantages both on efficiency and solving this problem correctly. Furthermore, the proposed approach can also be easily extended to other motif finding problems.

第一章、緒論 1
1.1 前言 1
1.2 研究動機 1
1.3 問題描述 3
1.4 研究目標 4
1.5 論文結構 5
第2章 、相關文獻探討 6
2.1 MOTIF 介紹 6
2.1.2 Motif Finding問題 7
2.2 MOTIF FINDING PROBLEM的分類 9
2.2.1 分類問題 9
2.2.2 找尋特殊的樣式 10
2.2.3 樣式比對和樣式找尋 10
2.2.4 連續的重複(Tandem Repeat) 11
2.2.5 不同的序列形式 11
2.3 現存MOTIF FINDING 演算法 12
2.3.1 完整式搜尋 13
2.3.2 從較短的樣式創造出較長的樣式 14
2.3.3 反覆式啟發演算法 15
2.3.3.1 Gibbs sampling 16
2.3.3 機器自動學習 17
2.3.4 使用額外的相關資訊 19
2.4演化策略演算法 21
第3章 、系統架構之設計 23
3.1 使用雜湊表(HASH TABLE)的完整式搜尋 23
3.2 FINDING MOTIF 使用演化式策略演算法 26
3.2.1 製造所需的子代 26
3.2.2 製造染色體(Chromosome) 29
3.2.3 突變方法 31
3.2.3 評分公式(Scoring Function) 33
3.3 演化式策略結合完整式搜尋 34
第4章 、系統運作流程及實驗 37
4.1 系統運作流程 37
4.1.1 測試資料產生及設計 37
4.1.2 演化式策略 38
4.1.3 完整式搜尋 42
4.1.4 完整式搜尋使用雜湊表 45
4.2 測試資料模擬 50
4.2.1 測試結果比較 50
4.3 測試結果分析 55
第5章 、結論與未來展望 59
5.1 結論 59
5.2 未來展望 60
5.2.1 基因演算法 60
5.2.2 功能擴展 61
參考文獻 62

[1] Bailey, T.L and Elkan, C. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning, 21(1-2):51-80,Oct 1995.
[2] Bailey, T. L. and Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings in the 2ND International Conference on Intelligent Systems for Molecular Biology
(ISMB), page: 28-36, 1994.
[3] Benson, G. Tandem Repeats finder : A program to analyze DNA sequence.
Nucleic Acids Research, 27(2):573-580, 1999.
[4] Dumitrescu, D., Lazzerini, B. Jain, L. C., Dumitrescu, A. Evolutionary Computation, CRC Press, 2000.
[5] Branden, C. & Tooze, J., Introduction to Protein Structure. GARLAND
publishing, 1991.
[6] Blanchette, M., Schwikowski, B. and Tompa, M. An exact algorithm to identify motifs in orthologous sequences from multiple species. In Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology, P37-45, 2000, AAAI Press.
[7] Blanchette, M. Algorithm for phylogenetic footprinting. In RECOMB01 2001: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, Montreal, Canada, Apr. 2001.
[8] Brazma, A., Jonassen, I., Vilo, J., and Ukkonen, E. Predicting gene regulatory elements in silico on a genomic scale. Genome Reseach, 15:1202-1215, 1998
[9] Bystroff, C., Thorsson, V. & Baker, D. HMMSTR: a Hidden Markov Model for local sequence-structure correlations in Proteins. J. Mol. Biol. 301, 173-190, 2000.
[10] Califano, A. SPLASH: Structural pattern localization analysis bu sequential
histograms. Bioinformatics, 16(4):341-347, 2000.
[11] Claverie, J. M. & Bougueleret, L., Heuristic informational analysis of sequences, Nucl. Acids Res. 14, 179-196, 1986.
[12] Cowan, R. Expected frequencies of DNA patterns using Whittle's formula.
Journal of Applied Probability, 28(4):886-892, 1991.
[13] Coward, E. and Drablos,F. Detecting periodic patterns in biological sequences. Bioinfomatics, 14(6):498-507, 1998.
[14] Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. Biological Sequence Analysis. Cambridge University Press.1998.
[15] Grundy, W. N., Bailey, T. L., Elkan, C. P., and Baker, M. E. Meta-MEME: motif-based Hidden Markov Models of Protein families. Computer Applications in the Biosciences, 13(4):397-406, 1997.
[16] Hertz, G. Z. and Stormo, G. D. Deriving ribosomal binding site(RBS) statistically significant alignments of multiple sequences. Bioinformatics, 15(7/8):563-577, 1999
[17] Hu, Y., Sandmeyer, S. and Kibler, D. Detecting Motifs from Sequences. In proceedings of the 16th International Conference on Machine Learning, page: 181-190, 1999.
[18] Hu, Y., Sandmeyer, S., McLaughlin, C. and Kibler, D. Combinatorial Motif Analysis and Hypothesis Generation on Genomic Scale. Bioinfomatics, 2001.
[19] Hughey, R. and Krogh, A. Hidden Markov Models for sequence analysis: extension and analysis of the basic method. Computer Applications in the Bio sequences, 12(2):95-107, 1996.
[20] Ison, J. C., Blades, M. J., Bleasby, A. J., Daniel, S. C., Parish, J. H.,and Findlay, J. B. Key residues approach to the definition of Protein families and analysis is of sparse family structures. Proteins, 40(2):330-331, 2000.
[21] Krogh, A., Brown, M., Mian, I. S., Sjolander, K., and Haussler, D. Hidden Markov Models in computational biology. Applications to Proteins modeling. Journal if Molecular Biology, 235(5):1501-1501, 1994.
[22] Lawrebce, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A.F., and Wooton, J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262(5131):208-214, 1993.
[23] Lawrence, C. E., and Reilly. A. A. An expectation maximization(EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences, Proteins, 7(1):41-51, 1990.
[24] Pevzner, P. and Sze, S.-H. Combinational approaches to finding subtle signals in DNA sequences. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, page: 269-278,
San Diego, CA, Aug, 2000.AAAI Press.
[25] Rocke, E, and Tompa, M. An algorithm for finding novel gapped motifs in
DNA sequences. IN Proceedings of the Second Annual International Intelligent Conference on Computational Molecular Biology, page: 228-233, 1998.
[26] Rigoutsos, I., Floratos, A., Parida, L., Gao, Y., and Platt, L. The emergence of pattern discovery techniques in computational biology. Metabolic Engineering, 2(3):159-167, 2000.
[27] Rigoutsos, I. and Floratos, A. Motif discovery without alignment or enumeration. In Proceedings of the second annual international conference in Computational molecular biology(RECOMB)m page: 221-227, New York, 1998.
[28] Sagot, M.-F.. Spelling approximate repeated or common motifs using a suffix tree. In C.L.Lucchesi and A. V. Moura, editors, Latin 1998:Theoretical Informatics, volume 1380 of Lecture Notes in Computer Science,P111-127, Springer, 1998.
[29] Sinha, S. and Tompa, M. A statistical method for finding transcription factor binding sites. In Proceeding of the Eighth International Conference on Intelligent Systems for Molecular Biology, P344-354, 2000, AAAI Press.
[30] Tompa, M. An exact method for finding short motifs in sequences, with application to the ribsomoe binding site problem. In proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB), page: 262-271, 1999.
[31] van Helden, J., del Olmo, M., and Pere-Ortin, J. E. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucletide frequencies. Journal of Molecular Biology, 281(5):827-832, 1998.
[32] Wojciech Makalowski, Searching for Regulatory Signals in DNA Sequences -the Nuclear tRNA Genes Example. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 1991.
[33] NCBI, http://www.ncbi.nlm.nih.gov
[34] TRANSFAC, http://transfac.gbf.de/TRANSFAC/index.html

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 10. 林恩從、高斐蘭(1998),”台灣地區房地產景氣與經濟、金融變數之共整研究”,東吳經濟商學學報,第20期,p.p.21~46。
2. 13. 吳森田(1997),”房地產價格之決定及其政策涵義之探討”,國立中興大學法商學報,第30期,p.p.173~198。
3. 12. 花敬群、張金鶚(1993),”房地產投機行為之研究”,經社法制論叢,第11期,p.p.327~359。
4. 25. 黃立,GATT/WTO爭端解決機制總論,月旦法學雜誌,第79期,第39-43頁(2001)。
5. 3. 王靖(1999),”房地產業重大訊息與不動產證券市場效率關係之研究”,文大商管學報,第4卷,第1期。
6. 37. 謝銘洋,著作人格權與一般人格權之關係,月旦法學雜誌,第79期,第31-38頁(2001)。
7. 34. 蔡明誠,加入WTO對我國智慧權保護法制的影響,月旦法學雜誌,第79期,第18-23頁(2001)。
8. 29. 楊光華,服務貿易總協定與我國入會承諾,月旦法學雜誌,第79期,第12-17頁(2001)。
9. 42. 劉崇堅(1991),”理想競爭理論及其對解除管制政策之影響”,經社法制論叢,第7期,p.p.119~144。
10. 33. 張炳良(1992),”國家介入與合法性危機-香港的經驗”,中國行政評論,第1卷,第4期,p.p.1~18。
11. 31. 張金鶚、彭建文(1997),”房地產景氣循環與住宅政策”,住宅學報,第6期,p.p.67~69。
12. 27. 陸民仁(2000),”市場機能與政府干預平議”,華信金融季刊,第12期,p.p.11~24。
13. 21. 孫克難(1998),”政府角色與科技專案經費分配的理論基礎”,經濟情勢暨評論,第4卷,第1期,p.p.106~126。
14. 11. 林娟宇(2001),”台灣當前房地產市場景氣變動及其因應對策之研究”,人與地,第208期。
15. 19. 陳明邦,從尊重專業建構優質之專利獨立審查環境,智慧財產權月刊,第32期,第9-13頁(2001)。