跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.88) 您好!臺灣時間:2024/12/04 14:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:曹介羽
研究生(外文):Chieh-Yu Tsao
論文名稱:運用短時傅立葉轉換及點矩陣分析於基因重複子區域之預測
論文名稱(外文):Repeat Regions Detection of DNA Sequences by Short Time Fourier Transform
指導教授:鄭振發
指導教授(外文):Cheng-Fa Cheng
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:通訊與導航工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:英文
論文頁數:58
中文關鍵詞:生物功能重複序列特徵頻率短時傅立葉轉換哥茲柔濾波器自我比對點矩陣粒子群演算法
外文關鍵詞:Biological functionRepeated sequenceCharacteristic frequencyShort-Time Fourier transformDot-matrixAuto-correlation algorithmsParticle swarm optimization algorithm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:324
  • 評分評分:
  • 下載下載:24
  • 收藏至我的研究室書目清單書目收藏:0
摘要

在生物科技的革新以及人類基因體計畫完成的激勵下,DNA序列的分析越來越令人感到興趣。尤其發現DNA重複序列在生物學裡中遺傳疾病、人類基因定位、演化和其他許多重要和令人注意的應用都扮演著重要的角色。面對這樣的問題,發展出一套簡單且有效的方法以分析複雜DNA重複序列之生物功能是極為迫切的。對於DNA序列的分析而言,找尋DNA重複序列的位置是尤其重要的。本論文旨在利用傅立葉轉換、短時傅立葉轉換、點矩陣及自我比對等發展高效率的演算法以找尋在DNA中的重複序列。首先,將介紹重複序列的架構並利用傅立葉轉換以求出該序列的特徵頻率及其對應重複序列之長度。接著,我們將主要特徵頻率加入短時傅立葉轉換,據此可從頻譜圖上的尖峰(或顏色)清楚鑑別出重複序列位置。然後,我們使用一種基於點矩陣和自我比對的混合型演算法以預測重複序列的位置。此法不但具有相當的準確性亦能有效地縮短運算時間。最後,我們將使用修正之粒子群演算法以預測DNA裡的重複序列並發現其對插入式的重複序列有著不錯的成效。本論文將使用National Center for Biotechnology Information (NCBI)資料庫數據作為電腦模擬之依據以據此驗證所提方法之效能及實用性。
Abstract

Stimulated by the biotechnology revolution and the achievement of the Human Genome Project, the analysis of DNA sequence is more and more interesting and attracting. And finding repetitive sequences in DNA is of particular interest in biology due to their role in genetic diseases, human gene mapping, evolution, and many other important and interesting applications. In the face of such a question, it is urgent and far challenging to develop approaches to analyze the complex biological functions of DNA repeated sequence. For DNA sequences analysis, it is of great importance to find the position of DNA repeated sequences. The main purpose of this thesis is to develop efficient algorithms based on digital signal processing techniques such as the DFT, STFT, and Dot-matrix to find the DNA repeated sequences. At the first, structure and fundamental characteristic of the repeat sequence will be introduced. The main characteristic frequency which is the reciprocal of the repeated pattern length will be derived by the FFT. The main characteristic frequency will be incorporated into the short-time Fourier transform such that the repeated sequence locations can be clearly identified as distinct peaks (or colors) in the spectrum. Then a mixed method based on dot-matrix and auto-correlation algorithms will be employed to detect the repeat of the DNA sequences. Furthermore, we adopt the modified particle swarm optimization (MPSO) algorithm to look for the repeated regions of the DNA sequence. The method has a pretty good result to inter-spread repeat sequence. Finally, we take some datasets from the National Center for Biotechnology Information (NCBI) for illustrating the effectiveness of the proposed algorithms.
Contents


Abstract Ⅲ

List of Tables Ⅶ

List of Figures Ⅷ

1. Introduction 1
1-1: Molecular Biology 1
1-2: Motivations and Problem Formulations 4
1-3: Overview of Previous Research 7
1-4: Contributions of the Thesis 8
1-5: Organization of the Thesis 9

2. Structure and Characteristic of DNA Repeat Sequences 11
2-1: Introduction 11
2-2: Structure of Repeat Sequence 12
2-3: The Characteristic Frequency 12
2-4: Results and Discussions 16
2-5: Summary 19

3. Repeat Regions Detection of DNA Sequences Using Short Time Fourier Transform 20
3-1: Introduction 20
3-2: Short Time Fourier Transform 22
3-3: Goertzel Filter Bank Approach 30
3-4: Results and Discussions 31
3-5: Summary 36

4. Repeat Regions Detection of DNA Sequences Using Dot-matrix and Auto-Correlation Algorithms 37
4-1: Introduction 37
4-2: Dot-Matrix 39
4-3: Auto-Correlation Algorithm 42
4-4: Repeat sequence Prediction Algorithm 45
4-5: Results and Discussions 47
4-6: Summary 54

5. Repeat Regions Detection of DNA Sequences Using Modified Particle Swarm Optimization Algorithm 55
5-1: Introduction 55
5-2: Particle Swarm Optimization Algorithm 56
5-3: Repeat Detection Algorithm 59
5-4: Results and Discussions 63
5-5: Summary 66

6. Conclusions 67
Bibliography
Bibliography

[1] B. Lewin, editor. Genes VIII. Oxford University Press, New York, NY, 2004.
[2] D. Tautz, M. Trick and G. A. Dover, “Cryptic simplicity in DNA is a major source of genetic variation,” Nature, vol. 322, pp. 652–656, 1986.
[3] M. L. Pardue, K. Lowenhaupt, A. Rich and, A. Nordheim, “(dC-dA)n (dG-dT)n sequences have evolutionarily conserved chromosomal locations in Drosophila with implications for roles in chromosome structure and functionm,” J. EMBO, vol. 6, pp. 1781–1789, 1987.
[4] P. Bucher and G. Yagil, “Occurrence of oligopurine. oligopyrimidine tracts in eukaryotic and prokaryotic genes, ”DNA Seq., vol. 1, pp. 157–172, 1991.
[5] C. Burge, A. M. Campbell, and S. Karlin, “Over- and underrepresentation of short oligonucleotides in DNA sequences,” Proc. Natl Acad. Sci., USA, vol. 89, pp.1358–1362, 1992.
[6] Q. Lu, L. L. Wallrath, H. Granok, and S. C. Elgin, “(CT)n (GA)n repeats and heat shock elements have distinct roles in chromatin structure and transcriptional activation of the Drosophila hsp26 gene,” Mol.” Cell Biol., vol. 13, pp. 2802–2814, 1993.
[7] T. K. Kundu and M. R. Rao, “CpG islands in chromatin organization and gene expression,” J. Biochem., vol. 125, pp. 217–222, 1999
[8] E. S. Lander, “Initial sequencing and analysis of the human genome,” Nature, vol. 409, pp. 860–921, 2001.
[9] A. Links, “Analysis of the genome sequence of the flowering plant Arabidopsis thaliana,” The Arabidopsis Genome Initiative of Nature, vol. 408, pp. 796–815. 2000.
[10] G. Benson, “Tandem repeats finder: A program to analyze DNA sequences,” Nucleic Acids Res., vol. 27, pp. 573-580, 1999.
[11] S. Kurts and C. Schleiermacher, “REPuter: Fast computation of maximal repeats in Complete genomes,” Bioinformatics, vol. 15, pp. 426-427, 1999.
[12] T. Yoshida, N. Obata, and K. Oosawa, “Color-coding reveals tandem repeats in the Esherichia coli genome,” J. Mol. Biol., vol. 298, pp. 343-349, 2000.
[13] H. Martinez, “An efficient method for finding repeats in molecular sequences,” Nucleic Acids Res., vol. 11, pp.4629–4634. 1983.
[14] J. Devereux, P. Haeberli, and O. Smithies, “A comprehensive set of sequence analysis programs for the VAX,” Nucleic Acids Res., vol. 12, pp. 387–395. 1984.
[15] P. Agarwal and D. States, “The repeat pattern toolkit (rpt): Analyzing the structure and evolution of the c. elegans genome,” Proc. ISMB ‘94, pp. 1–9. 1994.
[16] É. Rivals, M. Dauchet, J. Delahaye, and O. Delgrange, “Fast discerning repeats in DNA sequences with a compression algorithm,” Proceedings of the Workshop on Genome and Informatics, (GIW97), Genome Informatics, Tokyo, vol. 8, pp. 215-266, 1997.
[17] M. Leung, B. Blaisdell, C. Burge, and S. Karlin, “An efficient algorithm for identifying matches with errors in multiple long molecular sequences,” J. Mol. Biol., vol. 221, pp. 1367–1378. 1991.
[18] P. Agarwal and D. J. States, “The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome,” Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, ISMB 94, pp. 1-9, 1994.
[19] S. K. Kannan and E. W. Myers, “An algorithm for locating nonoverlapping regions of maximal alignment score,” SIAM J. Comput., vol. 25, pp. 648–662, 1996.
[20] G. Benson, “Tandem repeats finder: A program to analyze DNA sequences,” Nucleic Acids Res., vol. 27, pp. 573–580, 1999.
[21] RepeatMasker. http://ftp.genome.washington.edu/RM/RepeatMasker.html
[22] J. A. Bedell , I. Korf and, W. Gish, “MaskerAid: A performance enhancement to RepeatMasker,” Bioinformatics., vol. 16, pp. 1040–1041, 2000.
[23] W. Gish and D. J. States. “Identification of protein coding regions by database similarity search,” Nat Genet., vol. 3, pp. 266–272, 1993.
[24] Washington University School of Medicine: Index of /blast/blast. http://blast.wustl.edu/blast
[25] A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White and, S. L. Salzberg, “Alignment of whole genomes,” Nucleic Acids Res., vol. 27, pp. 2369–2376. 1999.
[26] S. Kurtz and C. Schleiermacher, “REPuter - fast computation of maximal repeats in complete genomes,” Bioinformatics, vol. 15, pp. 426–427, 1999.
[27] S. Kurtz, E. Ohlebusch, C. Schleiermacher, J. Stoye, and R. Giegerich, “Computation and visualization of degenerate repeats in complete genomes,” Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 228–238. 2000.
[28] J. Parkhill, “Complete DNA sequence of a serogroup a strain of Neisseria meningitidis Z2491,” Nature, vol. 404, pp. 502–506. 2000.
[29] G. Benson, “Tandem repeats finder: a program to analyze DNA sequences,” Nucleic Acids Res., vol. 27, pp. 573–580, 1999.
[30] D. G. Arques and C. J. Michel, ”Periodicities in introns,” Nucleic Acids Res., vol. 15, pp. 7581–7592, 1987.
[31] D. G. Arques and C. J. Michel, “Periodicities in coding and noncoding regions of the genes,” J. Theor. Biol., vol. 143, pp. 307–318, 1990.
[32] S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy, “Prediction of probable genes by Fourier analysis of genomic sequences,” Comput. Appl. Biosci., vol. 13, pp. 263–270, 1997.
[33] M. Yan, Z. S. Lin, and C. T. Zhang, “A new Fourier transform approach for protein coding measure based on the format of the Z curve,” Bioinformatics, vol. 14, pp. 685–690, 1998.
[34] B. Issac, H. Singh, H. Kaur, and G. P. S. Raghava, “Locating probable genes using Fourier transform approach,” Bionformatics, vol. 18, pp. 196–197, 2002.
[35] S. H. Nawab and T. F. Quatieri, “Short Time Fourier Transform,” Chapter in Advanced Topics in Signal Processing, J. S. Lim and A. V. Oppenheim, eds., Englewood Cliffs, NJ: Prentice Hall, October 1987.
[36] M. R. Portnoff, “Representation of digital signals and systems based on the short-time Fourier transform,” IEEE Trans. Acoustics, Speech, Signal Process., vol. 28, pp. 55-69, Feb. 1980.
[37] D. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. Acoustics, Speech, Signal Process., vol. 32, pp. 236-243, Apr. 1984.
[38] F. Hlawatsch and G. F. Boudreaux-Bartels, “Linear and quadratic time-frequency signal representations,” IEEE Signal Process. Mag., vol. 9, no. 2, pp. 21-67, Apr. 1992.
[39] S. H. Nawab and T. F. Quatieri, “Short Time Fourier Transform”, Chapter in Advanced Topics in Signal Processing, J. S. Lim and A. V. Oppenheim, eds., Englewood Cliffs, NJ: Prentice Hall, October 1987.
[40] V. Heine, M. Koen, and D. Uir, Theogl of Pseudopotentials, Moscow: MIR, 1973.
[41] A. Antoniou, Digital Filters: Analysis, Design, and Applications, McGraw-Hill, New York, 1993.
[42] S. Bagchi and S. K. Mitra, The NonUniform Discrete Fourier Transform and its Applications in Signal Processing, Norwell Massachusetts: Kluwer Academic Publishers, 1999.
[43] C. Mueller, M. Dalkilic and, A. Lumsdaine, “High-performance direct pairwise comparison of large genomic sequences,” Proceedings of the 19th IEEE Internationa Parallel and Distributed Processing Symposium (IPDPS’05), pp. 1530-2075, 2005.
[44] A.J. Gibbs and G. A. McIntyre, “The diagram, a method for comparing sequences,” Eur. Jour. Biochem, vol. 16(1), pp. 1–11, 1970.
[45] W. M. Fitch, “Locating gaps in amino acid sequences to optimize the homology between two proteins,” Biochem. Genet. vol. 3, pp. 99–108, 1969.
[46] R. Staden, “An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences,” Nucleic Acids Res., vol. 10, pp. 2951–2961, 1982.
[47] A .D. McLachlan, “Test for comparing related amino acid sequences. Cytochrome c and cytochrome c551,” J. Mol. Biol. vol. 61, pp. 409–424, 1971.
[48] A. D. McLachlan, “Analysis of gene duplication repeats in the myosin rod,” J. Mol. Biol., vol. 169, pp. 15–30, 1983.
[49] A. D. McLachlan and D. R. Boswell, “Confidence limits for homology in protein or gene sequences. The c-myc oncogene and Adenovirus E1A protein,” J. Mol. Biol. vol. 185, pp. 39–49, 1985.
[50] J .G. Reich and W. Meiske, “A simple statistical significance test of window scores in large dot matrices obtained from protein or nucleic acid sequences,” Comput. Appl. Biosci. vol. 3, pp. 25–30, 1987.
[51] C. Mueller, M. Dalkilic, and A. Lumsdaine, “High-performance direct pairwise comparison of large genomic sequences,” IEEE Transactions on Signal Processin, vol. 19, pp. 1530-2075, 2005.
[52] P. Argos, “A sensitive procedure to compare amino acid sequences,” J. Mol. Biol. vol. 193, pp. 385–396, 1987.
[53] J. Parkhill, et al. “Complete DNA sequence of a serogroup a strain of Neisseria meningitidis Z2491.” Nature, vol. 404, pp. 502–506, 2000.
[54] G. Benson, “Tandem repeats finder: a program to analyze DNA sequences,” Nucleic Acids Res., vol. 27, pp. 573–580, 1999.
[55] R. C. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” Proc. of Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, pp.39-43, 1995.
[56] 胡曉輝,「粒子群優化算法介紹」, http://web.ics.purdue.edu/~hux/tutorials.shtml,民國91 年4 月。
[57] J. Kennedy and R. C. Eberhart, Swarm Intelligence, Morgan Kaufmann Press. 2001.
[58] R.C. Eberhart and Y. Shi, “Comparison between genetic algorithms and particle swarm optimization,” 1998 Annual Conference on Evolutionary Programming, San Diego, 1998.
[59] D. Srinivasan, W. H. Loo, and R. L. Cheu, “Traffic incident detection using article swarm optimization,” Swarm Intelligence Symposium. Proceedings of the 2003 IEEE, pp.144-151, 2003.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top