(3.236.100.86) 您好!臺灣時間:2021/05/06 13:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:絲一倫
研究生(外文):Yi-Lun Ssu
論文名稱:在DNA序列中有效率的找尋串聯重覆序列
論文名稱(外文):Efficient Approach for Discovering Tandem Arrays in DNA Sequences
指導教授:羅有隆羅有隆引用關係
指導教授(外文):Yu-lung Lo
學位類別:碩士
校院名稱:朝陽科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:30
中文關鍵詞:DNA序列重覆片段近似串聯重覆序列串聯重覆序列銜接重覆
外文關鍵詞:repeating patternDNA sequenceapproximate tandem arraytandem repeattandem array
相關次數:
  • 被引用被引用:0
  • 點閱點閱:264
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
在DNA的序列中,串聯重覆序列(tandem array)指的是,一段核甘酸序列(nucleotide sequence)連續重覆兩次或以上且是相連的連續出現。而透過分析串聯重覆序列可幫助我們診斷單一基因疾病、鑑定親子關係等等於醫學上的應用。但是生物經過幾千萬年的演化,其間發生了突變、轉譯、反轉錄等事件,使得核甘酸序列的排序產生了些微的變化,因此生物根源關係的決定、演化樹的建構,則需透過近似串聯重覆序列(approximate tandem array)的資訊來完成。近似串聯重覆序列指的則是一段近似核甘酸序列,連續重覆兩次或以上且是相連的連續出現。雖然串聯重覆序列在生物學上的研究或與學上的應用已經很多年,但如何有效率的找尋出DNA序列中所有的串聯重覆序列或近似串聯重覆序列,已被發表的相關研究並不多。由於DNA序列可以由核甘酸鹼基的A、T、G、C四個符號來表示,如此,透過字串比對,可以發展找出DNA序列中串聯重覆序列的發生,這也是現有技術所採行的辦法。本研究報告將分別對串聯重覆序列與近似串聯重覆序列的找尋提出新的演算法,我們利用DNA中重覆片段(Repeating Pattern)的發生位置,可快速的串接出串聯重覆序列,並透過先找尋出允許容錯重覆片段(Fault Tolerant Repeating Pattern)再串接,以幫助找尋DNA序列中的近似串聯重覆序列。
A tandem array is two or more contiguous copies of a pattern of nucleotides in a DNA sequence. It means tandem array can be divided into two or more identical substring. Tandem arrays can be applied in disease diagnosis, human identity testing, and so forth. An approximate tandem arrays is one in which the substrings are similar, but not identical. Approximate tandem arrays can be applied in describe evolution events like mutations, translocations and reversal. Although, the applications of (approximate) tandem arrays have been widely used by biology and medical sciences many years, the existing schemes for efficiently discovering all the (approximate) tandem arrays in a DNA sequence are rare. A DNA sequence can be represented by four standard nucleotide bases – A、T、G and C, such that, the string matching techniques can be used to discover all the (approximate) tandem arrays in a DNA sequence. In this research report, we will propose a new algorithm by concatenating repeating patterns to find all the tandem arrays in a DNA sequence, and a fault tolerant repeating pattern approach will also be used to discover approximate tandem arrays.
摘要............................................I
Abstract........................................II
1.前言..........................................1
2.現有文獻探討..................................3
2.1 Optimized Basic Algorithm...................4
2.2 Findsquares Algorithm.......................6
2.3 Discovering of Approximate Tandem Repeats...7
3.Repeating Pattern Alignment Approach for Discovering Tandem
Arrays........................................9
3.1演算法說明...................................9
3.2範例說明.....................................10
3.3 時間複雜度分析..............................11
3.4效能評估.....................................12
3.4.1 DNA序列的長度.............................13
3.4.2 Tandem Arrays的個數.......................14
3.4.3重覆片段出現的個數.........................15
4.Fault Tolerance Repeating Pattern Approach for Discovering Approximate Tandem Arrays.......................17
4.1演算法說明...................................18
4.2範例說明.....................................19
4.3 效能評估....................................21
4.3.1 DNA序列長度的影響.........................22
4.3.2 Approximate Tandem Arrays個數的影響.......23
4.3.3重覆片段個數的影響.........................24
參考文獻........................................27

表目錄
表1. 重覆片段處理過程...........................11
表2. 三種演算法功能上之差異.....................17
表3. DNA序列間之重覆片段........................20
表4. 允許容錯重覆片段...........................21

圖目錄
圖1. “ ACAACAACA”之implicit suffix tree.......5
圖2. 長度為2之子字串............................7
圖3. “ATATATAT”切割為Q與R.....................7
圖4. The matrix D...............................9
圖5. DNA序列長度的影響..........................14
圖6. tandem array個數的影響.....................15
圖7. 重覆片段個數的影響.........................16
圖8. DNA序列長度的影響..........................23
圖9. approximate tandem array 個數的影響........24
圖10. 重覆片段個數的影響........................25
[1]D. Anastassiou(2003), “Genomic Signal Processing,’’ IEEE Signal Processing Magazine, Vol.81, pp.349-355.
[2]T. K. Attwood and D. P. Smith(2000), “Introduction to Bioinformatics,’’ Briefings In BIOINFORMATICS. School of Biologic Sciences, University of Manchester. Vol. 1. No 1 FEBRUARY. pp.103-109.
[3]S. S. Bandyopadhyay , S. Paul , and A. Konar(2005), “Improved Alignments for DNA Alignment and Revision of Scoring Matrix ,’’ proceddings of ICISIP IEEE.
[4]G. Benson(1998), “An Algorithm for Finding Tandem Repeats of Unspecified Pattern Size,” In Proc. of the 2nd annual international conference on Computational molecular biology, New York, Mar, pp. 20-29.
[5]B. Bergeron and M. D(2005), “Bioinformatics Computing,’’ Harvard Medical School and Massachusetts Institute of Technology.
[6]M. Buchner and S. Janjarasjitt(2003), “Detection and Visualization of Tandem Repeats in DNA Sequences,’’ IEEE TRANS. ON SIGNAL PROCESSING, VOL. 51, NO.9, SEP.
[7]J. G. Chen(2005), “Finding All Tandem Arrays in DNA Sequences,’’ master thesis, Dept. of Computer Science and Information Engineering, National Chi-Nan University, Taiwan.
[8]X. Chen , S. Kwong , and M. Li(2001), “A Compression Algorithm for DNA Sequences,’’ IEEE ENGINEERING IN MEDICINE AND BIOLOGY, July/August.
[9]L. L. Cheng , D. W. Cheung , and S. M. Yiu(2004), “Approximate String Matching in DNA Sequences, ’’ Proceedings of the Eighth International Conference on Database Systems for Advanced Applications(DASFAA’03) IEEE.
[10]C. F. Cheung, J. X. Yu, and H. Lu(2005), “Constructing Suffix Tree for Gigabyte Sequence with Megabyte Memory,’’ IEEE TRANSATCTION ON KNOWLEDGE AND DATA ENGINEERING, Vol. 17, No. 1, JAN.
[11]J. Cohen(2004), “Bioinformatics-An Introduction for Computer Scientists,’’ ACM Computing Surveys, Vol. 36, No.2, June, pp. 122- 158.
[12]D. Gusfield(1997), “Algorithms on Strings, Trees, and Sequences,” Computer Science and Computational Biology, the Press Syndicate of the University of Cambridge, New York.
[13]H. Hammond, L. Jin, Y. Zhong, C. T. Caskey, and R. Chakraborty(1994), “Evaluation of 13 short tandem repeat loci for use in personal identification applications,’’ Am J Hum Gent, 55:175-189.
[14]E. Hunt, M. P. Atkinson, and R. W. Irving(2002), “Database indexing for large DNA and protein sequence collections,’’ the VLDB Journal, 11:256-271/Digital Object Identifier (DOI).
[15]G. M. Landua, J. P. Schmidt, D. Sokol(2001), “An Algorithm for Approximate Tandem Repeats,” Journal of Computer Biology, Vol 8 ,Number 1, May Ann Libert, Inc. pp.1-18.
[16]E. Lin(2003), “Bioinformatic alignment,” MIS Dept., Yuan Ze Univ., Taiwan.
[17]Y. L. Lo and C. Y. Chen(2006), “Fault Tolerant Non-trivial Repeating Pattern Discovering for Music Data,” the 5th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2006), Honolulu, Hawaii, July 10-12, pp. 130-135.
[18]Y. L. Lo and W. L. Lee(2004), “Linear Time for Discovering Repeating Patterns in Music Databases,’’ the 2004 IEEE International Conference on Multimedia and Expo (ICME).
[19]Y. L. Lo, W. L. Lee, and L. H. Chang(2008), “True Suffix Tree Approach for Discovering Non-trivial Repeating Patterns in a Music Object,” Journal of Multimedia Tools and Applications, Springer, Vol. 37, No. 2, April, pp. 169-187.
[20]Y. L. Lo and Y. L. Ssu(2008), “Efficient Approach for Discovering Tandem Arrays,” the 2008 International Conference on Advanced Information Technologies (AIT).
[21]M. Main, and R. Lorentz(1984), “An O(nlogn) Algorithm for Finding All Repetitions in a String,’’ Journal of Algorithms, Vol. 5, pp.422-432.
[22]M. Sammeth and J. Stoye(2006), “Comparing Tandem Repeats with Duplications and Excisions of Variable Degree,” IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Vol. 3, Issue 4, Oct , pp. 395-407.
[23]J. Stoye, and D. Gusfield(2002), “Simple and flexible detection of contiguous repeats using a suffix tree,’’ Theoretical Computer Science, Vol. 270, Issues 1-2, pp. 843-856.
[24]T. T. Tran, and G. T. Zhou(2004), “Techniques for Detecting Approximate Tandem Repeats in DNA ,” ICASSP.
[25]E. Ukkonen(1993), “Approximate string matching over suffix trees, ” Proc 4th Annual Symp Combinatorial Pattern Matching ,pp.228-242.
[26]E. Ukkonen(1995), “On-Line Construction of Suffix Tree,” In Proc. of the 14th Ann. Symp. on Switching and Automata Theory, pp. 1-11.
[27]P. Weiner(1973), “Linear Pattern Matching Algorithms,” In Proc. of the 14th Ann. Symp. on Switching and Automata Theory, pp. 1-11.
[28]Y. Wexler, and Z. Yakhini, and Y. Kashi and D. Geiger(2004), “Finding Approximate Tandem Repeats in Genomic Sequences,’’ ACM RECOMB’04, March, pp.27-31.
[29]H. E. Williams, and J. Zobel(2002), “Indexing and Retrieval for Genomic Database,’’ IEEE Trans. on Knowledge and Data Engineering, Vol. 14, NO. 1, Jan./Feb.
[30]J. Yang , W. Wang , and P. Yu(2004) , “ Approach Search on Large String Databases,” Proceedings of the 16th International Conference on Scientific and Statistical Database Management(SSDBM’04) IEEE.
[31]H. X. Zhou and H. Yan(2007), “A Fast Method for Determining the Repeat Pattern Size in DNA Sequences,” In Proc. of the 6th Int’l Conf. on Machine Learning and Cybernetics, Vol.6, Hong Kong, Aug, pp. 3314-3318.
[32]DNA Data Bank of Japan(DDBJ),http://www.ddbj.nig.ac.jp/Welcome-j.html
[33]EMBL (EMBL-Bank), http://www.ebi.ac.uk/embl/
[34]GeneBank , http://www.bioversityinternational.org/Themes/Genebanks/index.asp
[35]National Center for Biotechnology Information (NCBI), http://www.ncbi.nlm.nih.gov/
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔