(3.237.20.246) 您好!臺灣時間:2021/04/15 13:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳昀慶
研究生(外文):Chen, Yun-Ching
論文名稱:建構節省空間的去氧核醣核酸序列後置樹
論文名稱(外文):Space Economical Suffix Trees for DNA Sequences
指導教授:李素瑛李素瑛引用關係
指導教授(外文):Lee, Suh-Yin
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:60
中文關鍵詞:後置樹核醣核酸
外文關鍵詞:Suffix TreesDNA SequencesPrefix TableBit Layout
相關次數:
  • 被引用被引用:0
  • 點閱點閱:97
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:0
近年來由於生物資料大量增加,使得生物資訊學變得越來越重要。因為利用電腦科技才能對這樣大量的資料進行快速的分析。在資訊領域中,後置樹是一個快速且強大的字串分析資料結構,且剛好能解決許多基因體序列的相關問題。但它唯一的致命缺點,就是在處理龐大的基因序列時會耗費太多記憶體空間。因此,我們在這篇論文中利用基因體序列的特性縮小後置樹所需要使用的空間。我們對每一個節點提出了更少資料量的儲存方式。再者,我們利用自己提出的前置陣列取代靠近樹根的眾多節點,讓使用空間更進一步的縮小。其三,我們根據新型後置樹的架構提出了一個加快建構速度的前置處理方法。最後,實驗結果顯示在針對基因體序列的表現上,我們提出的架構使用最少的空間,且在建構速度上也有不錯的表現。

In recent years, bioinformatics becomes an important research field because there are more and more genetic data to be analyzed. The suffix tree is a powerful data structure for string analysis and has many applications on bioinformatics. Besides, its linear construction time, linear construction space and short search time all make it very impressive. However, consuming huge space is a fatal drawback especially while using suffix trees to handle the large amount of DNA sequences. In this thesis, we utilize some characteristics of DNA sequences to reduce the space requirement of suffix trees. A new bit layout is proposed for the node of a suffix tree which requires less space than others. We also use an index table, called “prefix table”, which can reduce the number of internal nodes in suffix trees. In addition, we propose a preprocessing technique to improve the construction time based on our data structure. The experiments shows that our proposed method is the most space-parsimony implementation of suffix trees for DNA sequences and it also has a good performance in construction time.

Abstract (in Chinese)…………………………..……………………………………….i
Abstract (in English)…………………………………………………………………..ii
Acknowledgment……………………………………………………………………..iii
Table of Contents……………………………………………………………………..iv
List of Figures………………………………………………………………………...vi
List of Tables…………………………………………………………………………vii
Chapter 1 Introduction………………………………………………………...………1
1.1 Motivation………………………………………………………...………….1
1.2 Organization……………………………………………………………….....3
1.3 Overview of the Architecture……………………………………………...…4
Chapter 2 Background……………………………………………………………..…..5
2.1 DNA (deoxyribonucleic acid) sequences…………………………………….5
2.2 Suffix trees…………………………………………………………………...6
2.2.1 Basic Definitions……………………………………………………...6
2.2.2 Head Position…………………………………………………………8
2.2.3 The Suffix Tree Construction………………………………………..10
2.2.3.1 A naïve algorithm to build a suffix tree………………………11
2.2.3.2 Using the suffix link to achieve linear time construction…….13
2.3 Simple Implementation of Suffix Trees…………………………………….15
2.3.1 Related Works……………………………………………………….15
2.3.2 A Simple Linked List Implementation………………………………16
Chapter 3 Parsimony-Spaced Suffix Trees for DNA Sequences……………………..19
3.1 Analyzing the Previous Bit Layout of Records……………………………..19
3.1.1 Large Nodes and Small Nodes………………………………………19
3.1.1.1 Some Observations Found by Kurtz…………………………20
3.1.1.2 Analyzing Large Nodes and Small Nodes……………………21
3.1.2 Removing the Suffix Links………………………………………….26
3.2 Proposed Bit Layout………………………………………………………...28
3.3 Reducing the Space Requirement of Internal Nodes………………………..30
3.3.1 Characteristics of DNA Sequences Influencing Internal Nodes…….32
3.3.2 Constructing the Suffix Trees with the Prefix Table………………...33
3.4 Speed Up by Locality for Large Data………………………………………38
3.4.1 Advantage in Spatial Locality……………………………………….38
3.4.2 Implementation Techniques…………………………………………40
3.5 Full Algorithm………………………………………………………………44
Chapter 4 Experiments and Discussion………………………………………………46
4.1 Space Requirement………………………………………………………….46
4.2 Running Time……………………………………………………………….49
4.2.1 Platform……………………………………………………………...49
4.2.2 Result………………………………………………………………...49
4.3 Upper Limits on the Input Sequence Length………………………………..54
4.4 Advantage on Applications………………………………………………….55
Chapter 5 Conclusion and Future Work……………………………………………...57
5.1 Conclusion…………………………………………………………………..57
5.2 Future Work…………………………………………………………………57
Bibliography………………………………………………………………………….59

[1] Hooman H. Rashidi and Lukas K. Buehler, “Bioinformatics Basics,“ 2000
[2] http://www.ornl.gov/TechResources/Human_Genome/
[3] Dan Gusfield, “Algorithms on Strings, Trees, and Sequences,” 1997
[4] D. E. Knuth, J. H. Morris, and V. B. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing, Vol. 6, pp. 323-50, 1977.
[5] R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM, Vol. 20, pp. 762-72, October 1977.
[6] Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White and Steven L. Salzberg, “Alignment of whole genomes,” Nucleic Acids Research, Vol. 27, No. 11, pp. 2369 — 2376, April 1999.
[7] Aurthur L. Delcher, Adam Phillippy, Jane Carlton and Steven L. Salzberg, “Fast algorithms for large-scale genome alignment and comparison,” Nucleic Acids Research, Vol. 30, No. 11, pp. 2478 — 2483, 2002.
[8] U. Manber and E. W. Myers, “Suffix Arrays: A New Method for On-line String Searches,” SIAM Journal on Computing, Vol. 22, No. 5, pp. 935-948, 1993.
[9] A.Andersson and S. Nukssib, “Improved Behavior of Tries by Adaptive Branching,” Information Processing Letters, Vol. 46, pp. 295-300, 1993.
[10] R. W. Irving, “Suffix Binary Search Trees,” Research Report, Department of Computer Science, University of Glasgow, 1996.
[11] J. Karkkainen, “Suffix Cactus: A Cross between Suffix Tree and Suffix Array,” In Proceeding Of the Annual Symposium on Combinatorial Pattern Matching (CPM’95), LNCS 937, pp. 191-204, 1995.
[12] Neil A. Campbell, Lawrence G. Mitchell and Jane B. Reece, “Biology concepts and connections,” Chap. 10, 1997
[13] Kurtz, S, “Reducing the space requirement of suffix trees,” Software Pract. Experience, Vol. 29, pp. 1149-1171, 1999.
[14] P. Weiner, “Linear pattern matching algorithms,” Proceeding Of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1-11, 1973.
[15] E. M. McCreight, “A space-economical suffix tree construction algorithm,” Journal of ACM, Vol 23, pp. 262-272, 1976.
[16] E. Ukkonen, “On-line construction of suffix-trees,” Algorithmica, Vol. 14, No. 3, pp. 249-260, September 1995.
[17] M. Farach, “Optimal Suffix Tree Construction with Large Alphabets,” In Proceedings of the 38th Annual Symposium on the Foundations of Computer Science, FOCS 97, pp. 137-143, New York, 1997.
[18] M. Crochemore and R. Verin, “Direct Construction of Compact Acyclic Word Graphs,” In Proceeding Of the Annual Symposium on Combinatorial Pattern Matching (CPM’97), LNCS 1264, pp. 116-129, 1997.
[19] J. Knight, D. Gusfield, and J. Stoye. The Strmat Software-Package, 1998.
[20] Ela Hunt, Malcolm P. Atkinson and Robert W. Irving, “A Database Index to Large Biological Sequences,” Proceeding Of the 27th VLDB Conference, pp. 139-148, September 2001.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔