(3.231.166.56) 您好!臺灣時間:2021/03/08 11:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:吳珮如
研究生(外文):Wu, Pei-Ju
論文名稱:動態Burrows-Wheeler轉換方法設計與實踐
論文名稱(外文):Dynamic Burrows-Wheeler Transform and Its Practical Implementation
指導教授:韓永楷
指導教授(外文):Hon, Wing-Kai
口試委員:李哲榮盧錦隆
口試委員(外文):Lee, Che-RungLu, Chin-Lung
口試日期:2017-06-23
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:42
中文關鍵詞:塊排序壓縮動態去氧核醣核酸
外文關鍵詞:Burrows–WheelerBWTDynamicDNA
相關次數:
  • 被引用被引用:0
  • 點閱點閱:52
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Burrows-Wheeler 轉換是一種應用在資料壓縮和生物基因序列比對上的演算法,近年來,由於次世代定序的出現,基因序列比對成為一個受到大家關注的研究問題,次世代定序是目前最新的一種DNA定序方法,它大幅降低了傳統定序法所需的時間和空間的成本,目前許多序列比對工具的核心都使用Burrows-Wheeler 轉換演算法來壓縮資料,像是BWA、Bowtie、SOAP2。然而,在基因定序時,若是需要對小部分的序列做更動,或是加入新的序列,那麼就要將原本全部的序列重新做一次Burrows-Wheeler 轉換。為了改善重新轉換所需的時間和空間成本,我們設計並實作出一個動態Burrows-Wheeler 轉換的演算法,可以用較短的時間就能得到相同的結果,並且不需要全部重新做轉換,除此之外我們也提供一些使用者查詢的功能。
Burrows-Wheeler Transform (BWT) is a popular algorithm applied in data compression and DNA sequence alignment. Due to the emergence of Next Generation Sequencing --- the latest DNA sequencing method that significantly reduces the cost of time and space required by traditional sequencing methods, DNA sequence alignment has become a subject of great concern in recent years. Currently, many of the sequence alignment tools, such as BWA, Bowtie and SOAP2, are based on BWT to reduce the memory requirement. However, it is not convenient for us to re-do the transform of whole collection of sequences while we only add or delete a sequence in the collection. In order to improve the time and space required for re-transform, we designed and implemented a dynamic BWT algorithm. With our method, we can get the same result as the static transform in a shorter time, and do not need to re-do the transform.
Abstract
Contents
1 Introduction 1
1.1 Organization . . . . . . . . . . . . . . . . . . . . 4
2 Preliminaries 5
2.1 Burrows-Wheeler Transform (BWT) . . . . . . . 6
2.1.1 BWT and Backward Search . . . . . . . . 6
2.1.2 Generalized BWT . . . . . . . . . . . . . 9
2.1.3 Rank and Select in BWT . . . . . . . . . 10
2.2 B-Tree . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Methods 14
3.1 Data Structure . . . . . . . . . . . . . . . . . . . 15
3.2 Insertion of New Strings . . . . . . . . . . . . . . 17
3.3 Query . . . . . . . . . . . . . . . . . . . . . . . . 19
13.3.1 Rank . . . . . . . . . . . . . . . . . . . . 19
3.3.2 Select . . . . . . . . . . . . . . . . . . . . 20
3.4 External Memory Implementation . . . . . . . . . 21
4 Experiment 22
4.1 Optimization . . . . . . . . . . . . . . . . . . . . 23
4.1.1 Encoding of Characters . . . . . . . . . . 23
4.1.2 Bit Shift Operations . . . . . . . . . . . . 25
4.1.3 Fragment Size . . . . . . . . . . . . . . . . 27
4.1.4 Fanout Size . . . . . . . . . . . . . . . . . 30
4.2 Comparison between Static and Dynamic Versions 35
5 Conclusion and Future Work 39
[1] R. Arnold and T. Bell: The Canterbury Corpus Home Page.
http://corpus.canterbury.ac.nz/.
[2] T. Beller, M. Zwerger, S. Gog, and E. Ohlebusch: Space-Efficient Construction of the Burrows-Wheeler Transform.
Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 5–16, 2013.
[3] M. Burrows and D. Wheeler: A Block Sorting Lossless Data Compression Algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
[4] P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro: An Alphabet-Friendly FM-Index. Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 150–160, 2004.
[5] Y. Mori: An Implementation of the Induced Sorting Algorithm, 2010.
https://sites.google.com/site/yuta256/.
[6] G. Nong, S. Zhang, and W. H. Chan: Two Effi cient Algorithms for Linear Time Suffi x Array Construction. IEEE Transactions on Computers, 60:1471–1484, 2011.
[7] J. Sirén: Burrows-Wheeler Transform for Terabases. Proceedings of IEEE Data Compression Conference (DCC), pages 211–220, 2016.
[8] National Center for Biotechnology Information (NCBI).
https://www.ncbi.nlm.nih.gov/
[9] The 1000 Genomes Project Consortium: A Map of Human
Genome Variation from Population-Scale Sequencing. Nature, 467(7319):1061–1073, 2010.
[10] Wikipedia: B-tree. https://en.wikipedia.org/wiki/B-tree.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔