(3.80.6.131) 您好!臺灣時間:2021/05/17 03:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:余政修
研究生(外文):Zheng-XiuYu
論文名稱:基於硬體運算的生物序列比對之研究
論文名稱(外文):Investigation of Sequence Alignment by Using Hardware Operation
指導教授:黃吉川黃吉川引用關係
指導教授(外文):Chi-Chuan Hwang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:工程科學系
學門:工程學門
學類:綜合工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:51
中文關鍵詞:序列比對布隆過濾器硬體運算動態規劃演算法
外文關鍵詞:sequence alignmentbloom filterhardware operationdynamic programming
相關次數:
  • 被引用被引用:0
  • 點閱點閱:35
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於生物技術的快速進步,DNA序列定序所需時間不斷縮短以及各種生物基因體計畫的進行,越來越多的生物序列資料被建構成資料庫,DNA序列資料庫中含有的序列數量已高達數百億對,如何在此龐大的序列資料庫進行正確而快速的搜尋是重要課題之一。
序列比對是生物資訊研究中最重要的研究工具,它可以比較及分析出兩條或多條序列之間的相似度,在序列比對中,兩條序列通常長度都是相當長,因此所需的處理時間相當耗時。許多演算法被研究出來降低處理時間。
本論文旨在提出基於硬體運算模擬實現了一套序列比對的資料處理流程。首先在參考序列上,利用hash函式建立布隆過濾器儲存鹼基,利用候選位置產生演算法找出最佳候選位置,減少查找過程中的比對,在使用動態規劃演算法的遞回函式,計算相似度分數與尋找最高對齊方式,完成序列比對。
Due to the rapid advances in biotechnology, more and more biological sequence is built in database, in which there are tens of billions pairs of DNA sequence in the library. Therefore it is an important issue to properly and quickly search this much sequence in database. In bioinformatics study sequence alignment is the most important research tool, which can compare and analyze the similarity between two or more sequences in the sequence alignment. The lengths of the two sequences are usually quite long, so the processing takes much time. Many algorithms have been found out to reduce processing time. In the thesis, we focus on constructing data processing by performing the hardware-based computing simulation. First, the reference sequence uses the hash function to establish Bloom filter for the storage of bases. Second, candidate position is used to generate algorithm to find the best candidate for the position, which can reduce the process. Finally, dynamic programming algorithm is used to compute the similarity scores and find the highest alignment to complete sequence alignment.
中文摘要 I
Abstract II
誌謝 VII
目錄 IX
圖目錄 XII
第一章 緒論 1
1-1 研究背景 1
1-2 研究動機 4
1-3 研究目的 7
1-4 本文架構 10
第二章 序列比對相關研究 11
2-1 編輯距離 11
2-2 Sequence Alignment 13
2-3 演算法種類 14
2-4 布洛斯-惠勒轉換演算法 16
2-4-1 後綴陣列 16
2-4-2 布洛斯-惠勒轉換編解碼 19
第三章 最佳路徑分析與候選位置對應產生 22
3-1 求解最佳路徑 22
3-1-1 動態規劃演算法 22
3-1-2 最佳路徑分析 26
3-2 索引表及候選位置對應演算法設計 29
3-2-1 標準的布隆過濾器 29
3-2-2 最佳化hash函式個數 31
3-2-3 布隆過濾器建立 33
3-2-4 候選位置對應演算法 36
第四章 實驗結果 39
4-1 模擬流程 39
4-2 模擬之結果 40
第五章 結論與未來展望 46
5-1 結論 46
5-2 未來展望 46
參考文獻 48
參考文獻
[1]F. Baluška, D. Volkmann, and P. W. Barlow, Cell bodies in a cage, Nature, vol. 428, pp. 371-371, 2004.
[2]D. Keilin, The Leeuwenhoek Lecture: The problem of anabiosis or latent life: History and current concept, Proceedings of the Royal Society of London. Series B, Biological Sciences, vol. 150, pp. 149-191, 1959.
[3]I. K. Vasil, A history of plant biotechnology: from the cell theory of Schleiden and Schwann to biotech crops, Plant cell reports, vol. 27, pp. 1423-1440, 2008.
[4]F. H. Crick, The biological replication of macromolecules, in Symp. Soc. Exp. Biol, 1958, pp. 138-163.
[5]张春霆, 生物信息学的现状与展望, 世界科技研究与发展, vol. 22, pp. 17-20, 2000.
[6]F. Sanger, S. Nicklen, and A. R. Coulson, DNA sequencing with chain-terminating inhibitors, Proceedings of the National Academy of Sciences, vol. 74, pp. 5463-5467, 1977.
[7]A. M. Maxam and W. Gilbert, A new method for sequencing DNA, Proceedings of the National Academy of Sciences, vol. 74, pp. 560-564, 1977.
[8]解增言, 林俊华, 谭军, and 舒坤贤, DNA 测序技术的发展历史与最新进展, 生物技术通报, vol. 8, pp. 64-70, 2010.
[9]E. R. Mardis, The impact of next-generation sequencing technology on genetics, Trends in genetics, vol. 24, pp. 133-141, 2008.
[10]权威 and 王亚东, 基于新一代测序数据的比对算法的研究, 智能计算机与应用, vol. 2, pp. 39-41, 2012.
[11]A. J. Gibbs and G. A. McIntyre, The diagram, a method for comparing sequences, European Journal of Biochemistry, vol. 16, pp. 1-11, 1970.
[12]S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool, Journal of molecular biology, vol. 215, pp. 403-410, 1990.
[13]D. J. Lipman and W. R. Pearson, Rapid and sensitive protein similarity searches, Science, vol. 227, pp. 1435-1441, 1985.
[14]A. Apostolico and Z. Galil, Pattern matching algorithms, 1997.
[15]G. A. Stephen, String searching algorithms vol. 3: World Scientific, 1994.
[16]P. R. Gray, P. Hurst, R. G. Meyer, and S. Lewis, Analysis and design of analog integrated circuits: Wiley, 2001.
[17]C. Mead and L. Conway, Introduction to VLSI systems vol. 1080: Addison-Wesley Reading, MA, 1980.
[18]W. J. Masek and M. S. Paterson, A faster algorithm computing string edit distances, Journal of Computer and System sciences, vol. 20, pp. 18-31, 1980.
[19]V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, in Soviet physics doklady, 1966, pp. 707-710.
[20]J. Setubal and J. Meidanis, Sequence comparison and database search, Introduction to computational molecular biology, 1997.
[21]T. F. Smith and M. S. Waterman, Comparison of biosequences, Advances in applied mathematics, vol. 2, pp. 482-489, 1981.
[22]X. Huang and W. Miller, A time-efficient, linear-space local similarity algorithm, Advances in Applied Mathematics, vol. 12, pp. 337-357, 1991.
[23]S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic acids research, vol. 25, pp. 3389-3402, 1997.
[24]S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol. 48, pp. 443-453, 1970.
[25]O. Gotoh, An improved algorithm for matching biological sequences, Journal of molecular biology, vol. 162, pp. 705-708, 1982.
[26]P. Klus, S. Lam, D. Lyberg, M. S. Cheung, G. Pullan, I. McFarlane, et al., BarraCUDA-a fast short read sequence aligner using graphics processing units, BMC research notes, vol. 5, p. 27, 2012.
[27]H. Li and R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics, vol. 25, pp. 1754-1760, 2009.
[28]H. Li and R. Durbin, Fast and accurate long-read alignment with Burrows–Wheeler transform, Bioinformatics, vol. 26, pp. 589-595, 2010.
[29]B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome biol, vol. 10, p. R25, 2009.
[30]R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, et al., SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics, vol. 25, pp. 1966-1967, 2009.
[31]H. Li and N. Homer, A survey of sequence alignment algorithms for next-generation sequencing, Briefings in bioinformatics, vol. 11, pp. 473-483, 2010.
[32]W. R. Pearson and W. Miller, [27] Dynamic programming algorithms for biological sequence comparison, Methods in enzymology, vol. 210, pp. 575-601, 1992.
[33]E. Vermij, Genetic sequence alignment on a supercomputing platform, TU Delft, Delft University of Technology, 2011.
[34]B. H. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, vol. 13, pp. 422-426, 1970.
[35]Y. Huayun and G. Jihong, Bloom Filter 研究进展, 电信科学, vol. 26, pp. 31-36, 2010.
[36]A. Rajaraman, J. D. Ullman, and 王斌, 大数据: 互联网大规模数据挖掘与分布式处理, ed: 北京: 人民邮电出版社, 2012.
[37]M. Mitzenmacher, Compressed bloom filters, IEEE/ACM Transactions on Networking (TON), vol. 10, pp. 604-612, 2002.
[38]張柏毅, 以短基因片段重組基因組之演算法設計與硬體實作, 臺灣大學電子工程學研究所學位論文, pp. 1-93, 2012.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top