( 您好!臺灣時間:2021/05/17 03:35
字體大小: 字級放大   字級縮小   預設字形  


論文名稱(外文):Investigation of Sequence Alignment by Using Hardware Operation
指導教授(外文):Chi-Chuan Hwang
外文關鍵詞:sequence alignmentbloom filterhardware operationdynamic programming
  • 被引用被引用:0
  • 點閱點閱:35
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Due to the rapid advances in biotechnology, more and more biological sequence is built in database, in which there are tens of billions pairs of DNA sequence in the library. Therefore it is an important issue to properly and quickly search this much sequence in database. In bioinformatics study sequence alignment is the most important research tool, which can compare and analyze the similarity between two or more sequences in the sequence alignment. The lengths of the two sequences are usually quite long, so the processing takes much time. Many algorithms have been found out to reduce processing time. In the thesis, we focus on constructing data processing by performing the hardware-based computing simulation. First, the reference sequence uses the hash function to establish Bloom filter for the storage of bases. Second, candidate position is used to generate algorithm to find the best candidate for the position, which can reduce the process. Finally, dynamic programming algorithm is used to compute the similarity scores and find the highest alignment to complete sequence alignment.
中文摘要 I
Abstract II
誌謝 VII
目錄 IX
圖目錄 XII
第一章 緒論 1
1-1 研究背景 1
1-2 研究動機 4
1-3 研究目的 7
1-4 本文架構 10
第二章 序列比對相關研究 11
2-1 編輯距離 11
2-2 Sequence Alignment 13
2-3 演算法種類 14
2-4 布洛斯-惠勒轉換演算法 16
2-4-1 後綴陣列 16
2-4-2 布洛斯-惠勒轉換編解碼 19
第三章 最佳路徑分析與候選位置對應產生 22
3-1 求解最佳路徑 22
3-1-1 動態規劃演算法 22
3-1-2 最佳路徑分析 26
3-2 索引表及候選位置對應演算法設計 29
3-2-1 標準的布隆過濾器 29
3-2-2 最佳化hash函式個數 31
3-2-3 布隆過濾器建立 33
3-2-4 候選位置對應演算法 36
第四章 實驗結果 39
4-1 模擬流程 39
4-2 模擬之結果 40
第五章 結論與未來展望 46
5-1 結論 46
5-2 未來展望 46
參考文獻 48
[1]F. Baluška, D. Volkmann, and P. W. Barlow, Cell bodies in a cage, Nature, vol. 428, pp. 371-371, 2004.
[2]D. Keilin, The Leeuwenhoek Lecture: The problem of anabiosis or latent life: History and current concept, Proceedings of the Royal Society of London. Series B, Biological Sciences, vol. 150, pp. 149-191, 1959.
[3]I. K. Vasil, A history of plant biotechnology: from the cell theory of Schleiden and Schwann to biotech crops, Plant cell reports, vol. 27, pp. 1423-1440, 2008.
[4]F. H. Crick, The biological replication of macromolecules, in Symp. Soc. Exp. Biol, 1958, pp. 138-163.
[5]张春霆, 生物信息学的现状与展望, 世界科技研究与发展, vol. 22, pp. 17-20, 2000.
[6]F. Sanger, S. Nicklen, and A. R. Coulson, DNA sequencing with chain-terminating inhibitors, Proceedings of the National Academy of Sciences, vol. 74, pp. 5463-5467, 1977.
[7]A. M. Maxam and W. Gilbert, A new method for sequencing DNA, Proceedings of the National Academy of Sciences, vol. 74, pp. 560-564, 1977.
[8]解增言, 林俊华, 谭军, and 舒坤贤, DNA 测序技术的发展历史与最新进展, 生物技术通报, vol. 8, pp. 64-70, 2010.
[9]E. R. Mardis, The impact of next-generation sequencing technology on genetics, Trends in genetics, vol. 24, pp. 133-141, 2008.
[10]权威 and 王亚东, 基于新一代测序数据的比对算法的研究, 智能计算机与应用, vol. 2, pp. 39-41, 2012.
[11]A. J. Gibbs and G. A. McIntyre, The diagram, a method for comparing sequences, European Journal of Biochemistry, vol. 16, pp. 1-11, 1970.
[12]S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool, Journal of molecular biology, vol. 215, pp. 403-410, 1990.
[13]D. J. Lipman and W. R. Pearson, Rapid and sensitive protein similarity searches, Science, vol. 227, pp. 1435-1441, 1985.
[14]A. Apostolico and Z. Galil, Pattern matching algorithms, 1997.
[15]G. A. Stephen, String searching algorithms vol. 3: World Scientific, 1994.
[16]P. R. Gray, P. Hurst, R. G. Meyer, and S. Lewis, Analysis and design of analog integrated circuits: Wiley, 2001.
[17]C. Mead and L. Conway, Introduction to VLSI systems vol. 1080: Addison-Wesley Reading, MA, 1980.
[18]W. J. Masek and M. S. Paterson, A faster algorithm computing string edit distances, Journal of Computer and System sciences, vol. 20, pp. 18-31, 1980.
[19]V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, in Soviet physics doklady, 1966, pp. 707-710.
[20]J. Setubal and J. Meidanis, Sequence comparison and database search, Introduction to computational molecular biology, 1997.
[21]T. F. Smith and M. S. Waterman, Comparison of biosequences, Advances in applied mathematics, vol. 2, pp. 482-489, 1981.
[22]X. Huang and W. Miller, A time-efficient, linear-space local similarity algorithm, Advances in Applied Mathematics, vol. 12, pp. 337-357, 1991.
[23]S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic acids research, vol. 25, pp. 3389-3402, 1997.
[24]S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol. 48, pp. 443-453, 1970.
[25]O. Gotoh, An improved algorithm for matching biological sequences, Journal of molecular biology, vol. 162, pp. 705-708, 1982.
[26]P. Klus, S. Lam, D. Lyberg, M. S. Cheung, G. Pullan, I. McFarlane, et al., BarraCUDA-a fast short read sequence aligner using graphics processing units, BMC research notes, vol. 5, p. 27, 2012.
[27]H. Li and R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics, vol. 25, pp. 1754-1760, 2009.
[28]H. Li and R. Durbin, Fast and accurate long-read alignment with Burrows–Wheeler transform, Bioinformatics, vol. 26, pp. 589-595, 2010.
[29]B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome biol, vol. 10, p. R25, 2009.
[30]R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, et al., SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics, vol. 25, pp. 1966-1967, 2009.
[31]H. Li and N. Homer, A survey of sequence alignment algorithms for next-generation sequencing, Briefings in bioinformatics, vol. 11, pp. 473-483, 2010.
[32]W. R. Pearson and W. Miller, [27] Dynamic programming algorithms for biological sequence comparison, Methods in enzymology, vol. 210, pp. 575-601, 1992.
[33]E. Vermij, Genetic sequence alignment on a supercomputing platform, TU Delft, Delft University of Technology, 2011.
[34]B. H. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, vol. 13, pp. 422-426, 1970.
[35]Y. Huayun and G. Jihong, Bloom Filter 研究进展, 电信科学, vol. 26, pp. 31-36, 2010.
[36]A. Rajaraman, J. D. Ullman, and 王斌, 大数据: 互联网大规模数据挖掘与分布式处理, ed: 北京: 人民邮电出版社, 2012.
[37]M. Mitzenmacher, Compressed bloom filters, IEEE/ACM Transactions on Networking (TON), vol. 10, pp. 604-612, 2002.
[38]張柏毅, 以短基因片段重組基因組之演算法設計與硬體實作, 臺灣大學電子工程學研究所學位論文, pp. 1-93, 2012.
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top