# 臺灣博碩士論文加值系統

(44.192.20.240) 您好！臺灣時間：2024/02/24 00:56

:::

### 詳目顯示

:

• 被引用:1
• 點閱:323
• 評分:
• 下載:29
• 書目收藏:1
 在本篇論文中，我們將討論兩個在計算生物學上問題。第一個問題是DNA序列重組 (DNA sequence assembly)，第二個問題是2-matching double digest problem。針對第一個問題，我們使用亂數切割的方法將一個DNA序列切成一堆較短的片段，並藉由這些較短的片段重新還原成原始的DNA序列。在過去，DNA序列重組的問題常被視為最短超字串的問題 (shortest superstring problem)，我們認為這個想法裡有兩個瑕疵。首先，我們可以很容易的舉出一個最短超字串並非原始DNA序列的例子。第二，因為最短超字串的問題是很典型的NP-hard的問題，所以我們會認為DNA序列重組也是一個相當困難的問題。事實上，輸入的資料是具有一些特殊的特性。由於這些特性，使得DNA序列重組並不能視為最短超字串的問題。在本篇論文中，我們將提出一個演算法來解決DNA序列重組。我們利用19個不同的DNA序列當作實驗對象。在實驗結果中，我們的方法皆可以成功的還原原始的DNA序列。本演算法還可以應用在蛋白質序列(protein sequences)重組上，實驗結果顯示，我們的方法可以成功的還原蛋白質序列。 在1987年，double digest problem已經被證明為NP-hard的問題 [GW87]。在本篇論文中，我們將提出一個特殊版本的double digest problem。我們會針對這個特殊版本的double digest problem的特性提出一個演算法來幫助我們找到可行解 (feasible solution)。我們要強調對於double digest problem 限制是非常合理的限制。
 In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragments at least twice and later reassembly these fragments into a long sequence which covers all of the produced fragments. Traditionally, this problem is considered as a shortest superstring problem. We point out two flaws in this line of thinking. First of all, we can easily show an example that a shortest superstring may not be the original sequence at all. Secondly, it has been often pointed out that our sequence reassembly problem is difficult because the shortest superstring problem is NP-hard. This is not correct because the input data of our problem have some special properties. Because of these special properties, our problem is not equivalent to the shortest superstring problem. An algorithm was proposed in this thesis. Experimental results show that our approach is both efficient and feasible. We totally tested our algorithm on 19 DNA sequences and we successfully reconstructed the sequences in all cases. In the most difficult case, a string with 1852441 base pairs was cut into 1793 fragments. Our program reconstructed the original string in 945 seconds. This algorithm can also be applied in the reconstruction of protein sequences. We selected some protein sequences in NCBI to perform our method and all of them were reconstructed by our method successfully. The double digest problem was proved to be NP-complete [GW87]. In this thesis, we proposed a special version of the problem. We proved some useful characteristics of this special problem and proposed an algorithm to find a feasible solution of the problem. This algorithm had also been implemented and we also designed a visual displaying tool to display the results. We like to point out that although we made constraints on the double digest problem, our constraints are still quite reasonable.
 List of Figures VI List of Tables VII Chapter 1 Introduction 1 1.1 The Introduction of the DNA Sequence Assembly Problem 1 1.2 The Introduction of the Double Digest Problem 1 1.3 The Framework 2 Chapter 2 Reviews 1 2.1 Superstring and DNA Sequence Assembly 1 2.2 The Reviews of Double Digest Problem 4 Chapter 3 The DNA Sequence Assembly Problem 7 3.1 Preliminaries of DNA Sequence Assembly Problem 7 3.2 Basic Ideas of Our Algorithm 8 3.3 An Algorithm for DNA Sequence Assembly 12 3.4 Applying to Protein Sequences 17 3.5 Experimental Results 18 Chapter 4 The 2-Matching Double Digest Problem 25 4.1 Preliminaries of 2-Matching Double Digest Problem 25 4.2 An Algorithm for 2-Matching Double Digest Problem 28 4.3 Experimental Results 36 Chapter 5 Conclusions and Future Works 44 5.1 Conclusions 44 5.2 Future work 44 Bibliography 45
 [AS96] A 2 2/3 Superstring Approximation Algorithm, Armem, C. and Stein, C., in Proceedings Combinatorial Pattern Matching; Lecture Notes in Comput. Sci. 1075, Springer-Verlag, Berlin, 1996, pp. 87-101.[B88] Construction of Restriction Maps, Bellon, B., CABIOS, Vol. 4, 1988, pp. 111-115.[BJJ97] Rotations of Periodic Strings and Short Superstrings, Breslauer, D., Jiang, T. and Jiang, Z., Journal of Algorithms, Vol. 24, 1997, pp. 340-353.[BJLTY94] Linear Approximation of Shortest Superstrings, Blum, A., Jiang, T., Li, M., Tromp, J. and Yannakakis, M., Journal of the ACM, Vol. 41, 1994, pp. 630-647.[DB84] An Efficient Program to Construct Restriction Maps from Experimental Data with Realistic Error Levels, Durand, R., and Bregegere, F., Nucleic Acids Research, Vol. 12, 1984, pp. 703-716.[DK88] Errors Between Sites in Restriction Site Mapping, Dix, T. I., and Kieronska, D. H., CABIOS, Vol. 4, 1988, pp. 117-122.[FSR83] Mapping the Order of DNA Restriction Fragments, Fitch, W. M., Smith, T. F., and Ralph, W. W., Gene, Vol. 22, 1983, pp. 19-29.[GMS80] On Finding Minimal Length Superstring, Gallant, J., Maier, D. and Storer, J., Journal of Computer and System Sciences, Vol. 20, 1980, pp. 50-58.[GW87] Mapping DNA by Stochastic Relaxation, Goldstein, L. and Waterman, M. S., Advances in Applied Mathematics, Vol. 8, 1987, pp. 194-207.[GM90] Mapping DNA by Stochastic Relaxation : a New Approach to Fragment Sizes, Grigorjev, A. V. and Mironov, A. A., CABIOS, Vol. 6, 1990, pp. 107-111.[GM91] Mapping DNA by Stochastic Relaxation : a Schedule for Optimal Annealing, Grigorjev, A. V. and Mironov, A. A., Journal of DNA Mapping and Sequencing, Vol. 1, 1991, pp. 221-226.[HAY90] Restriction Site Mapping for Three or More Enzymes, Ho, S. T. S., Allison, L. and Yee, C. N., CABIOS, Vol. 6, 1990, pp. 195-204.[HM99] CAP3: A DNA Sequence Assembly Program, Huang, X. and Madan, A., Genome Research, Vol. 9, 1999, pp. 868-877.[KM95] Combinatorial Algorithms for DNA Sequence Assembly, Kececioglu, J. D. and Myers, E. W., Algorithmica, Vol. 13, 1995, pp. 7-51.[KS99] AMASS: A Structured Pattern Matching Approach to Shotgun Assembly, Kim, S. and Segre, A. M., Journal of Computational Biology, Vol. 6, 1999, pp. 163-186.[NMS84] Plasmid Mapping Computer Program, Nolan, G. P., Maina, C. V. and Szalay, A. A., Nucleic Acids Research, Vol. 12, 1984, pp. 717-729.[OS2002] A Divide-and-Conquer Approach to Fragment Assembly, Otu, H. H. and Sayood, K., Bioinformatics, Vol. 19, 2002, pp. 22-29.[P95] DNA Physical Mapping and Alternating Eulerian Cycles in Colored Graphs, Pevzner, P. A., Algorithmica, Vol. 13, 1995, pp. 77-105.[PM87] An Efficient Method for Physical Mapping of DNA Molecules, Pevzner, P. A. and Mironov, A. A., Molecular Biology, Vol. 21, 1987, pp. 788-796.[PTW2001] An Eulerian Path Approach to DNA Fragment Assembly, Pevzner, P., Tang, H. and Waterman, M. S., Proceedings of the National Academy of Sciences USA, Vol. 98, 2001, pp. 9748-9753.[S78] Inferfing DNA Structure from Segmentation Data, Stefik, M., Artificial Intelligence, Vol. 11, 1978, pp. 85-114.[S99] A 2 1/2 - Approximation Algorithm for Shortest Superstring, Sweedyk, Z., SIAM Journal of Computing, Vol. 29, 1999, pp. 954-986.[TDMH88] Restriction Map Construction Using a “Complete Sentences Compatibility” Algorithm, Tuffery, P., Dessen, P., Mugnier, C. and Hazout, S., CABIOS, Vol. 4, 1988, pp. 103-110.[TU88] A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings, Tarhio, J. and Ukkonen, E., Theoretical Computer Science, Vol. 57, 1988, pp. 131-145.[U90] A Linear-Time Algorithm for Finding Approximate Shortest Common Superstrings, Ukkonen, E., Algorithmica, Vol. 5, 1990, pp. 313-323.
 電子全文
 國圖紙本論文
 推文當script無法執行時可按︰推文 網路書籤當script無法執行時可按︰網路書籤 推薦當script無法執行時可按︰推薦 評分當script無法執行時可按︰評分 引用網址當script無法執行時可按︰引用網址 轉寄當script無法執行時可按︰轉寄

 1 隨機最佳化方法於生物資訊科技之應用--以螞蟻演算法應用於最短超字串問題為例 2 成對DNA序列重組問題 3 系統生物資訊運算:DNA序列重組最佳化

 無相關期刊

 1 植基於第一漸增子序列之轉位排序問題 2 用特徵向量及傅葉爾轉換決定DNA序列的相似性 3 適性化自我學習之超媒體課程設計:以英文文法課程為個案研究 4 可調變式的L-System 5 植基於霍夫碼設計之H.263+視訊加密演算法之研究 6 針對網頁式多媒體同步教材之有效瀏覽方法 7 SMIL2.0簡報之資料擷取引擎設計 8 以折反射取像系統進行對焦測距 9 以攝影機自動定位作完整物面三維掃描的錯誤更正碼之探討 10 群聚多重序列排對的解決:區塊排對 11 A*演算法在序列比對問題上之應用 12 符號邏輯於基因調控系統之應用 13 金字塔網路的寬直徑 14 迴旋與在序列分析上之應用 15 利用序列比對演算法辨識抄襲之C程式作業

 簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室