跳到主要內容

臺灣博碩士論文加值系統

(44.192.20.240) 您好!臺灣時間:2024/02/24 00:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:盧瑞鵬
研究生(外文):Jui Peng Lu
論文名稱:DNA序列重組
論文名稱(外文):DNA Sequence Assembly
指導教授:李家同李家同引用關係
指導教授(外文):R. C. T. Lee
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:59
中文關鍵詞:DNA序列重組蛋白質序列重組字串比對超字串亂數切割
外文關鍵詞:DNA Sequence AssemblyProtein Sequence AssemblyDouble Digest ProblemString MatchingSuperstringShotgun Method
相關次數:
  • 被引用被引用:1
  • 點閱點閱:323
  • 評分評分:
  • 下載下載:29
  • 收藏至我的研究室書目清單書目收藏:1
在本篇論文中,我們將討論兩個在計算生物學上問題。第一個問題是DNA序列重組 (DNA sequence assembly),第二個問題是2-matching double digest problem。針對第一個問題,我們使用亂數切割的方法將一個DNA序列切成一堆較短的片段,並藉由這些較短的片段重新還原成原始的DNA序列。在過去,DNA序列重組的問題常被視為最短超字串的問題 (shortest superstring problem),我們認為這個想法裡有兩個瑕疵。首先,我們可以很容易的舉出一個最短超字串並非原始DNA序列的例子。第二,因為最短超字串的問題是很典型的NP-hard的問題,所以我們會認為DNA序列重組也是一個相當困難的問題。事實上,輸入的資料是具有一些特殊的特性。由於這些特性,使得DNA序列重組並不能視為最短超字串的問題。在本篇論文中,我們將提出一個演算法來解決DNA序列重組。我們利用19個不同的DNA序列當作實驗對象。在實驗結果中,我們的方法皆可以成功的還原原始的DNA序列。本演算法還可以應用在蛋白質序列(protein sequences)重組上,實驗結果顯示,我們的方法可以成功的還原蛋白質序列。
在1987年,double digest problem已經被證明為NP-hard的問題 [GW87]。在本篇論文中,我們將提出一個特殊版本的double digest problem。我們會針對這個特殊版本的double digest problem的特性提出一個演算法來幫助我們找到可行解 (feasible solution)。我們要強調對於double digest problem 限制是非常合理的限制。
In this thesis, we consider two problems: the sequence reassembly problem and the 2-matching double digest problem which are popular research topics in computational biology. For the first problem, we use a shotgun to break a long DNA sequence into small fragments at least twice and later reassembly these fragments into a long sequence which covers all of the produced fragments. Traditionally, this problem is considered as a shortest superstring problem. We point out two flaws in this line of thinking. First of all, we can easily show an example that a shortest superstring may not be the original sequence at all. Secondly, it has been often pointed out that our sequence reassembly problem is difficult because the shortest superstring problem is NP-hard. This is not correct because the input data of our problem have some special properties. Because of these special properties, our problem is not equivalent to the shortest superstring problem. An algorithm was proposed in this thesis. Experimental results show that our approach is both efficient and feasible. We totally tested our algorithm on 19 DNA sequences and we successfully reconstructed the sequences in all cases. In the most difficult case, a string with 1852441 base pairs was cut into 1793 fragments. Our program reconstructed the original string in 945 seconds. This algorithm can also be applied in the reconstruction of protein sequences. We selected some protein sequences in NCBI to perform our method and all of them were reconstructed by our method successfully.
The double digest problem was proved to be NP-complete [GW87]. In this thesis, we proposed a special version of the problem. We proved some useful characteristics of this special problem and proposed an algorithm to find a feasible solution of the problem. This algorithm had also been implemented and we also designed a visual displaying tool to display the results. We like to point out that although we made constraints on the double digest problem, our constraints are still quite reasonable.
List of Figures VI
List of Tables VII
Chapter 1 Introduction 1
1.1 The Introduction of the DNA Sequence Assembly Problem 1
1.2 The Introduction of the Double Digest Problem 1
1.3 The Framework 2
Chapter 2 Reviews 1
2.1 Superstring and DNA Sequence Assembly 1
2.2 The Reviews of Double Digest Problem 4
Chapter 3 The DNA Sequence Assembly Problem 7
3.1 Preliminaries of DNA Sequence Assembly Problem 7
3.2 Basic Ideas of Our Algorithm 8
3.3 An Algorithm for DNA Sequence Assembly 12
3.4 Applying to Protein Sequences 17
3.5 Experimental Results 18
Chapter 4 The 2-Matching Double Digest Problem 25
4.1 Preliminaries of 2-Matching Double Digest Problem 25
4.2 An Algorithm for 2-Matching Double Digest Problem 28
4.3 Experimental Results 36
Chapter 5 Conclusions and Future Works 44
5.1 Conclusions 44
5.2 Future work 44
Bibliography 45
[AS96] A 2 2/3 Superstring Approximation Algorithm, Armem, C. and Stein, C., in Proceedings Combinatorial Pattern Matching; Lecture Notes in Comput. Sci. 1075, Springer-Verlag, Berlin, 1996, pp. 87-101.
[B88] Construction of Restriction Maps, Bellon, B., CABIOS, Vol. 4, 1988, pp. 111-115.
[BJJ97] Rotations of Periodic Strings and Short Superstrings, Breslauer, D., Jiang, T. and Jiang, Z., Journal of Algorithms, Vol. 24, 1997, pp. 340-353.
[BJLTY94] Linear Approximation of Shortest Superstrings, Blum, A., Jiang, T., Li, M., Tromp, J. and Yannakakis, M., Journal of the ACM, Vol. 41, 1994, pp. 630-647.
[DB84] An Efficient Program to Construct Restriction Maps from Experimental Data with Realistic Error Levels, Durand, R., and Bregegere, F., Nucleic Acids Research, Vol. 12, 1984, pp. 703-716.
[DK88] Errors Between Sites in Restriction Site Mapping, Dix, T. I., and Kieronska, D. H., CABIOS, Vol. 4, 1988, pp. 117-122.
[FSR83] Mapping the Order of DNA Restriction Fragments, Fitch, W. M., Smith, T. F., and Ralph, W. W., Gene, Vol. 22, 1983, pp. 19-29.
[GMS80] On Finding Minimal Length Superstring, Gallant, J., Maier, D. and Storer, J., Journal of Computer and System Sciences, Vol. 20, 1980, pp. 50-58.
[GW87] Mapping DNA by Stochastic Relaxation, Goldstein, L. and Waterman, M. S., Advances in Applied Mathematics, Vol. 8, 1987, pp. 194-207.
[GM90] Mapping DNA by Stochastic Relaxation : a New Approach to Fragment Sizes, Grigorjev, A. V. and Mironov, A. A., CABIOS, Vol. 6, 1990, pp. 107-111.
[GM91] Mapping DNA by Stochastic Relaxation : a Schedule for Optimal Annealing, Grigorjev, A. V. and Mironov, A. A., Journal of DNA Mapping and Sequencing, Vol. 1, 1991, pp. 221-226.
[HAY90] Restriction Site Mapping for Three or More Enzymes, Ho, S. T. S., Allison, L. and Yee, C. N., CABIOS, Vol. 6, 1990, pp. 195-204.
[HM99] CAP3: A DNA Sequence Assembly Program, Huang, X. and Madan, A., Genome Research, Vol. 9, 1999, pp. 868-877.
[KM95] Combinatorial Algorithms for DNA Sequence Assembly, Kececioglu, J. D. and Myers, E. W., Algorithmica, Vol. 13, 1995, pp. 7-51.
[KS99] AMASS: A Structured Pattern Matching Approach to Shotgun Assembly, Kim, S. and Segre, A. M., Journal of Computational Biology, Vol. 6, 1999, pp. 163-186.
[NMS84] Plasmid Mapping Computer Program, Nolan, G. P., Maina, C. V. and Szalay, A. A., Nucleic Acids Research, Vol. 12, 1984, pp. 717-729.
[OS2002] A Divide-and-Conquer Approach to Fragment Assembly, Otu, H. H. and Sayood, K., Bioinformatics, Vol. 19, 2002, pp. 22-29.
[P95] DNA Physical Mapping and Alternating Eulerian Cycles in Colored Graphs, Pevzner, P. A., Algorithmica, Vol. 13, 1995, pp. 77-105.
[PM87] An Efficient Method for Physical Mapping of DNA Molecules, Pevzner, P. A. and Mironov, A. A., Molecular Biology, Vol. 21, 1987, pp. 788-796.
[PTW2001] An Eulerian Path Approach to DNA Fragment Assembly, Pevzner, P., Tang, H. and Waterman, M. S., Proceedings of the National Academy of Sciences USA, Vol. 98, 2001, pp. 9748-9753.
[S78] Inferfing DNA Structure from Segmentation Data, Stefik, M., Artificial Intelligence, Vol. 11, 1978, pp. 85-114.
[S99] A 2 1/2 - Approximation Algorithm for Shortest Superstring, Sweedyk, Z., SIAM Journal of Computing, Vol. 29, 1999, pp. 954-986.
[TDMH88] Restriction Map Construction Using a “Complete Sentences Compatibility” Algorithm, Tuffery, P., Dessen, P., Mugnier, C. and Hazout, S., CABIOS, Vol. 4, 1988, pp. 103-110.
[TU88] A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings, Tarhio, J. and Ukkonen, E., Theoretical Computer Science, Vol. 57, 1988, pp. 131-145.
[U90] A Linear-Time Algorithm for Finding Approximate Shortest Common Superstrings, Ukkonen, E., Algorithmica, Vol. 5, 1990, pp. 313-323.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top