(3.231.166.56) 您好!臺灣時間:2021/03/08 12:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳慧敏
研究生(外文):Hui-Min Chen
論文名稱:使用編碼技術解決完整字串比對問題
論文名稱(外文):An Exact String Matching Problem Using Data Encoding Scheme
指導教授:李家同李家同引用關係
指導教授(外文):R.C.T. Lee
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:50
中文關鍵詞:字串比對滑動視窗編碼方法
外文關鍵詞:string matchingsliding windowencoding method
相關次數:
  • 被引用被引用:0
  • 點閱點閱:156
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:18
  • 收藏至我的研究室書目清單書目收藏:0
傳統的字串比對問題是在一個較長的字串T中找出一個給定字串P所有發生的位置。在本篇論文中,我們提出一個可以在O(m+n)時間內解決的新編碼技術,其技術藉由取代由特定字元所包圍的子字串的內容為其長度來縮短原本字串的長度。然後再使用字串比對演算法在其編碼後的字串上比對。在本篇論文中可以看到,經過編碼後的字串T和P可以縮短到原本長度的2/|Σ|倍。在實際上,縮短長度比2/|Σ|還要好。例如:在其字串P長度為50,字串T長度為200000的英文文章中,字串P平均縮短為原本長度的6%,字串T平均縮短為原本長度的12.4%。因此,字串的比對問題可以在更短時間內被解決。
The traditional exact string matching problem is to find all locations of a pattern string with length m in a text with length n. Here we propose a new encoding method to shorten the both lengths of pattern and text by substituting the substring between a special character for its length in O(m+n). Then we use an exact matching algorithm to solve the exact string matching problem on the encoding pattern and text. As can be seen、by using the encoding method、the pattern and text can be shortened about 2/|Σ| times the lengths of the original ones. In practice、it performs better than 2/|Σ|. For instance、for an English sentence pattern whose length is 50 and a text whose length is 200000、in average、the pattern is shortened to 6% of its original length and the text is shortened to 12.4% of its original length. Thus、the exact matching can be done in a much shorter time.
Chapter 1 Introduction 1
1.1 Motivations 1
1.2 Definitions 3
1.3 Thesis Organization 3
Chapter 2 String Matching Algorithms 4
2.1 Brute Force Method 4
2.2 Horspool Algorithm 6
2.3 The Quick Search Algorithm 9
Chapter 3 The Encoding Method 13
3.1 Run-length Algorithm 13
Chapter 4 Data Encoding Scheme for Exact String Matching Problem 15
4.1 Basic Idea 15
4.2 The Searching Phase 23
4.3 The Examination Phase 29
4.4 The Data Encoding Algorithm 32
4.5 Analysis 33
Chapter 5 Discussion 36
Chapter 6 Experiments 40
6.1 DNA Type Data 41
6.2 Natural Language Data 43
Chapter 7 Conclusion and Further Work 45
7.1 Conclusion 45
7.2 Further Work 45
Bibliography 46
[A90]Algorithms for finding patterns in strings、Aho、A.V.、in Handbook of Theoretical Computer Science、Volume A、Algorithms and complexity、1990、pp 255-300.

[AB92] Two-dimensional periodicity and its application、Amir、A. and Benson、G.、Proc. 3rd ACM-SIAM Symposium on Discrete Algorithms、1992.

[AB92] Efficient two-dimensional compressed matching、Amir、A. and Benson、G.、Data Compression Conference、1992、pp. 279-288.

[ABL99] Real scaled matching、Amir、A.、Butman、A. and Lewenstein、M.、Information Processing Letters、Vol. 70、1999、pp. 185–190.

[AHS97] Molecular analysis of a Penicillium chrysogenum GATA factor encoding gene (sreP) exhibiting significant homology to the Ustilago maydis urbs1 gene、
Haas、H.、Angermayr、K. and Stoffler、G.、Gene、Vol. 184、1997、pp. 33-37.

[ALS97] Matching for run-length encoded strings、Apostolico、A.、Landau、G.M. and Skiena、S.、Journal of Complexity、Vol. 15、1997、pp. 4-16.

[ALM2002] Edit distance of run-length encoded strings、Arbell、O.、Landau、G.M. and Mitchell、J.S.B.、Information Processing Letters、Vol. 83、2002、pp.307-314.

[AM2000] Maximal Codeword Lengths in Huffman Codes、Abu-Mostafa、Y.S. and McEliece、R.J.、Computers and Mathematics with Applications、Vol. 39、2000、pp. 129-134.

[B77] Two dimensional pattern matching、Bird、R.S.、Information Processing Letters、Vol. 6、1977、pp.168-170.

[B78] A technique for extending rapid exact-match string matching to arrays of more than one dimension、Baker、T.P.、SIAM J. Comput.、Vol. 7、1978、pp. 533-541.

[BBC92] Elements d'algorithmique、Beauquier、D.、Berstel、J. and Chretienne、P.、1992、Chapter 10、pp 337-377、Masson、Paris.

[BEL2004] Scaled and permuted string matching、Butman、A.、Eres、R.、Landau、G.M.、Information Processing Letters、Vol. 92、2004、pp. 293–297.

[BG92] A new approach to text searching、Baeza-yates、R.A. and Gonnet、G.H.、Communication of the ACM、Vol. 35、No. 10、1992、pp. 74-82.

[BM77] A fast string search algorithm、Boyer、R. S. and Moore、J. S.、Comm. ACM、20、1977、pp. 762–772.

[BR92] Average running time of the Boyer-Moore-Horspool algorithm、Baeza-yates、R.A. and Regnier、M.、Theoretical Computer Science、Vol. 92、No. 1、1992、pp.19-31.

[CH2000] A new lossless compression scheme based on Huffman coding scheme for image compression、Hu、Y.C. and Chang、C.C.、Signal Processing: Image Communication、Vol. 16、2000、pp. 367-372.

[CHMV2005] Expression and regulation of the atrial natriuretic factor encoding gene Nppa during development and disease、Houweling、A.C.、van Borren、M.M.、Moorman、A.F. and Christoffels、V.M.、Cardiovascular Research、Vol. 67、2005、pp. 583-593.

[CLP98] A very fast string matching algorithm for small alphabets and long patterns、CHARRAS、C.、LECROQ、T. and PEHOUSHEK、J.D.、Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching、1998、M. Farach-Colton ed.、Piscataway、New Jersey、Lecture Notes in Computer Science 1448、pp. 55-64.

[CTJ98] Very Fast String Matching Algorithm for Small Alphabets and Long Patterns、Christian、C.、Thierry、L. and Joseph、D.P.、Lecture Notes in Computer Science、Vol. 1448、1998、pp. 55-64.

[CY97] Vectorizations of randomized matching for run-length coded strings、Chung、K.L. and Yan、W.M.、Pattern Recognition Letters、Vol. 18、1997、pp. 63-72.

[H80] Practical fast searching in strings、HORSPOOL、R.N.、Software - Practice and Experience、Vol. 10、1980、pp. 501-506.

[HO95] A Data Compression Scheme for Chinese Text Files Using Huffman Coding and a Two-Level Dictionary、Ong、G.H and Huang、S.Y.、Information Sciences、Vol. 84、1995、pp. 85-99.

[HS91] Fast string searching 、HUME、A. and SUNDAY、D.M.、Software - Practice & Experience、Vol. 21、1991、pp. 1221-1248.

[KMP77] Fast pattern matching in strings、KNUTH、D.E.、MORRIS、J.H. and PRATT、V.R.、SIAM Journal on Computing、Vol. 6、No. 2、1977、pp.323-350.

[KS87] Efficient two-dimensional pattern matching in the presence of errors、Krithivasan、K. and Sitalakshmi、R.、Information Sciences、Vol. 43、1987、pp. 169-184.

[LWL2008] Finding a longest common subsequence between a run-length-encoded string and an uncompressed string、Liu、J.J.、Wang、Y.L. and Lee、R.C.T.、Journal of Complexity、Vol. 24、2008、pp. 173-184.

[LZ2005] An efficient chain code with Huffman coding、Liu、Y.K. and Zalik、B.、Pattern Recognition、Vol. 38、2005、pp. 553-557.

[MNU2003] Approximate matching of run-length compressed strings、Makinen、V.、Navarro、G. and Ukkonen、E.、Algorithmica、Vol. 35、2003、pp. 347-369.

[R92] Tuning the Boyer-Moore-Horspool string searching algorithm、RAITA、T.、Software - Practice & Experience、Vol. 22、No. 10、1992、pp. 879-884.

[S2000] Introduction to Data Compression、Sayoood、K.、second edition、Morgan Kaufmann、2000.

[S90] A very fast substring search algorithm、Sunday、D.M.、Communications of the ACM、Vol. 33、No. 8、1990、pp. 132-142.

[S91] Experiments with a very fast substring search algorithm、SMITH、P.D.、Software - Practice & Experience、Vol. 21、No. 10、1991、pp. 1065-1074.

[S93] String matching algorithms and automata、SIMON、I.、1st American Workshop on String Processing、1993、pp 151-157.

[ZT87] On improving the average case of the Boyer-Moore string matching algorithm、ZHU、R.F. and TAKAOKA、T.、Journal of Information Processing、Vol. 10、No. 3、1987、pp. 173-177 .
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔