跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.173) 您好!臺灣時間:2024/12/07 13:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林志杰
研究生(外文):Jyh-Jye Lin
論文名稱:線上實數比例字串搜尋
論文名稱(外文):On-Line Real Scaled Matching
指導教授:王炳豐
指導教授(外文):Biing-Feng Wang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2001
畢業學年度:89
語文別:英文
論文頁數:24
中文關鍵詞:字串比對
外文關鍵詞:String MatchingPattern MatchingScaled Pattern MatchingApproximate Pattern MatchingGeneralized Pattern Matching
相關次數:
  • 被引用被引用:0
  • 點閱點閱:157
  • 評分評分:
  • 下載下載:4
  • 收藏至我的研究室書目清單書目收藏:0
  令text T = t1 t2 …… tn和pattern P = p1 p2 …… pm代表兩個字串。『實數比例字串搜尋』是指找出所有P在T中出現的位置(含各種依照大於或等於1的實數比例放大的P字串)。
  『實數比例字串搜尋』是由電腦視訊處理所延伸出來的重要問題。在此之前,Amir [3] 等人已經提出了一個方法,可以在O(m + n) 的時間之內將這個問題解了出來。並且在同一篇論文之中,他們提出了一個延申的問題:我們可不可以把T先做一個處理,然後我們可以在O(m + tocc) 的時間之內找出所有字串P在字串T中有出現的位置(含各種依照大於或等於1的實數比例放大的P字串), 當中的tocc代表在T之中有多少個實數比例的P。這個問題我們稱之為『線上實數比例字串搜尋』。
  在這一篇碩士論文之中,我們對『線上實數比例字串搜尋』這個問題做了一些研究。從直覺上來看,因為這個問題可以接受任何大於或等於1的實數比例來放大,因此似乎可能有無限多種可能的pattern。如果真的有無限多種可能的pattern的話,則幾乎不可能存在一個演算法可以在多項式的時間之內把問題給解出來。很幸運地,在這一篇論文之中,我們證明了這樣的pattern最多只有O(n3)個。然後我們提供了一個方法,讓我們可以先花費O(n5)的時間來處理T之後,可以在O(m + tocc)的時間之內把在T之中所有實數比例的P的位置找出來。我們是第一個解答Amir這個問題的人。

  Let T = t1 t2 …… tn and P = p1 p2 …… pm be two strings. The pattern P matches T at position i if P is equal to the substring ti ti+1 ... ti+m—1. For any real number k ³ 1, let k × P denote the string obtained by proportionally enlarging P by a scale of k. We say that P real scaled matches T at position i if there exists a real scale k ³ 1 such that k × P matches T at position i. Moreover, we say that there is a real scaled occurrence of P in T at position I if P real scaled matches T at position i. The problem of real scaled matching refers to the finding of all real scaled occurrences of P in T.
  Real scaled matching is an important problem originally inspired by problems in the field of computer vision. Previously, Amir et al. efficiently solved it in O(m + n) time and asked the following open problem: “Can we efficiently preprocess T in a manner that will enable, upon inputting a pattern P, finding all real scaled occurrences of P in T in O(m + tocc) time, where tocc is the number of real scaled occurrences of P in T?” Such a problem is called the on-line real scaled matching.
  In this thesis, we study the on-line real scaled matching problem. Intuitively, since the scale k can be any real number ³ 1, it seems that there may be exponential patterns that can real scaled match T and thus impossible to have a polynomial-time preprocessing algorithm. Fortunately, in this thesis, we firstly derive a novel property from which we can conclude that there are at most O(n3) patterns that can real scaled match a given text T. Then, we give an O(n5)-time algorithm to preprocess T such that the finding of all real scaled occurrences of any pattern P can be done in O(m + tocc) time. Our result is the first solution to Amir et al.’s open problem.

Abstract ……………………………………………………………i
摘要…………………………………………………………………ii
Content……………………………………………………………iii
List of Figures …………………………………………………iv
List of Tables ……………………………………………………v
Chapter 1 Introduction …………………………………………1
Chapter 2 Preliminaries…………………………………………5
Chapter 3 Real Scaled Indexing Tree (RSIT) ………………8
  3.1 Digital Search Trees …………………………………8
  3.2 RSIT………………………………………………………10
  3.3 Upper Bound of the Size of RSIT …………………15
Chapter 4 Algorithms……………………………………………18
  4.1 Algorithm RSIT…………………………………………18
  4.2 Algorithm Search_the_Pattern………………………22
Chapter 5 Conclusion and Future Work………………………24
References…………………………………………………………25

1.A. V. Aho and M. J. Corasick, Efficient string matching, Communications of the Association for Computing Machinery, Vol.18, No.6, pp.333-340, 1975.
2.C. Allauzen and M. Raffinot, Simple Optimal String Matching Algorithm, Journal of Algorithms, Vol.36, pp.102-116, 2000
3.A. Amir, A. Butman, and M. Lewenstein, Real scaled matching, Information Processing Letters, Vol.70, pp.185-190, 1999.
4.A. Amir and G. Calinescu, Alphabet independent and dictionary scaled matching, in “Proceedings of 7th Annual Symposium on Combinatorial Pattern Matching (CPM 96),” pp.320-334, 1996.
5.A. Amir and M. Farach, Efficient 2-dimensional approximate matching of half-rectangular figures, Information and Computation, Vol.118, pp.1-11, 1995.
6.A. Amir and M. Farach, Adaptive dictionary matching, in “Proceedings of the 32nd IEEE FOCS,” pp.760-766, 1991.
7.A. Amir, M. Farach, R. Giancarlo, Z. Galil, and K. Park, Dynamic dictionary matching, Journal of Computer and System Sciences, Vol.49, pp.208-222, 1994.
8.A. Amir, M. Farach, R. M. Idury, J. A. La Poutre, and A. A. Schaffer, Improved dynamic dictionary matching, Information and Computation, Vol.119, pp.258-282, 1995.
9.A. Amir, M. Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching, Information Processing Letters, Vol.49, pp.111-115, 1994.
10.A. Amir and G. Landau, Fast parallel and serial multidimensional approximate array matching, Theoretic Computer Science, Vol.81, pp.97-115, 1991.
11.A. Amir, G. M. Landau, and U. Vishkin, Efficient pattern matching with scaling, Journal of Algorithms, Vol.13, pp.2-32, 1992.
12.A. Amir, D. Keselman, G. M. Landau, M. Lewenstein, N. Lewenstein, and M. Rodeh, Text indexing and dictionary matching with one error, Journal of Algorithms, Vol.37, pp.309-325, 2000.
13.A. Apostolico, Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings, Information Processing Letters, Vol.23, pp.63-69, 1986.
14.A. Apostolico, Remark on the Hsu-Du new algorithm for the longest common subsequence problem, Information Processing Letters, Vol.25, pp.235-236, 1987.
15.A. Apostolico, String editing and longest common subsequences, in “Handbook of Formal Languages”(G. Rozenberg and A. Salomaa, Eds.), Vol.2, pp.361-398, Springer, Berlin, 1997.
16.A. Apostolico, S. Browne, and C. Guerra, Fast linear-space computations of longest common subsequences, Theoretical Computer Science, Vol.92, pp.3-17, 1992.
17.A. Apostolico, and C. Guerra, A fast linear space algorithm for computing longest common subsequences, in “Proceedings of the 23rd Allerton Conference on Communication, Control and Computing,” Monticello, pp.76-84, 1985.
18.A. Apostolico, and C. Guerra, The longest common subsequence problem revisited, Algorithmica, Vol.2, pp.315-336, 1987.
19.R. A. Baeza-Yates and G. H. Gonnet, A new approach to text searching, Communications of the ACM, Vol.35, No.10, pp.74-82, 1992.
20.A. B. Borodin, M. J. Fischer, D. G. Kirkpatrick, N. A. Lynch, and M. Tompa, A time-space trade-off for sorting on non-oblivious machines, in “20th Annual Symposium on Foundations of Computer Science, San Juan, P. R., 1979,” pp.319-327, IEEE Computer Society, Long Beach, California, 1979.
21.R. S. Boyer and J. S. Moore, A fast string searching algorithm, Communications of the ACM, Vol.20, No.10, pp.762-772, 1977.
22.G. S. Brodal and L. Gasieniec, Approximate dictionary queries, in “Proceedings of 7th Annual Symposium on Combinatorial Pattern Matching (CPM 96),” pp.65-74, Lecture Notes in Computer Science, Vol.1075, Springer-Verlag, Berlin/New York, 1996.
23.M. T. Chen and J. Seiferas, Efficient and elegant subword tree construction, in “Combinatorial Algorithms on Words” (A. Apostolico and Z. Gaili, Eds.), Vol.12, Chap.12, pp.97-107, NATO ASI Series F: Computer and System Sciences, Springer-Verlag, Berlin, 1985.
24.T. H. Cormen, C. E. Leiserson, and R. L. Rivest, “Introduction to algorithms,” McGraw-Hill Book Company, 1990.
25.M. Crochemore, Transducers, and repetitions, Theoretic Computer Science, Vol.45, No.1, pp.63-86, 1986.
26.M. Crochemore, Constant-space string matching, in “Proceedings of the 8th Conference on Foundations of Software Technology and Theoretical Computer Science” (K. V. Nori and S. Kumar, Eds.), Lecture Notes in Computer Science, No.338, pp.80-87, Springer-Verlag, 1988.
27.M. Crochemore, L. Gasieniec, and W. Rytter, Constant-space string matching in sublinear average time, in “Compression and Complexity of Sequences” (B. Carpentieri, A. De Santis, U. Vaccaro, and J. A. Storer, Eds.), pp.230-239, IEEE Computer Society, Los Alamitos, CA, 1998.
28.M. Crochemore and W. Rytter, “Text Algorithm,” Oxford University Press, London, 1994.
29.A. Czumaj, M. Crochemore, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter, Speeding up two string-matching algorithms, Algorithmica, Vol.12, pp.247-267, 1994.
30.P. Duris and Z. Galil, Fooling a two way automation or One pushdown store is better than one counter for two way machines (preliminary version), in “Proceedings of the 13th Annual ACM Symposium on Theory of Computing, Milwaukee, Wis., 1981,” pp.177-188, New York, 1981.
31.T. Eilam-Tsoreff and U. Vishkin, Matching patterns in a string subject to multilinear transformations, in “Proceedings of International Workshop on Sequences, Combinatorics, Compression, Security and Transmission,” Salerno, Italy, June 1988.
32.M. Farach, Optimal suffix tree construction with large alphabets, in “Proceedings of the 38th IEEE Symposium on Foundations of Computer Science,” pp.137-143, 1997.
33.P. Ferragina and R. Grossi, Fast increment text editing, in “Proceedings of the 7th ACM-SIAM Symposium on Discrete Algorithms,” pp.531-540, 1995.
34.P. Ferragina and R. Grossi, Optimal on-line search and sublinear time update in string matching, SIAM Journal on Computing, Vol.27, pp.713-736, 1998.
35.P. Ferragina, S. Muthukrishnan, and M. deBerg, Multi-method dispatching: a geometric approach with applications to string matching, in “Proceedings of the 31st Annual Symposium on the Theory of Computing (STOC),” pp.483-491, 1999.
36.N. J. Fine and H. S. Wilf, Uniqueness theorems for periodic functions, in “Proceedings of American Mathematical Society,” Vol.16, No.1, pp.109-114, 1965.
37.M. J. Fischer and M. S. Paterson, String matching and other products, in “Proceedings of SIAM-AMS Conference on Complexity of Computation” (R. M. Karp, Ed.), pp.113-125, American Mathematical Society, Providence, R. I., 1974.
38.K. Fredriksson and E. Ukkonen, A rotation invariant filter for two-dimensional string matching, in “Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching (CPM 98),” Lecture Notes in Computer Science, Vol.1448, pp.118-125, Springer, Berlin, 1998.
39.Z. Galil, On improving the worst case running time of the Boyer-Moore string matching algorithm, Communications of the ACM, Vol.22, No.9, pp.505-508, 1979.
40.Z. Galil and R. Giancarlo, Parallel string matching with k mismatches, Theoretical Computer Science, Vol.51, pp.341-348, 1987.
41.Z. Galil and R. Giancarlo, Data structures and algorithms for approximate string matching, Journal of Complexity, Vol.4, pp33-72, 1988.
42.Z. Galil and J. Seiferas, Recognizing certain repetitions and reversals within strings, in “Proceedings of 17th Annual Symposium on Foundations of Computer Science, Houston, Texas, 1976,” pp.236-252, IEEE Computer Society, Long Beach, California, 1976.
43.Z. Galil and J. Seiferas, Saving space in fast string matching, SIAM Journal on Computing, Vol.9, No.2, pp.417-438, 1980.
44.Z. Galil and J. Seiferas. Linear-time string matching using only a fixed number of local storage locations, Theoretic Computer Science, Vol.13, No.3, pp.331-336, 1981.
45.Z. Galil and J. Seiferas, Time-space optimal string matching, Journal of Computer and System Sciences, Vol.26, No.3, pp.280-294, 1983.
46.L. Gasieniec, W. Plandowski, and W. Rytter, Constant-space string matching with smaller number of comparisons: Sequential sampling, in “Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching,” Espoo, Finland, (Z. Galil and E. Ukkonen, Eds.), Lecture Notes in Computer Science, No.937, pp.78-89, Springer-Verlag, Berlin, 1995.
47.H. Goeman and M. Clausen, A new practical linear space algorithm for the longest common subsequence problem, in “Proceedings of Prague Stringology Club Workshop’99,” Report DC-99-05, pp.40-60, Department of Computer Science and Engineering, Czech Technical University, 1999.
48.M. Gu, M. Farach, and R. Beigel, An efficient algorithm for dynamic text indexing, in “Proceedings of the 5th Annual ACM-SIAM Symposium on Discrete Algorithms,” pp.697-704, 1994.
49.D. S. Hirschberg, A linear-space algorithm for computing maximal common subsequences, Communications of the ACM, Vol.18, No.6, pp.341-343, 1975.
50.D. S. Hirschberg, Algorithms for the longest common subsequence problem, Journal of the ACM, Vol.24, pp.664-675, 1977.
51.D. S. Hirschberg, Serial computations of Levenshtein distances, in “Pattern Matching Algorithms”(A. Apostolico and Z. Galil, Eds.), pp.123-141, Oxford University Press, 1997.
52.E. Horowitz and S. Sahni, “Fundamentals of data structures in PASCAL fourth edition,” W. H. Freeman and Company, 1994.
53.W. J. Hsu and M. W. Du, New algorithms for the LCS problem, Journal of Computer and System Sciences, Vol.29, pp.133-152, 1984.
54.J. W. Hunt and T. G. Szymanski, A fast algorithm for computing longest common subsequences, Communications of the ACM, Vol.20, pp.350-353, 1977.
55.R. M. Idury and A. A. Schaffer, Dynamic dictionary matching with failure functions, in “Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching,” pp.273-284, 1992.
56.J. Jaja, “An introduction to parallel algorithms,” Addison Wesley, 1992.
57.D. E. Knuth, J. H. Morris, and V. R. Pratt, Fast pattern matching in strings, SIAM Journal on Computing, Vol.6, pp.323—350, 1977.
58.K. Krithivansan and R. Sitalakshmi, Efficient two-dimensional pattern matching in the presence of errors, Information Sciences, Vol.13, pp.169-184, 1987.
59.S. Kumar and C. Rangan, A linear space algorithm for the LCS problem, Acta Informatica, Vol.24, pp.353-363, 1987.
60.G. M. Landau and U. Vishkin, Efficient string matching with k mismatches, Theoretical Computer Science, Vol.43, pp.239-249, 1986.
61.G. M. Landau and U. Vishkin, Fast parallel and serial approximate string matching, Journal of Algorithms, Vol.10, pp.158-169, 1989.
62.G. M. Landau and U. Vishkin, Pattern matching in a digitized image, Algorithmica, Vol.12, pp.375-408, 1994.
63.R. C. Lyndon and M. P. Schutzenberger, The equation aM = bNcP in a free group, The Michigan Mathematical Journal, Vol.9, No.4, pp.289-298, 1962.
64.E. M. McCreight, A space-economical suffix tree construction algorithm, Journal of the ACM, Vol.23, No.2, pp.262-272, 1976.
65.E. W. Myers, An O(nd) difference algorithm and its variations, Algorithmica, Vol.1, pp.251-266, 1986.
66.N. Nakatsu, Y. Kambayashi, and S. Yajima, A longest common subsequence algorithm suitable for similar text string, Acta Informatica, Vol.18, pp.171-179, 1982.
67.A. Pentland, Invited talk, NSF Institutional Infrastructure Workshop, 1992.
68.C. Rick, A new flexible algorithm for the longest common subsequence problem, in “Proceedings of 6th Annual Symposium on Combinatorial Pattern Matching (CPM 95),” pp.340-351, Lecture Notes in Computer Science, Vol.937, Springer, Berlin, 1995. Also in Nordic Journal of Computing, Vol.2, pp.444-461, 1995.
69.C. Rick, Simple and fast linear space computation of longest common subsequences, Information Processing Letters, Vol.75, pp.275-281, 2000.
70.S. C. Sahinalp and U. Vishkin, Efficient approximate and dynamic matching of patterns using a labeling paradigm, in “Proceedings of the 37th FOCS,” pp.320-328, 1996.
71.J. Seiferas and Z. Galil, Real-time recognition of substring repetition and reversal, Mathematical Systems Theory, Vol.11, No.2, pp.111-146, 1977.
72.E. Ukkonen, Algorithms for approximate string matching, Information and Control, Vol.64, pp.100-118, 1985.
73.P. Weiner, Linear pattern matching algorithm, in “Proceedings of the 14th IEEE Symposium on Switching and Automata Theory,” pp.1-11, 1973.
74.S. Wu and U. Manber, Fast text searching allowing errors, Communications of the ACM, Vol.35, No.10, pp.83-91, 1992.
75.S. Wu, U. Manber, G. Myers, and W. Miller, An O(np) sequence comparison algorithm, Information Processing Letters, Vol.35, pp.317-323, 1990.
76.A. C. Yao, The complexity of pattern matching for a random string, SIAM Journal on Computing, Vol.8, No.3, pp.368-387, 1979.
77.A. C. Yao and F. F. Yao, Dictionary lookup with one error, Journal of Algorithms, Vol.25, pp.194-202, 1997.
78.J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Transactions on Information Theory, Vol.23, pp.337-343, 1977.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top