 Fu, Wang and Lou（2002） present an exact and large deviation approximation for the distribution of the longest run in a sequence of multi-state Markov dependent trials with order k<=1 . In this thesis, we extend their results to general order k and, in addition, we derive the distribution of any pattern. As an application, we derived the distribution of the frequency of n -words in DNA sequences. The finite Markov chain imbedding technique by Fu and Koutras, (1994) is used to obtain the exact distributions. For k>=2 , numerical comparisons between the exact distributions and approximations of the frequency of n -words in DNA sequences are provided to illustrate the theoretical results. Furthermore, the distributions of the longest run statistics in DNA sequences are demonstrated through real data analysis. The results obtained in the thesis can be applied to protein sequences with minor modifications.
 Chapter 1Introduction 1 1.1 DNA sequence 2 1.2Run Statistics 3 1.3Literature Review 5 Chapter 2Exact Distribution 7 2.1Markov chain imbedding technique7 2.2Notations for DNA sequences 8 2.3Distribution of the frequency of n-words9 Chapter 3Large Deviation Approximation 14 Chapter 4Data Analysis 17 4.1Real Data Analysis 17 4.2Longest runs 20 Chapter 5Conclusion and Further Research 24 Reference 25
