[1] Liao, L., Patterson, D. J., Fox, D., & Kautz, H. (2006). Building personal maps from GPS data. Annals of the New York Academy of Sciences, 1093(1), 249-265.
[2] Stratonovich, R.L. (1960). Conditional Markov Processes. Theory of Probability and Its Applications. 5(2): 156–178. doi:10.1137/1105015
[3] Forney Jr, G. D. (2005). The viterbi algorithm: A personal history. arXiv preprint cs/0504020.
[4] Baum, L. E., Petrie, T. (1966). Statistical Inference for Probabilistic Functions of Finite State Markov Chains. The Annals of Mathematical Statistics. 37 (6): 1554–1563. doi:10.1214/aoms/1177699147.
[5] Baum, L. E., Eagon, J. A. (1967). An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bulletin of the American Mathematical Society. 73 (3): 360. doi:10.1090/S0002-9904-1967-11751-8.
[6] Baum, L. E., Sell, G. R. (1968). Growth transformations for functions on manifolds. Pacific Journal of Mathematics. 27 (2): 211–227. doi:10.2140/pjm.1968.27.211.
[7] Baum, L. E., Petrie, T., Soules, G., Weiss, N. (1970). A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics. 41 (1): 164–171. doi:10.1214/aoms/1177697196.
[8] Baum, L.E. (1972). An Inequality and Associated Maximization Technique in Statistical Estimation of Probabilistic Functions of a Markov Process. Inequalities. 3: 1–8.
[9] Baker, J. (1975). The DRAGON system—An overview. IEEE Transactions on Acoustics, Speech, and Signal Processing. 23: 24–29. doi:10.1109/TASSP.1975.1162650
[10] Bishop, M., Thompson, E. (1986). Maximum Likelihood Alignment of DNA Sequences. Journal of Molecular Biology. 190 (2): 159–165. doi:10.1016/0022-2836(86)90289-5.
[11] McCulloch, W., Walter, P. (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259.
[12] Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psychological Review. 65 (6): 386–408. doi:10.1037/h0042519
[13] Werbos, P. J. (1994). The Roots of Backpropagation : From Ordered Derivatives to Neural Networks and Political Forecasting. New York: John Wiley & Sons.
[14] Rumelhart, D. E.; Hinton, G. E.; Williams, Ronald J. (1986). Learning representations by back-propagating errors. Nature. 323 (6088): 533–536. doi:10.1038/323533a0.
[15] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.
[16] Borman, S. (2004). The expectation maximization algorithm-a short tutorial. Submitted for publication, 41.
[17] Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press. p338-339
[18] Davis, R. I., Lovell, B. C., & Caelli, T. (2002, August). Improved estimation of hidden markov model parameters from multiple observation sequences. In Object recognition supported by user interaction for service robots (Vol. 2, pp. 168-171). IEEE.
[19] Tu, S. (2015). Derivation of baum-welch algorithm for hidden markov models. URL: https://people. eecs. berkeley. edu/~ stephentu/writeups/hmm-baum-welch-derivation. pdf.
[20] 顧正偉(2005)。利用多觀察值型隱馬可夫模型進行人體動作辨識。國立交通大學資訊工程系所碩士論文,新竹市。 取自https://hdl.handle.net/11296/383tud[21] Haykin, S. (2010). Neural Networks and Learning Machines, 3/E. Pearson Education India.
[22] Snyder, J. P. (1987). Map projections--A working manual (Vol. 1395). US Government Printing Office. P60-64