跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2025/02/19 03:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:羅建勝
研究生(外文):LUO, JIAN-SHENG
論文名稱:利用後驗機率於非參數時間模型分群之研究
論文名稱(外文):Clustering of time series data using nonparametric Bayesian model
指導教授:林財川林財川引用關係
指導教授(外文):LIN, TSAIR-CHUAN
學位類別:碩士
校院名稱:國立臺北大學
系所名稱:統計學系
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:295
中文關鍵詞:非參數自我迴歸模型後驗機率模型基礎分群分群相似度測量
外文關鍵詞:nonparametric autoregressive modelsposteriormodel-based clusteringsimilarity measures
相關次數:
  • 被引用被引用:2
  • 點閱點閱:314
  • 評分評分:
  • 下載下載:28
  • 收藏至我的研究室書目清單書目收藏:0
時間序列資料分群可分為非模型基礎和模型基礎,前者如離散型的傅立葉轉換、離散型的小波轉換、主成份分析和編碼倒頻譜等分群法,使用此類方法分群需先對資料限定條件或進行處理。而後者如自我迴歸貝氏分群法、自我迴歸移動平均模型分群法和隱藏式馬可夫模型分群法等。此模型基礎分群假設資料服從一定模型,且為了方便估計或容易計算分配求導,研究者會假設其模型為線性且資料為定態。但在實際應用上卻常遇到資料為非線性或是資料為非定態,導致假設不成立且分析結果不佳。為了克服遇到非定態資料、非線性資料以及參數化模型的模型函數不易取得的問題,本研究企圖以非參數模型對時間序列加以分群。
研究中將貝氏分群方法使用於非參數可加性自我迴歸模型來取代現存研究中的自我迴歸模型,並使用廣義的交叉驗證和BRUTO流程來求得最適平滑參數及估計函數;文中利用自然立方樣條平滑估計函數可以基底函數表示的特性,並給予適當的平滑基底維度、先驗機率和資料分配的假設,計算分群模型概率函數值,當有最大後驗機率或其他分群訊息準則值便可得到最佳分群的結果。
本論文的主要內容為:利用非參數自我迴歸模型降低分群誤差;利用先驗機率與特定平滑函數的假設建立非參數自我迴歸模型的貝氏分群方法;導出非參數模型分群的訊息準則(BIC);對自我迴歸貝氏分群法與非參數自我迴歸分群法加以比較。
自我迴歸模型貝氏分群法使用邊際概率函數值對線性時間序列資料分群有不錯的分群相似度;而且對近單根時間序列資料其分群相似度卻不甚理想,相似度約為0.47~0.58;對非線性時間序列資料,相似度約為0.74~0.91。自我迴歸模型貝氏分群法使用貝氏資訊準則值進行分群時,相似度值較使用邊際概率函數值低。而非參數自我迴歸模型的貝氏分群法使用邊際概率函數值對線性、近單根和非線性時間序列資料進行分群時,皆有較佳的分群相似度。在資料長度為100時,對線性資料相似度約為0.79~0.91,對近單根資料相似度約為0.8~0.94,對非線性資料相似度約為0.81~0.97。使用貝氏資訊準則值進行分群時,其相似度值較使用邊際概率函數值低。在模擬分析中,可看出非參數貝氏分群法對非線性以及近單根的時間序列資料有較佳的分群相似度。
本研究使用心電圖資料進行實証分析,得到非線性貝氏分群法較線性自我迴歸貝氏分群法結果為佳。
In general time series data clustering methods can be grouped as two main classes, namely non-model-based methods and model-based methods. The former consists, e.g., DFT, DWT, PCA and LPC clustering methods. The methods in the latter class usually assume the series in the same cluster have the same model. For example, the autoregressive, autoregressive and moving-average, and hidden Markov model clustering methods are assumed and used in variant model-based clustering processes. The clustering methods in the above two classes usually assume the implemented models are linear and each series is stationary. However, in practical these assumptions may fail and cause incorrect result. Hence, the study aims to propose a nonparametric clustering method to overcome the mentioned problems.
We present a Bayesian concept of the nonparametric additive model for and derive the Bayesian information criterion (BIC) for nonparametric additive models clustering. The GCV criteria and the BRUTO algorithm are applied to get the best smoothing parameters and deal with the mean function estimations.
In the simulations, the Bayesian autoregressive models (BAR) methods have similarity value 0.98 when the true models are linear stationary autoregressive and low similarity value when the true models are near unit root autoregressive or non-linear models. On the other hand, as expectation, BNPAR models have higher similarity values for three kinds data generations than the BAR models clustering.
We also employed the proposed clustering method in the application of ECG data and found that our proposed method have the better result that than Bayesian autoregressive model clustering.
目錄
第一章:導論 4
第一節:研究背景介紹 4
第二節:研究目的 6
第二章:文獻回顧 9
第一節:隱藏式馬可夫模型分群方法 9
第二節:自我迴歸時間序列模型分群方法 14
第三節:非參數模型分群方法 18
第三章:自我迴歸時間序列模型的貝氏分群法 21
第一節:自我迴歸時間序列模型 21
第二節:自我迴歸時間序列模型貝氏分群方法 23
第四章:非參數自我迴歸模型 35
第一節:非參數模型基本理論 35
第二節:非參數模型平滑函數 36
第三節:平滑參數的選取 38
第四節:滿足立方樣條平滑函數的解 39
第五節:簡單非參數自我迴歸模型 40
第六節:非參數可加性模型 41
第七節:可加性自我迴歸模型 43
第五章:非參數自我迴歸模型貝氏分群法 44
第六章:模擬分析 52
第一節:模擬比較分群方法 52
第二節:分群個數的模擬分析 54
第三節:分群相似度的模擬分析 75
第七章:實際資料分析 140
第一節:心電圖介紹 140
第二節:心電圖資料描述 142
第八章:結論與未來發展 156
附錄一 模擬分析結果 159
附錄二 程式碼 278
參考文獻 293
參考文獻
黃筱芸(2005)。“非定態時間序列資料的非參數估計集群方法”國立台北大學統計研究所碩士論文。

呂嘉陞(1996)。“心電圖學必備”九州出版社。

方馨譽(2001)。“利用智慧型ECG信號處理來顯示人類情緒指數”。

香港聖約翰救傷隊,沙田B救護支隊,http://www.geocities.com/stbadhk

心電圖資料來源網站:http://www.physionet.org/physiobank/database

Agrawal, R. C. Faloutsos, and A. Swami (1993). “Efficient similarity search in sequence databases,”Proc. of FODO, pg. 69–84.

Anderson, T. W. (1980). “Maximum likelihood estimation for vector autoregressive moving average models,” In Brillinger, D. R., and Tiao, G. C. (eds.), New Directions in Time Series, pp. 45--59: Institute of Mathematical Statistics.

Berger, J. O.(1985). “Statistical decision theory and Bayesian analysis,”Springer-Verlag.

Box, G.E.P. and G. M. Jenkins (1976). “Time Series Analysis: Forecasting and Control,” Holden-Day: San Franciso.

Box, G. E. P., G. M. Jenkins and G. C. Reinsel (1994). “Time Series Analysis: Forecasting and Control,” Prentice Hall, Englewood Cli_s, NJ, USA, 3rd edition.

Brockwell, P. J. and R. A. Davis (1991). “Time Series: Theory and Methods (2nd edition),”Springer Verlag.

Brockwell, P. J. and R. A. Davis (2000). “Introduction to Time Series and Forecasting,” Springer.

Cleveland, W. S. (1979). “Robust locally weighted regression and smoothing scatterplots,” Journal of American Statistical Association, 74, 829--836.

De Boor, C. (1978). “A Practical Guide to Splines,” New York: Springer-Verlag.

Dempster, A. P., N. Laird and D. B. Rubin(1976). “Maximum likelihood estimation from incomplete data using the EM algorithm (with discussion),” J. Roy. Statist. Soc. Series B, 39, 1–38.

Dugad, R. and U. B. Desai (1996). “A Tutorial on Hidden Markov Models,”Technical Report, Signal Processing and Artificial Neural Networks Laboratory,Indian Institute of Technology, SPANN-96.1.

Forney, G. D. (1973). “The Viterbi algorithm,” Proc. IEEE, vol. 61,pp. 268-278.

Furui, S.(1989). “Digital Speech Processing, Synthesis, and Recognition,” Marcel Deckker, Inc., New York.

Gavrilov, M. D. Anguelov, P. Indyk, and R. Motwani(2000). “Mining the stock market: Which measure is best?,” In Proc. of the KDD, pg. 487--496.

Hastie, T. J. and R. J. Tibshirani(1990). “Generalized additive models,”London:Chapman and Hall.

Hastie, T. R. Tibshirani and A. Buja(1993). “Flexible Discriminant Analysis by Optimal Scoring,”AT&T Bell Laboratories Technical Memorandum.

Juang, B. H. and L. R. Rabiner(1990). “The segmental K-means algorithm for estimating parameters of hidden Markov models,”IEEE Trans. Signal Processing, vol. 38, pp. 1639-1641

Kalpakis, K., D. Gada, and V. Puttagunta (2001). “Distance measure for effective clustering of ARIMA time series,” In proc. Of the IEEE ICDM, pp. 273-280.

Luan, Y. and H. Li (2003). “Clustering of time-course gene expression data using a mixed-effects model with B-splines,” Bioinformatics. Vol 19, No. 4, pp.474-482.

Ma, P., C. I. Castillo-Davis, W. Zhong and J. S. Liu(2006). “A data-driven clustering method for time course gene expression data,” Nucleic Acids Res. Vol.34, No. 4 , pp.1261–1269.

Poulsen, C. (1990). “Mixed Markov and latent Markov modelling applied to brand choice behavior,” International Journal of Research in Marketing, 7:5--19.

Rabiner, L. R. and B. H. Juang(1986). “An introduction to hidden Markov models,”IEEE ASSP Magazine, pp. 4-15.

Rabiner, L.R.(1989). “A tutorial on hidden Markov models and selected applications in speech Recognition,” Proceedings of the IEEE, Vol 77, No. 2, pp.257-286.

Ramoni, M. F., P. Sebastiani and I. S. Kohane(2002). “Cluster analysis of gene expression dynamics,”Proc Nat Acad Sci USA Vol 99, No. 14, pp.9121-9126.

Redner, R. A. and H. F. Walker(1984). “Mixture densities, maximum likelihood and the
EM algorithm,”SIAM Review, Vol 26, No. 2, pp.195-239

Sebastiani, P., M. F. Ramoni(2001). “Bayesian clustering by dynamics of European socio-economic indicators,” Bayesian Methods: Selected Papers from Sixth World Meeting of the Intl Soc for Bayesian Analysis (ISBA-2000) pp 479-488.

Silverman, B. W. (1984).“Spline smoothing: the equivalant variable kernel method,” Annals of Statistics 12: 898-916.

Silverman, B. W.(1985).“Some aspects of the spline smoothing approach to non-parametric regression curve fitting,”J.R.Statist.Soc.B, Vol 47, pp. 1-52.

Struzik, Z. and A. Sibes(1999). “Measuring time series similarity through large singular features revealed with wavelet transformation,” In Proc. of the 10th Intl. Workshop on Database and Expert Systems Appl., pp. 162–166.

Tsoi, A. C., Z. Shu and M. Hagenbuchner (2005). “Pattern Discovery on Australian Medical Claims Data-a systematic approach,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, No. 10, pp. 1420-1435.

Xiong, Y. and D. Y. Yeung (2004). “Time Series Clustering with ARMA Mixtures,” Pattern Recognition, Vol 37, No. 8, pp. 1675-1689.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top