(3.231.166.56) 您好!臺灣時間:2021/03/08 12:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳珮文
研究生(外文):Chen, Pei-Wen
論文名稱:潛在類別模型的變數選擇與在交通流量預測的應用
論文名稱(外文):Variable selection in the latent class model and its application in traffic flow prediction
指導教授:黃冠華黃冠華引用關係
指導教授(外文):Huang, Guan-Hua
口試委員:邱燕楓郭炤裕洪慧念
口試委員(外文):Chiu, Yen-FengGuo, Chao-YuHong, Huei-Nian
口試日期:2017-07-24
學位類別:碩士
校院名稱:國立交通大學
系所名稱:統計學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:44
中文關鍵詞:迴歸潛在類別模型高維度資料變數選擇交替K均值分群法交通流量預測
外文關鍵詞:regression extend of latent class analysishigh-dimensional datavariable selectionalternate k-means clusteringtraffic flow prediction
相關次數:
  • 被引用被引用:0
  • 點閱點閱:130
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在做迴歸潛在類別模型的參數估計時,我們可以透過將觀察單位分群(clustering)的方式,來獲得潛在類別變數的估計。在高維度的資料下,做分群分析時的變數選取,則顯得非常重要。在此,我們提出交替K均值分群演算法,此演算法可找出雜訊變數,將之排除在分群分析之外,以期獲得最佳的潛在類別分群結果。接著,再視此分群結果為已知,我們便可以估計其他模型之參數。我們透過運用此具變數選擇能力的迴歸潛在類別模型,發展出一個「台灣國道五號交通流量」的預測方法,來預測未來三個月後或更遠某連續假日之全天交通流量。我們並比較原始的K均值分群演算法及新提出的交替K均值演算法,檢視兩者在交通流量預測結果之差異。
Parameters in the regression extend of latent class analysis (RLCA) model can be estimated by some clustering methods. For the high-dimensional data, variable selection in cluster analysis becomes an important issue. Here, we adopt an alternate k-means clustering method to first distinguish clustering and noisy variables (surrogates) and then exclude those noisy variables from clustering. By doing so, we can increase the accuracy of clustering results and thus have a better estimate of the latent class variable. By treating the estimated latent class as known, one can then estimate other parameters in the RLCA model. We offer a prediction method under this newly developed variable-selecting RLCA model to predict the traffic flow on the Freeway No. 5 in Taiwan. The aim of prediction is to predict the whole day traffic flow of some holidays that are three months or more away from now. We also compare the results from the traditional k-means algorithm with those from the proposed alternate k-means algorithm.
中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
表目錄 v
圖目錄 vi
1. 介紹 1
2. 文獻回顧 3
2.1. 潛在類別分析 3
2.2. 迴歸潛在類別分析 3
2.2.1. 藉由EM演算法做參數估計 5
2.2.2. 藉由分群分析演算法做參數估計 6
2.3. 迴歸潛在類別模型的邊際化 7
2.3.1. 在條件機率上邊際化影響觀測值之變數的影響 7
2.3.2. 在潛在盛行率上邊際化影響潛在類別之變數的影響 9
2.4. K均值演算法 9
3. 方法 11
4. 藉由分群演算法做參數估計—交替K均值 14
4.1. 未加入共變量變數(xi, zi)時的參數估計 14
4.1.1. 符合假設一的量測值 15
4.1.2. 符合假設二的量測值 16
4.1.3. 交替K均值之演算法 18
4.2. 加入共變量變數(xi, zi)時的參數估計 20
5. 交通流量預測的應用 22
5.1. 資料背景 25
5.2. 迴歸潛在類別模型—交替K均值之結果 27
5.3. 交替K均值之結果及K均值之結果之比較 32
6. 結論 40
參考文獻 41
Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., & Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association, 92(440), 1375-1386.
Brusco, M. J., & Cradit, J. D. (2001). A variable-selection heuristic for K-means clustering. Psychometrika, 66(2), 249-270.
Chang, C. J., Chen, W. J., Liu, S. K., Cheng, J. J., Yang, W. C. O., Chang, H. J., ... & Hwu, H. G. (2002). Morbidity risk of psychiatric disorders among the first degree relatives of schizophrenia patients in Taiwan. Schizophrenia Bulletin, 28(3), 379.
Chen, W. J., Liu, S. K., Chang, C. J., Lien, Y. J., Chang, Y. H., & Hwu, H. G. (1998). Sustained attention deficit and schizotypal personality features in nonpsychotic relatives of schizophrenic patients. American Journal of Psychiatry, 155(9), 1214-1220.
Cheng, J. J., Ho, H., Chang, C. J., Lan, S. Y., & Hwu, H. G. (1996). Positive and Negative Syndrome Scale (PANSS): establishment and reliability study of a Mandarin Chinese language version. Chinese Psychiatry, 10(3), 251-258.
Dayton, C. M., & Macready, G. B. (1998). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83 , 173-178.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B 39, 1-38.
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American statistical association, 97(457), 77-87.
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association, 97(458), 611-631.
Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society. Series B, 66 , 815-849.
Frank, I. E. & Friendman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109-148.
Fraiman, R., Justel, A., & Svarc, M. (2008). Selection of variables for cluster analysis and classification rules. Journal of the American Statistical Association, 103(483), 1294-1303.
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61 , 215-231.
Hirose, K., Tateishi, S., & Konishi, S. (2011). Efficient algorithm to select tuning parameters in sparse regression modeling with regularization. arXiv preprint arXiv:1109.2411.
Huang, G. H. (2005). Selecting the number of classes under latent class regression: a factor analytic analogue. Psychometrika, 70 , 325-345.
Huang, G. H., & Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69 , 5-32.
Huang, G. H., Wang, S.M., & Hsu, C.C. Prediction of Underlying Latent Classes via K-means and Hierarchical Clustering Algorithms.
Huang, G. H., Wang, S. M., & Hsu, C. C. (2011). Optimization-based model fitting for latent class and latent profile analyses. Psychometrika, 76(4), 584-611.
Hughes, T. R., Mao, M., Jones, A. R., Burchard, J., Marton, M. J., Shannon, K. W., et al. (2001). Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nature Biotechnology, 19 , 342-347.
Lange, K. (1995). A quasi-Newton acceleration of the EM algorithm. Statistica sinica, 1-18.
Landwehr, J. M., Pregibon, D., & Shoemaker, C. (1984). Graphical methods for assessing logistic regression models. Journal of the American Statistical Association, 79 , 61-71.
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. New York: Houghton-Mifflin.
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88 , 365-411.
Liu, S. K., Hwu, H. G., & Chen, W. J. (1997). Clinical symptom dimensions and deficits on the continuous performance test in schizophrenia. Schizophrenia Research, 25 , 211-219.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models, second edition. London: Chapman and Hall.
Melton, B., Liang, K. Y., & Pulver, A. E. (1994). Extended latent class approach to the study of familial/sporadic forms of a disease: its application to the study of the heterogeneity of schizophrenia. Genetic Epidemiology, 11 , 311-327.
Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49 , 313-334.
Muthen, L. K., & Muthhen, B. O. (2007). Mplus user's guide. fifth edition. Los Angeles: Muthen & Muthen.
Pan, W. & Shen, X. (2007). Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research 8, 1145-1164.
Raftery, A. E., & Dean, N. (2006). Variable selection for model-based clustering. Journal of the American Statistical Association, 101(473), 168-178.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Kopenhagen: Nielsen & Lydiche.
Rosvold, H. E., Mirsk, A. F., Sarason, I., Bransome Jr, D. D., & Bech, L. H. (1956). A continuous performance test of brain damage. Journal of Consulting Psychology, 20 , 343-350.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58, 267-288.
Titterington, D. M., Smith, A. F., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley & Sons.
Veer, L. J. van't, Dai, H., Vijver, M. J. van de, He, Y. D., Hart, A. A., Mao, M., et al. (2002).
Gene expression profiling predicts clinical outcome of breast cancer. Nature,415 ,
530-536.
Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering.
Journal of the American Statistical Association, 105(490), 713-726.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔