跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.90) 您好!臺灣時間:2024/12/03 16:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王瑜筎
研究生(外文):Yu-Ju Wang
論文名稱:具遺失訊息下t因子分析模式之自動學習
論文名稱(外文):Automated learning of t factor analysis models with missing information
指導教授:林宗儀林宗儀引用關係
指導教授(外文):Tsung-I Lin
口試委員:吳宏達王婉倫
口試日期:2016-07-07
學位類別:碩士
校院名稱:國立中興大學
系所名稱:統計學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:33
中文關鍵詞:ECM演算法最大概似估計遺失訊息模型選擇多變量t分佈離群值
外文關鍵詞:ECM algorithmsmaximum likelihood estimationmissing informationmodel selectionmultivariate t distributionoutliers
相關次數:
  • 被引用被引用:0
  • 點閱點閱:98
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
對於穩健地處理具厚尾雜訊的高維度資料的縮減,t因子分析模型已被展現成一種令人讚許的工具。以往在配適t因子模型的時候,通常是使用兩個階段的程序去找出合適的因子個數。第一階段針對每一個候選模型做參數估計,而第二階段根據有懲罰項的準則(例如:貝氏訊息準則),在候選模型中選出最佳因子個數。然而,在兩階段的配適過程中,要找到最佳配適模型通常計算的負荷量很高,尤其是資料量相當龐大時。在本文中,我們致力於發展一種新穎的自動學習方法,其能在同一步驟中無縫地整合了參數估計以及模型選擇。這個新的技巧稱為t因子分析模式之自動學習(AtFA)演算法。這的演算法對於處理具有遺失的資料亦是可行的。此外,我們推導tFA模型的費雪訊息矩陣逼近最大概似估計量的共變異數矩陣。藉由真實與模擬資料的實驗,我們發現AtFA不僅提供了相同的配適結果,收斂速度上更是比兩階段的因子分析更加提升,尤其在計算具有大量的遺失訊息資料情況下更加明顯。

The t factor analysis (tFA) model has been shown a promising tool for robust reduction of high-dimensional data in the presence of heavy-tailed noises. When determining the number of factors of the tFA model, the two-stage procedure is commonly performed in which parameter estimation is carried out for a number of candidate models, and then the best model is chosen according to certain penalized likelihood indices such as the Bayesian information criterion. However, the computation burden of such a procedure could be extremely high to achieve the optimal performance, particularly for extensively large datasets. In this thesis, we are devoted to developing a novel automatic learning method in which parameter estimation and model selection are seamlessly integrated in a one-stage algorithm. This new scheme is called the automatic tFA (AtFA) algorithm and is also workable when missing values exist in the data. In addition, we derive the Fisher information matrix to approximate the asymptotic covariance matrix associated with the ML estimators of tFA models. Experiments on real and simulated data examples reveal that the AtFA algorithm not only provides identical fitting results as compared to traditional two-stage procedures but also converges much more swiftly, especially for the presence of considerable missing values.

1. Introduction 1
2. ML estimation for the tFA model with missing information 4
2.1. The EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2. The AECM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3. The ECM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3. The AtFA algorithm for learning incomplete data 12
4. Estimating precision of parameter estimates 14
5. Numerical experiments 17
5.1. The cereal data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2. The white leghorn fowl data with authentic missing values . . . . . . 19
5.3. Simulation with synthetic missing values . . . . . . . . . . . . . . . . 23
6. Conclusion 26
Appendices 27
A. Proof of Equation (9) . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
B. Proof of Equation (29) . . . . . . . . . . . . . . . . . . . . . . . . . . 28
C. The score vector and Hessian matrix . . . . . . . . . . . . . . . . . . 30
References 31

Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (Eds.) 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Anderson, T.W. (1957) Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. J. Am. Stat. Assoc. 52:200– 203.
Baek, J., McLachlan, G.J. (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276.
Baudry, J.P., Raftery, A.E., Celeux, G., Lo, K., Gottardo, R. (2010) Combining Mixture Components for Clustering. J. Comput. Graph. Stat. 9:332–353.
Biernacki, C., Celeux, G., Govaert, G. (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Patt. Anal. Mach. Intell. 22:719–725.
Biernacki, C., Govaert, G. (1997) Using the classification likelihood to choose the number of clusters. Comput. Sci. Stat, 29:451–457.
Bozdogan, H. (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52:345–370.
Celeux, G., Soromenho, G. (1996) An entropy criterion for assessing the number of clusters in a mixture model. J. Classific., 13:195V212.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Royal Stat. Soc. B 9:1–38.
Efron, B., Tibshirani, R. (1993) An Introduction to the Bootstrap, Chapman & Hall, London.
Fang, K.T., Kotz, S., Ng, K.W. (1990) Symmetric Multivariate and Related Distributions. Chapman & Hall, London.
Hannan, E.J., Quinn, B.G. (1979) The determination of the order of an autoregression. J R Stat Soc Ser B 41:190V195
Hastings, W.K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109.
Hocking, R.R., Smith, W.B. (1968) Estimation of parameters in the multivariate normal distribution with missing observations. J. Am. Stat. Assoc. 63:159– 173.
Ibrahim, J.G., Zhu, H., Tang, N. (2008) Model selection criteria for missing data problems via the EM algorithm. J. Am. Stat. Assoc. 103:1648–1658.
Jamshidian, M. (1997) An EM algorithm for ML factor analysis with missing data. In: Berkane M (ed) Latent variable modeling and applications to causality. Springer, New York, pp 247–258.
Johnson, R.A., Wichern, D.W. (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice-Hall, New York.
Jöreskog, K.G. (1967) Some contributions to maximum likelihood factor analysis. Psychometrika 32:433–482.
Keribin, C. (2000) Consistent estimation of the order of mixture models. Sankhyā Ser. 62:49–66.
Lattin, J., Carrol, J.D., Green, P.E. (2003) AnalyzingMultivariate Data. Brooks/Cole, Pacific Grove, CA.
Lawley, D.N., Maxwell, A.E. (1971) Factor analysis as a Statistical Method. 2nd ed., Butterworth, London.
Ledermann, W. (1937) On the Rank of the Reduced Correlational Matrix in Multiple- Factor Analysis. Psychometrika 2:85–93.
Lin, T.I., Lee, J.C., Ho, H.J. (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recogn 39:1177–1187.
Little, R.J.A., Rubin, D.B. (2002) Statistical Analysis with Missing Data, 2nd edn. Wiley, New York.
Liu, M., Lin, T.I. (2015) Skew-normal factor analysis models with incomplete data. J Appl Stat 42:789–805.
Marcus, L. F. (1990) Traditional morphometrics. Pages 77–122 in Proceedings of the Michigan morphometrics workshop, volume 2 (F. J. Rohlf and F. L. Bookstein, eds.). Univ. Michigan Museum of Zoology, Ann Arbor, Michigan.
McLachlan, G.J., Bean, R.W., Jones, L.B.T. (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Stat. Data Anal. 51:5327–5338.
Meng, X.L., Rubin, D.B. (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278.
Meng, X.L., van Dyk, D. (1997) The EM algorithm–an old folk-song sung to a fast new tune. J. Roy. Stat. Soc. B 59:511–567.
Kotz, S. Nadarajah, S. (2004) Multivariate t Distributions and their Applications, Cambridge University Press, Cambridge.
Nadarajah, S., Kotz, S. (2005) Mathematical properties of the multivariate t distribution.Acta Appl. Math. 89, 53–84.
Rubin, D.B. (1976) Inference and missing data. Biometrika 63:581–592.
Rubin, D.B. (1987) Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
Schwarz, G. (1978) Estimating the dimension of a model. Ann. Statist. 6:461–464.
Sclove, L.S. (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52:333–343.
Spearman, C. (1904) General intelligence, objectively determined and measured. Am J Psychol 15:201–292.
Wang, W.L., Lin, T.I. (2013) An efficient ECM algorithm for maximum likelihood estimation in mixtures of t-factor analyzers. Comput. Stat. 28:751–769.
Woodbury, M.A., (1950) Inverting Modified Matrices. Statistical Research Group, Memo. Rep. No. 42. Princeton University, Princeton, New Jersey.
Zhao, J.H., Yu, P.L.H., Jiang, Q (2008) ML estimation for factor analysis: EM or non-EM? Stat Comput 18:109–123.
Zhao, J., Shi, L. (2014) Automated learning of factor analysis with complete and incomplete data. Comput. Stat. Data Anal. 72:205–218.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top