跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/05 10:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張硯婷
研究生(外文):Yen-Ting Chang
論文名稱:混合共同偏斜t因子分析器
論文名稱(外文):Mixtures of skew-t Factor Analyzers with Common Factor Loadings
指導教授:林宗儀林宗儀引用關係
指導教授(外文):Tsung-I Lin
口試委員:吳宏達王婉倫
口試委員(外文):Hong-Dar Isaac WuWan-Lun Wang
口試日期:2016-07-07
學位類別:碩士
校院名稱:國立中興大學
系所名稱:統計學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:38
中文關鍵詞:分群共同因子負荷量資料縮減ECME 演算法因子分析器離群值
外文關鍵詞:clusteringcommon factor loadingsdata reductionECME algorithmfactor analyzeroutliers
相關次數:
  • 被引用被引用:0
  • 點閱點閱:177
  • 評分評分:
  • 下載下載:11
  • 收藏至我的研究室書目清單書目收藏:0
當分析高維度且厚尾的群集資料時,過去文獻所提出的混合共同t因子分析器(MCtFA)已展現其在穩健化混合共同因子分析器(MCFA)之效率。然而,MCtFA 對於具高度不對稱性的觀測值仍缺乏穩健性。在本論文中,假設共同因子為限制型多變量偏斜t分佈來適應非常態(偏斜和厚尾)的隨機現象下,我們提出一個MCFA與MCtFA模型的穩健性延伸。所提出的模型保留精簡的因子表示式並能在低維度空間視覺化資料。這個新模型稱之混合共同偏斜t因子分析器(MCstFA),它包含MCFA和MCtFA這二個模型為特例。我們發展一個具計算可行性的ECME演算法進行疊代得出參數的最大概似估計值。根據常用的懲罰準則,我們可以選出模型適當的因子個數(q)和混合成份個數(g)。我們透過模擬研究和實例分析來闡述所提方法的實用性,其結果顯示所提出模型的表現優於現存的MCFA和MCtFA
方法。

The mixture of common t factor analysis (MCtFA) model has been shown its effectiveness in robustifing the mixture of common factor analysis (MCFA) model when analyzing model-based clustering of high-dimensional data with heavy tails. However, the MCtFA model may still suffer from a lack of robustness against observations whose distributions are highly asymmetric. In this thesis, we present a further robust extension of the MCFA and MCtFA models by assuming a restricted multivariate skew-t distribution for the common factors to accommodate severe non-normal (skewed and leptokurtic) random phenomena while preserving its parsimony in factor-analytic representation and allowing graphical visualization in low-dimensional plots. This new tool is called the mixture of common skew-t factor analysis (MCstFA) model and contains both MCFA and MCtFA as limiting and special cases. A computationally feasible Expectation Conditional Maximization Either (ECME) algorithm is developed to employ maximum likelihood estimation. The numbers of factors and mixture components are simultaneously determined based on common likelihood penalized criteria. The usefulness of our proposed model is illustrated with simulated and real data, and results demonstrate its better performance over existing MCFA and MCtFA methods.

1. Introduction 1
2. Notation and prerequisites 3
3. Methodology 5
3.1. Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Parameter estimation via the ECME algorithm . . . . . . . . . . . . 8
4. Practical issues from computational aspects 12
4.1. Initialization and stopping rules . . . . . . . . . . . . . . . . . . . . . 12
4.2. Model selection and performance evaluation . . . . . . . . . . . . . . 13
4.3. Identifiability issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5. Simulation 15
5.1. Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2. Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6. Applications to Real Data 20
6.1. Example 1: the seeds data . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2. Example 2: the human liver cancer data . . . . . . . . . . . . . . . . 21
7. Conclusion 25
A. Proof of hierarchical representation (8) 26
B. Proof of Proposition 1 28
C. Proof of CMQ-steps 32

Aitken A.C. (1926) On Bernoulli’s numerical solution of algebraic equations. ProcR Soc Edinburgh 46:289–305
Azzalini, A. (2014) The Skew-Normal and Related Families. IMS Monographs series, Cambridge University Press, UK.
Baek, J., McLachlan, G.J. (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276.
Baek, J., McLachlan, G.J., Flack, L.K. (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of
high-dimensional data. IEEE Trans. Patt. Anal. Mach. Intell. 32:1–13.
Barndorff-Nielsen, O., Shephard, N. (2001) Non-Gaussian Ornstein-Uhlenbeckbased models and some of their uses in financial economics. J. Roy. Statist. Soc. Ser. B 63:167–241.
Beal, M.J. (2003) Variational Algorithms for Approximate Bayesian Inference (Ph.D. thesis), The University of London, London, UK.
Biernacki, C., Celeux, G. Govaert, G. (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Patt. Anal.
Mach. Intell. 22:719–725.
Charytanowicz, M., Niewczas J., Kulczycki, P., Kowalski, P., Lukasik, S., Zak, S. (2010) A complete gradient clustering algorithm for features analysis of
X-ray images, in: E. Pietka, J. Kawa (Eds.), Information Technologies in Biomedicine, Springer, Berlin, pp. 15–24.
Chen, X., Cheung, S.T., So, S., Fan, S.T., Barry, C., Higgins, J., Lai, K.M., Ji, J., Dudoit, S., Ng, I.O., Van De Rijn, M., Botstein, D., Brown, P.O. (2002) Gene
expression patterns in human liver cancers. Molecular Biology of the Cell 13: 1929–1939.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Royal Stat. Soc. B 9:1–38.
Ghahramani, Z., Hinton, G.E. (1997) The EM algorithm for factor analyzers, Technical Report No. CRG-TR-96-1. The University of Toronto: Toronto.
Ghahramani, Z., Beal, M. (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla, S., Leen, T., Muller, K.-R (Eds.), Advances in Neural Information Processing Systems, vol. 12. MIT Press, Cambridge, pp. 449–455.
Hartigan, J.A., Wong, M.A. (1979) Algorithm AS 136: A K-means clustering algorithm. J. Royal Stat. Soc. C, 28:100–108.
Hubert, L.J., Arabie, P. (1985) Comparing partitions. Journal of Classification 2:193–218.
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K. (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233.
Lee, W.L., Chen, Y.C., Hsieh, K.S. (2003) Ultrasonic liver tissues classification by fractal feature vector based on M-band wavelet transform. IEEE Trans. Med.
Imaging 22:382–392.
Lee, Y.W., Poon, S.H. (2011) Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops pp. 1–61.
School of Social Science, University of Manchester.
Lin, T.I. (2014) Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition. Comput. Statist. Data Anal. 71:183–195.
Lin, T.I., McLachlan, G.J., Lee, S.X. (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Ana. 143:398–413.
Lin, T.I., Wu, P.H., McLachlan, G.J., Lee, S.X. (2015) A robust factor analysis model using the restricted skew-t distribution. TEST 24:510–531.
Liu, C., Rubin, D.B. (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:33–648.
McLachlan, G.J., Basford, K.E. (1988) Mixture models: inference and application to clustering. Marcel Dekker, New York.
McLachlan, G.J.,Krishnan, T. (2008) The EM algorithm and extensions, 2nd edition. John Wiley and Sons, New York.
McLachlan, G.J., Peel, D. (2000) Finite Mixture Models. Wiley, New York.
McNicholas, P.D., Murphy, T.B. (2008) Parsimonious Gaussian mixture models. Statist. Comp. 18:285–296.
McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D. (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54:711–723.
Meng, X.L., Rubin, D.B. (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278.
Murray, P.M., Browne, R.P., McNicholas, P.D. (2014a) Mixtures of skew-t factor analyzers. Comput. Stat. Data Anal. 77:326–335.
Murray, P.M., McNicholas, P.D., Browne, R.P. (2014b) Mixtures of common skew-t factor analyzers. Stat 3:68–82.
Ouyang, M., Welsh, W., Georgopoulos, P. (2015) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20:917–923.
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P. (2009) Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA, 106:8519–8524.
Schwarz, G. (1978) Estimating the dimension of a model. Ann. Statist. 6:461–464.
Subedi, S., McNicholas, P.D. (2014) Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv. Data Anal. Classif. 8:167–193.
Teschendorff, A., Wang, Y., Barbosa-Morais, N., Brenton, J., Caldas, C. (2005)
A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21:3025–3033.
Tortora, C., McNicholas, P., Browne, R. (2015) A mixture of generalized hyperbolic factor analyzers. Adv. Data Anal. Classif. DOI 10.1007/s11634-015-0204-z.
Ueda, N., Nakano, R., Ghahramani, Z., Hinton, G.E. (2000) SMEM algorithm for mixture models. Neural Computation 12:2109–2128.
Wang, W.L. (2013) Mixtures of common factor analyzers for high-dimensional data with missing information. J. Multivar. Anal. 117:120–133.
Wang, W.L. (2015) Mixtures of common t-factor analyzers for modeling highdimensional data with missing values. Comput. Statist. Data Anal. 83:223–235.
Waterhouse, S., MacKay, D., Robinson, T. (1996) Bayesian methods for mixture of experts. In: Advances in neural information processing systems, vol 8. MIT Press, Cambridge.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊