(3.237.20.246) 您好!臺灣時間:2021/04/15 11:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:楊烱男
研究生(外文):Jyong-Nan Yang
論文名稱:具遺失訊息混合共同t 因子分析器 之最大概似推論
論文名稱(外文):Maximum Likelihood Inference for Mixtures of Common t-Factor Analyzers with Missing Information
指導教授:王婉倫
指導教授(外文):Wan-Lun Wang
口試委員:林文欽林宗儀
口試委員(外文):Win-Chin LinTsung-I Lin
口試日期:2014-07-09
學位類別:碩士
校院名稱:逢甲大學
系所名稱:統計學系統計與精算碩士班
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:中文
論文頁數:71
中文關鍵詞:分群共同因子負荷量維度縮減ECME 演算法AECM 演算法隨機遺失
外文關鍵詞:ClusteringCommon factor loadingsDimension reductionECME algorithmAECM algorithmMissing at random
相關次數:
  • 被引用被引用:0
  • 點閱點閱:57
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
混合共同因子分析器(MCFA) 結合了高斯混合模型和因子分析模型, 它擁有分析來自於異質母體的高維度資料之能力。透過成份共變異矩陣的因子分析表示式以及共同成份因子負荷量, 此模型有效地縮減在成份共變異矩陣設定上的參數個數。對於具有非常態或厚尾的群集資料, MCFA 的穩健性延伸, 稱為混合共同t 因子分析器(MCtFA), 由於其對於離群值較不敏感, 此模型已成為具有彈性的分析工具。然而, 遺失值經常出現在許多的科學研究中, 現有的MCtFA 模型不允許處理這類型的資料。本論文提供分析者一個一般化的模型結構以配適具不完整資料的MCtFA。在隨機遺失的機制下, 我們發展了有效率的ECME 和AECM 演算法來執行參數的最大概似估計。分群的視覺化呈現、新個體的分類準則、離群值的診斷方針、以及遺失資料的填補也被探討。模擬研究與實例分析驗證了所提出方法的實用性, 並且描述了在有限樣本之下各個模型的相互比較結果。
Mixture of common factor analyzers (MCFA), which is a fusion of Gaussian mixture models and factor analysis models, provides the ability to analyze high-dimensional data from a heterogeneous population. The model considerably reduces the number of parameters in the specification of component-covariance matrices through factor-analytic representation of the component-covariance matrices and common component-factor loadings. For the data with clusters of having nonnormality or heavy tails, a robust extension of MCFA, called the mixture of common t-factor analyzers (MCtFA), has become a more flexible approach due to less than ideal sensitivity to outliers. Unfortunately, the existing MCtFA model does not allow to handle missing values that frequently occur in many scientific investigations. The thesis accommodates the analysts with a general framework for fitting the MCtFA with incomplete data. To carry out maximum likelihood estimation of the model parameters, we develop an efficient ECME and AECM algorithm under a missing at random mechanism. A visualization tool for clustering, a classification rule for allocating new individuals, a diagnostic guideline for outliers, and an imputation method for filling in missing data under the proposed approach are also provided.Illustrative examples are presented to describe the usefulness of our methodology and compare the finite sample performance of various competing models.
目錄
1 簡介1
1.1 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 概要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 模型架構4
2.1 MCtFA模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 具遺失資料下之MCtFA . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 最大概似估計9
3.1 AECM 演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 ECME 演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 模擬研究14
4.1 模擬資料生成設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 模型選擇. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 分群表現. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 遺失值填補和配適值精準度表現. . . . . . . . . . . . . . . . . . . . . . 22
5 實例分析24
5.1 義大利葡萄酒資料介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 模型配適與分析結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 義大利葡萄酒相關模擬. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 皮馬印第安人糖尿病資料分析. . . . . . . . . . . . . . . . . . . . . . . 35
6 結論40
附錄A MCtFA模型之概似函數及對數概似函數推導45
附錄B MCtFA模型之最大概似估計E 步驟推導48
附錄C MCtFA模型之最大概似估計CM 步驟推導54
附錄D 實例分析附表61
Akaike, H. (1973), “Information theory and an extension of the maximum likelihood
principle,” In B.N. Petrov and F. Csake (eds.), Second International
Symposium on Information Theory, Budapest: Akademiai Kiado, 267–281.
Andrews, J. L., and McNicholas, P. D. (2011), “Extending mixtures of multivariate
t-factor analyzers,” Statistics and Computing, 21, 361–373.
Baek, J., McLachlan, G. J., and Flack, L. K. (2010), “Mixtures of factor analyzers
with common factor loadings: applications to the clustering and visualization
of high-dimensional data,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32, 1–13.
Baek, J., McLachlan, G.J. (2011), “Mixtures of common t-factor analyzers for
clustering high-dimensional microarray data.” Bioinformatics, 27(9), 1269–
1276.
Banfield J.D. and Raftery A.E. (1993), “Model-based Gaussian and non-Gaussian
clustering,” Biometrics, 49, 803–821.
Bennett, K. P., and Bredensteiner, E. J. (1997), “A Parametric Optimization
Method for Machine Learning,” INFORMS Journal on Computing, 9(3), 311–
318.
Biernacki, C., Celeux, G. and Govaert, G. (2000), “Assessing a Mixture Model for
Clustering with the Integrated Completed Likelihood,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, 22, 719–725.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood
from incomplete data via the EM algorithm (with discussion),” Journal of the
Royal Statistical Society, Series B, 39, 1–38.
Flury, B. N. (1984), “Common principle components in k groups,” Journal of the
American Statistical Association, 79, 892–898.
Fokoue, E., and Titterington, D.M. (2003), “Mixtures of factor analyzers. Bayesian
estimation and inference by stochastic simulation,” Mach. Learn, 50, 73–94.
Forina, M., Armanino, C., Castino, M., and Ubigli, M. (1986), “Multivariate data
analysis as a discriminating method of the origin of wines,” Vitis, 25, 189–201.
Garcke, J., Griebel, M., and Thess, M. (2001), “Data Mining with Sparse Grids,”
Computing, 67(3), 225–253.
Ghahramani, Z., and Beal, M. (2000), “Variational inference for Bayesian mixture
of factor analysers”, In: Solla S, Leen T, Muller K-R (eds) Advances in neural
information processing systems 12. MIT Press, Cambridge, 449–455.
Ghahramani, Z., and Hinton, G. E. (1997), The EM algorithm for factor analyzers,
The University of Toronto, Toronto: Technical Report No. CRG-TR-96-1.
42
Habert, L. and Arabie, P. (1985), “Comparing partitions,” Journal of Classifica-
tion, 2, 193–218.
Kotz, S., and Nadarajah, S. (2004), Multivariate t Distributions and Their Appli-
cations, Cambridge, U.K.: Cambridge University Press.
Lee, W.L., Chen, Y.C. and Hsieh, K.S. (2003), “Ultrasonic liver tissues classification
by fractal feature vector based on M-band wavelet transform,” IEEE
Transactions on Medical Imaging, 22, 382–392.
Liu, C., and Rubin, D. B. (1994), “The ECME algorithm: a simple extension of
EM and ECM with faster monotone convergence,” Biometrika, 81, 633–648.
McLachlan, G. J., Bean, R. W., and Jones, L. B. T. (2007), “Extension of the mixture
of factor analyzers model to incorporate the multivariate t-distribution,”
Computational Statistics and Data Analysis, 51, 5327–5338.
McLachlan, G. J., Bean, R. W., and Peel, D. (2002), “A mixture model-based
approach to the clustering of microarray expression data,” Bioinformatics, 18,
413–422.
McLachlan, G. J., and Peel, D. (2000), Finite Mixture Models, New York: Wiley.
McLachlan, G. J., Peel, D., and Bean, R. W. (2003), “Modelling high-dimensional
data by mixtures of factor analyzers,” Computational Statistics and Data Anal-
ysis, 41, 379–388.
McNicholas, P. D., and Murphy, T. B. (2008), “Parsimonious Gaussian mixture
models,” Statistics and Computing, 18, 285–296.
McNicholas, P. D., T. B. Murphy, K. R. Jampani, A. F. McDaid, and L. Banks
(2011), “pgmm Version 1.0 for R: Model-based clustering and classification via
latent Gaussian mixture models,” Technical Report 2011-320, Department of
Mathematics and Statistics, University of Guelph.
Meng, X. L., and Rubin, D. B. (1993), “Maximum likelihood estimation via the
ECM algorithm: a general framework,” Biometrika, 80, 267–278.
Meng, X. L., and van Dyk, D. (1997), “The EM algorithm – an old folk-song sung
to a fast new tune,” Journal of the Royal Statistical Society, Series B, 59,
511–567.
Raymer, M. L., Doom, T. E., Kuhn, L. A., and Punch, W. F. (2003), “Knowledge
discovery in medical and biological datasets using a hybrid Bayes classifier/
evolutionary algorithm,” IEEE Transactions on Systems, Man, and Cy-
bernetics, Part B, 33(5), 802–813.
Rubin, D. B. (1987), Multiple Imputation for Non-response in Surveys, New York:
John Wiley &; Sons.
Schwarz, G. (1978), “Estimating the dimension of a model,” Annals of Statistics,
6, 461–464.
43
Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., Johannes, R. S.
(1988), “Using the ADAP learning algorithm to forecast the onset of diabetes
mellitus,” Proceedings of the Symposium on Computer Applications and Med-
ical Care, IEEE Computer Society Press, 261–265.
Spearman, C. (1904), “General Intelligence, objectively determined and measured,”
American Journal of Psychology, 15, 201–292.
Ueda, N., Nakano, R., Ghahramani, Z., and Hinton, G. E. (2000), “SMEM algorithm
for mixture models,” Neural Computation, 12, 2109–2128.
Utsugi, A., Kumagai, T. (2001), “Bayesian analysis of mixtures of factor analyzers,”
Neural Computation, 13, 993–1002.
Wang, W. L. (2013), “Mixtures of common factor analyzers for high-dimensional
data with missing information,” Journal of Multivariate Analysis, 117, 120–
133.
Wang, W.L., and Lin, T.I. (2013), “An efficient ECM algorithm for maximum
likelihood estimation in mixtures of t-factor analyzers,” Comp. Stat, 28, 751–
769.
Wei, X., Li, C. (2013), “Bayesian mixtures of common factor analyzers: model,
variational inference, and applications,” Signal Process, 93, 2894–2905.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔