(54.236.58.220) 您好!臺灣時間:2021/02/28 08:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:顧哲文
研究生(外文):Che-wen Ku
論文名稱:基於矩陣分解的主題推薦與發現
論文名稱(外文):Topic Recommendation and Discovery based on Matrix Factorization
指導教授:康藝晃
指導教授(外文):Yihuang Kang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:49
中文關鍵詞:主題發現非負矩陣分解推薦主題模型
外文關鍵詞:Non-negative Matrix FactorizationTopic DiscoveryRecommendationTopic Modeling
相關次數:
  • 被引用被引用:0
  • 點閱點閱:104
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:13
  • 收藏至我的研究室書目清單書目收藏:1
隨著網路的發達,現在在網路上有越來越多的文件,因為很多的資訊都與文字息息相關,像是新聞或文章. 因此,有很多的學者利用這些文件來做文字分析. 而非負矩陣分解是一種用非機率的方法用來分解文集. 在這篇論文中,我們提出利用稀疏限制的非負矩陣分解來做k個主題的主題模型. 此外, 我們想在稀疏限制的非負矩陣分解中加入一個與作者相關的矩陣,並且找出在主題中隱藏的部分. 它可以給我們更多的資訊進而幫助我們找出來的主題更加集中.其中,決定主題個數k是一個困難但我們必須解決的問題,所以我們利用互資訊與穩定度來評估主題數k. 它可以提供對於主題數k的一個參考. 除此之外,我們想要利用Jensen-Shannon divergence來找出每個主題在不同時間裡面的詞語的改變. 他可以計算出主題之間的距離並且我們可以利用Hungarian algorithm找出不同時間中對應的主題.
Nowadays, there are more and more text documents on the Internet with the development of the Internet, because much information is related to text. Thus, researchers have used these text documents for text analysis. Non-negative Matrix Factorization is a kind of non-probabilistic method to factorize the matrix. In this thesis, we propose to use sparse-constraint NMF to do topic modeling with k topics. Moreover, we want to incorporate author information into nsNMF and so as to find hidden parts in the topics. It can offer more information and make the topic more concentrated. Among it, how many topic k is also a critical but difficult issue. Here, we use the mutual information and stability to determine the number of topic k. Besides, we want to find the changes of terms in topics in different time using Jensen-Shannon divergence and use Hungarian algorithm to match the topics in different times.
1. Introduction 1
2. Background and Related works 4
2.1 LDA 4
2.2 SVD 5
2.3 NMF 6
3. Method 9
3.1 How many topics k? 9
3.2 Nonsmooth Non-negative Matrix Factorization (nsNMF) 12
3.2 nsNMF with constraint 14
3.3 Topic Discovery 15
4. Experiment & Result 17
4.1 Data and Preprocessing 17
4.2 COOL3C news 19
4.2.1 Document-Term Matrix 19
4.2.2 TF-IDF 20
4.2.3 SVD 21
4.2.4 nsNMF 22
4.2.5 How many topic k? 24
4.2.6 Topic Modeling 25
4.2.7 Article Recommendation 27
4.3 arXiv.ML papers 29
4.3.1 How many topic k? 29
4.3.2 Topic Modeling 31
4.3.3 nsNMF with constraint 32
4.3.4 Topic Discovery 33
5. Conclusion 35
6. Reference 37
Aggarwal, C. C., & Zhai, C. (2012). Mining text data. Springer Science & Business Media. Retrieved from https://www.google.com/books?hl=zh-TW&lr=&id=vFHOx8wfSU0C&oi=fnd&pg=PR3&dq=mutual+information+topic+modeling&ots=obag_JmIVy&sig=fQ_MXiuGSe8t_-QXuxA_1deQRg0
Arora, S., Ge, R., & Moitra, A. (2012). Learning Topic Models - Going beyond SVD. arXiv:1204.1956 [Cs]. Retrieved from http://arxiv.org/abs/1204.1956
Baker, L. D., & McCallum, A. K. (1998). Distributional clustering of words for text classification. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 96–103). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=290970
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Cai, D., He, X., Han, J., & Huang, T. S. (2011). Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1548–1560.
Carmel, D., Yom-Tov, E., Darlow, A., & Pelleg, D. (2006). What makes a query difficult? In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 390–397). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1148238
Choo, J., Lee, C., Reddy, C. K., & Park, H. (2013). Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Transactions on Visualization and Computer Graphics, 19(12), 1992–2001.
Gillis, N. (2014). The why and how of nonnegative matrix factorization. Regularization, Optimization, Kernels, and Support Vector Machines, 12(257). Retrieved from https://www.google.com/books?hl=zh-TW&lr=&id=5Y_SBQAAQBAJ&oi=fnd&pg=PA257&dq=The+Why+and+How+of+Nonnegative+Matrix+Factorization&ots=nwGtxapMBn&sig=TnywuixkEgkwtbnH5t0n5wrj58Y
Gong, L., & Nandi, A. K. (2013). An enhanced initialization method for non-negative matrix factorization. In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 1–6). https://doi.org/10.1109/MLSP.2013.6661949
Greene, D., & Cross, J. P. (2016). Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. arXiv:1607.03055 [Cs]. Retrieved from http://arxiv.org/abs/1607.03055
Greene, D., O’Callaghan, D., & Cunningham, P. (2014). How Many Topics? Stability Analysis for Topic Models. arXiv:1404.4606 [Cs]. Retrieved from http://arxiv.org/abs/1404.4606
Grosse, I., Bernaola-Galván, P., Carpena, P., Román-Roldán, R., Oliver, J., & Stanley, H. E. (2002). Analysis of symbolic sequences using the Jensen-Shannon divergence. Physical Review E, 65(4), 41905.
Langville, A. N., Meyer, C. D., Albright, R., Cox, J., & Duling, D. (2014). Algorithms, initializations, and convergence for the nonnegative matrix factorization. arXiv Preprint arXiv:1407.7299. Retrieved from https://arxiv.org/abs/1407.7299
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.
Li, Z., Tang, Z., & Ding, S. (2013). Dictionary learning by nonnegative matrix factorization with 1/2-norm sparsity constraint. In Cybernetics (CYBCONF), 2013 IEEE International Conference on (pp. 63–67). IEEE. Retrieved from http://ieeexplore.ieee.org/abstract/document/6617435/
Liu, J., Wang, C., Gao, J., & Han, J. (2013). Multi-view clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining (pp. 252–260). SIAM. Retrieved from http://epubs.siam.org/doi/abs/10.1137/1.9781611972832.28
Pascual-Montano, A., Carazo, J. M., Kochi, K., Lehmann, D., & Pascual-Marqui, R. D. (2006). Nonsmooth Nonnegative Matrix Factorization (nsNMF). IEEE Trans. Pattern Anal. Mach. Intell., 28(3), 403–415. https://doi.org/10.1109/TPAMI.2006.60
Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 952–961). Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=2391052
Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp. 267–273). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=860485
Zou, H., Zhou, G., & Xi, Y. (2011). Research on Modeling Microblog Posts Scale Based on Nonhomogeneous Poisson Process. In G. Zhiguo, X. Luo, J. Chen, F. L. Wang, & J. Lei (Eds.), Emerging Research in Web Information Systems and Mining (pp. 99–112). Springer Berlin Heidelberg. Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-24273-1_14
機器學習中的數學(5)-強大的矩陣奇異值分解(SVD)及其應用- LeftNotEasy - 博客園. (n.d.). Retrieved November 18, 2016, from http://www.cnblogs.com/LeftNotEasy/archive/2011/01/19/svd-and-applications.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔