(54.236.58.220) 您好!臺灣時間:2021/02/27 12:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:楊智凱
研究生(外文):Zhi-kai Yang
論文名稱:應用自動化文本分類及電子書推薦提升點擊率
論文名稱(外文):Applying Automatic Text Classification and E-book Recommendation to Improve Click Rate
指導教授:楊鎮華楊鎮華引用關係
指導教授(外文):Stephen J.H. Yang
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:46
中文關鍵詞:文本分類自動化文本分類開放教材資源電子書機器學習
外文關鍵詞:Text classificationAutomatic text classificationOpen Educational ResourcesE-bookMachine learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:89
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來大量的開放式教育資源已經逐漸融入於學生各階段的學習過程中,開放式教育資源不但能幫助學生實現自我學習,也能減少教師的課前的備課時間,讓教師專注於解決學生在學習過程所遇到的學習困難。然而,隨著開放式教育資源的數量不斷增加,如何提升各類型教材的被使用程度來幫助學生精準獲得所需要的學習教材,成為開放式教育資源需要解決的問題。有鑑於此,本研究以教育部教育大市集為平台,應用機器學習方法及文本分類技術來提升各類型教材的被使用程度,進而幫助學生精準獲得所需要的學習教材。本研究將會透過比較多種不同的分類模型,從中選擇相對適合此資料集的分類模型並透過LDA進行特徵萃取找出最佳特徵集合,最後使用SGD和投票機制來做模型的修正和決定分類模型。實驗的環境本研究使用Spark進行分散式處理,並利用Cassandra資料庫系統儲存前處理過的資料,而利用隨機森林、支持向量機、邏輯迴歸、類神經網路…等分類器分類好的教材及推薦清單會儲存在MySQL關聯式資料庫系統,最後透過PHP以及JavaScript網頁技術進行使用者介面的推薦。
In recent years, a large number of open educational resources have been gradually blend in the learning process of students at all stages. Open educational resources not only can help students achieve self-learning but also let teachers to reduce their lesson preparative time and focus solving the learning problems of students. However, with the number of open educational resources increasing in these year, how to help students precisely get the required learning materials has become a problem that open educational resources need to solve. In view of this, the study takes the Education Market as a platform. Apply machine learning methods and text classification techniques to improve the usage amount of various types teaching materials, and then helps students precisely get the required learning materials. The study will compare the various classification models to select the appropriate classification model for our data set. After that ,The study will use LDA feature extraction to find the best feature component. In addtion to LDA,the study use SGD and voting mechanisms to make model optimize and integrated the classification model. This study use Spark experimental environment to do distributed processing and use Cassandra database system to store pre-processed data. In our study, we apply multiple different classifiers, such as random forests, support vector machines, logistic regression, neural networks to classify teaching materials and establish recommendation list and the result will be stored in MySQL associative database system. Finally, we can through PHP and JavaScript web technologies to do recommend on everyone user interface.
目錄
摘要 VI
ABSTRACT VII
圖目錄 X
表目錄 XI
一、 緒論 1
二、 文獻探討 2
2.1 多元分類(Multiclass Classification) 2
2.1.1類神經網路(Neural Network) 2
2.1.2邏輯迴歸(Logistic Regression) 3
2.1.3支持向量機(Support Vector Machine, SVM) 4
2.1.4隨機森林(Random Forests) 5
2.2 特徵選取(Feature Selection) 6
2.3 特徵萃取(Feature extraction) 7
2.4 梯度下降法(Gradient Descent) 8
三、 系統設計 8
3.1 系統環境 9
3.2 系統架構 11
3.3 資料收集 13
3.3.1知識架構 13
3.3.2數學詞庫 14
3.3.3電子教材的文本內容 14
3.3.4資料前處理 15
3.3.4.1資料清理 15
3.3.4.2特徵截取 16
3.3.4.3資料轉換 16
3.4 資料儲存 17
3.5 資訊萃取與分析 17
3.5.1分類器的挑選 18
3.5.2資料集的降維 20
3.5.3分類器的優化 20
3.5.4分類器的結合 21
3.6 資訊應用 21
3.6.1 推薦清單的路徑建置 22
3.6.2 推薦教材的選取 22
四、 實驗設計 23
4.1 訓練集與指標設計 24
4.2 模型參數設計 24
4.3 推薦清單點擊率設計 25
五、 結果與討論 25
六、 結論與未來研究 31
七、 參考文獻 32
Atenas, J., & Havemann, L. (2014). Questions of quality in repositories of open educational resources: a literature review. Research in Learning Technology, 22.
Bosch, A., Zisserman, A., & Munoz, X. (2007, October). Image classification using random forests and ferns. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on (pp. 1-8). IEEE.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Chen, M., Mao, S., & Liu, Y. (2014). Big data: a survey. Mobile Networks and Applications, 19(2), 171-209.
Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forests for classification in ecology. Ecology, 88(11), 2783-2792.
Davenport, T. H., & Prusak, L. (1998). Working knowledge: How organizations manage what they know. Harvard Business Press.
Dias, P., & Sousa, A. P. (1997). Understanding navigation and disorientation in hypermedia learning environments. Journal of educational multimedia and hypermedia, 6, 173-186.
Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1), 3.
García-Peñalvo, F. J., García de Figuerola, C., & Merlo, J. A. (2010). Open knowledge: Challenges and facts. Online Information Review, 34(4), 520-539.
Gardner, J. W., Craven, M., Dow, C., & Hines, E. L. (1998). The prediction of bacteria type and culture growth phase by an electronic nose with a multi-layer perceptron network. Measurement Science and Technology, 9(1), 120.
John, G. H., & Langley, P. (1995, August). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence (pp. 338-345). Morgan Kaufmann Publishers Inc..
Su, J., Shirab, J. S., & Matwin, S. (2011). Large scale text classification using semi-supervised multinomial naive bayes. In Proceedings of the 28th international conference on machine learning (icml-11) (pp. 97-104).
Izenman, A. J. (2013). Linear discriminant analysis. In Modern multivariate statistical techniques (pp. 237-280). Springer New York.
Lawrence, R. L., & Wright, A. (2001). Rule-based classification systems using classification and regression tree (CART) analysis. Photogrammetric engineering and remote sensing, 67(10), 1137-1142.
Liu, C., & Wechsler, H. (2002). Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Transactions on Image processing, 11(4), 467-476.
Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., & Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain–computer interfaces. Journal of neural engineering, 4(2), R1.
Manek, A. S., Shenoy, P. D., Mohan, M. C., & Venugopal, K. R. (2017). Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World wide web, 20(2), 135-154.
Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., & de Mendonça, A. (2011). Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC research notes, 4(1), 299.
Morgan, N., & Bourlard, H. (1990, April). Continuous speech recognition using multilayer perceptrons with hidden Markov models. In Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on (pp. 413-416). IEEE.
Nijhuis, J. A. G., Ter Brugge, M. H., Helmholt, K. A., Pluim, J. P. W., Spaanenburg, L., Venema, R. S., & Westenberg, M. A. (1995, November). Car license plate recognition with neural networks and fuzzy logic. In Neural Networks, 1995. Proceedings., IEEE International Conference on (Vol. 5, pp. 2232-2236). IEEE.
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
Rokach, L., & Maimon, O. (2014). Data mining with decision trees: theory and applications. World scientific.
Samant, A., & Adeli, H. (2000). Feature extraction for traffic incident detection using wavelet transform and linear discriminant analysis. Computer‐Aided Civil and Infrastructure Engineering, 15(4), 241-250.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
Romero, C., López, M. I., Luna, J. M., & Ventura, S. (2013). Predicting students' final performance from participation in on-line discussion forums. Computers & Education, 68, 458-472.
Shelton, B. E., Duffin, J., Wang, Y., & Ball, J. (2010). Linking open course wares and open education resources: creating an effective search and recommendation system. Procedia Computer Science, 1(2), 2865-2870.
Statnikov, A., Wang, L., & Aliferis, C. F. (2008). A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC bioinformatics, 9(1), 319.
Xanthopoulos, P., Pardalos, P. M., & Trafalis, T. B. (2013). Linear discriminant analysis. In Robust Data Mining (pp. 27-33). Springer New York.
Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), 966-974.
Thaoroijam, K. (2014). A Study on Document Classification using Machine Learning Techniques. International Journal of Computer Science Issues (IJCSI), 11(2), 217.
Baldwin, R. A. (2009). Use of maximum entropy modeling in wildlife research. Entropy, 11(4), 854-866.
Yao, Y., Welp, T., Liu, Q., Niu, N., Wang, X., Britto, C. J., ... & Montgomery, R. R. (2017). Multiparameter single cell profiling of airway inflammatory cells. Cytometry Part B: Clinical Cytometry.
Grzymala-Busse, J. W., & Hu, M. (2000, October). A comparison of several approaches to missing attribute values in data mining. In International Conference on Rough Sets and Current Trends in Computing (pp. 378-385). Springer, Berlin, Heidelberg.
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010 (pp. 177-186). Physica-Verlag HD.
Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14(2), 202-226.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500), 2323-2326.
Zhen, X., Zheng, F., Shao, L., Cao, X., & Xu, D. (2017). Supervised Local Descriptor Learning for Human Action Recognition. IEEE Transactions on Multimedia.
Li, Z., Liu, J., Tang, J., & Lu, H. (2015). Robust structured subspace learning for data representation. IEEE transactions on pattern analysis and machine intelligence, 37(10), 2085-2098.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6), 1373-1396.
Wang, Q., Wu, Y., Shen, Y., Liu, Y., & Lei, Y. (2015). Supervised sparse manifold regression for head pose estimation in 3D space. Signal Processing, 112, 34-42.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔