( 您好!臺灣時間:2021/05/18 04:02
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):W K Tharanga Mahesh Gunarathne
指導教授(外文):Timothy K. Shih
外文關鍵詞:Open Education Resources (OER)Learning objectssearch enginedata visualizationdata extractiondata transformationclusteringprobabilistic topic modelsmulti-label classificationAutomatic learning object classification
  • 被引用被引用:0
  • 點閱點閱:64
  • 評分評分:
  • 下載下載:9
  • 收藏至我的研究室書目清單書目收藏:0
我們認知學習開源教育知識庫(OER)對於教育品質的提升是常好的策略與機會。目前來說,學生、教師或研究人學者是可以透過教材內容的關鍵字的邏輯組合在通用的檢索引擎中尋找資源。但大多數的檢索引擎並沒有辦法準確找到合適的學習內容。這個研究最主要的目的是提出一個用於開放式教育知識庫的自動化的學習內容分類機制。目前MERLOT II (www.merlot.org)是個擁有大量用戶作為獲取或上傳資源的學習平台。因此我們以MERLOT II實驗的場域。
下一個階段,我們重新定位原始計畫,提出一個自動學習開源教育知識庫內容的分類方法。開源教育知識庫主要的價值取決於能透過網頁檢索引擎進行檢索或定位。目前MERLOT II知識庫要求資源提供者在上傳時必須手動選擇其所屬相關的學科類別,這種作法非常耗時,而且容易有人為疏失。如果選擇了不正確的分類,知識庫中就會發生未存入正確類別的情況。可能導致MERLOT的智慧檢索或進階檢索時學習資源並不會被列出。以上的調查。我們發現開發一個開源知識庫的內容自動分類方案的重要性。資料集是採用MERLOT蒐集資料並採用廣為周知的分類方法,如:Logistic Regression、 (Multinomial) Naive Bayes、Linear Support Vector Machine及Random Forest進行初步實驗以測試準確性。我們提出自動學習內容分類模組(LCCM)將學習資源進行其相關學科的分類,並將其添加入MERLOT知識庫中。本階段的目標包含資料集準備、資料預處裡、使用LDA主題模型的特徵擷取並使用預先訓練的詞彙嵌入矩陣計算語意的相似度。這些方法是可以在短時間內更有效率對學習資源進行分類的基礎。
Open Educational Resources deliver a strategic opportunity to improve the quality of education. At present, OER users, students, instructors, and scholars can find OERs from general search engines through metadata enrichment and logic extrapolation. Yet, most users of Web search engines today face difficulties when searching for decent and appropriate learning materials. The main goal of this study is to propose an automated learning content classification for Open Education Repositories. Since MERLOT II (www.merlot.org) is used by a large number of users to obtain learning resources and to submit resources, the MERLOT II repository was designated as an experimental domain.
In the initial phase, we inspired to propose an enhanced learning object (LO) search engine solution together with a data visualization feature to navigate LOs through a hierarchical knowledge graph based on a single search keyword for LOR users. A Web-based solution was implemented where users could execute a single keyword search and then visualize results on a hierarchical knowledge graph. The back-end of the system was designed with the functions of data extraction, data transformation, data clustering, and data visualization to accomplish our objectives. The outcome of the search and data visualization results indicate that the proposed approach can help users to get a clear overview of the LOs based on a single keyword search.
In the next phase, we repositioned with our original plan of proposing an automated learning content classification for Open Education Repositories. The value of OERs mainly depends on how easy they can be searched or located through a web search engine. Currently, the MERLOT II metadata repository requests resource providers to choose the relevant discipline category manually while adding material to its repository. This practice appears very time-consuming and also bound to involve human errors. If a member picks an incorrect discipline category, then the learning resource may not be correctly categorized in the repository. This situation may result in a learning resource not being shortlisted for a given keyword search of the "MERLOT Smart Search" or in the "Advanced search." Above investigations motivated us to recognize the importance of developing an automated learning content classification solution for OER repositories. The dataset was arranged using the MERLOT data collection and carried out the initial experiments with the well-known classifiers: Logistic Regression, (Multinomial) Naive Bayes, Linear Support Vector Machine, and Random Forest to test the accuracy. An automated learning content classification model (LCCM) was proposed to classify learning resources into relevant discipline categories while adding them to the MERLOT repository. The research goal incorporated in this phase includes dataset preparation, data preprocessing, feature extraction using the LDA topic model, and calculating the semantic similarity using a pre-trained word embedding matrix. These methods serve as a base for classifying learning resources more effectively within a short time.

Acknowledgement iv
List of Figures viii
List of Tables ………………………………………………………………………………………………………………ix
Explanation of Symbols x
Chapter 1 Introduction 1
1.1 First research issue with MERLOT II – Searching a Material 1
1.1.1 Research Contribution - 1 2
1.2 Second research issue with MERLOT II – Adding Materials 3
1.2.1 Research Contribution - 2 4
Chapter 2 Literature Review 5
2.1 Learning Objects and Learning Object Repositories 5
2.1.1 OERCommons 6
2.1.2 MERLOT II 6
2.1.3 Open Stax CNX 6
2.2 Data Pre-Processing 6
2.3 Probabilistic Topic Models 7
2.4 Word2vec 8
2.5 Ida2vec 9
2.6 Text Classification 9
2.5.1 k-Nearest Neighbour Classifiers 9
2.5.2 Decision Tree Classifier 10
2.5.3 Naive Bayes Classifiers 10
2.7 Multi-label Document Classification 11
2.8 Performance Measures 12
2.9 Performance Measures - Multi-label classification 14
Chapter 3 Learning Object Search Engine Solution MERLOT II 18
3.1 Introduction 18
3.1.1 Exploring the research issue with MERLOT Smart Search 19
3.1.2 Research Contribution 20
3.2 Proposed Methodology 21
3.2.1 Data Extraction and Preparation 23
3.2.2 Data Transformation 23
3.2.3 Clustering Algorithm Implementation 25
3.2.4 Information Visualization Implementation of Data Visualization 29
3.3 Experimental Setup 31
3.4 Evaluation of Clustering Results 32
3.5 Discussion 33
3.5.1 Methodology Strength 33
3.5.2 Methodological Limitations 34
3.6 Conclusion and Future works 34
Chapter 4 Automated Learning Content Classification Solution – MERLOT II 36
4.1 Introduction 36
4.1.1 Research issue with MERLOT II – Adding Material 36
4.1.2 Research Contribution - 2 38
4.2 Phase -1 Proposed Methodology 39
4.2.1 Approaching the (Multi-Class) Classification Techniques 39
4.2.2 Data Collection 40
4.2.3 Data Preprocessing 42
4.2.4 Experimental Setup 43
4.2.5 Results and Discussions 43
4.2.6 Performance Evaluation with sample Results 44
4.3 Phase 2 - Proposed Methodology 47
4.4 Experiment Design and Evaluation 48
4.4.1 Data Collection 49
4.4.2 Data Preprocessing 49
4.4.3 Implementation of Probabilistic Topic Models 49
4.4.4 Implementation of LDA 51
4.4.5 Model classification based on similarity scores 52
4.3.3 Results and Discussion 54
Chapter 5 Conclusion and Future works 58
[1] T. Caswell, S. Henson, M. Jensen, and D. Wiley, “Open content and open educational resources: Enabling universal education,” Int. Rev. Res. Open Distribution. Learn., vol. 9, no. 1, 2008.
[2] E. Tovar and N. Piedra, “Guest editorial: open educational resources in engineering education: various perspectives opening the education of engineers,” IEEE Trans. Educ., vol. 57, no. 4, pp. 213–219, 2014.
[3] D. White, M. Manton, and N. Warren, “Open Educational Resources: The value of reuse in higher education,” Creative Commons, 2011.
[4] E. Tovar, H. Chan, and S. Reisman, “Promoting MERLOT Communities Based on OERs in Computer Science and Information Systems,” in Computer Software and Applications Conference (COMPSAC), IEEE 41st Annual, vol. 2, pp. 700–706, 2017.
[5] Learning Technology Standards Committee, “Approved Working Draft of the IEEE Learning Technology Standards Committee (LTSC),” Learning Object Metadata Working Group, IEEE P1484, 2000.
[6] N. Piedra, J. Chicaiza, J. López, E. Tovar, and O. Martinez, “Finding OERs with social-semantic search,” in Global Engineering Education Conference (EDUCON), IEEE, pp. 1195–1200, 2011.
[7] T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” Sci. Am., vol. 284, no. 5, pp. 34–43, 2001.
[8] N. Piedra, E. Tovar, R. Colomo-Palacios, J. Lopez-Vargas, and J. Alexandra Chicaiza, “Consuming and producing linked open data: the case of Opencourseware,” Program, vol. 48, no. 1, pp. 16–40, 2014.
[9] J. Lopez-Vargas, N. Piedra, J. Chicaiza, and E. Tovar, “OER Recommendation for Entrepreneurship Using a Framework Based on Social Network Analysis,” IEEE Rev. Iberoam. Tecnol. del Aprendiz., vol. 10, no. 4, pp. 262–268, 2015.
[10] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Proc. Workshop at ICLR, 2013.
[11] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” Proc. NIPS, 2013.
[12] T. Mikolov, W.Yih, and G. Zweig, “Linguistic Regularities in Continuous Space Word Representations,” Proc. NAACL HLT, 2013.
[13] M. Tomas, C. Kai, C. Greg, and D. Jeffrey, “Efficient estimation of word representations in vector space,” Computer Science, 2013.
[14] V. V. Raghavan, and S. K. M. Wong, “A critical analysis of vector space model for information retrieval.” Journal of the American Society for Information Science, vol. 37, pp. 279–87.16, 1986.
[15] S. Vaidya, and A. Jayshree, "Natural Language Processing Preprocessing Techniques," International Journal of Computer Engineering and Applications, Volume XI, Special Issue, www.ijcea.com ISSN 2321-3469, 2017.
[16] D. M. Blei, “Probabilistic topic models,” Communications of the ACM, 55(4), pp.77-84, 2012.
[17] G. Bettina, and H. Kurt, “Topic models: An R Package for Fitting Topic Model”, Journal of Statistical Software, vol. 40, No. 13, 2011.
[18] D. M. Blei, A. Y. Ng, and M. I. Jordan. “Latent dirichlet allocation.” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003
[19] C. K. Yau, A. Porter, N. Newman, and A. Suominen, “Clustering scientific documents with topic modeling,” Scientometrics, vol. 100, no. 3, pp. 767–786, 2014.
[20] “Topic Modeling with LDA and NMF” [Online] https://medium.com/ml2vec/topic-modeling-is-an-unsupervised-learning-approach-to-clustering-documents-to-discover-topics-fdfbf30e27df [Accessed: 15-Jan-2019]
[21] R. E. Schapire, and Y. Singer, “BoosTexter: A boosting-based system for text categorization,” Machine learning, 39(2-3), pp.135-168, 2000.
[22] C. E. Moody, “Mixing dirichlet topic models and word embedding to make lda2vec.” arXiv preprint arXiv:1605.02019, 2016.
[23] Y. Yang, “An evaluation of statistical approaches to text categorization.” Information Retrieval, vol. 1, pp. 69–90, 1999.
[24] K. A Vidhya. and G. Aghila. “A survey of Na¨ıve bayes machine learning approach in text document classification,” International Journal of Computer Science and Information Security, 7, 206–211.15, 18, 24, 2010.
[25] R. Kumar, and R. Verma, “Classification Algorithms for Data Mining:A Survey”, In Engineering International Journal of Innovations and Technology (IJIET), vol. 1, Issue 2 , pp. 7-14, 2012.
[26] M. M. García, R. P. Rodríguez, L. A. Rifón, and M. V. Ferro, “Towards a multi-label classification of open educational resources,” In IEEE 15th International Conference on Advanced Learning Technologies, pp. 407-408, 2015.
[27] G. Moise, M. Vladoiu, and Z. Constantinescu, “MASECO: a multi-agent system for evaluation and classification of OERs and OCW based on quality criteria,” In E-Learning Paradigms and Applications, pp. 185-227, 2014.
[28] G. Tsoumakas, I. Katakis, “Multi-label Classification: An Overview,” International Journal of Data Warehousing and Mining, vol. 3, No. 3, pp. 1-13, 2007.
[29] M.R. Boutell, Luo J., Shen X., C.M. Brown, “Learning multi-label scene classification,” Pattern recognition, vol. 37, No. 9, pp. 1757-1771, 2004.
[30] A. Santos, A. Canuto, and A.F. Neto, “A comparative analysis of classification methods to multi-label tasks in different application domains,” International Journal of Computer Information Systems and Industrial Management Applications, vol. 3, pp. 218-227, 2011.
[31] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining multi-label data,” In Data mining and knowledge discovery handbook, Springer, 2009.
[32] G. Tsoumakas, I. Katakis, and I. Vlahavas, I., “Random k-labelsets for multilabel classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, No. 7, pp.1079-1089, 2010.
[33] R. E Schapire, and Y. Singer, “BoosTexter: A boosting-based system for text categorization,” Machine learning, vol. 39, No. 2-3, pp.135-168, 2000.
[34] S. Godbole, and S. Sarawagi, “Discriminative methods for multi-labeled classification,” In Pacific-Asia conference on knowledge discovery and data mining, pp. 22-30, 2004.
[35] M. L. Zhang, and Z.H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern recognition, vol. 40, No. 7, pp.2038-2048, 2007.
[36] S. Burkhardt, and S. Kramer, “Online multi-label dependency topic models for text classification,” Machine Learning, vol. 107, No. 5, pp.859-886, 2018.
[37] J. Huang, H. Sun, J. Han, H. Deng, Y. Sun, and Y. Liu, “SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks,” in Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 219–228, 2010.
[38] D. Húsek, J. Pokorn\`y, H. \vRezanková, and V. Snášel, “Web data clustering,” in Foundations of Computational Intelligence, vol. 4, pp. 325–353, 2009.
[39] Y. Bédard, T. Merrett, and J. Han, “Fundamentals of spatial data warehousing for geographic knowledge discovery,” Geogr. data Min. Knowl. Discov., vol. 2, pp. 53–73, 2001.
[40] R. Forsati, M. Mahdavi, M. Shamsfard, and M. R. Meybodi, “Efficient stochastic algorithms for document clustering,” Inf. Sci. (Ny)., vol. 220, pp. 269–291, 2013.
[41] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, and others, “A density-based algorithm for discovering clusters in large spatial databases with noise.,” in Kdd, vol. 96, no. 34, pp. 226–231, 1996.
[42] B. Rao and B. K. Mishra, “An approach to clustering of text documents using graph mining techniques,” Int. J. Rough Sets Data Anal., vol. 4, no. 1, pp. 38–55, 2017.
[43] G. Marchionini, “Exploratory search: from finding to understanding,” Commun. ACM, vol. 49, no. 4, pp. 41–46, 2006.
[44] M. O. Ward, G. Grinstein, and D. Keim, “Interactive data visualization: foundations, techniques, and applications” CRC Press, 2010.
[45] J. Ahn and P. Brusilovsky, “Adaptive visualization of search results: Bringing user models to visual analytics,” Inf. Vis., vol. 8, no. 3, pp. 167–179, 2009.
[46] E. Clarkson, K. Desai, and J. Foley, “Resultmaps: Visualization for search interfaces,” IEEE Trans. Vis. Comput. Graph., vol. 15, no. 6, 2009.
[47] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp.53-65, 1987.
[48] G. Maheshwari, P. Trivedi, H. Sahijwani, K. Jha, S. Dasgupta, and J. Lehmann, “SimDoc: Topic Sequence Alignment based Document Similarity Framework.” In Proceedings of the Knowledge Capture Conference, pp. 1-8, 2017.
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top