(3.235.25.169) 您好!臺灣時間:2021/04/18 04:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:Murman Dwi Prasetio
研究生(外文):Murman Dwi Prasetio
論文名稱:應用單純貝氏機器學習方法調查在工作分析中預測字和子任務類別之間的關係
論文名稱(外文):AN INVESTIGATION OF RELATIONSHIP BETWEEN PREDICTION WORD AND SUBTASK CATEGORY IN TASK ANALYSIS – A NAIVE BAYES BASED MACHINE LEARNING APPROACH
指導教授:林樹強林樹強引用關係
指導教授(外文):Shu-chiang Lin
口試委員:林樹強
口試日期:2012-06-12
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:工業管理系
學門:商業及管理學門
學類:其他商業及管理學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:73
中文關鍵詞:自然語言處理文本處理任務分析機器學習工具快速的礦工礦工文本
外文關鍵詞:natural language processingtext processingtask analysismachine learning toolrapid minertext miner
相關次數:
  • 被引用被引用:0
  • 點閱點閱:312
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:86
  • 收藏至我的研究室書目清單書目收藏:2
傳統上,索引和搜索在幾個任務分析講話內容已實現通過一個單獨構建自然語言處理引擎的組合。基於自然語言處理上講話,講話是人類,也最自然,最有效的形式在講話人類之間的信息交換之間的主要溝通模式。因此,它是唯一合乎邏輯的,未來的科技發展是自然語言語音識別人機交互(HCI)。不幸的是,與計算機系統的發展和它的用戶界面,一個用戶的當前活動任務分析是不夠的猜測用戶將做什麼任務,按照以前的任務。在林和萊赫托的研究,開發了基於貝葉斯的半自動化的任務分析工具,以幫助任務分析家預測知識代理人代理嘗試,以幫助客戶解決他們的問題,從電話交談的任務/子類別。
本研究的目的是為了研究集林和萊赫托(2007年)的建立和進一步分析的貝葉斯基於任務分析模型,通過比較現有數據集之間的兩台機器學習開放的結果由林和萊赫托(2009)的建議基於貝葉斯方法文本礦工和霍夫曼,M和科林貝爾,研究2009年發明的快速礦工的源程序。在這種分析中,快速的礦工程序共產生了15個預測結果的話在電話的呼叫中心代理和客戶之間的對話交談。 15個組合詞組成的單詞,對詞,三詞,翻兩番的話,單,雙字,三聯單的話,單翻兩番的話,對三的話,對翻兩番的話,三倍,四倍的話,三聯單對的話,單對翻兩番的話,單三四倍的話,對三,四的話,單對三,四的話。為了確定預測的話和主要子任務類別之間的關係,本研究試圖比較快速的礦工和文本挖掘,機器學習開源計劃。
這些研究觀察結果迅速礦工天真比文本礦工表現不佳的貝葉斯預測預測單詞和主要子任務類別之間的關係為基礎的工具。基於分析的快速礦工和文本挖掘每個人都有71 5184敘事對話集,為所有的敘述集快速礦工的精確率33%,並測試設置26%,也為培訓工具的性能設置35%的子任務類別平均預測正確的概率19.91%。共有11個類別,有超過50%的正確預測。出11大類,39個有低於50%的正確預測。比較的主要模糊貝葉斯任務分析子任務類別包括13個類別的文本挖掘工具的結果有正確預測的80%或以上和34個類別,有50%以上的正確預測。然而,由於文本礦工下發展,需要進一步分析具有相同的數據集,再次確認的結果,並比較其他工具的基礎上的文字處理與其他算法或模型的發展。
Traditionally, indexing and searching of speech content in several tasks analysis have been achieved through a combination of separately construct natural language processing engines. Natural language processing is based on speech, the speech is primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. So, it is only logical that the next technological development to be natural language speech recognition for Human Computer Interaction (HCI).Unfortunately, in line with the development of computer system and its user interface a task analysis of users' current activities is not sufficient to guess what tasks the users will do following the previous tasks. In Lin and Lehto’s study, a Bayesian based semi-automated task analysis tool was developed to help task analysts predict categories of tasks/subtasks performed by knowledge agents from telephone conversations where agents were trying to help customers to troubleshoot their problems.
The purpose of this study is to examine the dataset that was established by Lin and Lehto (2007) and further analyze the result of Bayesian based task analysis model proposed by Lin and Lehto (2009) by comparing the existing datasets result between two machine learning open source program based on Bayesian approach Text miner and Rapid miner which was invented by Hofmann, M and Klinkenberg, R, 2009. In this analysis, the Rapid Miner program generated a total of fifteen prediction results words in telephone’s dialog conversation between call center agent and customer. The fifteen combination words consist of single-words, pair-words, triple-words, quadruple words, single-pairs words, single-triple words, single-quadruple words, pair-triple words, pair-quadruple words, triple-quadruple words, single-pair-triple words, single-pair-quadruple words, single-triple-quadruple words, pair-triple-quadruple words, single-pair-triple-quadruple words. To identify the relationship between prediction words and main subtask categories, this study tries comparing machine learning open source program between Rapid Miner and Text Miner.
These studies observe the results from rapid miner tool based on naive Bayesian having a poor performance than text miner to predict the relationship between prediction words and main subtask categories. Based on analysis Rapid Miner and Text Miner each has 71 subtask categories for 5184 narratives dialog datasets, the precision rate of rapid miner for all narratives datasets 33%, and testing set 26% and also for the training set 35% also the tool performance an average of correct prediction probability 19.91%. A total of 11 categories have correct prediction of over 50%. Out of these 11 Categories, 39 have correct predictions of below 50%. Compare to text miner tool’s results of main subtask categories in Fuzzy Bayesian task analysis consists of 13 categories have correct predictions of 80% or above and 34 categories have correct predictions of 50% or above. However, since Text miner under developing, a further analysis with the same datasets is needed to reconfirm the findings and compare the other tool based on text processing with the other algorithm or model development.
ABSTRACT i
ACKNOWLEDGEMENT ii
TABLE OF CONTENTS iii
LIST OF FIGURES vi
LIST OF TABLES vii
INTRODUCTION 1
1.1. Research Background 1
1.2. Research Objectives 3
1.3. Research Outline 4
LITERATURE REVIEW 5
2.1. Human-Machine Communication through Voice 5
2.2. Speech Recognition 7
2.2.1 The Basic Problem Speech Recognition 11
2.3. Task Analysis 12
2.4. The Techniques of Task Analysis 12
2.4.1 Hierarchical Task Analysis 13
2.4.2 Cognitive Task Analysis 14
2.5. Machine Learning 16
2.6. Text Processing 20
2.6.1. Text Classification 20
2.6.2. Information Extraction 21
2.6.3. Token Identification 22
2.7. Knowledge Acquisition 23
2.8. Statistical Methods 25
2.9. Evaluation Measurement 25
2.10. Related Work 27
2.10.1. Data Collection 29
2.10.2. Pre-processing Data 29
2.10.3. Define the Main Subtask Categories 30
2.10.4. Text Miner 32

METHODOLOGY 35
3.1. Introduction 35
3.2. The Existing Data Result 35
3.3. The New Proposed Frame Work 37
3.4. Hardware and Software Devices 39
3.5. Machine Learning Tool General Environment 39
3.5.1. Rapid Miner 39
3.6. Machine Learning Tool Workspace 43
3.6.1. Rapid Miner 43
RESULT AND DISCUSSION 53
4.1. Introduction 53
4.1.1. Transition Matrix Result 54
4.1.2. Comparison Machine Learning Tool Environment Result 64
CONCLUSION AND FUTURE WORK 67
5.1. Conclusion 67
5.2. Future Works 68
REFERENCES 69
APPENDICES 70
Appendix A Partial Listing of the Table Dialog between Knowledge Agent and Customer that Contains 5184 narratives 70
Appendix B Partial Listing of the Table Dialog between Knowledge Agent and Customer for All Datasets that Contains 5184 narratives 70
Appendix C Partial Listing of the Table Dialog between Knowledge Agent and Customer for Training Datasets that Contains 3472 narratives 70
Appendix D Partial Listing of the Table Dialog between Knowledge Agent and Customer for Testing Datasets that Contains 1712 narratives 70
Appendix E Partial Listing of the Table Decompositions Subtask Categories 71
Appendix F Partial Listing of the Single-Words Frequency List That Contains 2,535 Occurrences of Single Words 72
Appendix G Partial Listing of the Pair-Words Frequency List That Contains 59,571 Occurrences of Pair Words 72
Appendix H Partial Listing of the Triple-Words Frequency List That Contains 315,741 Occurrences of Triple Words 72
Appendix I Partial Listing of the Quadruple-Words Frequency List That Contains 441,342 Occurrences of Quadruple Words 72
Appendix J Partial Listing of Combination Words 73
Annet, J. (2005). Hierarchical Task Analysis (HTA).In Stanton et al., (2005): Chapter 33 (pp. 33-1 – 33-7).
Appelt, D.E. (1999) “Introduction to information extraction technology.” Tutorial, Int Joint Conf on Artificial Intelligence IJCAI’99 . Morgan Kaufmann, San Mateo.
Apte, C., Damerau, F.J. and Weiss, S.M. (1994) “Automated learning of decision rules for text categorization.” ACM Trans Information Systems, Vol. 12, No. 3, pp. 233-251.
Baber, C. (1991). Speech Technology in Control Room Systems: A Human Factors Perspective. New York: Ellis Horwood.
Brin, S. and Page, L. (1998) “The anatomy of a large-scale hypertextual Web search engine.”Proc World Wide Web Conference WWW-7.In Computer Networks and ISDN Systems , Vol. 30, No. 1-7, pp. 107-117.
Cunningham, H. (2002) “GATE, a General Architecture for Text Engineering.” Computing and the Humanities, Vol. 36, pp. 223-254.
Davis.J, MacLean.J, Dampier D, Methods of Information hiding and detection in file systems, 2010 Fifth IEEE International workshop, 66.
Fisher, D. (1987) “Knowledge acquisition via incremental conceptual clustering.” Machine Learning Vol. 2, pp. 139–172.
Hearst, M.A. (1999) “Untangling text mining.” Proc Annual Meeting of the Association for Computational Linguistics ACL99 .University of Maryland, June.
Keller, E.: “Fundamentals of Speech Synthesis and Speech Recognition”, John Wiley & Sons, New York, USA, (1994).
Lin, S. and Lehto, M.R. (2009).A Bayesian Based Machine Learning Application to Task Analysis, Encyclopedia of Data Warehousing and Mining, Classification B, 1-7, Wang, John (Ed., 2nd Edition).
L. Liu and M. T. Zsu.Encyclopedia of Database Systems. Springer Publishing Company, Incor-porated, 2009
Nahm, U.Y. and Mooney, R.J. (2002) “Text mining with information extraction.” Proc AAAI-2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases. Stanford, CA.
Porter, M.F. (1980) “An algorithm for suffix stripping.” Program, Vol. 13, No. 3, pp. 130-137.
Rabiner, L.; Juang B.: “Fundamentals of Speech Recognition”, Prentice Hall, Englewood Cliffs, New Jersey, (1993).
Sebastiani, F. (2002) “Machine learning in automated text categorization.” ACM Computing Surveys, Vol. 34, No. 1, pp. 1–47.
T. Fomby. Naive bayes classifier. April 2008.
Tkach, D. (editor) (1998) “Text mining technology: turning information into knowledge.” IBM White Paper, Feb 17, 1998.
Witten, I.H. and Frank, E. (2000) Data mining: Practical machine learning tools and techniques with Java implementations .Morgan Kaufmann, San Francisco, CA.
Zhigang Zhou, Application of Data field clustering in Computer Forensics, ICIC ?10 Proceedings of 2010 Third International Conference on Information and Computing-volume 1.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔