跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.86) 您好!臺灣時間:2025/02/15 09:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李冠霖
研究生(外文):Kuan-Lin Lee
論文名稱:基於問卷資料中問答語意之個人偏好推論
論文名稱(外文):Personal Preference Inference Based on Semantics ofQuestion-Answer Pairs in Questionnaire Data
指導教授:張時中張時中引用關係
指導教授(外文):Shi-Chung Chang
口試日期:2017-06-27
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:91
中文關鍵詞:偏好推論問題-答案組合語意問卷語法結構關鍵字依存文法分析器word2vec支撐向量機
外文關鍵詞:preference inferencequestion-answer pairsemanticsquestionnairesyntaxkeyworddependency parserword2vecSVM
相關次數:
  • 被引用被引用:0
  • 點閱點閱:281
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
問卷調查是獲知個人偏好的泛用方法,透過電腦快速、精準地從問卷資料推論個人偏好的技術,在商業競爭激烈的現代社會中更顯重要。問卷調查即透過問卷中的問題來描述某物品或行為,再要求個人回答對該物品或行為的偏好為何。一對問題、答案即揭露了個人對行為的偏好。
宋名傑於2015年設計了基於語意的QPIE分析問卷資料並據以推論個人偏好。QPIE首先根據問題的語法萃取三關鍵字來代表問題的語意,再將每個關鍵字轉為語意向量並堆疊之形成問題的語意向量,最後,問題的語意向量根據個人對該問題的回答分類。然而分開考慮問題語意和答案是QPIE可改進之處。
本論文的研究問題為「如何基於問答語意來推論個人偏好?」為解決該問題,本論文的基本構想是轉化問題加答案的語意為問答語意向量,來代表個人偏好行為,再利用新舊個人偏好行為間的語意關聯,推論新的個人偏好行為,即對新問題的回答。因此需要克服以下挑戰:一)輸入的每組問題和答案是分開的兩個句子,如何萃取出兩者的語意?二)如何定義並計算每一組問題、答案之間的語意相似度,並藉此推論新問題當中的個人偏好行為。
本論文針對上述挑戰設計「基於問題答案組合之個人偏好推論模型」(QAPIE)系統,其設計包含以下部分對應上述挑戰:
1. 基於問題語法結構來結合並萃取問答關鍵字以構成問答語意向量之方法
句子中各單字依據語法組成句子。個人偏好行為由問題加答案共同組合而成,其中問題和答案分屬不同句子。問、答的語法結構首先被分析再合併,再用「基於語法結構之關鍵字萃取演算法2.0 (SKEA2.0)」將問題、答案中的關鍵字依動詞、受詞、答案、修飾語的順序萃取出。最後將各關鍵字的語意向量堆疊成問答語意向量。
2. 藉由既有個人偏好行為、候選新偏好行為之間的語意關聯來推理偏好
填答者選擇某答案選項代表該答案選項是他個人偏好的正例,同時他未選的其他答案選項是他個人偏好的反例。所有「既有問題加已選答案」的語意向量劃分為個人偏好行為的正例集合,所有「既有問題加未選答案」的語意向量劃分為個人偏好行為的反例集合。將新問題和可能的答案選項合併成4個候選新問答語意向量,然後使用SVM分類方法找出哪一個候選新問答語意向量最接近個人偏好正例集合而遠離個人偏好反例集合,進而推論該候選新問答語意向量當中的答案選項就是個人對新問題的答案。
為實踐並驗證QAPIE創新之方法設計,本論文採用科技部傳播調查資料庫[TCS13]之問卷資料,藉由1313人對於44個詢問對某行為偏好程度的問題測試QAPIE的推論準確度。本實驗使用[Sun15]定義的預測準確度來評估得QAPIE的預測準確度78.86%,顯著高於QPIE的預測準確度66.65%。
本論文的貢獻在於創新利用問卷中問題和答案的語意來描述個人偏好行為,並藉此設計了QAPIE利用個人偏好行為之間的語意關係來推論新的個人偏好行為,亦即推論個人對新問題的偏好回答。具體貢獻條列如下:
(1) 將答案語意納入推論過程已捕捉其中隱含的偏好。
(2) 使用建構句子的語法來結合問題、答案兩個句子並萃取出能代表其語意的關鍵字。
(3) 問題、答案的語意被結合成個人偏好行為來考慮,找出哪個問題加哪個答案語意上最相似已知的個人偏好行為。
(4) 藉由實際問卷資料驗證基於以上方法所實作之系統QAPIE之個人偏好推論準確度平均達78.86%,顯著高於參考基準QPIE之66.65%。
Questionnaire survey is a widely used approach to acquire personal preference. The technique to infer personal preference using computers is even more important for businesses in the competitive modern society. Questionnaire survey describes certain behaviors or objects in questions, and asks respondents to rate their preferences to those behaviors or objects by answering questions. A question and answer (Q&A) pair therefore reveals Personal Preference to a Behavior (PPB).
Ming-Chieh Sung, 2015, designed a semantics-based approach, QPIE, to analyze and infer personal preferences from a questionnaire survey. QPIE first extracts three keywords as semantics of every question, according to the syntax of the question. Second, it transforms every keyword into a semantic vector and concatenates three semantic vectors of keywords to form a semantic vector of the question. At last, semantic vectors of questions are classified into four categories which represent four answer options. The main deficiency of QPIE is the separate consideration of a question and its answer.
In this thesis, the research problem is: how to extract and infer personal preferences based on the semantics of each Q&A pair? To solve the problem, there are two ideas. First, a Q&A pair is transformed into a Q&A semantic vector to represent a PPB. Second, the answer to a test question is inferred by identifying the answer that form a Q&A pair most similar to the training PPBs of the person. Accordingly, there are two challenges: (1) An input question and the associated answer are essentially two different sentences. How should the semantics be extracted from them? (2) How should the semantic similarity be defined and calculated among Q&A pairs for inferring PPB of a test question?
To solve the problem, this thesis designs a “Question and Answer-based Personal Preference Inference Engine (QAPIE),” which consists of two parts matching the two challenges above.
1. Q&A semantic vector construction by extracting and combining keywords from a Q&A pair based on its syntax.
Syntax is the structure to construct a sentence from words within. The syntaxes of Q&A are first parsed as dependency trees and merged. Then Syntax-based Keywords Extraction Algorithm 2.0 (SKEA2) is utilized to identify verb, object, answer, modifier as keywords in order from the merged Q&A. At last, semantic vectors of keywords are concatenated to form a Q&A semantic vector.
2. Preference inference based on the semantic similarity among test PPB and training PPBs
A respondent selects an answer option to a training question as a positive instance of his preference, while he leaves other answer options unselected as negative instances of his preference. A PPB set of positive instances is constructed with every “training question & selected answer option”, while the other PPB set of negative instances is constructed with every “training question & unselected answer option”. A test question is then combined with possible answer options to formulate candidates of test PPB. The SVM classifier is then used to classify which candidate of test PPB is most likely to belong to the PPB set of positive instances and dissimilar from PPB set of negative instances, so that the answer in that candidate is the inferred answer to the test question.
To implement and verify QAPIE, this thesis tests the Inference Accuracy of QAPIE on the questionnaire data from Taiwan Communication Survey with 1313 respondents and 44 questions regarding personal preferences to Internet usage behaviors. In this experiment QAPIE achieves 78.86% inference accuracy which is significantly higher than 66.65% achieved by QPIE due to using ordinal relationships among PPBs.
The main contribution of this thesis is to describe personal preferences by the semantics of questions as well as answers, and thus design QAPIE to infer test personal preferences based on the semantic relationships among PPBs. Specifically, contributions are listed below:
(1) Answer semantics is adopted in preference inference process to capture the preference implied within.
(2) Syntax constructing sentences from words is utilized to combine and extract keywords from Q&A as two sentences.
(3) Q&A are bound as PPB to compare semantics of answer as well as question, so the answer to a test question can be inferred by identifying the answer that form a Q&A pair most similar to the training PPBs.
(4) On real questionnaire data, QAPIE achieves 78.86% Inference Accuracy which is significantly higher than 66.65% achieved by QPIE as a benchmark.
Content
ABSTRACT II
中文摘要 VI
LIST OF FIGURES XI
LIST OF TABLES XII
CHAPTER 1. INTRODUCTION TO PREFERENCE INFERENCE 1
1.1 BACKGROUND AND MOTIVATION 1
1.2 LITERATURE SURVEY 2
1.2.1 Questionnaire Data from Taiwan Communication Survey (TCS) 2
1.2.2 Deterministic Personal Preferences Exposed in Questionnaire Data 3
1.2.3 Ordinal Personal Preference 4
1.3 PROBLEM SCENARIO 5
1.4 RESEARCH SCOPE 5
CHAPTER 2. PERSONAL PREFERENCE INFERENCE BASED ON SEMANTICS OF QUESTION-ANSWER PAIRS IN QUESTIONNAIRE DATA 6
2.1 REVIEW TO QPIE 7
2.1.1 Knowledge Abstraction 8
2.1.2 Numerical Representation of a Single-sentence Question 12
2.1.3 Semantic Relations among Questions for Preference Inference 15
2.1.4 Validation of QPIE 17
2.1.5 Result: Similarities between Questions are Captured in QPIE 18
2.2 DEFICIENCY OF QPIE 20
2.3 DEFINITION OF RESEARCH PROBLEM: HOW TO INFER PERSONAL PREFERENCE BASED ON SEMANTICS OF QUESTIONS AND ANSWERS? 22
2.3.1 Definition of Personal Preference Disclosed by Questionnaire Answers 22
2.3.2 Sub-Problem 1: How to Construct a Semantic Vector of a Question-Answer Pair from a Question and an Answer According to Their Syntaxes? 26
2.3.3 Sub-Problem 2: How to Infer Personal Preference Based on Semantic Similarity among PPBs? 27
CHAPTER 3. SEMANTICS OF PERSONAL PREFERENCE TO BEHAVIOR (PPB) EXTRACTED FROM QUESTION-ANSWER PAIRS 28
3.1 ORDERED KEYWORDS TO REPRESENT SEMANTICS OF PPB 30
3.1.1 Syntax as the Basis to Extract Keywords 31
3.1.2 Syntax-based Keywords Extraction Algorithm 2 (SKEA2) 32
3.2 ANALYSIS AND EVALUATION OF SKEA2 RESULTS 39
3.2.1 Results Comparison of SKEA2, SKEA, Human Encoders 40
3.2.2 Analysis of Difference between SKEA2 and Human Encoder’s Extraction 41
3.2.3 Main improvement in SKEA2 is Modifiers 42
3.3 NUMERICAL REPRESENTATION OF PPB SEMANTICS BASED ON ORDER OF KEYWORDS 44
CHAPTER 4. PREFERENCE INFERENCE BASED ON SEMANTIC SIMILARITY AMONG PERSONAL PREFERENCES TO BEHAVIORS (PPB) 46
4.1 POSITIVE/NEGATIVE INSTANCES OF PPBS TO REPRESENT PERSONAL PREFERENCES 48
4.1.1 Personal Preference Assumed as a Single Answer Option 48
4.1.2 SVM Classifier Trained on Positive/Negative Instances of PPBs to Capture Personal Preferences 48
4.2 PREFERENCE INFERENCE BASED ON SEMANTIC SIMILARITY AMONG PPBS 50
4.3 DISCUSSION OF EXTRA PERSONAL PREFERENCE INFORMATION BY CONSIDERING ANSWER SEMANTICS IN ADDITION TO QUESTION SEMANTICS 53
CHAPTER 5. QAPIE IMPLEMENTATION AND EXPERIMENT RESULT 57
5.1 OPERATION PROCEDURE OF QAPIE SYSTEM 57
5.2 QAPIE EXPERIMENT RESULT FROM QUESTIONNAIRE DATA 61
5.3 FUTURE WORK MOTIVATED BY EXPERIMENT RESULTS 64
CHAPTER 6. CONCLUSIONS 67
APPENDIX 69
A. HOW SEMANTIC VECTORS TRAINED IN NNLM 69
A.1. Notations in NNLM 69
A.2. Word Vector is a Combination of Concepts trained in NNLM 71
B. QUESTIONS SELECTED FROM TCS QUESTIONNAIRE 76
C. KEYWORDS EXTRACTED BY SKEA2, HUMAN ENCODERS, SKEA 81
D. CASES STUDIED RESULTS OF SKEA2 81
REFERENCES 85
[Sun15] Ming-Chieh Sung, Shi-Chung Chang, Peter B. Luh, 2015, Design of Personal Preference Inference from Questionnaire Data with Exemplary Application. Master Thesis, EE Dept., National Taiwan University, Taipei, Taiwan.
[Hug68] Hughes, G., 1968. On the mean accuracy of statistical pattern recognizers. IEEE transactions on information theory, 14(1), pp.55-63.
[YMS02] Yates, D., Moore, D.S. and Starnes, D.S., 2002. The practice of statistics: TI-83/89 graphing calculator enhanced. Macmillan.
[BDV03] Bengio, Y., Ducharme, R., Vincent, P. and Jauvin, C., 2003. A neural probabilistic language model. Journal of machine learning research, 3(Feb), pp.1137-1155.
[BFK95] Bies, A., Ferguson, M., Katz, K., MacIntyre, R., Tredinnick, V., Kim, G., Marcinkiewicz, M.A. and Schasberger, B., 1995. Bracketing guidelines for Treebank II style Penn Treebank project. University of Pennsylvania, 97, p.100.
[Wal14] Waltz, D.L. ed., 2014. Semantic Structures (RLE Linguistics B: Grammar): Advances in Natural Language Processing (Vol. 23). Routledge.
[EOR07] Erkan, G., Özgür, A. and Radev, D.R., 2007, June. Semi-supervised classification for extracting protein interaction sentences using dependency parsing. In EMNLP-CoNLL (Vol. 7, pp. 228-237).
[JPK11] Han, J., Pei, J. and Kamber, M., 2011. Data mining: concepts and techniques. Elsevier.
[MaJ00] Martin, J.H. and Jurafsky, D., 2000. Speech and language processing. International Edition, 710, p.25.
[MYZ13] Mikolov, T., Yih, W.T. and Zweig, G., 2013, June. Linguistic regularities in continuous space word representations. In hlt-Naacl (Vol. 13, pp. 746-751).
[SaT08] Sagae, K. and Tsujii, J.I., 2008, August. Shift-reduce dependency DAG parsing. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 753-760). Association for Computational Linguistics.
[SiA00] Siolas, G. and d''Alché-Buc, F., 2000. Support vector machines based on a semantic kernel for text categorization. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on (Vol. 5, pp. 205-209). IEEE.
[TCS13] Taiwan Communication Survey, phase one, year two, official questionnaire, 2013. [Online]. Available: http://www.crctaiwan.nctu.edu.tw/AnnualSurvey_detail_e.asp?ASD_ID=17
[WZH09] Wu, Y., Zhang, Q., Huang, X. and Wu, L., 2009, August. Phrase dependency parsing for opinion mining. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3 (pp. 1533-1541). Association for Computational Linguistics.
[YLL05] Ye, Q., Lin, B. and Li, Y.J., 2005, August. Sentiment classification for Chinese reviews: A comparison between SVM and semantic approaches. In Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on (Vol. 4, pp. 2341-2346). IEEE.
[Wor] Word2vec. Google. [Online]. Available: code.google.com/archive/p/word2vec/
[TuP10] Turney, P.D. and Pantel, P., 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37, pp.141-188.
[Fri96] Friedman, J., 1996. Another approach to polychotomous classification (Vol. 56). Technical report, Department of Statistics, Stanford University.
[Dic] Dictionary.com. Definitions from Dictionary.com. [online] Available at: http://www.dictionary.com/
[Mer] Merriam-Webster Dictionary. Definition of clause by Merriam-Webster. [online] Available at: https://www.merriam-webster.com/dictionary/
[Sch05] Scherer, K.R., 2005. What are emotions? And how can they be measured?. Social science information, 44(4), pp.695-729.
[MYM06] Martin, T.L., Yu, C.T., Martin, G.L. and Fazzio, D., 2006. On choice, preference, and preference for choice. The behavior analyst today, 7(2), p.234.
[Sch15] Schultz, W., 2015. Neuronal reward and decision signals: from theories to data. Physiological Reviews, 95(3), pp.853-951.
[Rey61] Reynolds, G.S., 1961. Relativity of response rate and reinforcement frequency in a multiple schedule. Journal of the Experimental Analysis of Behavior, 4(2), pp.179-184.
[VaC15] Vapnik, V.N. and Chervonenkis, A.Y., 2015. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity(pp. 11-30). Springer International Publishing.
[LaR10] Larson, R.K. and Ryokai, K., 2009. Grammar as science. Mit Press.
[Hal84] Halliday, M.A., 1984. Language as code and language as behaviour: a systemic-functional interpretation of the nature and ontogenesis of dialogue. The semiotics of culture and language, 1, pp.3-35.
[Mar92] Martin, J.R., 1992. English text: System and structure. John Benjamins Publishing.
[EgS05] Eggins, S. and Slade, D., 2005. Analysing casual conversation. Equinox Publishing Ltd..
[HMM14] Halliday, M., Matthiessen, C.M. and Matthiessen, C., 2014. An introduction to functional grammar. Routledge.
[YHL12] Yuan, G.X., Ho, C.H. and Lin, C.J., 2012. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9), pp.2584-2603.
[MLS13] Mikolov, T., Le, Q.V. and Sutskever, I., 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
[MSC13] Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
[ChL11] Chang, C.C. and Lin, C.J., 2011. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3), p.27.
[MRT12] Mohri, M., Rostamizadeh, A. and Talwalkar, A., 2012. Foundations of machine learning. MIT press.
[Pla99] Platt, J., 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), pp.61-74.
[RaD13] Rahman, M.M. and Davis, D.N., 2013. Addressing the class imbalance problem in medical datasets. International Journal of Machine Learning and Computing, 3(2), p.224.
[MaM08] De Marneffe, M.C. and Manning, C.D., 2008. Stanford typed dependencies manual (pp. 338-345). Technical report, Stanford University.
[TCS13] Taiwan Communication Survey, phase one, year two, official questionnaire, 2013. [Online]. Available: http://www.crctaiwan.nctu.edu.tw/AnnualSurvey_detail_e.asp?ASD_ID=17
[Rad04] Radford, A., 2004. English syntax: An introduction. Cambridge University Press.
[McM92] McArthur, T.B. and McArthur, F., 1992. The Oxford companion to the English language. Oxford University Press, USA.
[Kro05] Kroeger, P.R., 2005. Analyzing grammar: An introduction. Cambridge University Press.
[Niv] Joakim Nivre, et el. Universal Dependencies. [Online]. Available: http://universaldependencies.org/u/dep/
[MMM06] De Marneffe, M.C., MacCartney, B. and Manning, C.D., 2006, May. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC (Vol. 6, No. 2006, pp. 449-454).
[Cry04] Crystal, D., 2004. The Cambridge encyclopedia of the English language. Ernst Klett Sprachen.
[YaH10] Yao, H.C., Hsu, Y.S., 2010. On THSR’s Crisis Communication Strategies and the Effects. Journal of Communications Management.
[LiS06] Lichtenstein, S. and Slovic, P. eds., 2006. The construction of preference. Cambridge University Press.
[Bre56] Brehm, J.W., 1956. Postdecision changes in the desirability of alternatives. The Journal of Abnormal and Social Psychology, 52(3), p.384.
[SMD09] Sharot, T., De Martino, B. and Dolan, R.J., 2009. How choice reveals and shapes expected hedonic outcome. Journal of Neuroscience, 29(12), pp.3760-3765.
[GoL14] Goldberg, Y. and Levy, O., 2014. word2vec Explained: deriving Mikolov et al.''s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.
[MiC13] Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[Har54] Harris, Z.S., 1954. Distributional structure. Word, 10(2-3), pp.146-162.
[Kow90] Kowalski, R.A., 1990. Problems and promises of computational logic. In Computational logic (pp. 1-36). Springer, Berlin, Heidelberg.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊