(3.237.97.64) 您好!臺灣時間:2021/03/03 01:15
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳泓志
研究生(外文):Hung-Chen Chen
論文名稱:基於深度學習方法根據使用者生成資料進行個性評估
論文名稱(外文):A Deep Learning Based Approach for Personality Detection from User Generated Content
指導教授:魏志平魏志平引用關係
口試委員:簡立峰楊錦生
口試日期:2019-07-31
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊管理學研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:57
中文關鍵詞:個性深度學習遷移學習少量資料集使用者生成資料文字探勘
DOI:10.6342/NTU201901942
相關次數:
  • 被引用被引用:0
  • 點閱點閱:65
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在過往的許多研究中已經證明人的個性跟人的生活、行為和喜好有 非常大的關聯。根據這些關聯,知道一個人的個性便有助於企業進行 人力資源管理,幫助企業找到他們的目標客群,以及幫助其他任何需 要對人有初步了解的任務。為了有效率的偵測一個人的個性,目前已 經有很多方法利用使用者生成的資料來進行自動化的個性推測。隨著 人工智慧的快速發展,許多前人的研究中開始應用深度學習方法從文 章中萃取出複雜的語意特徵來幫助他們建立更強大的分類模型。然而 要訓練一個深度學習模型通常需要非常大量的資料,在這個領域中有 標記的資料卻越來越難取得。因此在使用深度學習方法時就必須留意 資料量不足的問題。在這個領域中,長文章也是一個需要特別處理的 問題,因為使用者生成資料有時會是一篇很長的文章,但是某些深度 學習架構像是遞歸神經網路(RNN)並無法記憶這樣過長的內容,所以 就可能會跑出不理想的結果。 我們的研究中,我們提出一個綜合了深 度學習、傳統預先定義好的特徵以及極限梯度提升分類器(XGBoost)的 模型架構。我們利用遷移學習的技巧來處理對深度學習來說資料量不 足的問題。我們使用了兩種不同挑選重要句子的方式來增加我們的資 料量並且解決長文章的問題。最後的結果顯示我們的模型中的每一個 部分都有助於提升模型的表現。我們的方法也比現有的技術可得到更 高的準確率。
Human personality has been proved to be highly correlated to individual’s life, behaviors, and preferences. Because of these relationships, knowing a people’s personalities is helpful for firms’ effective human resource management, finding firms’ target customers, and other tasks that can be supported with users’ profiles. To efficiently detect a person’s personality traits, several methods have been proposed to infer the personality automatically by user-generated content (UGC). With the rapid development of AI, prior studies started to exploit the deep learning approach to discover latent and complex linguistic features and to develop a more effective classification model. However, training a deep learning model usually needs a very large set of training data, but in this specific task, labeled data are hard to obtain. Therefore, the use of deep learning methods for personality prediction will need to address the limited training data problem. Another problem in this task is that sometimes UGC data will be long documents while some deep learning models such as Recurrent Neural Networks cannot memorize such huge context.
In this work, we propose a hybrid model structure containing deep learning, traditional hand-crafted features, and XGBoost classifier. We employ transfer learning to address the insufficient training data problem for deep learning models. We propose two sentence selection schemes to increase our training data set and, at the same time, to address the long document problem. Our empirical evaluation results show that each part of our proposed method helps to improve the prediction effectiveness and outperforms our benchmark method.
Chapter 1 Introduction .. 1
1.1 Background .. 1
1.2 Research Motivations and Objectives .. 3
Chapter 2 Literature Review .. 5
2.1 Overview of Human Personality .. 5
2.2 Measuring Human Personality by Questionnaire .. 8
2.3 Measuring Human Personality by UGC .. 9
2.3.1 Leveraging the User Generated Content .. 9
2.3.2 Open Dataset .. 10
2.3.3 Overview of Personality Prediction Methods .. 11
2.3.4 Previous Research Using Deep Learning .. 14
Chapter 3 Methodology .. 16
3.1 Overall Process of Our Proposed Method .. 17
3.2 Text Preprocessing .. 18
3.2.1 Data Cleaning and Sentence Splitting .. 18
3.2.2 Sentence Selection .. 18
3.2.3 Padding .. 19
3.3 Mairesse’s Features .. 19
3.4 Fine-tune Language Model .. 20
3.5 Train Personality Classifiers .. 22
3.5.1 Classifier Model Structure .. 22
3.5.2 Language Model to Document Representation .. 23
3.5.3 XGBoost Model .. 24
3.5.4 Using Data Augmentation .. 24
Chapter 4 Empirical Evaluations .. 25
4.1 Empirical Setup .. 25
4.1.1 Dataset .. 25
4.1.2 Mairesse’s Features .. 25
4.1.3 Sentence Selection .. 26
4.1.4 Language Model .. 26
4.1.5 Feature Concatenation and Classification .. 28
4.1.6 Benchmarks .. 28
4.1.7 Variants of Our Method .. 29
4.1.8 Evaluation Criterion and Procedure .. 30
4.2 Evaluation Results .. 30
4.3 Other Experimental Results .. 32
4.3.1 Effects of Fine-tuning Language Model .. 33
4.3.2 Effects of Document Length .. 34
4.3.3 Effects of Sentence Selection Schemes .. 35
4.3.4 How to Merge Two Sentence Selectors .. 36
4.3.5 Effects of Mairesse’s Features .. 37
Chapter 5 Conclusions and Future Works .. 39
5.1 Contributions .. 39
5.2 Future Works .. 40
References 42
Appendix 53
A Top 1% Chi-square Words for Each Factor .. 53
A.1 Extraversion .. 53
A.2 Neuroticism .. 54
A.3 Agreeableness .. 55
A.4 Conscientiousness .. 55
A.5 Openness .. 56
Barrick, M. R. and Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1):1–26.
Barrick, M. R., Mount, M. K., and Strauss, J. P. (1993). Conscientiousness and performance of sales representatives: Test of the mediating effects of goal setting. Journal of Applied Psychology, 78(5):715–722.
Bhat, S. and Reddy, S. K. (1998). Symbolic and functional positioning of brands. Journal of Consumer Marketing, 15(1):32–43.
Bleidorn, W. and Hopwood, C. J. (2019). Using machine learning to advance personality assessment and theory. Personality and Social Psychology Review, 23(2):190–203. PMID: 29792115.
Buettner, R. (2017). Predicting user behavior in electronic markets based on personality-mining in large online social networks. Electronic Markets, 27(3):247–265.
Caprara, G. V., Barbaranelli, C., Borgogni, L., and Perugini, M. (1993). The “big five questionnaire”: A new questionnaire to assess the five factor model. Personality and Individual Differences, 15(3):281–288.
Celli, F., Pianesi, F., Stillwell, D., and Kosinski, M. (2013). Workshop on computational personality recognition: Shared task. In Proceedings of Seventh International AAAI Conference on Weblogs and Social Media.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.
Chen, Z., Jiang, F., Cheng, Y., Gu, X., Liu, W., and Peng, J. (2018). Xgboost classifier for ddos attack detection and analysis in sdn-based cloud. In 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pages 251–256.
Coltheart, M. (1981). The mrc psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4):497–505.
Costa, P. T., J. and McCrae, R. R. (1980). Still stable after all these years: Personality as a key to some issues in adulthood and old age. Life span development and behavior, 3:65–102. New York: Academic Press.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. Computing Research Repository (CoRR), abs/1810.04805.
Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41(1):417–440.
Eysenck, H. J. and Eysenck, S. B. G. (1964). Manual of the eysenck personality inventory. London: University of London Press.
Eysenck, H. J. and Eysenck, S. B. G. (1975). Manual of the eysenck personality questionnaire. San Diego: EdITS.
Fadaee, M., Bisazza, A., and Monz, C. (2017). Data augmentation for low-resource neural machine translation. Computing Research Repository (CoRR), abs/1705.00440.
Gaikwad, G. and Joshi, D. J. (2016). Multiclass mood classification on twitter using lexicon dictionary and machine learning algorithms. In Proceedings of 2016 International Conference on Inventive Computation Technologies (ICICT), volume 1, pages 1–6.
Garretsen, H., Stoker, J. I., Soudis, D., Martin, R., and Rentfrow, J. (2018). The relevance of personality traits for urban economic growth: making space for psychological factors. Journal of Economic Geography, 19(3):541–565.
Goldberg, L. (1981). Language and individual differences: The search for universals in personality lexicons. In L. Wheeler (Ed.), Review of Personality and Social Psychology, pages 141–165. Beverly Hills, CA: Sage Publication.
Goldberg, L. R. (1990). An alternative” description of personality”: the big-five factor structure. Journal of Personality and Social Psychology, 59(6):1216.
Goldberg, L. R. et al. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. Personality Psychology in Europe, 7(1):7–28.
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., and Gough, H. G. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40(1):84–96.
Gosling, S. D., Rentfrow, P. J., and Swann, W. B. (2003). A very brief measure of the big-five personality domains. Journal of Research in Personality, 37(6):504–528.
Gupta, V., Varshney, D., Jhamtani, H., Kedia, D., and Karwa, S. (2014). Identifying purchase intent from social posts. In Eighth International AAAI Conference on Weblogs and Social Media.
Halder, S., Roy, A., and K. Chakraborty, P. (2010). The influence of personality traits on information seeking behaviour of students. Malaysian Journal of Library & Information Science, 15:41–53.
He, W., Wu, H., Yan, G., Akula, V., and Shen, J. (2015). A novel social media competitive analytics framework with sentiment benchmarks. Information and Management, 52(7):801–812.
Howard, J. and Ruder, S. (2018). Fine-tuned language models for text classification. Computing Research Repository (CoRR), abs/1801.06146.
Hu, T., Xiao, H., Luo, J., and Nguyen, T.-v. T. (2016). What the language you tweet says about your occupation. In Tenth International AAAI Conference on Web and Social Media.
Johnson, J. A. (2014). Measuring thirty facets of the five factor model with a 120-item public domain inventory: Development of the ipip-neo-120. Journal of Research in Personality, 51:78–89.
Judge, T. A., Higgins, C. A., Thoresen, C. J., and Barrick, M. R. (1999). The big five personality traits, general mental ability, and career success across the life span. Personnel Psychology, 52(3):621–652.
Kobayashi, S. (2018). Contextual augmentation: Data augmentation by words with paradigmatic relations. Computing Research Repository (CoRR), abs/1805.06201.
Liu, F., Perez, J., and Nowson, S. (2016). A language-independent and compositional model for personality trait recognition from short texts. Computing Research Repository (CoRR), abs/1610.04345.
Liu, L., Liu, K., Cong, Z., Zhao, J., Ji, Y., and He, J. (2018). Long length document classification by local convolutional feature aggregation. Algorithms, 11(8).
Lounsbury, J. W., Loveland, J. M., Sundstrom, E. D., Gibson, L. W., Drost, A. W., and Hamrick, F. L. (2003). An investigation of personality traits in relation to career satisfaction. Journal of Career Assessment, 11(3):287–307.
Luckner, M., Topolski, B., and Mazurek, M. (2017). Application of xgboost algorithm in fingerprinting localisation task. In Saeed, K., Homenda, W., and Chaki, R., editors, Computer Information Systems and Industrial Management, pages 661-671, Cham. Springer International Publishing.
Mairesse, F., Walker, M. A., Mehl, M. R., and Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research, 30:457–500.
Majumder, N., Poria, S., Gelbukh, A., and Cambria, E. (2017). Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems,32(2):74–79.
McCrae, R. R. and Costa, P. T. (1985). Comparison of epi and psychoticism scales with measures of the five-factor model of personality. Personality and Individual Differences, 6(5):587–597.
McCrae, R. R. and Costa Jr., P. T. (1989). Reinterpreting the myers-briggs type indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1):17–40.
McCrae, R. R. and John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60(2):175–215.
Merity, S., Keskar, N. S., and Socher, R. (2017a). Regularizing and optimizing LSTM language models. Computing Research Repository (CoRR), abs/1708.02182.
Merity, S., Xiong, C., Bradbury, J., and Socher., R. (2017b). Pointer sentinel mixture models. In Proceedings of the International Conference on Learning Representations.
Mugge, R., Govers, P. C., and Schoormans, J. P. (2009). The development and testing of a product personality scale. Design Studies, 30(3):287–302.
Nguyen, T., Phung, D., Adams, B., Tran, T., and Venkatesh, S. (2010). Classification and pattern discovery of mood in weblogs. In Zaki, M. J., Yu, J. X., Ravindran, B., and Pudi, V., editors, Advances in Knowledge Discovery and Data Mining, pages 283–290, Berlin, Heidelberg. Springer Berlin Heidelberg.
Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. The Journal of Abnormal and Social Psychology, 66(6):574.
Ortigosa, A., Carro, R. M., and Quiroga, J. I. (2014). Predicting user personality by mining social interactions in facebook. Journal of Computer and System Sciences, 80(1):57–71.
Park, C. W., Jaworski, B. J., and MacInnis, D. J. (1986). Strategic brand concept image management. Journal of Marketing, 50(4):135–145.
Peng, K., Liou, L., Chang, C., and Lee, D. (2015). Predicting personality traits of chinese users based on facebook wall posts. In Proceedings of 2015 24th Wireless and Optical Communication Conference (WOCC), pages 9–14.
Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71(2001):2001.
Pennebaker, J. W. and King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6):1296.
Pennebaker, J. W. and Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology, 85(2):291.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. Computing Research Repository (CoRR), abs/1802.05365.
Pratama, B. Y. and Sarno, R. (2015). Personality classification based on twitter text using naive bayes, knn and svm. In 2015 International Conference on Data and Software Engineering (ICoDSE), pages 170–174.
Rangel Pardo, F. M., Celli, F., Rosso, P., Potthast, M., Stein, B., and Daelemans, W. (2015). Overview of the 3rd author profiling task at pan 2015. In Proceedings of CLEF 2015 Evaluation Labs and Workshop Working Notes Papers, pages 1–8.
Salgado, J. F. (1997). The five factor model of personality and job performance in the european community. Journal of Applied Psychology, 82(1):30.
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., et al. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9):e73791.
Seibert, S. E. and Kraimer, M. L. (2001). The five-factor model of personality and career success. Journal of Vocational Behavior, 58(1):1–21.
Sennrich, R., Haddow, B., and Birch, A. (2015). Improving neural machine translation models with monolingual data. Computing Research Repository (CoRR),abs/1511.06709.
Sun, X., Liu, B., Cao, J., Luo, J., and Shen, X. (2018). Who am i? personality detection based on deep learning for texts. In 2018 IEEE International Conference on Communications (ICC), pages 1–6.
Vainik, U., Dagher, A., Realo, A., Colodro-Conde, L., Mortensen, E. L., Jang, K., Juko, A., Kandler, C., Sørensen, T.I., and Mo ̃ttus, R.(2019).Personality-obesity associations are driven by narrow traits: A meta-analysis. Obesity Reviews, 20(8):1121–1131.
Vinciarelli, A. and Mohammadi, G. (2014). A survey of personality computing. IEEE Transactions on Affective Computing, 5(3):273–291.
Wang, W. Y. and Yang, D. (2015). That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using# petpeeve tweets. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2557–2563.
Wei, H., Zhang, F., Yuan, N. J., Cao, C., Fu, H., Xie, X., Rui, Y., and Ma, W.-Y. (2017). Beyond the words: Predicting user personality from heterogeneous information. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, pages 305–314, New York, NY, USA. ACM.
Xue, D., Wu, L., Hong, Z., Guo, S., Gao, L., Wu, Z., Zhong, X., and Sun, J. (2018). Deep learning-based personality recognition from text posts of online social networks. Applied Intelligence, 48(11):4232–4246.
Youyou, W., Kosinski, M., and Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4):1036–1040.
Yu, J. and Markov, K. (2017). Deep learning based personality recognition from facebook status updates. In Proceedings of 2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST), pages 383–387.
Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B., and Si, Y. (2018). A data-driven design for fault detection of wind turbines using random forests and xgboost. IEEE Access, 6:21020–21031.
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., and Carin, L. (2017). Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning, pages 4006–4015.
Zhu, Y., Lu, S., Zheng, L., Guo, J., Zhang, W., Wang, J., and Yu, Y. (2018). Texygen: A benchmarking platform for text generation models. In Proceesings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 1097–1100, New York, NY, USA. ACM.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔