跳到主要內容

臺灣博碩士論文加值系統

(44.220.184.63) 您好!臺灣時間:2024/10/04 07:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡昀陞
研究生(外文):Tsai, Yun-Sheng
論文名稱:使用字典選擇的高風險文章的詞級模型進行自殺風險評估
論文名稱(外文):Suicide Risk Assessment using Word-Level Model with Dictionary-Based Risky Posts Selection
指導教授:陳良弼陳良弼引用關係
指導教授(外文):Chen, Arbee L.P.
口試委員:范耀中徐嘉連
口試委員(外文):Fan, Yao-ChungHsu, Jia-Lien
口試日期:2022-07-28
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學門:電算機學門
學類:系統設計學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:英文
論文頁數:34
中文關鍵詞:深度學習自殺偵測自然語言處理社群媒體
外文關鍵詞:DeepLearningSuicideDetectionNLPSocialMediaReddit
相關次數:
  • 被引用被引用:0
  • 點閱點閱:125
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
自殺是個在全球都很嚴重的議題,根據美國CDC指出在美國約有1200萬人有過自殺想法,320萬人有過自殺計畫,120萬人嘗試過自殺。隨著資訊科技的蓬勃發展,因為社群媒體有著較好的匿名性,許多人願意把內心不常與別人透漏的想法,透過社群平台紀錄與分享,使得社群論壇的資料被廣泛運用來進行自殺風險相關的研究。然而並非每篇在社群媒體上的文章都隱藏著自殺訊號,一個高自殺風險患者可能也只有少數幾篇文章清楚地描述對於自殺的想法。以往的研究透過文章級別的注意力機制來關注高風險的文章,但這並非每次都可以奏效,這樣子的問題在使用特徵提取的文章向量中更為嚴重,因為文章在模型訓練階段以前就已經轉換成向量,也已經失去了文字級別的資訊。
在本篇論文中,我們先使用自殺字詞字典來篩選出可能被文章級別的注意力機制所忽略的高風險文章,並且對選出的高風險文章運用文字級別的模型來找回因為特徵提取的文章向量所失去的資訊。我們也證明了先前研究所採取的FScore可以簡化成Accuracy的函數,無法真實反映模型在資料不平衡情況下的效能。在本研究中,我們額外採取macro F1 Score作為評價指標。實驗結果顯示,我們提出的方法優於先前的研究,加了文字級別的模型不僅在FScore表現上優於先前的研究,更可以提升macro F1 Score近4個百分點,達到42.3%。
Suicide is a serious issue around the world. According to the US Centers for Disease Control and Prevention, an estimated 12.2 million adults seriously thought about suicide, 3.2 million made a plan, and 1.2 million attempted suicide. With the rapid development of information technology, and the good anonymity of social media, more and more people begin to use social media to share their inner feelings. It enables social media data to be widely used for research on suicide risk assessment. However, not all social media posts are suicide related. Even a person with a high suicide risk may have only a few posts containing suicide-related risk signals. Previous research addressed this problem with post-level attention mechanism. However, post-level attention mechanism may not find the correct suicide posts. This problem becomes more serious in the feature-based post embeddings since posts have been converted to post embeddings before the training phase of the model and the word information has been lost.
In this thesis, we use a suicide keyword dictionary to select risky posts which may be lost by the post-level attention mechanism and then build a word-level model for the risky posts to get back the lost information in the feature-based post embeddings. We also demonstrate that the FScore used in previous studies can be reduced to the function of accuracy, which does not reflect the model performance in predicting imbalanced datasets. Therefore, we additionally adopt macro F1 score as the evaluation function. Experiment results show that our model not only outperforms previous studies in FScore performance, but also achieves macro F1 Score of 42.3%, a nearly 4% improvement compared to previous studies.
摘要 i
Abstract ii
Acknowledgement iii
Table of Contents iv
List of Figures v
List of Tables vi
1. Introduction 1
2. Related Work 5
2.1 Suicide Risk Detection on Social Media 5
2.2 NLP for Suicide Risk Severity 6
3. Dataset 8
4. Method 10
4.1 Post-Level Model 11
4.2 Word-Level Model 12
4.2.1 Post Selection Layer 12
4.2.2 Word Embeddings 15
4.2.3 Self-Attention Layer 15
4.3 The PLWL Model 16
5. Experiments 17
5.1 Experimental Setup 17
5.1.1 Experiment Settings 17
5.1.2 Evaluation Metrics 18
5.1.3 Preprocessing 20
5.2 Results and Analysis 20
5.3 Ablation Studies 22
5.3.1 PLWL with One-Stage Training 22
5.3.2 Different Oversampling Proportions 23
5.3.3 Filtering out Users without Risky Posts 24
5.3.4 Different Embeddings 25
5.4 Case Studies 27
6. Conclusion and Future Work 30
Reference 31
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Baytas, I. M., Xiao, C., Zhang, X., Wang, F., Jain, A. K., & Zhou, J. (2017). Patient subtyping via time-aware LSTM networks. Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining.
Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5, 135-146.
Cao, L., Zhang, H., Feng, L., Wei, Z., Wang, X., Li, N., & He, X. (2019). Latent suicide risk detection on microblog via suicide-oriented word embeddings and layered attention. arXiv preprint arXiv:1910.12038.
Centers for Disease Control and Prevention. (2022, February 25, 2022). Suicide Prevention. Retrieved June 27 from https://www.cdc.gov/suicide/index.html
Coppersmith, G., Leary, R., Crutchley, P., & Fine, A. (2018). Natural language processing of social media as screening for suicide risk. Biomedical informatics insights, 10, 1178222618792860. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6111391/pdf/10.1177_1178222618792860.pdf
De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., & Kumar, M. (2016). Discovering shifts to suicidal ideation from mental health content in social media. Proceedings of the 2016 CHI conference on human factors in computing systems.
Domino, G. (1996). Test-retest reliability of the Suicide Opinion Questionnaire. Psychological Reports, 78(3), 1009-1010.
Eichstaedt, J. C., Smith, R. J., Merchant, R. M., Ungar, L. H., Crutchley, P., Preoţiuc-Pietro, D., Asch, D. A., & Schwartz, H. A. (2018). Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences, 115(44), 11203-11208.
Gaur, M., Alambo, A., Sain, J. P., Kursuncu, U., Thirunarayan, K., Kavuluru, R., Sheth, A., Welton, R., & Pathak, J. (2019). Knowledge-aware assessment of severity of suicide risk for early intervention. The world wide web conference.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Jashinsky, J., Burton, S. H., Hanson, C. L., West, J., Giraud-Carrier, C., Barnes, M. D., & Argyle, T. (2014). Tracking suicide risk factors through Twitter in the US. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 35(1), 51.
Klonsky, E. D., & May, A. M. (2015). The three-step theory (3ST): A new theory of suicide rooted in the “ideation-to-action” framework. International Journal of Cognitive Therapy, 8(2), 114-129.
Leavey, G., Mallon, S., Rondon-Sulbaran, J., Galway, K., Rosato, M., & Hughes, L. (2017). The failure of suicide prevention in primary care: family and GP perspectives–a qualitative study. BMC psychiatry, 17(1), 1-10.
Lim, M., Lee, S. U., & Park, J.-I. (2014). Difference in suicide methods used between suicide attempters and suicide completers. International journal of mental health systems, 8(1), 1-4.
Liu, Z., Wang, Z., Liang, P. P., Salakhutdinov, R. R., Morency, L.-P., & Ueda, M. (2019). Deep gamblers: Learning to abstain with portfolio theory. Advances in neural information processing systems, 32.
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
Masuda, N., Kurahashi, I., & Onari, H. (2013). Suicide ideation of individuals in online social networks. PloS one, 8(4), e62262.
Matero, M., Idnani, A., Son, Y., Giorgi, S., Vu, H., Zamani, M., Limbachiya, P., Guntuku, S. C., & Schwartz, H. A. (2019). Suicide risk assessment with multi-level dual-context language and BERT. Proceedings of the sixth workshop on computational linguistics and clinical psychology.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
Mishra, R., Sinha, P. P., Sawhney, R., Mahata, D., Mathur, P., & Shah, R. R. (2019). SNAP-BATNET: Cascading author profiling and social network graphs for suicide ideation detection on social media. Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: student research workshop.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
Posner, K., Brent, D., Lucas, C., Gould, M., Stanley, B., Brown, G., Fisher, P., Zelazny, J., Burke, A., & Oquendo, M. (2008). Columbia-suicide severity rating scale (C-SSRS). New York, NY: Columbia University Medical Center, 10.
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
Renberg, E. S., & Jacobsson, L. (2003). Development of a questionnaire on attitudes towards suicide (ATTS) and its application in a Swedish population. Suicide and Life-Threatening Behavior, 33(1), 52-64.
Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Ungar, L., & Schwartz, H. A. (2014). Developing age and gender predictive lexica over social media. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).
Sawhney, R., Joshi, H., Gandhi, S., & Shah, R. (2020). A time-aware transformer based model for suicide ideation detection on social media. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP).
Sawhney, R., Joshi, H., Gandhi, S., & Shah, R. R. (2021). Towards ordinal suicide ideation detection on social media. Proceedings of the 14th ACM International Conference on Web Search and Data Mining.
Sawhney, R., Manchanda, P., Singh, R., & Aggarwal, S. (2018). A computational approach to feature extraction for identification of suicidal ideation in tweets. Proceedings of ACL 2018, Student Research Workshop.
Sawhney, R., Neerkaje, A. T., & Gaur, M. (2022). A Risk-Averse Mechanism for Suicidality Assessment on Social Media. Association for Computational Linguistics 2022 (ACL 2022).
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., & Seligman, M. E. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.
Shing, H.-C., Nair, S., Zirikly, A., Friedenberg, M., Daumé III, H., & Resnik, P. (2018). Expert, crowdsourced, and machine assessment of suicide risk via online postings. Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic.
Shing, H.-C., Resnik, P., & Oard, D. W. (2020). A prioritization model for suicidality risk assessment. Proceedings of the 58th annual meeting of the association for computational linguistics.
Smucker, M. D., & Clarke, C. L. (2012). Time-based calibration of effectiveness measures. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wang, N., Luo, F., Shivtare, Y., Badal, V. D., Subbalakshmi, K., Chandramouli, R., & Lee, E. (2021). Learning Models for Suicide Prediction from Social Media Posts. arXiv preprint arXiv:2105.03315.
Yang, C., Zhang, Y., & Muresan, S. (2021). Weakly-Supervised Methods for Suicide Risk Assessment: Role of Related Domains. arXiv preprint arXiv:2106.02792.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies.
Zirikly, A., Resnik, P., Uzuner, O., & Hollingshead, K. (2019). CLPsych 2019 shared task: Predicting the degree of suicide risk in Reddit posts. Proceedings of the sixth workshop on computational linguistics and clinical psychology.
李明濱. (2020). 109年自殺防治年報. https://www.tsos.org.tw/media/4591#doc-tabs-detail
電子全文 電子全文(網際網路公開日期:20270829)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊