跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.88) 您好!臺灣時間:2026/02/15 01:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李奇
研究生(外文):Chi Lee
論文名稱:網路論壇持續性之研究:使用文本內容分析法
論文名稱(外文):Examining the Sustainability of Online Communities: A Content Analysis Approach
指導教授:楊錦生楊錦生引用關係
指導教授(外文):Chin-Sheng Yang
口試委員:顧宜錚林志麟
口試委員(外文):Yi-Cheng KuJun-Lin Lin
口試日期:2018-6-21
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:113
中文關鍵詞:持續性網路論壇文本內容分析機器學習
外文關鍵詞:SustainabilityOnline ForumsContent AnalysisMachine Learning
相關次數:
  • 被引用被引用:1
  • 點閱點閱:421
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來網路一直不斷在發展,人與人之間的溝通方式也漸漸透過網路來維持聯繫,有著密不可分的關係。網路上各式各樣的論壇也快速增加,汽車、電子產品、甚至只要是一個話題就能吸引一群人形成一個虛擬的社群。在如此多樣形態的論壇之中,有的論壇討論越來越熱絡,有的則越冷清,產生了論壇持續性參差不齊的問題,而為了解決此問題,論壇的持續性成為重要的課題。到底是甚麼樣的因素才能夠維持一個永久經營的論壇,是因為討論串的正面言論導致整個討論版的熱絡程度,還是反之呢?為了回答上述問題,本研究將透過文本內容分析方法來探索可能的原因,並以六種不同文本方法進行分析,分別為可讀性指標(Readability)、詞彙豐富度(Vocabulary richness)、詞彙獨特性(Vocabulary uniqueness)、情感分析(Sentiment analysis)作為傳統文本分析法;以及透過網路用語之表情符號(Emoticon/Emoji)與俚語(Slang)作為創新文本分析法,來評估網路論壇的持續性。為了驗證這些分析是否與網路論壇持續性有重要影響,本研究會將上述變數分類成傳統特徵、表情符號以及俚語特徵作為特徵組合,再透過相關係數、機器學習演算法來比較上述文本特徵指標是否對預測特徵,即討論串回文個數、討論串持續時間具有預測。
As the Internet society grows bigger and stronger, communication between people through Internet has become more common nowadays, and the platform which provides those interactions occurred is called online communities, also known as forums. Forums are virtual communities that allow participants to discuss not only the various problems but also the latest topics at the moment. Therefore, forums with more discussions are considered to be more popular amongst participants, and according to the researches in the past, longer discussion threads are viewed as sustainability of the forum. In short, the sustainability of these online forums are inconsistent because of the phenomenon. In order to counter this problem, this research implements six different content analysis methods (i.e., readability, vocabulary richness, vocabulary uniqueness, sentiment analysis, emoticon/emoji, and slang) to predict the sustainability of Reddit, one of the biggest forum in western countries, and discuss if emoticons, emojis, and internet slang bring strong impacts on the sustainability. The paper attempts to focus on content of the discussion, and use three kinds of categories (i.e., Traditional, Emoticons/Emojis, Slang) as independent variables to predict two dependent variables (i.e., comment number and discussion thread time) using linear regression and novel machine learning algorithms, to enhance the predicting accuracy of the forum sustainability.
書名頁 i
論文口試委員審定書 ii
授權書 iii
中文摘要 iv
英文摘要 v
誌謝 vi
目錄 vii
表目錄 ix
圖目錄 x
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
1.4 研究架構 4
第二章 文獻探討 5
2.1 網路社群持續性 5
2.2 文本特徵用於網路社群 5
2.2.1 可讀性分析 6
2.2.2 詞彙豐富度 8
2.2.3 詞彙獨特性 9
2.2.4 情感分析 9
第三章 研究方法與架構 11
3.1 Reddit網站介紹 11
3.2 資料前處理 13
3.3 特徵選擇 16
3.3.1 文本特徵 16
3.3.2 論壇持續性變數 25
3.4 模型建置 29
3.4.1 模型介紹 30
第四章 實驗結果與評估 32
4.1 原始資料集 32
4.2 實驗流程與設計 43
4.3 評估指標 44
4.4 實驗結果 46
4.4.1 相關係數實驗結果 46
4.4.2 預測討論串回文個數 58
4.4.3 預測討論串持續時間 60
第五章 結論與未來展望 63
5.1 結論 63
5.2 研究限制與未來展望 64
參考文獻 66
附錄A 討論版之敘述統計表 69
附錄B 預測討論串回文個數演算法實驗結果 99
附錄C 預測討論版模型比較結果 103
附錄D 預測討論串持續時間演算法實驗結果 106
附錄E 討論串持續時間模型比較結果 110
[1] A. Liaw, M. Wien, 2002. Classification and regression by randomForest. R News Vol. 2, 18-22.
[2] A. Esuli & F. Sebastiani, 2006. SENTIWORDNET: A publicly available lexical resource for pinion mining. Proceedings of the 5th Conference on Language Resources and Evaluation.
[3] A. Huang, 2008. Similarity measures for text document clustering. Proceedings of the sixth New Zealand computer science research student conference, pp. 49-56.
[4] B. Fang, Q. Ye, D. Kucukusta, & R. Law, 2016. Analysis of the perceived value of online tourism reviews: Influence of readability and reviewer characteristics. Tourism Management, Vol. 52, pp. 498-506.
[5] B.S. Butler, 2001. Membership size, communication activity, and sustainability: A resource-based model of online social structures. Information systems research Vol. 12, pp. 346-362.
[6] D.C. Montgomery, E.A. Peck & G.G. Vining, 2012. Introduction to linear regression analysis. John Wiley & Sons, 5th Edition.
[7] E. Frank, M. Hall, and I.H. Witten, 2016. Data mining: Practical machine learning tools and techniques, Morgan Kaufmann, 4th Edition.
[8] E. Frank, M. Hall, G. Holmes, B. Pfahringer, P. Reutemann, & I.H. Witten, 2009. The WEKA data mining software: An update. SIGKDD explorations, Vol. 11.
[9] E.A. Smith, J.P. Kincaid, 1970. Derivation and validation of the automated readability index for use with technical materials. Human factors: The journal of the human factors and ergonomics society, Vol. 12, pp. 457-564.
[10] G.H. McLaughlin, 1969. SMOG Grading-a new readability formula. Journal of Reading, Vol. 12, pp. 639-646.
[11] J.P. Kincaid, J.A. Aagard, J.W. O’Hara, & L.K. Cottrell, 1981. Computer readability editing system. IEEE Transactions on Professional Communication, Vol. 24, pp. 38-42.
[12] K. Krippendorff, 1989. Content analysis. International encyclopedia of communication, Vol. 1, pp. 403-407.
[13] L. Sproull, M. Arriaga, 2007. Online communities. The handbook of computer networks: Distributed networks, network planning, control, management, and new trends and applications, Vol. 3, pp. 898-914.
[14] M. Coleman, T. L. Liau, 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology, Vol. 60, pp. 283–284.
[15] P.K. Novak, J. Smailović, B. Sluban, & I. Mozetič, 2015. Sentiment of emojis. PLoS ONE, Vol. 10, pp. 1–22.
[16] P.L. Braga, A.L.I. Oliveira, & S.R.L. Meira, 2007. Software effort estimation using machine learning techniques with robust confidence intervals, 7th International Conference on Hybrid Intelligent Systems (HIS 2007), pp. 352-357.
[17] R. Gunning, 1969. The FOG index after twenty years. International journal of business communication, Vol 6, pp. 3-13.
[18] R.E. Rice, 1982. Communication networking in computer-conferencing systems: A longitudinal study of group roles and system structure. Communication yearkbook, Vol. 6, pp. 925-944.
[19] R.L. Moreland, J.M. Levine, & M. L. Wingert, 1996. Creating the ideal group: Composition effects at work. Understanding group behavior: Small group processes and interpersonal relations, Vol. 2, pp. 11-35.
[20] S Bird., 2006. Nltk: the natural language toolkit. Proceedings of the COLING/ACL on interactive presentation sessions, pp. 69-72.
[21] S. Cavanagh, 1997. Content analysis: concepts, methods and applications. Nurse Researcher, Vol. 4, pp. 5-16
[22] S.L. Johnson, H. Safadi, & S. Faraj, 2015. The emergence of online community leadership, Information Systems Research., Vol. 26, pp. 165-187
[23] Zephoria, 2018. The top 20 valuable Facebook statistics - updated May 2018. https://zephoria.com/top-15-valuable-facebook-statistics/.
[24] Z. Yao, Z. Weinberg, & W.L. Ruzzo, 2006. A regression-based K nesarest algorithm for gene function prediction from heterogeneous data. BMC bioinformatics, Vol. 7.
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊