(3.235.108.188) 您好!臺灣時間:2021/02/25 07:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳世和
研究生(外文):Chen, Shih-He
論文名稱:基於發布時間之推特主題推論方法
論文名稱(外文):Publishing-time-based Topic Derivation in Twitter
指導教授:王國禎
指導教授(外文):Wang, Kuo-Chen
口試委員:王國禎黃經堯方凱田
口試委員(外文):Wang, Kuo-ChenHuang, Ching-YaoFang, Ki-Ten
口試日期:2017-09-21
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊學院資訊學程
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2017
畢業學年度:106
語文別:英文
論文頁數:28
中文關鍵詞:主題推論發布時間推特非負矩陣分解法
外文關鍵詞:Topic derivationpublishing timeTwitterNMF
相關次數:
  • 被引用被引用:0
  • 點閱點閱:151
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:9
  • 收藏至我的研究室書目清單書目收藏:0
自社群媒體如推特、臉書出現以來,他們成為人們日常生活中不可或缺的一部份。此外,行動通訊和家用區域網路的發展更促進社群媒體的興盛。瀏覽社群媒體儼然成為一種有助於日常生活便利之重要方式,如汽車共乘、商品比價、團購和商品推薦。資料科學家試著從社群媒體間的主題推論中捕捉流行趨勢。然而在先前的研究中,從推特的文字和互動特色來作主題推論難以達到較高的純度。即使有現有方法tNMijF將時間因素納入考量,其推論效果亦受到推特互動關係中mention特徵比例的高低而有所偏差。我們提出基於發布時間之推特主題推論的方法(PTD),當中包含了時間因素,並以intJNMF為研究基礎。我們所提之方法利用了推特推文間的互動關係及內容來辨別主題,並且透過發布時間來強化,以取得較高的純度。與一個代表性的相關論文intJNMF相比,我們所提之方法的純度在20到100個主題中提高了19.01%至23.13%,而F度量則增加了31.76%至32.66%。與一個同樣利用時間因素的相關論文tNMijF相比,我們所提之方法純度在20到100個主題中提高了7.46%至10.32%,F度量則增加了10.16%至12.17%。我們所提之方法因考慮發布時間及當前時間之間的時間差,以致執行時間稍微增加。實驗結果顯示,藉由發布時間和當前時間之時間差來強化推特推文之間的互動關係,PTD在純度及F度量上有進一步的改善。
Since social media such as Twitter and Facebook come out, they become an essential part of daily life for most people. Besides, social media thrive further with the development of mobile communications and mobile phones. Surfing social media tends to be an important way to facilitate life, such as for carpool, price comparison, group buying and product recommendation. Data scientists try to capture trends from topic derivation in social media. However, it is hard to achieve high purity by deriving topics from text and interaction features in Twitter, as done by previous research. Even though a temporal factor was taken into consideration by related work tNMijF, its clustering accuracy still tends to be deviated by the percentage of the mention feature. We propose Publishing-time-based Topic Derivation in Twitter (PTD) approach, which is based on temporal (considering publishing time) factor and intJNMF. Our proposed PTD utilizes Twitter interaction and Twitter content to identify topics and is enhanced by publishing time to reflect content’s freshness to achieve high purity. Compared to intJNMF, a representative related work, the purity (F-measure) of the PTD is improved from 19.01% (31.76%) to 23.13% (32.66%) for 20 to 100 topics. Compared to tNMijF, the purity (F-measure) of the PTD is improved from 7.46% (10.16%) to 10.32% (12.17%) for 20 to 100 topics. The overhead of the PTD is slight increase of running time due to consideration of time difference between publishing time and current time. The performance results show the purity and F-measure of the PTD can be further enhanced by strengthening interaction features between Twitter tweets using the time difference between current time and publishing time.
Abstract (in Chinese) i
Abstract iii
Acknowledgements v
Content vi
List of Figures vii
List of Tables viii
Chapter 1 Introduction 1
1.1 Motivation 2
1.2 Problem statement 2
1.3 Contribution 2
1.4 Thesis outline 3
Chapter 2 Related Work 4
2.1 Classification 4
2.2 Comparison 5
Chapter 3 Publishing-time-based Topic Derivation in Twitter 7
3.1 Measuring relationships between tweets 10
3.2 Incorporating of temporal factor 11
3.3 Clustering tweets 12
3.4 Capturing correlation between terms 13
3.5 Inferring representative words 14
Chapter 4 Evaluation 16
4.1 Experiment setup 16
4.2 Evaluation metrics 17
4.3 Experiment results 20
Chapter 5 Conclusion and Future Work 24
5.1 Conclusion 24
5.2 Future work 24
Bibliography 26
[1] “Amazing Social Media Statistics You Should Know in 2016,” SocialPilot.co. [Online]. Available:https://socialpilot.co/blog/125-amazing-social-media-statistics-know-2016/.
[2] “Astonishing Twitter Stats and Facts for 2016,” Brandwatch. [Online]. Available: https://www.brandwatch.com/blog/44-twitter-stats-2016/.
[3] R. Nugroho, W. Zhao, J. Yang, C. Paris, and S. Nepal, “Using time-sensitive interactions to improve topic derivation in twitter,” World Wide Web, vol. 20, no. 1, pp. 61–87, Jun. 2016.
[4] R. Nugroho, J. Yang, Y. Zhong, C. Paris and S. Nepal, “Deriving Topics in Twitter by Exploiting Tweet Interactions,” in Proc. IEEE International Congress on Big Data, 2015.
[5] “About replies and mentions | Twitter Help Center,” Twitter. [Online]. Available: https://support.twitter.com/articles/14023.
[6] Dat Quoc Nguyen, Richard Billingsley, Lan Du and Mark Johnson, “Improving topic models with latent feature word representations,” Transactions of the Association for Computational Linguistics, vol. 3, pp. 299-313, 2015.
[7] X. Yan, J. Guo, S. Liu, X. Cheng and Y. Wang, “Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix,” in Proc. Proceedings of the SIAM International Conference on Data Mining, pp. 749-757, 2013.
[8] Liu, H. Yang, J. Fan, L. He and Y. Wang, “Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce,” in Proc. Proceedings of the 19th international conference on World wide web - WWW '10, 2010.
[9] R. Nugroho, Y. Zhong, J. Yang, C. Paris, and S. Nepal, “Matrix Inter-joint Factorization - A New Approach for Topic Derivation in Twitter,” in Proc. IEEE International Congress on Big Data, 2015.
[10] D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” in Advances in neural information processing systems, pp. 556–562, 2000.
[11] Myasuka, “PasaLab/marlin,” GitHub. [Online]. Available: https:// github.com/PasaLab/marlin.
[12] “Apache Spark™ - Lightning-Fast Cluster Computing,” Spark.apache.org. [Online]. Available: https://spark.apache.org/.
[13] “Google Trends,” Google Trends, 2016. [Online]. Available: https:// www.google.com/trends/explore?date=2016-01-01%202016-1231&geo=US&q=% 2Fm%2F02l3h.
[14] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of. Addison-Wesley, 1989.
[15] D. H. Von Seggern, CRC Standard Curves and Surfaces with Mathematica. CRC Press, 2006.
[16] S. Kullback, Information theory and statistics. Courier Dover Publications, 1997.
[17] C. Manning, P. Raghavan, and H. Sch¨utze, Introduction to information retrieval. Cambridge university press Cambridge, 2008, vol. 1.
[18] “Twitter Sentiment Corpus,” Sanders Analytics LLC. [Online]. Available: http:// www.sananalytics.com/lab/twitter-sentiment/.
[19] “Cloud Dataproc - Managed Spark & Managed Hadoop Service | Google Cloud Platform,” Google Cloud Platform, 2017. [Online]. Available: https:// cloud.google.com/dataproc/.
[20] “Twitter Sentiment Corpus,” Sanders Analytics LLC. [Online]. Available: http://www.sananalytics.com/lab/twitter-sentiment/.
[21] “Machine Types | Compute Engine Documentation | Google Cloud Platform,” Google. [Online]. Available: https://goo.gl/CcmFwF.
[22] “Evaluation of clustering,” Evaluation of clustering. [Online]. Available: https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html.
[23] “F1 score,” Wikipedia, 12-Jul-2017. [Online]. Available: https://goo.gl/RY8lGd.
[24] N. Chang, S. Chen, J. Yeh, M. Huang, Claven and W. Yang, “A Concept Extraction Approach for Document Clustering,” International Conference on Advanced Information Technologies, 2008.
[25] S. S. Du, Y. Liu, B. Chen, and L. Li, “Maxios: Large scale nonnegative matrix factorization for collaborative filtering.”
[26] “Apache Flink:Scalable Stream and Batch Data Processing,” Flink.apache.org,2017. [Online]. Aailable: https://flink.apache.org/.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔