跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.97) 您好!臺灣時間:2026/03/16 23:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:羅聖崴
研究生(外文):Sheng-Wei Luo
論文名稱:利用機器學習與特徵選取技術之影片流行度影響的關鍵因素探討
論文名稱(外文):Utilizing machine learning and feature selection techniques to determine key factors for the video popularity
指導教授:蔡垂雄林冠成林冠成引用關係
指導教授(外文):Chwei-Shyong TsaiKuan-Cheng Lin
口試委員:吳憲珠
口試日期:2019-07-03
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊管理學系所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:55
中文關鍵詞:用戶生成內容機器學習資料分析流行度預測行為分析
外文關鍵詞:User Generated ContentMachine LearningData AnalysisPopularity PredictionBehavior Analysis
相關次數:
  • 被引用被引用:1
  • 點閱點閱:439
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
近年來,用戶生成內容模式之線上平台的發展程度相較過去10年有巨大的提升,相關平台例如: YouTube、Facebook、Twitter或Instagram,平台使用者可隨時與其他使用者分享不同種類內容,現今用戶生成內容與從前的隨選視訊(Video On Demand, VoD) (Cha, Kwak, Rodriguez, Ahn, & Moon, 2009)差異有下述三點。
1.提供多元社交功能,例如訂閱、喜歡/不喜歡、留言區與社群區等。
2. 現今平台擁有大量使用者,創造很大的流量,例如YouTube擁有超過10億的使用者,且每天觀看影片的總時數突破十億小時(YouTube新聞中心,2019)。
3. 種類多樣的生成內容,例如YouTube有電玩、美食、新聞、運動、音樂與節目等內容種類。
用戶生成內容模式中最具代表性的平台為YouTube,且很多相關YouTube的商業模式也正在快速發展,不論是類似平台競爭對手還是舊媒體或學術單位都興起許多相關的研究,並開始關注用戶生成內容平台的成功關鍵因素,且關鍵因素中很大部分有關於數位行銷。由於用戶生成內容是一種使用者行為模式,在開放性平台上以社群或各種媒體管道傳遞,從前的用戶生成內容分析相當侷限,在資料蒐集以及分析的方法上皆不完善,例如,影片流行度的預測準確率不高、資料特徵不足問題等。本研究以YouTube平台為研究對象,針對用戶生成內容資料的特徵定義,提出更適當的方法與流程,透過影片特徵的數據化結合資料分析技術,進一步搭配機器學習開發影片流行度之預測分類模型。最後,加入進階機器學習技術優化特徵與模型,並且整理出最佳特徵組,搭配統計製程控制概念提高模型的可解釋性,為使用者提供使用模型的方法,期盼此模型可幫助平台與創作者提升影片廣告價值。
In recent years, User-generated content orientation online platform had a dramatically increased than decades ago. For example: YouTube, Facebook, Twitter or Instagram. Users can share their personal videos via these online platforms at anytime and anywhere with no limited. There are three main diversities between the modern User-generated content and previous Video On Demand content as:
1. Provide multiple social connection, such as subscription, like/dislike, comment area and social area.
2. More extensive and active platform. For example, YouTube has more than one billion users, and one billion hours of content being watched on YouTube every day.
3. High diversity contents, such as Gaming, Food, News & Politics, Sports, Music and Travel & Events.
YouTube is the most representative User-generated content platforms. Meanwhile, the other similar business models are rapid growth as well. In order to dig out the key factors why the YouTube platform can be so successful, those competitors, media from old times, or even academics are all putting effort on it. And a large part of the key factors is about digital marketing. Since the User-generated content is kind of user behavior which transfer and connect with others on social internet and media via open platform. So it was hard to analyze the User-generated content before, both data collection and data analysis are imperfect in particular. Such as the accuracy of video popularity prediction or lake of data characteristics. This research aims to study YouTube platform, and focus on the feature’s definition of UGC data to propose a proper method and flow. This research will use digitization of video features combine with the data analysis technique. Moreover, machine learning will be applied into developing the model of User-generated contents predict model. Lastly, by adding the advanced data science technology to optimize the characteristic and model. And sorting out the best features subset combine with the concept of Statistical Process Control to increase the interpretability. This research provides methods of using model. The model is expected to help YouTube platform and YouTubers enhance the advertising values of video.
摘要 i
Abstract ii
目次 iii
表目次 vi
圖目次 viii
第一章 緒論 1
1.1研究背景與動機 1
1.2研究目的 2
1.3論文架構 3
第二章 文獻探討 4
2.1用戶生成內容與流行度 4
2.1.1用戶生成內容 4
2.1.2 YouTube的成立背景與簡史 4
2.1.3 YouTube的商業模式 5
2.1.4流行度的定義 5
2.1.5觀看數與訂閱數的關係 6
2.1.6 YouTube經營常見問題 8
2.1.7 YouTube頻道經營法則 8
2.1.8用戶生成內容相關研究領域-以YouTube為例 9
2.2預測影片流行度之相關研究 10
2.2.1影片流行度特徵的研究 11
2.2.2機器學習預測流行度研究 12
2.3機器學習 15
2.3.1支援向量機演算法 15
2.3.2決策樹演算法 16
2.3.3隨機森林演算法 17
2.3.4邏輯斯回歸演算法 18
2.3.5線性回歸演算法 18
2.3.6特徵選擇方法 18
2.4機器學習模型的評估方法與指標 20
2.4.1 Holdout 驗證 20
2.4.2 K倍交叉驗證 20
2.4.3分類模型評估指標 21
2.4.4回歸模型評估指標 22
2.5統計製程控制 23
2.6數位行銷 23
第三章 研究方法 25
3.1 研究流程 25
3.2 YouTuber影片資料集 27
3.3訪問創作者 28
3.4資料的統計分析 29
3.5資料前處理 29
3.5.1遺漏值處理 29
3.5.2特徵選擇方法 30
3.6機器學習演算法的流行度預測 31
3.6.1支援向量機 31
3.6.2其他分類演算法 32
3.6.3線性回歸 32
第四章 實驗結果與討論 33
4.1實驗環境 33
4.2研究數據前處理 33
4.3研究數據敘述性統計分析 33
4.4回歸相關性分析 35
4.5分類實驗結果 36
4.5.1分類實驗一 36
4.5.2分類實驗二 37
4.5.3分類實驗三 38
4.5.4分類實驗四 39
4.5.5特徵選取的最佳特徵組 39
4.5.6最佳特徵組分析 41
第五章 結論與建議 45
5.1結論 45
5.2研究限制與未來研究建議 46
參考文獻 47
中文部分
李航(2012)。統計學習方法。北京:清華大學出版社。
張翔、廖崇智(2016)。提綱挈領學統計。臺北市:鼎茂圖書出版股份有限公司。
劉文良(2018)。網路網路行銷(第六版)。臺北市:碁峰資訊股份有限公司。
劉立民、吳建華(譯)(2016)。Python機器學習(原作者:S. Raschka)。新北市:博碩文化股份有限公司。
陳瑞陽(2006)。網路行銷。臺北市:學貫行銷股份有限公司。
網路資源部分
何宛芳(2009)。數位行銷 就是現在。取自: https://www.bnext.com.tw/article/12152/BN-ARTICLE-12152
林宗勳(2000)。Support Vector Machines簡介。取自:http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/SVM2.pdf
阿福筆記(2018)。如何成為月入10萬的YouTube網紅。取自:https://affnotes.com/youtube-make-money/
陳加忠(2012)。決定係數R2之判斷標準。取自:http://amebse.nchu.edu.tw/new_page_535.htm/
黃衍明(2002)。因演算法之基本概念、方法與國內相關研究概況。取自:http://myweb.ncku.edu.tw/~ftlin/course/CAAD/CourseInformation/document/genetictaiwan.pdf
葉名山、吳承瑋(2018)。以羅吉斯回歸模式分析影響汽車道路駕駛考驗通過因素及汽車道路駕駛考驗實施現況探討。取自:http://ts.cpu.edu.tw/bin/downloadfile.php?file=WVhSMFlXTm9MekUwTDNCMFlWOHhOVFUwTTE4eE9ERXlNVFUzWHpNNE1USXhMbkJrWmc9PQ==&fname=A5RPVXUXYXUTNPOPKLIHSXRP45YXOPQPHD01SX4545QPFH45HDRPSXCHA5VT01FHML01EDOP35YXGHLO45A531VXKLMLNLRPGHA5HDCHHDRP50SXED11UWOP45VT11DDA5EDCDVXHDZX5055STML40B5MLVTCDRPSTRPYXB5UXSTCDDD51RPUWEHKLMLCDSXML01VTCHKLOP5040RLMLQPA5HDZX11RPA5VTWXPK45QPDG4551JHKPVXSTOPWX4515EDHD45SXUTZXEHCDSTQPQP35VTVTRPRLNLOPUTA5UTUXEHCDQP50EHUTVTGHCDHDEDWT55UXQP14QP45A531VX14OP01SXIHJHEHCDHDRPGHVT51FDEDRPIHVT11VT45A5HDPK15QPCDOLSTIH31B535VTCD40RLNLZXQP45YXJH55CDQP31EHB1ML0150HDED45CD15UTCDLOGHIHUWQPB1YX11QP35EDGH45KLQP45VT5101ED45IHVTTWRP45A5HDPKHDMLGDA5140140B551ZXVTOPHD01JD45SXRP45QL51RP31RPCDOPKLOPHDJHEDCHJDA5GHQLRLMLKOVX
維基百科(2019a)。用戶生成內容。取自:https://zh.wikipedia.org/wiki/%E7%94%A8%E6%88%B7%E7%94%9F%E6%88%90%E5%86%85%E5%AE%B9
維基百科(2019b)。YouTube。取自:https://zh.wikipedia.org/wiki/YouTube
維基百科(2019c)。機器學習。取自:https://zh.wikipedia.org/wiki/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0
維基百科(2019d)。支援向量機。取自:https://zh.wikipedia.org/wiki/%E6%9C%80%E8%BF%91%E9%84%B0%E5%B1%85%E6%B3%95
維基百科(2019e)。決策樹。取自:https://zh.wikipedia.org/wiki/%E5%86%B3%E7%AD%96%E6%A0%91
維基百科(2019f)。隨機森林。取自:https://zh.wikipedia.org/wiki/%E9%9A%8F%E6%9C%BA%E6%A3%AE%E6%9E%97
維基百科(2019g)。線性回歸。取自:https://zh.wikipedia.org/wiki/%E7%B7%9A%E6%80%A7%E5%9B%9E%E6%AD%B8
維基百科(2019h)。特徵選擇。取自:https://zh.wikipedia.org/wiki/%E7%89%B9%E5%BE%81%E9%80%89%E6%8B%A9#cite_note-guyon-intro-3
維基百科(2019i)。遺傳演算法。取自:https://zh.wikipedia.org/wiki/%E9%81%97%E4%BC%A0%E7%AE%97%E6%B3%95
維基百科(2019j)。交叉驗證。取自:https://zh.wikipedia.org/wiki/%E4%BA%A4%E5%8F%89%E9%A9%97%E8%AD%89#cite_note-4
維基百科(2019k)。混淆矩陣。取自:https://zh.wikipedia.org/wiki/%E6%B7%B7%E6%B7%86%E7%9F%A9%E9%98%B5
維基百科(2019l)。皮爾遜積差相關係數。取自:https://zh.wikipedia.org/wiki/%E7%9A%AE%E5%B0%94%E9%80%8A%E7%A7%AF%E7%9F%A9%E7%9B%B8%E5%85%B3%E7%B3%BB%E6%95%B0
維基百科(2019m)。均方根誤差。取自:https://zh.wikipedia.org/wiki/%E5%9D%87%E6%96%B9%E6%A0%B9%E8%AF%AF%E5%B7%AE
維基百科(2019n)。管制圖。取自:https://zh.wikipedia.org/wiki/%E7%AE%A1%E5%88%B6%E5%9C%96
維基百科(2019o)。數位行銷。取自:https://zh.wikipedia.org/wiki/%E6%95%B8%E4%BD%8D%E8%A1%8C%E9%8A%B7
簡書(2018)。RF和Feature Importance函數。取自:https://www.jianshu.com/p/d289697b0436
Amazon Machine Learning (2019)。將資料分割為訓練和評估資料。取自:https://docs.aws.amazon.com/zh_tw/machine-learning/latest/dg/splitting-the-data-into-training-and-evaluation-data.html
Big Data in Finance (2017)。機器學習經典算法優缺點總結。取自:https://bigdatafinance.tw/index.php/392-2017-06-01-13-30-40
CSDN (2015)。機器學習算法---隨機森林實現(包括回歸和分類)。取自:https://blog.csdn.net/jiede1/article/details/78245597
d0evi1 (2015)。sklearn中的特徵提取。取自:http://d0evi1.com/sklearn/feature_selection/。
MBAlib (2018)。統計製程控制。取自:https://wiki.mbalib.com/zh-tw/SPC
Medium (2018a)。交叉驗證(Cross-validation, CV)。取自:https://medium.com/@chih.sheng.huang821/%E4%BA%A4%E5%8F%89%E9%A9%97%E8%AD%89-cross-validation-cv-3b2c714b18db
Medium (2018b)。Confusion Matrix(混淆矩陣)。取自:https://medium.com/@c824751/confusion-matrix-%E6%B7%B7%E6%B7%86%E7%9F%A9%E9%99%A3-f6ddf6e6aa58
Medium (2018c)。機器學習統計方法:模型評估-驗證指標(validation index)。取自:https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%B5%B1%E8%A8%88%E6%96%B9%E6%B3%95-%E6%A8%A1%E5%9E%8B%E8%A9%95%E4%BC%B0-%E9%A9%97%E8%AD%89%E6%8C%87%E6%A8%99-b03825ff0814
Medium (2017d)。資料前處理(Missing data, One-hot encoding, Feature Scaling)。取自:https://medium.com/jameslearningnote/%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90-%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%AC%AC2-4%E8%AC%9B-%E8%B3%87%E6%96%99%E5%89%8D%E8%99%95%E7%90%86-missing-data-one-hot-encoding-feature-scaling-3b70a7839b4a
PicSee 短網址 (2018)。【YouTube大數據】訂閱多真的觀看就多嗎?。取自:https://medium.com/picsee-official/youtube%E5%A4%A7%E6%95%B8%E6%93%9A-%E8%A8%82%E9%96%B1%E4%BA%BA%E6%95%B8%E8%B7%9F%E5%BD%B1%E7%89%87%E8%A7%80%E7%9C%8B%E6%95%B8%E5%88%B0%E5%BA%95%E6%9C%89%E6%B2%92%E6%9C%89%E9%97%9C%E4%BF%82-a89ebad990cc
YouTube 新聞中心(2019)。YouTube 統計數據。取自:https://www.youtube.com/yt/about/press/。
YouTube說明中心(2019)。問題搜尋。取自:https://support.google.com/youtube/?hl=zh-Hant#topic=9257498
英文部分
Borghol, Y., Mitra, S., Ardon, S., Carlsson, N., Eager, D., & Mahanti, A. (2011). Characterizing and modelling popularity of user-generated videos. Performance Evaluation, 68(11), 1037-1055.
Bandari, R., Asur, S., & Huberman, B. A. (2012). The pulse of news in social media: Forecasting popularity. ICWSM, 12, 26-33.
Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of machine learning research, 2(Dec), 125-137.
Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., ... & Haley, C. S. (2015). Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific reports, 5, 10312.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Cha, M., Kwak, H., Rodriguez, P., Ahn, Y. Y., & Moon, S. (2007, October). I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (pp. 1-14). ACM.
Cha, M., Kwak, H., Rodriguez, P., Ahn, Y. Y., & Moon, S. (2009). Analyzing the video popularity characteristics of large-scale user generated content systems. Ieee/Acm Transactions On Networking (Ton), 17(5), 1357-1370.
Ding, W., Shang, Y., Guo, L., Hu, X., Yan, R., & He, T. (2015, October). Video popularity prediction by sentiment propagation via implicit network. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 1621-1630). ACM.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874.
Figueiredo, F., Almeida, J. M., Gonçalves, M. A., & Benevenuto, F. (2014). On the dynamics of social media popularity: A YouTube case study. ACM Transactions on Internet Technology (TOIT), 14(4), 24.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
Hoiles, W., Aprem, A., & Krishnamurthy, V. (2017). Engagement and Popularity Dynamics of YouTube Videos and Sensitivity to Meta-Data. IEEE Transactions on Knowledge & Data Engineering, (7), 1426-1437.
Jia, A. L., Shen, S., Epema, D. H., & Iosup, A. (2016). When game becomes life: The creators and spectators of online game replays and live streaming. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 12(4), 47.
Jia, A. L., Shen, S., Li, D., & Chen, S. (2018). Predicting the implicit and the explicit video popularity in a User Generated Content site with enhanced social features. Computer Networks, 140, 112-125.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: springer.
Kaytoue, M., Silva, A., Cerf, L., Meira Jr, W., & Raïssi, C. (2012, April). Watch me playing, i am a professional: a first study on video game live streaming. In Proceedings of the 21st International Conference on World Wide Web (pp. 1181-1188). ACM.
Khosla, A., Das Sarma, A., & Hamid, R. (2014, April). What makes an image popular?. In Proceedings of the 23rd international conference on World wide web (pp. 867-876). ACM.
Li, H., Ma, X., Wang, F., Liu, J., & Xu, K. (2013, October). On popularity prediction of videos shared in online social networks. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management (pp. 169-178). ACM.
Pires, K., & Simon, G. (2015, March). YouTube live and Twitch: a tour of user-generated live streaming systems. In Proceedings of the 6th ACM Multimedia Systems Conference (pp. 225-230). ACM.
Pinto, H., Almeida, J. M., & Gonçalves, M. A. (2013, February). Using early view patterns to predict the popularity of youtube videos. In Proceedings of the sixth ACM international conference on Web search and data mining (pp. 365-374). ACM.
Roy, S. D., Mei, T., Zeng, W., & Li, S. (2013). Towards cross-domain learning for social video popularity prediction. IEEE Transactions on multimedia, 15(6), 1255-1267.
Rizoiu, M. A., Xie, L., Sanner, S., Cebrian, M., Yu, H., & Van Hentenryck, P. (2017, April). Expecting to be HIP: Hawkes intensity processes for social media popularity. In Proceedings of the 26th International Conference on World Wide Web (pp. 735-744). International World Wide Web Conferences Steering Committee.
Szabo, G., & Huberman, B. A. (2008). Predicting the popularity of online content. Available at SSRN 1295610.
Trzciński, T., & Rokita, P. (2017). Predicting popularity of online videos using support vector regression. IEEE Transactions on Multimedia, 19(11), 2561-2570.
Vallet, D., Berkovsky, S., Ardon, S., Mahanti, A., & Kafaar, M. A. (2015, October). Characterizing and predicting viral-and-popular video content. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 1591-1600). ACM.
Xu, J., Van Der Schaar, M., Liu, J., & Li, H. (2015). Forecasting popularity of videos using social media. IEEE Journal of Selected Topics in Signal Processing, 9(2), 330-343.
Yu, H., Xie, L., & Sanner, S. (2014, November). Twitter-driven youtube views: Beyond individual influencers. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 869-872). ACM.
Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., & Sugiyama, M. (2014). High-dimensional feature selection by feature-wise kernelized lasso. Neural computation, 26(1), 185-207.
Zhu, C., Cheng, G., & Wang, K. (2017). Big data analytics for program popularity prediction in broadcast TV industries. IEEE Access, 5, 24593-24601.
網路資源部分
Alexa (2018). Top Sites in Taiwan. https://www.alexa.com/topsites/countries/TW
DATA VERSITY (2017). Machine Learning Algorithms: Introduction to Random Forests. https://www.dataversity.net/machine-learning-algorithms-introduction-random-forests/
HOOTSUITE (2019). 23 Smart Ways to Promote Your YouTube Channel. https://blog.hootsuite.com/how-to-promote-your-youtube-channel/
I failed the Turing Test (2017). Feature Engineering. https://vinta.ws/code/feature-engineering.html
Jekyll (2013). Differences between L1 and L2 as Loss Function and Regularization. http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/
scikit-learn (2019a). sklearn.svm.SVC. https://scikit-learn.org/stable/modules/tree.html
scikit-learn (2019b). Decision Trees. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
scikit-learn (2019c). sklearn.ensemble.RandomForestClassifier. https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Search Engine Journal (2018). YouTube SEO from Basic to Advanced: How to Optimize Your Videos. https://www.searchenginejournal.com/youtube-seo-video-optimization/260757/
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊