臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.42) 您好！臺灣時間：2025/10/01 09:54

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

曾威箖

研究生(外文):

Wei-Lin Tseng

論文名稱:

以使用者瀏覽行為的情境感知學習於網頁類別預測

論文名稱(外文):

Learning User Browsing Behaviors for Context-Aware Web Page Category Prediction

指導教授:

陳信希

口試委員:

鄭卜任、盧文祥

口試日期:

2011-07-13

學位類別:

碩士

校院名稱:

國立臺灣大學

系所名稱:

資訊工程學研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2011

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

使用者意圖、網頁類別預測、存取紀錄、資料檢索、點擊行為

外文關鍵詞:

User Intent、Web Page Category Prediction、User Browsing Log、User Click Behavior

相關次數:

被引用:0
點閱:391
評分:
下載:0
書目收藏:2

現今網路發達及網頁服務的成長非常迅速，大數的網頁類別預測皆利用來自於使用者在入口網站及搜尋引擎的關鍵字查詢、以及與查詢結果有相關連的網頁點擊，探勘其關連來預測使用者的意圖。分析這些使用者在網站上的存取資料及結果，不僅可以幫助增進搜尋引擎回傳的查詢資料的準確度、透過網頁快取及預先儲存點擊的網頁以增加搜尋引擎的效能、與查詢關連的網頁推薦系統、個人化的網站排序系統，還可應用在商業廣告行為的產品推薦及資訊過濾的應用，所以預測使用者的意圖顯然是個很重要的議題及挑戰。

多數的研究皆以觀察使用者的查詢關鍵字及關連結果的網頁點擊，來分析使用者的意圖及瀏覽行為。本論文利用觀察使用者瀏覽網頁的存取紀錄及其網頁的類別紀錄，藉由預測使用者未來點擊的網頁類別來了解其意圖，並且實作出兩種模型：利用網頁的頂級網域名稱模型(Top-Level Domain Model)及隱藏馬可夫模型(Hidden Markov Model)來預測使用者的網頁類別。

依據上述兩種模型，我們提出混合模型(Mixture Model)，以隱藏馬可夫模型(Hidden Markov Model)配合瀏覽網址的頂級網域名稱模型(Top-Level Domain Model)加上網域的關連做最佳化。實驗證實：(1)觀察網址本身的資訊在特定的頂級網域上，的確能幫助提升網頁類別預測的準確性；(2)觀察使用者瀏覽行為的情境感知的資訊所預測的網頁類別會更加準確；(3)觀察使用者瀏覽行為的前幾次存取紀錄越多，準確率越高(HMM 1-gram, HMM 2-gram, HMM 3-gram, HMM 4-gram 的比較)。

Web activities and services are increasing rapidly. In recent years, predicting user intent most from relation between query keyword and queried result pages with search engine or portal. Analyzing users’ access data or activities on website can help web service provider to enhance the accuracy of query keyword’s result pages, to improve website’s performance by caching query keyword’s result pages and pre-fetch web pages, to improve web page recommendation system and web page ranking system personalization, to improve commercial advertisement for products and application to information filtering. So capture the context of user’s previous browsing behavior for predicting user intent is a very important issue and challenge.

Most studies are focus on user’s query keyword and relation between query keyword and next click pages in queried result page for predicting user intent. We implement two models, Top-Level Domain model(TLD) that trained by URL-based feature, Hidden Markov Model(HMM) that trained by context-aware category sequence from user’s browsing URLs. And we proposed a mixture model for combining TLD and HMM to predict category of user’s next access page. Also, to apply our proposed context-aware web page category prediction model to two filtering applications, i.e., objectionable web content filtering and web security threat prevention.

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第１章緒論 1
１.１研究背景 1
１.２研究動機與目的 2
１.３論文架構 3
第２章相關研究 4
２.１使用者的存取樣式 4
２.２使用者的意圖 5
第３章網頁類別預測模型 8
３.１使用者的存取軌跡 8
３.２使用者的瀏覽網頁類別 12
３.３頂級網域名稱模型 13
3.3.1　頂級網域名稱 13
3.3.2　頂級網域模型的訓練階段 16
3.3.3　頂級網域模型的測試階段 17
３.４隱藏馬可夫模型 18
3.4.1　隱藏馬可夫模型 18
3.4.2　隱藏馬可夫模型的訓練階段 19
3.4.3　隱藏馬可夫模型的測試階段 20
３.５混合模型 21
3.5.1　混合模型的訓練階段 21
3.5.2　混合模型的測試階段 22
第４章實驗資料集 24
４.１資料格式 24
４.２相關統計資料 25
４.３訓練與測試資料集 26
4.3.1　訓練資料集 26
4.3.2　測試資料集 27
第５章實驗與效能評估 29
５.１實驗資料集的設定 29
５.２實驗數據的評估準則 31
５.３簡易貝氏分類器、隱馬可夫及頂級網域名稱的實驗結果 35
５.４混合模型的實驗結果 36
第６章模型應用 40
６.１網路不當資訊內容過濾 41
６.２預防網路安全威脅 43
第７章結論及未來研究方向 45
７.１結論 45
７.２未來研究方向 45
參考文獻 46

Anti-Phishing Work Group, available online at www.antiphishing.org/
Autonomous System (AS), available online at http://en.wikipedia.org/wiki/Autonomous_system_(Internet)
E. Baykan, M. Henzinger, L. Marian, I. Weber. (2009) “Purely URL-based topic classification”, WWW ''09 Proceedings of the 18th international conference on World wide web, pages 1109–1110.
Fava, D.S., Byers, S.R., Yang, S.J. (2006). “Projecting Cyberattacks Through Variable-Length Markov Models”, The Journal of Machine Learning Research Volume 7, 12/1/2006, pages 359-369.
F. Sebastiani. (2002). “Machine learning in automated text categorization”, Journal ACM Computing Surveys (CSUR) Volume 34 Issue 1, pages 1-47.
H. Zuo, W. Hu, O. Wu. (2010). “Patch-based skin color detection and its application to pornography image filtering.” WWW ''10: Proceedings of the 19th international conference on World wide web, pages 1227–1228.
Internet Assigned Numbers Authority (IANA) , available online at http://www.iana.org/
IP to Country mapping, available online at http://www.ip2nation.com/
J. Z. Kolter, M. A. Maloof. (2008). “Learning to Detect and Classify Malicious Executables in the Wild”, IEEE Transactions on Information Forensics and Security Volume 3 Issue 3, pages 2721-2744.
Lee, P. Y., Hui, S. C., and Fong, A. C. M. (2002). “Neural Networks for Web Content Filtering,” IEEE Intelligent Systems Volume 17 Issue 5, pages 48-57.
Lee, P. Y., Hui, S. C., and Fong, A. C. M. (2003). “A Structural and Content-Based Analysis for Web Filtering,” Internet Research: Electronic Networking Applications and Policy Volume 13 Issue 1, pages 27-37.
L. Wenyin, G. Huang, L. Xiaoyue, Z. Min, X. Deng. (2005). “Detection of phishing webpages based on visual similarity”, WWW ''05 Special interest tracks and posters of the 14th international conference on World Wide Web, pages 1060-1061.
M. Deshpande, G. Karypis. (2004). “Selective Markov models for predicting Web page accesses”, ACM Transactions on Internet Technology (TOIT) Volume 4 Issue 2, pages 163–184.
Natural Language Toolkit (NLTK) , available online at http://www.nltk.org/
Open Directory Project (ODP) , available online at http://www.dmoz.org/
Platform for Internet Content Selection (PICS) , available online at http://www.w3.org/PICS/
PhishTank, available online at www.phishtank.com/
R. Lempel, S. Moran. (2003). “Predictive caching and prefetching of query results in search engines.” WWW ''03: Proceedings of the 12th international conference on World Wide Web, pages 19–28.
R. W. White, P. Bailey, and L. Chen. (2009). “Predicting user interests from contextual information.” SIGIR ''09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 363–370.
Root Zone Database, available online at http://www.iana.org/domains/root/db/
S. Gündüz, M. T. Özsu. (2003). “Recommendation models for user accesses to web pages” ICANN/ICONIP''03: Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing, pages 1003–1010.
Trend Micro Inc., available online at http://tw.trendmicro.com/tw/about/
Trend MicroTM URL Filtering Module (Engine), available online at http://emea.trendmicro.com/imperia/md/content/uk/products/datasheets/ds02urlf070804gb.pdf
Trend Micro Inc. WTP (Web Threat Protection), available online at http://www.trendmicro.com.tw/wtp/micro/index.asp
X. Shen, S. Dumais, and E. Horvitz. (2005). “Analysis of topic dynamics in web search.” Proceedings of the International Conference on World Wide Web, pages 1102–1103.
Yahoo Directory, available online at http://dir.yahoo.com/
Z. Cheng, B. Gao and T.Y. Liu. (2010). “Actively predicting diverse search intent from user browsing behaviors.” WWW ’10: Proceedings of the 19th international conference on World Wide Web, pages 221–230.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	情境感知之網路釣魚威脅偵測
2.	探索使用者瀏覽行為於不當內容過濾

無相關期刊

1.	探索使用者瀏覽行為於不當內容過濾
2.	情境感知之網路釣魚威脅偵測
3.	應用廣義知網以支援情緒分析之研究
4.	領域相關詞彙極性分析及文件情緒分類之研究
5.	基於遞迴神經網路的指代消解
6.	分析顧客評論之情景及其情緒分析之研究
7.	行為預測於不當內容過濾之研究
8.	中文顯性和隱性語篇關係分析之研究
9.	中文語篇標記解釋與語篇關係辨識及其在意見極性分析之研究
10.	以分類學為基礎之跨領域情緒分析方法研究
11.	運用網站日誌探勘提昇網際網路搜尋與廣告之效能
12.	將機率模型以及圖形隨機漫步理論應用在時序資料以改良網頁搜尋品質
13.	非中文母語學習者中文寫作用詞錯誤偵測及更正之研究
14.	整合主體、類別和屬性識別的知識庫簡單問題問答系統
15.	學習將自然語言敘述映射為知識圖譜表示形式以利知識庫之建立

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室