跳到主要內容

臺灣博碩士論文加值系統

(44.220.181.180) 您好!臺灣時間:2024/09/09 18:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳君櫂
研究生(外文):Chun-Chao Chen
論文名稱:應用語意分析資訊於相關回饋以進行文件分類之方法
論文名稱(外文):The Application of Semantic Analysis Information in Relevance Feedback for Document Classification
指導教授:周世傑周世傑引用關係
指導教授(外文):Shih-Chieh Chou
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:57
中文關鍵詞:資訊檢索相關回饋LDAWord2Vec語意分析
外文關鍵詞:Information RetrievalRelated FeedbackLDAWord2VecSemantic Analysis
相關次數:
  • 被引用被引用:0
  • 點閱點閱:307
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在資訊檢索領域中,相關回饋演算法是從使用者所回傳的相關文件清單中,萃取重要字詞作為回饋的特徵值,常使用向量空間模型(Vector Space Model)來表示文件之字詞特徵,然而此方法只考慮字詞出現的頻率,而未考量到字詞和文件間存在之語意關係,並且對於原始查詢字詞之語意資訊未加以利用,而近年來語意搜索(Semantic search)的研究陸續被提出,目的是挖掘字詞間隱含的語意關係。因此,本研究發展一套基於語意資訊之文件特徵擷取方法,以主題模型萃取隱含於相關文件與非相關文件中之主題資訊,並擷取出較能代表使用者資訊需求之主題字詞,再使用神經網路模型Word2Vec來分析原始查詢字詞與主題字詞間之語意資訊,也同時考量主題字詞之字詞出現情況(Term-appearance situation),最終給予不同主題字詞適當的權重。實驗結果表明,本研究提出之方法的分類準確率相較於BASELINE提升27個百分點,可以找出具代表性之重要主題字詞,進而檢索出更符合使用者資訊需求之文件。
In the field of information retrieval, the relevant feedback algorithm extracts important words as feedback feature values from the list of related documents returned by the user. The vector space model is often used to represent the word features of the document. However, this method only considering the frequency of occurrence of words, but not considering the semantic relationship between words and document, and the semantic information of the original query words is not used. And the research on semantic search has been proposed in recent years, the purpose is to explore the implicit semantic relationship between words. Therefore, this study develops a document feature extraction method based on semantic information, extracts the topic information implicit in related documents and non-related documents, and extracts the topic words that are more representative of users' information needs. Then use the neural network model Word2Vec to analyze the semantic information between the original query words and the topic words, and also consider the term-appearance situation of the topic words, and finally give appropriate weights to different topic words.
The experimental results show that the classification precision of our proposed method is 27 percentage points higher than that of BASELINE, and it can find representative and important topic words, and then retrieve the documents that are more in line with the user's information needs.
論文摘要 i
Abstract ii
目錄 iv
圖目錄 vi
表目錄 viii
一、 緒論 1
1-1 研究背景與動機 1
1-2 研究目的 2
1-3 研究範圍與限制 2
1-4 論文架構 3
二、 文獻探討 4
2-1 向量空間模型 4
2-2 相關回饋 5
2-3 主題萃取 7
2-4 字詞敏感度 8
2-5 Word2Vec 9
2-6 應用相關回饋資訊於文件分類之研究 9
2-6-1 應用向量空間模型 10
2-6-2 利用關聯規則與潛在語意分析之特徵擷取方法 11
三、 研究方法 12
3-1 系統架構 12
3-2 方法設計 13
3-2-1 萃取重要主題字詞 14
3-2-2 調整主題字詞權重 14
3-2-3 文件分類器 17
四、 實驗設計 18
4-1 實驗資料 18
4-2 實驗評估指標 21
4-3 實驗參數設定 22
4-4 實驗設計與流程 23
4-4-1 實驗一: 萃取重要主題字詞 23
4-4-2 實驗二: 調整主題字詞權重 29
4-5 實驗結果 33
4-5-1 實驗一結果 33
4-5-2 實驗二結果 33
4-6 實驗結果分析及討論 36
五、 結論 40
5-1 結論和貢獻 40
5-2 未來研究方向 41
參考文獻 42
[1] G. Salton and M. J. McGill, Introduction to modern information retrieval. New York: mcgraw-hill, 1983.
[2] F. Sebastiani, "Text categorization," in Encyclopedia of Database Technologies and Applications: IGI Global, 2005, pp. 683-687.
[3] G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Commun. ACM, vol. 18, no. 11, pp. 613-620, 1975.
[4] G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information processing & management, vol. 24, no. 5, pp. 513-523, 1988.
[5] C.-Y. Tu, "The method of combining association rules with latent semantic analysis using relevance feedback information in document classification," Master’s degree, National Central University, Taiwan, 2014.
[6] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.
[7] Lemur Project Group. (05/01). The Lemur Project. Available: http://www.lemurproject.org/
[8] A. Spink, D. Wolfram, M. B. Jansen, and T. Saracevic, "Searching the web: The public and their queries," Journal of the American society for information science and technology, vol. 52, no. 3, pp. 226-234, 2001.
[9] G. Salton, "The SMART system," Retrieval Results and Future Plans, 1971.
[10] S. E. Robertson and K. S. Jones, "Relevance weighting of search terms," Journal of the American Society for Information science, vol. 27, no. 3, pp. 129-146, 1976.
[11] G. Gay, S. Haiduc, A. Marcus, and T. Menzies, "On the use of relevance feedback in IR-based concept location," in 2009 IEEE International Conference on Software Maintenance, 2009, pp. 351-360: IEEE.
[12] D. Kelly and N. J. Belkin, "Reading time, scrolling and interaction: exploring implicit sources of user preferences for relevance feedback," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001, pp. 408-409: ACM.
[13] C. Manning, P. Raghavan, and H. Schütze, "Introduction to information retrieval," Natural Language Engineering, vol. 16, no. 1, pp. 100-103, 2010.
[14] J. Bhogal, A. MacFarlane, and P. Smith, "A review of ontology based query expansion," Information processing & management, vol. 43, no. 4, pp. 866-886, 2007.
[15] B. Zhang, Y. Du, H. Li, and Y. Wang, "Query expansion based on topics," in 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008, vol. 2, pp. 610-614: IEEE.
[16] Z. Ye, J. X. Huang, and H. Lin, "Finding a good query‐related topic for boosting pseudo‐relevance feedback," Journal of the American Society for Information Science and Technology, vol. 62, no. 4, pp. 748-760, 2011.
[17] S. Chou and W. Chang, "The identification of distinguishing term characteristics from relevance feedback," Online Information Review, vol. 33, no. 4, pp. 745-760, 2009.
[18] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[19] Z. Chen and Y. Lu, "A SVM based method for active relevance feedback," in 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), 2010, vol. 1, pp. 508-513: IEEE.
[20] T. Joachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization," Carnegie-mellon univ pittsburgh pa dept of computer science1996.
[21] H. Drucker, B. Shahrary, and D. C. Gibbon, "Relevance feedback using support vector machines," in ICML, 2001, pp. 122-129.
[22] H. Drucker, B. Shahrary, and D. C. Gibbon, "Support vector machines: relevance feedback and information retrieval," Information processing & management, vol. 38, no. 3, pp. 305-323, 2002.
[23] F. Sebastiani, "Machine learning in automated text categorization," ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1-47, 2002.
[24] T. L. Griffiths and M. Steyvers, "Finding scientific topics," Proceedings of the National academy of Sciences, vol. 101, no. suppl 1, pp. 5228-5235, 2004.
[25] J.-X. Zeng, "The utilization of the semantic analysis technique in the application of relevance feedback," M.B.A thesis, National Central University, Taiwan, 2012.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
1. 應用相關回饋之醫學字詞資訊於醫學查詢擴展之方法
2. 應用查詢擴展字詞及原始查詢字詞之語意資訊於文件重排序之方法
3. 建構與應用特殊語詞資訊於文件重排序之研究
4. 利用語意分析於相關回饋以進行查詢擴展之方法
5. 60 GHz 頻帶基於濾波器組多載波技術與循環前綴正交分頻多工加權重疊累加技術之數位基頻接收機設計及效能評估
6. 次世代巨量天線徑分多重接取蜂巢式系統中手機同步及基地台搜尋方法之模擬與實現
7. 60GHz頻帶單一載波基頻接收機之同相與正交分量不平衡及直流偏壓準位偏移的聯合適應估測與補償設計
8. 發展一個整合應用視覺詞頻率與文字語意於自動圖像註解系統的方法
9. 整合查詢擴展融合與MeSH醫學字詞重排序之醫學文件檢索方法
10. 次奈秒讀寫及次毫瓦每千兆赫茲之32Kb 5T靜態隨機存取記憶體和靜態隨機存取記憶體內運算之多位元緩衝器設計
11. 工作於次奈秒讀取且次毫瓦每千兆赫茲之28奈米製程0.45伏電壓之32Kb 5T之靜態隨機存取記憶體與記憶體內運算架構
12. 應用於60GHz頻帶10Gbps單一載波基頻之脈波成形和同相與正交分量不平衡及直流偏壓準位偏移聯合補償設計
13. 毫米波頻帶之單一使用者多輸入多輸出混合式預編碼與結合器設計
14. 毫米波之單一載波基頻接收機運用機器學習方法聯合補償類比前端不理想效應之架構
15. 採用多相位連鎖數位控制震盪器陣列的可合成注入式鎖相迴路