跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.59) 您好!臺灣時間:2025/10/16 14:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡子龍
研究生(外文):Zih-Long Cai
論文名稱:以概念階層為基礎之互動式網路查詢擴展系統
論文名稱(外文):Interactive Web Query Expansion Using Concept Hierarchy
指導教授:曾守正曾守正引用關係
指導教授(外文):Frank S.C. Tseng
學位類別:碩士
校院名稱:國立高雄第一科技大學
系所名稱:資訊管理所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:147
中文關鍵詞:資訊距離知識結構網路探勘查詢擴展
外文關鍵詞:Knowledge StructureWeb MiningQuery ExpansionInformation Distance
相關次數:
  • 被引用被引用:2
  • 點閱點閱:521
  • 評分評分:
  • 下載下載:88
  • 收藏至我的研究室書目清單書目收藏:2
傳統線上資料檢索環境中,資料檢索之比對均採用布林 (Boolean) 運算之比對模式。在此種比對模式下,其文字資料的比對僅會呈現0或1二種運算結果 (0代表兩字串不同;1則表示相同)。但於實際的應用上,兩極化的比對結果並不適用於真實的環境當中。因此,在兩極化的結果中尚存在著模糊 (Fuzzy) 的比對空間。至於該如何對其模糊空間進行分析處理,則為語意分析之核心概念。
本研究主要改善過去資料查詢過程所採用的二值化布林比對方式。在分析過程中,導入經由網路探勘 (Web Mining) 所自動化建置之知識結構 (Automatic Constructed Knowledge Structure),其中所記錄的字詞資訊包含概念階層 (Concept Hierarchy)、連結類型 (Link Type)、資訊距離 (Information Distance)。透過上述資訊,進而推演出更多較為可能以及具有潛在關聯性之查詢結果,達到查詢擴展 (Query Expansion) 之目的。
另一方面,在查詢擴展策略上,為了讓不具專業背景知識或無豐富經驗之使用者能夠更方便且快速的表達其搜尋概念 (Search Concept),本研究則提出不同類型之查詢擴展策略,以協助使用者聚焦 (Narrow) 或放大(Broad)其搜尋範圍,使其結果更為貼近使用者的資訊需求,並縮短使用者於網路上進行資料檢索之時程。
Traditional data retrieval processes usually use the Boolean operation to filter out unnecessary data. Such approach only produces a ‘0’ or ‘1’ evaluation result, where ‘0’ expresses both terms are different, and ‘1’ means they are equivalent. However, the Boolean operation method is not practical enough, as there may be some fuzzy space laid between such two-type results. However how to analyze the fuzzy space is a Latent Semantic Analysis topic.
Our objective in this thesis is to improve the crisp analysis induced by the Boolean operation method. We conducted Web Mining techniques in building an auto-constructed Knowledge Structure, which keeps useful information of terms, including Concept Hierarchy, Link Type and Information Distance. To utilize this useful information, our approach introduces a Query Expansion process to extend the searching result with potential associations with user’s searching concept.
On the other hand, for those users who are not well-experienced or are lacking of professional domain knowledge, we provide various types of Query Expansion strategies to assist users in narrowing or broadening the searching scopes. Based on our approach, users could spend less time and effort in the on-line data retrieval process, but gain more searching result, together with some useful information close to their needs.
中文摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
圖目錄 viii
表目錄 x
1. 緒論 1
1.1. 研究背景與動機 1
1.2. 研究目的 3
1.3. 研究範圍與限制 4
1.4. 研究貢獻 5
1.5. 研究流程 6
1.6. 論文結構 7
2. 文獻探討 8
2.1. 網路探勘 (Web Mining) 8
2.1.1. 網路內容探勘 (Web Content Mining) 11
2.1.2. 網路結構探勘 (Web Structure Mining) 13
2.1.3. 網路使用性探勘 (Web Usage Mining) 13
2.2. 擴增查詢 (Query Expansion) 14
2.2.1. 查詢擴展之分類 17
2.2.1.1. 手動式擴增查詢 (Manual Query Expansion) 17
2.2.1.2. 自動式查詢擴展 (Automatic Query Expansion) 17
2.2.1.3. 互動式查詢擴展 (Interactive Query Expansion) 18
2.2.2. 查詢擴展之知識結構 18
2.2.2.1. 參照於識別結果 (Based On Search Result) 19
2.2.2.2. 參照於知識結構 (Based On Knowledge Structure) 19
2.3. 網路查詢擴展系統 (Web Query Expansion System) 20
2.4. Google資訊距離 30
2.4.1. 資訊距離 (Information Distance) 30
2.4.2. 標準化之壓縮距離 (Normalized Compression Distance) 31
2.4.3. 以Google為基礎之相似性計算 32
2.4.3.1. Google語意分析 33
2.4.3.2. Google語意相似性計算 34
2.5. 關聯法則 35
2.5.1. 關聯法則之定義描述 36
2.5.2. 關聯法則於相關領域之應用 37
2.5.3. 動態式關聯法則探勘 38
2.6. 相似度計算 41
2.7. 字詞連結類型 (Term Link Type) 43
3. 研究方法 47
3.1. 系統架構 47
3.2. 系統流程 48
3.2.1. 查詢初始階段 48
3.2.2. 查詢概念分析階段 49
3.2.3. 查詢改良階段 51
3.3. 系統單元 53
3.3.1. 概念分析引擎 53
3.3.2. 網路探勘代理人 55
3.3.2.1. 網路資料內容類型 56
3.3.2.2. 內容擷取處理 57
3.3.2.3. 詞彙處理 58
3.3.2.3.1. 中文斷詞處理 58
3.3.2.3.2. 中文斷詞規則 62
3.3.2.3.3. 英文詞幹處理 63
3.3.2.3.4. 停用表之字詞過濾 64
3.3.2.3.5. 關鍵字萃取 65
3.3.3. 關鍵字間關聯性推理機 66
3.3.3.1. 字詞知識結構 (Terms Knowledge Structure) 66
3.3.3.2. 查詢擴展策略 67
3.3.3.3. 關聯性評估單元 68
3.3.3.3.1. 同義詞評估與萃取方法 69
3.3.3.3.2. 上義詞與下義詞評估與萃取方法 70
3.3.3.3.3. 概念相關詞評估與萃取方法 72
3.3.3.3.4. 概念分群處理流程 76
3.3.3.3.5. 使用者查詢擴展策略 79
4. 系統設計 83
4.1. 伺服端與客戶端互動流程 83
4.2. 使用者查詢擴展流程 85
4.3. 系統資料收集與知識結構建置流程 86
4.3.1. 網路探勘文字處理 86
4.3.1.1. 資料搜集階段 87
4.3.1.2. 文字內容萃取處理階段 87
4.3.1.3. 語言處理階段 88
4.3.1.4. 關鍵詞萃取階段 89
4.3.1.5. 相似性詞彙分析階段 90
4.3.2. 字詞概念階層歸納分析 90
4.3.2.1. 隸屬階層分析階段 91
4.3.2.2. 概念階層詞組更新階段 93
4.3.2.3. 概念子群組分析階段 93
4.3.3. 知識結構建置 94
4.3.3.1. 詞彙前置分析階段 95
4.3.3.2. 字詞連結類型分析階段 96
4.3.3.2.1. 同義詞連結關係萃取演算法 97
4.3.3.2.2. 上義詞與下義詞連結關係萃取演算法 98
4.3.3.2.3. 概念相關詞連結關係萃取演算法 101
4.3.3.3. 關聯法則分析階段 103
4.3.3.3.1. 關聯法則之資料更新演算法 104
4.3.3.3.2. 關聯法則之探勘演算法 107
4.3.3.4. 隱藏性關聯推論 109
4.3.3.4.1. 同義詞與同義詞之隱藏性關聯推論 111
4.3.3.4.2. 同義詞與上義詞之隱藏性關聯推論 112
4.3.3.4.3. 同義詞與下義詞之隱藏性關聯推論 112
4.3.3.4.4. 同義詞與概念相關詞之隱藏性關聯推論 113
4.3.3.4.5. 概念相關詞與概念相關詞之隱藏性關聯推論 114
4.3.3.4.6. 上義詞與上義詞之隱藏性關聯推論 115
4.3.3.4.7. 下義詞與下義詞之隱藏性關聯推論 115
5. 總論 117
6. 未來研究方向 118
6.1. 跨語言資訊檢索系統 (Cross-Language Information Retrieve System) 118
6.1.1. 跨語言檢索與機器翻譯之差異性 118
6.1.1.1. 詞彙內容差異 118
6.1.1.2. 評估準則差異 119
6.1.1.3. 語言轉換知識來源差異 119
6.1.2. 跨語言資訊檢索系統之語言轉換知識結構 120
6.1.2.1. 手動式知識結構建置方式 120
6.1.2.2. 自動式知識結構建置方式 120
6.1.2.2.1. 第三方語言分析法 (The Third Party Language Analysis) 121
6.1.2.2.2. 超連結分析 (Hyperlink Structure & Context Analysis) 122
6.1.2.2.3. 隱性語意分析 (Latent Semantic Analysis) 123
6.2. 應用語意相似性比對於專利迴避分析系統 124
6.2.1. 研究動機 124
6.2.2. 研究目的 124
6.2.3. 研究方法 125
6.2.4. 系統架構 126
參考文獻 128
[1]陳永德,「中文斷詞中長詞優先、詞頻對比、前詞優先規則之使用」,國立台灣大學心理學研究所,博士論文,1997。
[2]曾元顯,「關鍵詞自動擷取技術之探討」,中國圖書學會會訊,第106期,1997。
[3]曾元顯,「關鍵詞自動擷取技術與相關詞回饋」,中國圖書館學會會報,第59期,1997。
[4]曾守正、曾有德,「以Web 2.0概念建構自動化文件分類與內容相似度比對之研究」,第十九屆國際資訊管理學術研討會論文集,國立暨南大學主辦,2008。
[5]Agrawal R. and Srikant R., “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Databases, 1994, pp. 487-499.
[6]Ballesteros L. and Croft W. B., “Resolving Ambiguity for Cross-Language Retrieval,” Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 64-71.
[7]Bennett C. H., Gacs P., Ming L., Vitanyi M. B. and Zurek W. H., “Information Distance,” IEEE Transactions on Information Theory, Vol. 44, No. 4, July 1998, pp. 1407-1423.
[8]Borin L., “You’ll take the high road and I’ll take the low road: Using a Third Language to Improve Bilingual Word Alignment,” Proceedings of the 18th International Conference on Computational Linguistics, 2000, pp. 97-103.
[9]Brown P., Pietra S. A. D., Pietra V. D. J. and Mercer R. L., “The Mathematics of Machine Translation,” Computational Linguistics, Vol. 19, No. 2, 1993, pp. 263-312.
[10]Chakrabarti S., Dom B., Gibson D., Kleinberg J., Raghavan P. and Pajagopalan S., “Automatic resource list compilation by analyzing hyperlink structure and associated text,” proceedings of the 7th World Wide Web Conference on Computer Networks and ISDN System, Vol. 30, 1998, pp. 65-74.
[11]Chien, L. F., “PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval,” Proceedings of the 1997 ACM SIGIR, pp.50-58.
[12]Cilibrasi R. L. and Vitanyi P., “The Google Similarity Distance,” IEEE Transactions on Knowledge and Data Engineering, Vol.19, No.3, 2007, pp. 370-383.
[13]T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, 1991.
[14]Dagan I., Church K. W. and Gale W. A., “Robust Bilingual Word Alignment for Machine Aided Translation,” Proceedings of the Workshop on Very Large Corpora, 1993, pp. 1-8.
[15]Dumais S. T., Landauer T. K. and Littman M. L., “Automatic Cross-linguistic Information Retrieval Using Latent Semantic Indexing,” Proceedings of Association for the Advancement of Artificial Intelligence Spring Symposium on Cross-language Text and Speech Retrieval, 1997, pp. 15-21.
[16]Efthimiadis E. N., “Query Expansion,” Annual Review of Information Science and Technology, Vol. 31, 1996, pp. 121-187.
[17]Etzioni, O., “The World-Wide Web: quagmire or gold mine?” Communications of the ACM, Vol. 39, No. 11, 1996, pp. 65-68.
[18]Evangelista A. J. and Kjos-Hanssen B., “Google Distance Between Words,” Frontiers in Undergraduate Research, University of Connecticut, 2006.
[19]Fong J., Hughes J. G. and Zhu J., “Online Web Mining Transactions Association Rules Using Frame Metadata Model,” Proceedings of 1st International Conference on Web Information Systems Engineering, Vol. 2, 2000, pp. 121-129.
[20]Fonseca B. M., Golgher P. and Possas B., “Concept-Based Interactive Query Expansion,” ACM Conference on Information and Knowledge Management, 2005, pp. 696-703.
[21]Fonseca F. T. and Egenhofer M. J., “Ontology-driven Geographic Information Systems,” Proceedings of the 7th ACM Symposium on Advances in Geographic Information Systems, November 1999, pp. 14-19.
[22]Gligorov R., Aleksovski Z., Kate W. T. and Harmelen F. V., “Using Google Distance to weight approximate ontology matches,” Proceedings of IEEE International Conferences on World Wide Web, May 2007, pp. 767-776.
[23]Golenberg K., Kimelfeld B. and Sagiv Y., “Keyword Proximity Search in Complex Data Graphs,” Proceedings of ACM SIGMOD International Conference on Management of Data, June 2008, pp. 927-940.
[24]Haddad M. H., Chevallet J. P. and Bruandet M. F., “Relations between Terms Discovered by Association Rules,” Proceedings of the 4th European conference on Principles and Practices of Knowledge Discovery in Databases Workshop on Machine Learning and Textual Information Access, 2000.
[25]Hu X. and Wu B., “Automatic Keyword Extraction Using Linguistic Features,” Proceedings of IEEE International Conference on Data Mining - Workshops, 2006.
[26]Huang C. C., Chuang S. L. and Chien L. F., “Using a Web-based Categorization Approach to Generate Thematic Metadata from Texts,” ACM Transactions on Asian Language Information Processing, Vol. 3, No. 3, 2004, pp. 190-212.
[27]Joho, H., Coverson, C., Sanderson, M. and Beaulieu M., “Hierarchical Presentation of Expansion Terms,” Proceedings of ACM International Conference on Applied Computing, 2002, pp. 645-649.
[28]Joshi R. R. and Aslandogan Y. A., “Concept-based Web Search Using Domain Prediction and Parallel Query Expansion,” Proceedings of IEEE International Conference on Information Reuse and Integration, September 2006, pp. 166-171.
[29]Kalfoglou Y., Domingue J., Motta E., Vargas-Vera M. and Buckingham-Shum S., “MyPlanet: an ontology-driven web-based personalised news service,” Proceedings of International Joint Conferences on Artificial Intelligence workshop on Ontologies and Information Sharing, August 2001, pp. 44-52.
[30]Kleinberg J., “Authoritative Sources in a Hyperlinked Environment,” Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, ACM, 1998, pp. 668-677.
[31]Kosala R. and Blockeel H., “Web Mining Research: A Survey,” SIGKDD Explorations: Newsletter of the Special Interest Group on Knowledge Discovery & Data Mining, ACM, Vol. 2, July 2000, pp. 1-15.
[32]Lewis P. A. W., Baxendale P. B. and Bennett J. L., “Statistical Discrimination of the Synonymy/Antonymy Relationship Between Words,” Journal of the ACM, Vol. 30, No. 1, 1967, pp. 20-44.
[33]Lu W. H. and Chien L. F., “Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach,” ACM Transactions on Information Systems, Vol. 22, No. 2, April 2004, pp. 242–269.
[34]Malecka J. and Rozinajova V., “An Approach to Semantic Query Expansion,” Artificial Intelligence in Theory and Practice II, Vol. 276, 2008, pp. 373-382.
[35]Miller, G. A., “WordNet: A lexical database for English,” Communications of the ACM, Vol. 38, No. 11, 1995, pp. 39-41.
[36]Mori T., Kokubu T. and Tanaka T., “Cross-lingual Information Retrieval based on LSI with Multiple Word Spaces,” Proceedings of the 2nd National Institute of Informatics Test Collection for IR Systems Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and text Summarization, 2001.
[37]Oh, S. J. and Kim J. Y., “A Hierarchical Clustering Algorithm for Categorical Sequence Data,” Information Processing Letters, Vol. 91, No. 3, 2004, pp.135-140.

[38]Okabe M. and Yamada S., “Semisupervised Query Expansion with Minimal Feedback,” IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 11, 2007, pp. 1585-1589.
[39]Palleti P., Karnick H. and Mitra P., “Personalized Web Search Using Probabilistic Query Expansion,” Proceedings of IEEE International Conferences on Web Intelligence and Intelligent Agent Technology, 2007, pp. 83-86.
[40]Paolucci M., Kawamura T., Payne T. and Sycara K., “Semantic Matching of Web Services Capabilities,” Proceedings of International Semantic Web Conference, 2002, pp. 333-347.
[41]Pawlak, Z., “Discovering Synonyms based on Frequent Termsets,” Proceedings the International Conference on Rough Sets and Intelligent Systems Paradigms, 2007, pp. 516-525.
[42]Porter, M. F., “An Algorithm for Suffix Stripping,” Program, Vol.14, No.3, 1980, pp. 130-137.
[43]Qiu Y. and Frei H. P., “Concept Based Query Expansion,” Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000, pp. 366-373.
[44]Salton G. and Buckley C., “Term-weighting Approaches in Automatic Text Retrieval,” Information Processing & Management, Vol. 24, 1988, pp. 513-523.
[45]Sanderson, M. and Croft, W. B., “Deriving Concept Hierarchies from Text,” Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp. 206-213.
[46]Shannon, C. E., “Prediction and Entropy of Printed English,” Bell System Technical, 1951, pp. 50-64.
[47]Shehata S., Karray F. and Kamel M., “Enhancing Text Retrieval Performance Using Conceptual Ontological Graph,” Proceedings of IEEE International Conference on Data Mining - Workshops, 2006, pp. 39-44.
[48]Sheinman V. and Tokunaga T., “WordSets: Finding Lexically Similar Words for Second Language Acquisition,” Recent Advances in Natural Language Processing, September 2008, pp. 500-535.
[49]Shenoy P. D., Srinivasa K. G., Venugopal K. R. and Patnaik L. M., “Evolutionary Approach for Mining Association Rules on Dynamic Databases,” Principles and Practices of Knowledge Discovery in Databases Conference on LNAI, Vol. 2637, 2003, pp. 325-336.
[50]Stumme G., Hotho A. and Berendt B., “Semantic Web Mining State of the Art and Future Directions,” Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 4, No. 2, 2006, pp. 124-143.
[51]Tata S. and Lohman G. M., “SQAK: Doing More with Keywords,” Proceedings of SIGMOD Conference, June 2008, pp. 889-902.
[52]Turney P. D., “Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL,” Proceedings of the 12th European Conference on Machine Learning, 2001, pp. 491-502.
[53]Wakabi-Waiswa P. P. and Baryamureeba V., “Extraction of Interesting Association Rules Using Genetic Algorithms,” International Journal of Computing and ICT Research, Vol.2, No. 1, June 2008, pp. 26-32.
[54]Wei J., Bressan S. and Ooi B. C., “Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results,” Proceedings of the 1st International Conference on Web Information Systems Engineering, Vol. 1, 2000, pp. 366-373.
[55]Wong, K. F. and Li, W. J., “Intelligent Chinese Information Retrieval: Why is it so Difficult?” Proceedings of the 1st Asia Digital Library Workshop, 1998, pp. 47-56.
[56]Woon Y. K., Ng W. K. and Das A., “Fast Online Dynamic Association Rule Mining,” Proceedings of the 2nd International Conference on Web Information Systems Engineering, Vol. 1, December 2001, pp. 278-287.
[57]Zhang C., Qin Z. and Yan X., “Association-Based Segmentation for Chinese-Crossed Query Expansion,“ IEEE Intelligent Informatics Bulletin, Vol. 5, No. 1, June 2005, pp. 18-25.
[58]Zhang D. and Lee W. S., “Web based Pattern Mining and Matching Approach to Question Answering,” Proceedings of Text Retrieval Conference, 2002.
[59]Zhao Y., Wang X. L. and Liu B. Q., “Query Expansion Using Trigger-based Vector Space Model,” Proceedings of the First International Conference on Alliance of Information and Referral Systems, Vol. 10, 2004, pp. 201-204.
[60]Zhu W., Xu X., Hu X., Song I. Y. and Allen R. B., “Using UMLS-based Re-weighting Terms as a Query Expansion Strategy,” Proceedings of IEEE International Conference on Granular Computing, 2006, pp. 217-222.
[61]Zhou Q. and Zheng Z., “An Intelligent Query Expansion of Searching Related Text information by keywords,” Proceedings of IEEE International Conferences on Web Intelligence, 2004, pp. 582-585.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top