跳到主要內容

臺灣博碩士論文加值系統

(3.235.227.117) 您好!臺灣時間:2021/07/28 02:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:禹良治
研究生(外文):Liang-Chih Yu
論文名稱:應用語意相依關係及超空間模擬語言模型於網頁文本探勘及資訊檢索之研究
論文名稱(外文):A Study on Semantic Dependencies and HAL Modeling for Web Text Mining and Information Retrieval
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Chung-Hsien Wu
學位類別:博士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
畢業學年度:96
語文別:英文
論文頁數:87
中文關鍵詞:文件探勘資訊檢索自然語言處理
外文關鍵詞:Information RetrievalText MiningNatural Language Processing
相關次數:
  • 被引用被引用:1
  • 點閱點閱:460
  • 評分評分:
  • 下載下載:136
  • 收藏至我的研究室書目清單書目收藏:3
資訊檢索之目的在幫助使用者快速且有效地搜尋有用的資訊。傳統的資訊檢索系統通常使用字袋(Bag-of-words)表示法來處理使用者查詢及文件,因此在檢索時僅使用文字層面的資訊,而忽略了較高層次的結構化資訊。然而,文件在結構上通常包含許多主題(Topics),這些主題資訊有助於深入理解使用者的查詢需求,進而達到更精確的檢索結果。因此,本論文提出使用文本探勘方法擷取使用者查詢及文件中的主題資訊,並利用所擷取的主題資訊來提升檢索之精確度。

本論文以中文網路精神科文件作為實驗語料。精神科文件係由網路使用者提出的憂鬱問題及相對應的專家建議所組成。這些文件主要包含三項憂鬱症相關之主題資訊,如:負面生活事件、憂鬱症狀及症狀之間的關係。本論文之目的即在實現精神科文件檢索系統,使其具備憂鬱主題分析功能,以協助使用者快速找到與其憂鬱問題相關的文件。

上述三項主題皆使用不同的方法判斷之。對於憂鬱症狀來說,其在文字的表現上通常由單一句子或多句所構成,因此本論文提出使用語意相依關係(Semantic Dependencies)來分析文句之語意結構,逐句判斷其所包含的憂鬱症狀;憂鬱症狀之間的關係,如:因果關係與時間先後關係,係使用領域知識本體(Domain Ontology)來判斷;負面生活事件係由語意樣式(Semantic Pattern)所構成,語意樣式定義為在語意上可表示負面生活事件之字詞組合,因此本論文結合超空間模擬語言模型(Hyperspace Analog to Language, HAL)及演化式計算方法自動從未標記之網路精神科文件擷取語意樣式。

最後本論文提出一檢索模型,能根據使用者查詢與文件所包含的負面生活事件、憂鬱症狀及症狀之間的關係來計算兩者之相似度。在實驗評估上,本論文使用以文字為基礎之檢索模型如:向量空間模型(Vector Space Model, VSM)及Okapi模型為比較對象,實驗結果顯示考慮主題資訊可達到較為精確的檢索結果。
Information retrieval (IR) attempts to retrieve documents relevant to a user’s query from a large collection of documents. Instead of using keyword-based approaches, recent IR systems have been presented to enable natural language queries. Users can thus express their information needs naturally. These systems usually adopt a bag-of-words approach to represent a query and a document, which means that they can exploit only word-level information during the retrieval process. The high-level structural information in documents is often neglected. However, a document is generally structured; that is, it can be characterized as a set of topics and inter-topic relations. Such topic information is beneficial for better understanding users’ information needs so as to obtain more precise retrieval results. Therefore, this dissertation proposes the use of text mining techniques to extract the topic information contained in both queries and documents to improve retrieval performance.
The experimental corpora used herein are Chinese psychiatry web resources, a large collection of psychiatric documents produced by Internet users and psychiatrists. Each psychiatric document contains a user’s depressive problems and an expert’s suggestions to alleviate the depressive problems. The psychiatric documents thus contain rich depressive-related topic information, including negative life events, depressive symptoms and semantic relations between symptoms. Therefore, this dissertation attempts to help people to efficiently and effectively locate the psychiatric documents relevant to their depressive problems according to the depressive-related topic information.
The topics are extracted using different approaches. For depressive symptoms, the information is often embedded in a single sentence or multiple sentences. This dissertation proposes a text mining framework integrating the semantic dependencies of a sentence (intra-sentence) and the strength of lexical cohesion between sentences (inter-sentence) to mine the symptoms. Once the symptoms are identified, the semantic relations such as cause-effect and temporal relations between symptoms can then be identified. This dissertation uses a domain ontology to mine the semantic relations between extracted symptoms. For negative life events, the information is often represented by meaningful patterns. A pattern refers to a semantically plausible combination of words. Therefore, this dissertation proposes a framework integrating a cognitive motivated model such as Hyperspace Analog to Language (HAL), and evolutionary computation to induce variable-length patterns from unannotated psychiatry web resources.
Finally, a retrieval model is designed to calculate the similarity between input queries and psychiatric documents by combining the similarities of negative life events, depressive symptoms and semantic relations within them. Experiments are conducted to compare the performance of the topic-aware model and conventional word-based models such as the vector space model (VSM) and Okapi model. Experimental results show that the use of topic information can provide more precise information about users’ depressive problems, thus improving the retrieval precision.
TABLE OF CONTENT ..................................................................................................... ix
LIST OF FIGURES............................................................................................................ ix
LIST OF TABLES............................................................................................................... x
CHAPTER 1. INTRODUCTION 1
1.1. Motivation 1
1.2. The Approach of this Dissertation 3
1.3. The Organization of this Dissertation 4
CHAPTER 2. FRAMEWORK OVERVIEW 5
CHAPTER 3. SEMANTIC DEPENDENCIES MODELING 7
3.1. Introduction 7
3.2. Semantic Dependency Model 12
Semantic Label Inference 15
Discourse Segment Identification 17
3.3. Semantic Relation Discovery 20
Symptom Chain Construction 22
3.4. Experimental Results 23
Experiments Setup 23
Evaluation on Semantic Label Inference 24
Evaluation on Discourse Segment Identification 26
Evaluation on Semantic Relation Discovery 28
3.5. Summarization of this Chapter 29
CHAPTER 4. HYPERSPACE ANALOG TO LANGUAGE (HAL) MODEL 30
4.1. Introduction 31
4.2. HAL Space Construction 36
4.3. Evolutionary Inference Algorithm 39
Individuals Representation 40
Initial Population 41
Fitness Function 42
Variation Operators 44
Relevance Feedback 45
4.4. Association Pattern Mining 47
Find frequent word sets 47
Generate association patterns from frequent word sets 48
4.5. Experimental Results 49
Parameter Setting 51
Effect of Relevance Feedback 54
Comparative Evaluation 57
Discussion 60
4.6. Summarization of this Chapter 62
CHAPTER 5. RETRIEVAL MODEL 63
5.1. Similarity of events and symptoms 63
5.2. Similarity of relations 64
5.3. Experimental Results 68
Experiment setup 68
Retrieval results 72
5.4. Summarization of this Chapter 74
CHAPTER 6. CONCLUSION AND FUTURE WORKS 75
REFERENCE 77
[Agrawal and Srikant 1994] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 487–499, 1994.
[Ahlgrena and Grönqvist 2008] P. Ahlgrena and L. Grönqvist, “Evaluation of Retrieval Effectiveness with Incomplete Relevance Data: Theoretical and Experimental Comparison of Three Measures,” Information Processing & Management, vol. 44, no. 1, pp. 212-225, 2008.
[Atkinson-Abutridy et al. 2003] J. Atkinson-Abutridy, C. Mellish, and S. Aitken, “A Semantically Guided and Domain-Independent Evolutionary Model for Knowledge Discovery From Texts,” IEEE Trans. Evolutionary Computation, vol. 7, no. 6, pp. 546-560, 2003.
[Au et al. 2003] W. H. Au, K. C. C. Chan, and X. Yao, “A Novel Evolutionary Data Mining Algorithm with Applications to Churn Prediction,” IEEE Trans. Evolutionary Computation, vol. 7, no. 6, pp. 532-545, 2003.
[Baeza-Yates and Ribeiro-Neto 1999] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison-Wesley, Reading, MA, 1999.
[Bai et al. 2001] Y. M. Bai, C. C. Lin, J. Y. Chen, and W. C. Liu, “Virtual Psychiatric Clinics,” American Journal of Psychiatry, vol. 158, no. 7, pp. 1160-1161, 2001.
[Barzilay and Lapata 2008] R. Barzilay and M. Lapata, “Modeling Local Coherence: An Entity-based Approach,” Computational Linguistics, vol. 34, no. 1, pp. 1-34, 2008.
[Bompada et al. 2007] T. Bompada C. C. Chang, J. Chen, R. Kumar, and R. Shenoy, “On the Robustness of Relevance Measures with Incomplete Judgments,” in Proc. of the 30th annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 359-366, 2007.
[Bradley and Lang 1999] M.M. Bradley and P. J. Lang, “Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings,” Technical Report C-1, Center for Research in Psychophysiology, University of Florida, 1999.
[Brostedt and Pedersen 2003] E. M. Brostedt and N. L. Pedersen, “Stressful Life Events and Affective Illness,” Acta Psychiatrica Scandinavica, vol. 107, no. 3, pp. 208-215, 2003.
[Burstein et al. 2003] J. Burstein, D. Marcu and K. Knight, “Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays,” IEEE Intelligent Systems, vol. 18, no. 1, 2003, pp. 32-39.
[Cancedda et al. 2003] N. Cancedda, E. Gaussier, C. Goutte, and J. M. Renders, “Word-Sequence Kernels,” Journal of Machine Learning Research, vol. 3, no. 6, pp. 1059-1082, 2003.
[Chan 2004] S.W.K. Chan, “Extraction of Salient Textual Patterns: Synergy between Lexical Cohesion and Contextual Coherence,” IEEE Transactions on Systems, Man and Cybernetics, Part A, vol. 34, no. 2, 2004, pp. 205-218.
[Chang and Chen 2006] Y. C. Chang and S. M. Chen, “A New Query reweighting Method for Document Retrieval Based on Genetic Algorithms,” IEEE Trans. Evolutionary Computation, vol. 10, no. 5, pp. 617-622, 2006.
[Chen et al. 2001] K. J. Chen, C. R. Huang, F. Y. Chen, C.C. Luo, M.C. Chang and C.J. Chen, “Sinica Treebank: Design Criteria, Representational Issues and Implementation,” In Anne Abeille, editor, Building and Using Syntactically Annotated Corpora, Kluwer, 2001, pp. 29-37.
[Chien 2006] J. T. Chien and M. S. Wu, “Association Pattern Language Modeling,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 5, pp. 1719-1728, 2006.
[Chien and Wu 2008] J. T. Chien, “Adaptive Bayesian Latent Semantic Analysis,” IEEE Trans. Audio, Speech, and Language Processing, vol. 16, no. 1, pp. 198-207, 2008.
[Coelho et al. 2004] T. A. S. Coelho, P. Calado, L. V. Souza, B. Ribeiro-Neto, and R. Muntz, “Image Retrieval Using Mul-tiple Evidence Ranking,” IEEE Trans. Knowledge Data Engeneering, vol. 16, no. 4, pp. 408-417, 2004.
[Devitt and Ahmad 2007] A. Devitt and Khurshid Ahmad, “Sentiment Polarity Identification in Financial News: A Cohesion-based Approach,” In Proceedings of the 45th Annual Meeting of the ACL, 2007, pp. 984-991.
[Fellbaum 1998] C. Fellbaum, WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.
[Fisher and Roark 2007] S. Fisher and B. Roark, “The utility of parse-derived features for automatic discourse segmentation,” In Proceedings of the 45th Annual Meeting of the ACL, 2007, pp. 488-495.
[Gao and Suzuki 2003] J. F. Gao and H. Suzuki, “Unsupervised Learning of Dependency Structure for Language Modeling,” In Proceedings of the 41st Annual Meeting of the ACL, 2003, pp. 521-528.
[Hamilton 1960] M. Hamilton, “A Rating Scale for Depression,” Journal of Neurology, Neurosurgery and Psychiatry, vol. 23, pp. 56-62, 1960.
[Han and Kamber 2001] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmamn publishers, 2001.
[He and Ounis 2007] B. He and I. Ounis, “Combining Fields for Query Expansion and Adaptive Query Expansion,” Information Processing & Management, vol. 43, no. 5, pp. 1294-1307, 2007.
[Hobbs 1985] J.R. Hobbs, “On the Coherence and Structure of Discourse,” Report No. CSLI-85-37, Center for the Study of Language and Information, Stanford University, 1985.
[Jarvelin and Kekalainen 2000] K. Jarvelin and J. Kekalainen, “IR Evaluation Methods for Retrieving Highly Relevant Documents,” in Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41-48, 2000.
[Jarvelin and Kekalainen 2002] K. Jarvelin and J. Kekalainen, “Cumulated Gain-based Evaluation of IR Techniques,” ACM Trans. Information Systems, vol. 20, no. 4, pp. 422-446, 2002.
[Kim and Moldovan 1995] J. T. Kim and D. I. Moldovan, “Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction,” IEEE Trans. Knowledge and Data Engineering, vol. 7, no. 5, pp. 713-724, 1995.
[Kullback 1959] S. Kullback, Information Theory and Statistics. New York: John-Wiley & Sons, 1959.
[Landauer et al. 1998] T. K. Landauer, P. W. Foltz, and D. Laham, “An introduction to latent semantic analysis,” Discourse Processes, vol. 25, no. 2&3, pp. 259-284, 1998.
[Lau et al. 2008] R. Y. K. Lau, P. D. Bruza, and D. Song, “Towards a Belief-revision-based Adaptive and Context-Sensitive Information Retrieval System,” ACM Trans. Information Systems, vol. 26, no. 2, pp. 8-38.
[Lehnert et al. 1992] W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, E. Riloff, and S. Soderland, “University of Massachusetts: Description of the CIRCUS System used for MUC-4,” Proc. Fourth Message Understanding Conference (MUC-4), pp. 282-288, 1992.
[Leroy and Chen 2001] G. Leroy and H. Chen, “Meeting medical terminology needs-the Ontology-Enhanced Medical Concept Mapper,” IEEE Trans. Information Technology Biomedicine, vol. 5, no. 4, pp. 261-270, 2001.
[Lin et al. 2003] C. C. Lin, Y. M. Bai, and J. Y. Chen, “Reliability of Information provided by Patients of a Virtual Psychiatric Clinic,” Psychiatric Services, vol. 54, no. 8, pp. 1167-1168, 2003.
[Lodhi et al. 2002] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, “Text Classification Using String Kernels,” Journal of Machine Learning Research, vol. 2, no. 3, pp. 419-444, 2002.
[Mann and Thompson 1988] W.C. Mann and S.A. Thompson, “v Structure Theory: Toward a Functional Theory of Text Organiza-tion,” Text, vol. 8, no. 3, 1988, pp. 243-281.
[Manning and Schütze 1999] C. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. Cambridge, Mass.: MIT Press, 1999.
[Michalewicz 1996] Z. Michalewicz, Genetic Algorithms + Data Structure = Evolution Programs. New York: Springer-Verlag, 1996.
[Morris et al. 2003] J. Morris, C. Beghtol and G. Hirst, “Term relationships and their contribution to text semantics and information literacy through lexical cohesion,” In Proceedings of the 31st Annual Conference of the CAIS, 2003, pp. 153-168.
[Morris and Hirst 1991] J. Morris and G. Hirst, “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text,” Computational Linguistics, vol. 17, no. 1, 1991, pp. 21-48.
[Muslea 1999] I. Muslea, “Extraction Patterns for Information Extraction Tasks: A Survey,” Proc. AAAI Workshop on Machine Learning for Information Extraction, pp. 1-6, 1999.
[Navigli et al. 2003] R. Navigli, P. Velardi, and A. Gangemi, “Ontology Learning and Its Application to Automated Terminology Translation,” IEEE Intelligent Systems, vol. 18, no. 1, pp. 22-31, 2003.
[Navigli and Velardi 2005] R. Navigli and P. Velardi, “Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1075-1086, 2005.
[Okabe et al. 2005] M. Okabe, K. Umemura, and S. Yamada, “Query Expansion with the Minimum User Feedback by Transductive Learning,” in Proc. of HLT/EMNLP, Vancouver, Canada, pp. 963-970, 2005.
[Osinski and Weiss 2005] S. Osinski and D. Weiss, “A Concept-Driven Algorithm for Clustering Search Results,” IEEE Intelligent Systems, vol. 20, no. 3, pp. 48-54, 2005.
[Pagano et al. 2004] M. E. Pagano, A. E. Skodol, R. L. Stout, M. T. Shea, S. Yen, C. M. Grilo, C. A. Sanislow, D. S. Bender, T. H. McGlashan, M. C. Zanarini, and J. G. Gunderson, “Stressful Life Events as Predictors of Functioning: Findings from the Collaborative Longitudinal Personality Disorders Study,” Acta Psychiatrica Scandinavica, vol. 110, pp. 421-429, 2004.
[Power et al. 2003] R. Power, D. Scott and N. Bouayad-Agha, “Document Structure,” Computational Linguistics, vol. 29, no. 2, 2003, pp. 211-260.
[Robertson et al. 1995] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford, “Okapi at TREC-3,” in Proc. of the Third Text REtrieval Conference (TREC-3), NIST, 1995.
[Robertson et al. 1996] S. E. Robertson, S. Walker, M. M. Beaulieu, and M. Gatford, “Okapi at TREC-4,” in Proc. of the fourth Text REtrieval Conference (TREC-4), NIST, 1996.
[Rodríguez et al. 1998] H. Rodríguez, S. Climent, P. Vossen, L. Bloksma, W. Peters, A. Alonge, F. Bertagna, and A. Roventint, “The top-down strategy for building EeuroWordNet: Vocabulary coverage, base concepts and top ontology,” Comput. Humanities, vol. 32, pp. 117–159, 1998.
[Salton and Buckley 1988] G. Salton and C. Buckley, “Term-weighting Approaches in Automatic Text Retrieval,” Information Processing Management, vol. 24, no. 5, pp. 513-523, 1988.
[Salton and McGill 1983] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
[Soderland 1999] S. Soderland, “Learning Information Extraction Rules for Semi-Structured and Free Text,” Machine Learning, vol. 34, no. 1-3, pp. 233-272, 1999.
[Stevens et al. 2002] R. Stevens, C. Goble, I. Horrocks, and S. Bechhofer, “Building a Bioinformatics Ontology Using OIL,” IEEE Trans. Information Technology Biomedicine, vol. 6, no. 2, pp. 135-141, 2002.
[Voorhees 2001] E. M. Voorhees, “Evaluation by Highly Relevant Documents,” in Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74-82, 2001.
[Voorhees and Harman 2000] E. M. Voorhees and D. K. Harman. “Overview of the Sixth Text REtrieval Conference (TREC-6),” Information Processing & Management, vol. 36, no. 1, pp. 3-35, 2000.
[Wolf and Gibson 2005] F. Wolf and E. Gibson, “Representing Discourse Coherence: A Corpus-based Analysis,” Computational Linguistics, vol. 31, no. 2, 2005, pp. 249-288.
[Wolfe and Goldman 2003] M. B. W. Wolfe and S. R. Goldman. “Use of Latent Semantic Analysis for Predicting Psychological Phenomena: Two Issues and Proposed Solutions,” Behaviour Research Methods, vol. 35, no. 1, pp. 22-31, 2003.
[Wu et al. 2006a] C. H. Wu, Z. J. Chuang, and Y. C. Lin, “Emotion Recognition from Text Using Semantic Labels and Separable Mixture Models,” ACM Trans. Asian Language Information Processing, vol. 5, no. 2, pp. 165-182, 2006.
[Wu et al. 2006b] C. H. Wu, J. F. Yeh, and Y. S. Lai, “Semantic Segment Extraction and Matching for Internet FAQ Retrieval,” IEEE Trans. Knowledge and Data Engineering, vol. 18, no. 7, pp. 930-940, 2006.
[Yeh et al. 2004] J. F. Yeh, C. H. Wu, M. J. Chen, and L. C. Yu, “Automated Alignment and Extraction of Bilingual Domain Ontology for Cross-Language Domain-Specific Applications,” in Proc. of the 20th International Conference on Computational Linguistics (COLING ‘04), Geneva, Switzerland, 2004, pp.1140-1147.
[Yeh et al. 2008a] J. F. Yeh, C. H. Wu, and M. J. Chen, “Ontology-based Speech Act Identification in a Bilingual Dialog System Using Partial Pattern Trees,” Journal of the American Society for Information Science and Technology, vol. 59, no. 5, pp. 684-694, 2008.
[Yeh et al. 2008b] J. F. Yeh, C. H. Wu, L. C. Yu, and Y. S. Lai, “Extended Probabilistic HAL with Close Temporal Association for Psychiatric Consultation Query Retrieval,” to appear in ACM Trans. Information Systems, 2008.
[Yu et al. 2008] L. C. Yu, C. H. Wu, J. F. Yeh, and F. L. Jang, “HAL-based Evolutionary Inference for Pattern Induction from Psychiatry Web Resources,” IEEE Trans. Evolutionary Computation, vol. 12, no. 2, pp. 160-170, 2008.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊