跳到主要內容

臺灣博碩士論文加值系統

(54.161.24.9) 您好!臺灣時間:2022/01/17 12:31
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:賴育昇
研究生(外文):Yu-Sheng Lai
論文名稱:自然語言處理於網際網路常用問答集檢索之研究
論文名稱(外文):A Study on Natural Language Processing for Internet FAQ Retrieval
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Chung-Hsien Wu
學位類別:博士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:152
中文關鍵詞:網際網路問句比對未知詞偵測常用問答題檢索常用問答題採集類詞組單元文件分類自然語言處理
外文關鍵詞:FAQ RetrievalFAQPhrase-Like UnitNatural Language ProcessingInternetFrequently Asked QuestionsNLPUnknown Word DetectionPLUQuestion ComparisonFAQ MiningText Categorization
相關次數:
  • 被引用被引用:14
  • 點閱點閱:1515
  • 評分評分:
  • 下載下載:358
  • 收藏至我的研究室書目清單書目收藏:5
無論對於商業應用或是學術研究來説,開放領域問答系統都是一項新的挑戰。從答案產生的方式來看,問答系統的研究方法可分成兩類:(1)產生新的答案:(2)從答案庫中找尋答案。人工智慧專家們傾向於採行第一種方式,它需要知識庫的支援,以目前技術而言,知識庫僅能建立在某些特定或較小的領域。建立一個知識庫通常需要耗廢較大的人力和時間,而且不容易移植到其他領域。對於第二種方式而言,其答案庫又可分成兩種類型:一種是不包含問題的,另一種是包含問題的。任何一個資料庫都可以作為第一種答案庫,只要你能從中找到答案。常用問答集即屬於第二種答案庫,它所提供的答案都伴隨著相對應的問題,這使得問答系統可以藉由比對使用者問題和答案庫中的問題,而獲得使用者所需的答案。
以目前的技術而言,「常用問答集檢索」增加了「開放領域問答系統」的可行性。即便如此,常用問答集檢索仍存在許多困難需要克服。首先,答案庫的建立就是一個問題,如何蒐集足夠多的問答集以滿足使用者的需求?在網路蓬勃發展的今天,許多網站都設置了常用問答集。面對各種不同語言設計的網頁,本論文提出一個與網頁語言無關的方法,可以從網站採集常用問答集。
再者,大量的問答集必須被適當的分類,如此可以增加檢索的速度及準確度。因此,文件分類技術是另一個我們想要探討的主題。一個新詞的產生通常是為了描述一個特定的事件、人物、…等等,或作為一個特定領域的專有詞彙。換句話說,他們在某一特定領域出現的頻率通常會高於其他領域。本論文利用未知詞的這個特性,來改善傳統利用字、詞、N-grams為主的文件分類器的效能。
對於中文和大多數亞洲語言的電腦處理而言,未知詞問題尤其嚴重。這些語言的書寫文字中,沒有任何符號可用來標示詞彙間的斷點。這使得甚至是已知詞的標示都很困難,更別提未知詞的標示。基於一個構詞上的假設:「未知詞是由一些已知詞和未定義的字所結合而成」,本論文將介紹一個偵測未知詞的統計方法。我們提出一個公式來量度任一字串為一未知詞的可能性,並且配合一套演算法則,以找出句子中的未知詞。由於這些被找出來的未知詞並沒有長度限制,而且有些是片語或詞組,因此我們稱之為「類詞組單元」(Phrase-Like Unit, PLU)。
對於「常用問答集檢索」而言,一個直覺就是:如果使用者的問句和某些常用問答集中的問句在意義上很接近,則這些問句的答案很有可能可以用來回答該使用者的問題。因此,本論文也將探討問句與問句在意義上的比對。透過分析中文問句的類型,我們建立一組語意文法,配合中文問句的局剖析器,我們將問句解析出兩個部分:意圖區段(Intention Segment)和關鍵詞串(Keyword String)。所謂的意圖區段,指的是:「一個問句中某些組成分子的組合,在不包含問句中其他輔助或功能性子句的情況下,仍可表達該問句最直接的表層意義。」一個問句的比對將分別由這兩個部分的比對結果加以適當的權重結合而成。
Open-domain question answering is a new challenge to both commercial applications and academic research. From the viewpoint of answer generation, the research on question answering can be approached in two ways: (1) generating new answers and (2) seeking answers from a vast amount of data collection. AI experts tend to focus on the generation of new answers. This kind of approaches is now available only for developing domain-specific systems. On the other hand, the data collection for seeking answers can be subdivided into two types: question-dissociated and question-associated types. In the data collection, all text data can be viewed as question-dissociated answers. On the contrary, the question-associated answers, such as frequently asked questions (FAQ), represent that each possible answer refers to at least one question.
For today’s technology, FAQ retrieval is more feasible to answer open-domain questions. However, there are many problems to overcome still. Data collection, i.e. FAQ collection, is the first problem. How do you collect enough FAQ files for user requirements? Many Websites create and maintain FAQ pages for customer service, advertisement, etc. For various styles of Web papers, this dissertation proposes a language-independent approach to mining FAQ from the WWW.
Moreover, a large amount of FAQs should be classified appropriately to shorten the search time and to improve the accuracy. Therefore, the technology of text categorization is another topic we are interested. The generation of new words is usually for describing specific events, personages, etc. or to be proper nouns. In other words, they usually occur in a few specific domains more frequently than many other domains. Utilizing the characteristic of unknown words, we improve the performance of traditional text classifiers basing on characters, words, and N-grams.
Unlike Western languages, there are no any delimiters between written words for Chinese and most Asia languages. For these languages, unknown word problem is a serious problem. Based on the assumption that unknown words consist of words and undefined characters, this dissertation proposes a statistical approach to unknown word detection. We propose a formula to measure the likelihood that a string is an unknown word, and an algorithm to detect unknown words in sentences.
FAQ retrieval is based on an intuition that if a user query is semantically similar to the questions of some question-answer pairs, then the corresponding answers of the question-answer pairs are possibly answers to the user query. Therefore, this dissertation also probes into the comparison of two questions in semantics. By the analysis of Chinese question types, we create a set of semantic grammar. Cooperating a partial parser of Chinese questions, a question is parsed into to two parts: an intention segment and a string of keywords. The intention segment is defined as “a combination of component segments in a question that conveys the surface purpose, and need not to comprise other auxiliary or functional clauses.” By comparing their intention segments and keyword string respectively and combining their weighted results, we can obtain the result of question comparison.
誌謝 I
摘要 II
第一章 簡介 IV
第二章 預備知識 V
第三章 網際網路常用問答集的採集 VI
第四章 類詞組單元的偵測 VII
第五章 基於類詞組單元之文件分類 VIII
第六章 意圖輔助之問句比對 IX
第七章 結論與未來工作 X
Abstract XI
Contents XIII
List of Figures XIX
List of Tables XXIII
List of Abbreviations XXV
List of Notations XXVI
Chapter 1 Introduction 1
1.1 What is FAQ Retrieval? 1
1.2 The TREC Question Answering Track 2
1.3 Our Problems 3
1.3.1 FAQ Mining 3
1.3.2 Unknown Word Detection 4
1.3.3 Text Categorization 5
1.3.4 Question Comparison 6
1.4 Organization of this Dissertation 7
Chapter 2 Preliminaries 8
2.1 Chinese Characters and Words 8
2.2 Word Segmentation 9
2.3 N-grams 10
2.3.1 Simple N-grams 11
2.3.2 Sparse Data Problem and Smoothing Algorithms 13
2.4 Representation of Languages 15
2.4.1 Grammars 16
2.4.2 Parse Trees 17
Chapter 3 FAQ Mining from the WWW 19
3.1 Web FAQs 19
3.1.1 Web Languages 19
3.1.2 Taxonomy of Web FAQs 20
3.1.3 Domain Knowledge 24
3.2 List Detection 24
3.2.1 Terms for Our Approach 25
3.2.2 What is a List? 30
3.2.3 Measurement of the Similarity between two RTCSs 31
3.2.4 List Detection Algorithm 32
3.3 Experimental Results 33
3.4 Summary 34
Chapter 4 Detection of Phrase-Like Unit 35
4.1 Formation of Unknown Words 36
4.2 Phrase-Like Unit 38
4.2.1 PLU-Based Likelihood Ratio 38
4.2.2 Definition of PLU 39
4.2.3 Reduction of Computational Complexity 40
4.3 PLU Detection from Sentences 41
4.3.1 Chinese Word Segmentation 42
4.3.2 Candidate PLU Extraction 43
4.3.3 PLU Selection 44
4.4 Experimental Results 46
4.4.1 Corpus 46
4.4.2 Experiments 46
4.5 Summary 51
Chapter 5 PLU-Based Text Categorization 52
5.1 Approaches to Text Categorization 52
5.2 Overview of the Proposed Approach 53
5.3 Term Extraction and Selection 56
5.3.1 PLU Extraction 56
5.3.2 Further Purification 57
5.3.3 Discriminative Term Selection 60
5.4 Indexing and Classification 63
5.4.1 The Indexing Machine 63
5.4.2 Indexing Method and Classification Function 64
5.5 Experimental Results 68
5.5.1 Corpus 68
5.5.2 Performance Evaluation 69
5.5.3 Meaningful Terms vs. Practicability 81
5.5.4 Distribution of Category Frequencies 84
5.5.5 Consistency between Training and Testing Data 86
5.6 Discussion 87
5.6.1 Meaningful Term Extraction 88
5.6.2 Discriminative Term Selection 88
5.6.3 Term Adaptation 89
5.7 Summary 90
Chapter 6 Intention-Aided Question Comparison 91
6.1 Intention Extraction 91
6.1.1 Intention 92
6.1.2 Grammatical Forms of Chinese Questions 93
6.1.3 Semantic Grammar 96
6.1.4 Intention Segments in Various Questions 99
6.2 Semantic Comparison of Chinese Words 104
6.2.1 A Brief Overview of HowNet 104
6.2.2 Feature Similarities 106
6.2.3 A Broad-Sense Thesaurus 110
6.3 Overall Matching Task 113
6.3.1 Question Matcher 115
6.3.2 Answer Matcher 119
6.3.3 Scoring Strategy 120
6.4 Experimental Results 122
6.4.1 Experimental Data 122
6.4.2 Evaluation Metrics 124
6.4.3 Adjusting the Depth Effect Factor 126
6.4.4 Comparative Performance Evaluation 127
6.4.5 Out-of-Grammar Ratio vs. Performance 130
6.5 Summary 132
Chapter 7 Conclusion and Future Work 134
Appendix A Some Experimental Results of PLU-Based Word Segmentation 138
A.1 Baseball 138
A.2 Baseketball 139
Bibliography 140
作者簡歷 (Author’s Biographical Notes) 150
著作 (Publications) 151
Aho, A.V. and Corasick, M. J. (1975). Efficient String Matching: An Aid to Bibliographic Search, Communications of the ACM, 18(6), 333-340.
Aoe, J. I. (1989). An Efficient Implementation of Static String Pattern Matching Machines, IEEE Transactions on Software Engineering, 15(8), 1010-1016.
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval, 1st Ed., ACM Press, New York.
Beigi, H. S. M. (1993). An Overview of Handwriting Recognition, In Proceedings of the 1st Annual Conference on Technological Advancements in Developing Countries, Columbia University, New York, 30-46.
Benkhalifa, M. and Bensaid, A. (1999). Text Categorization Using the Semi-Supervised Fuzzy c-Means Algorithm, In Proceedings of the 18th International Fuzzy Information Conference of North American, New York, NY, 561-565.
Bordogna, G. and Pasi, G. (1999). Fuzzy Rule Based Information Retrieval, In Proceedings of the 18th International Conference of the North American Fuzzy Information, 585-589.
Bray, T., Paoli, J., Sperberg-McQueen, C. M. and Maler, E. (2000). eXtensible Markup Language (XML) 1.0, Second Edition, W3C Recommendation.
Burke, R. D., Hammond, K. J., Kulyukin, V. A., Lytinen, S. L., Tomuro, N. and Schoenberg, S. (1997). Question Answering from Frequently-Asked Question Files Experiences with the FAQ Finder System, Technical Report TR-97-05, 1-38.
Chang, C. Y. (1997). A Discourse Analysis of Questions in Mandarin Conversation, M.A. Dissertation, National Taiwan University Graduate Institute of Linguistics, 16-81.
Chang, J. S., Chen, S. D., Ker, S. J., Chen, Y. and Liu, J. (1994). A Multiple-Corpus Approach to Recognition of Proper Names in Chinese Text, Computer Processing of Chinese and Oriental Languages, 8(1), 75-85.
Chang, L. L. (1993). The Modality Words in Modern Mandarin, CKIP Technical Report, no. 93-06, 1-16.
Chen, H. H., Ding, Y. W., Tsai, S. C., and Bian, G. W. (1998). Description of the NTU System Used for MET2, In Proceedings of the 7th Message Understanding Conference.
Chen, H. H. and Lee, J. C. (1994). The Identification of Organization Names in Chinese Texts, Communication of COLIPS, 4(2), 131-142.
Chen, H. H. and Lee, J. C. (1996). Identification and Classification of Proper Nouns in Chinese Texts, In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, 222-229.
Chen, K. J. and Bai, M. H. (1998). Unknown Word Detection for Chinese by a Corpus-based Learning Method, Computer Linguistics and Chinese Language Processing, 3(1), 27-44.
Chen, S. F. and Goodman, J. (1996). An Empirical Study of Smoothing Technologies for Language Modeling, In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, 310-318.
Cho, J., Garcia-Molina, H. and Page, L. (1998). Efficient Crawling through URL ordering, In Proceedings of the 7th International World Wide Web Conference (WWW 98), Brisbane, Australia, 161-172.
Chomsky, N. (1957). Syntactic Structures, The Hague: Mouton.
Church, K. W. and Gale, W. A. (1991). A Comparison of the Enhanced Good-Turing and Deleted Estimation Methods for Estimating Probabilities of English Bigrams, Computer Speech and Language, 5, 19-54.
Church, K. W., Gale, W. A. and Kruskal, J. B. (1991). Appendix A: the Good-Turing Theorem, In Computer Speech and Language [Church and Gale 1991], 19-54.
Clarkson, P. R. (1999). Adaptation of Statistical Language Models for Automatic Speech Recognition, Ph.D. Dissertation of University of Cambridge.
Cohen, W. W. (1995). Fast Effective Rule Induction, In Proceedings of the 12th International Conference on Machine Learning, Tarragona, Spain, 115-123.
Cohen, W. W. (1996). Learning Trees and Rules with Set-Valued Features, In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), Portland, OR, 709-716.
Cohen, W. W. and Singer, Y. (1999). Context-Sensitive Learning Methods for Text Categorization, ACM Transactions on Information Systems, 71(2), 141-173.
Earley, J. (1970). An Efficient Context-free Parsing Algorithm, Communications of the ACM, 6(8), 451-155.
Farkas, J. (1995). Towards Classifying Full-Text Using Recurrent Neural Networks, In Proceedings of the 1995 Canadian Conference on Electrical and Computer Engineering, 511-514.
Farkas, J. (1996). Improving the Classification Accuracy of Automatic Text Processing Systems Using Context Vectors and Back-Propagation Algorithms, In Proceedings of the 1996 Canadian Conference on Electrical and Computer Engineering, 696-699.
Francis, B., Homer, A. and Ullman, C. (1999). IE 5 Dynamic HTML Programmer’s Reference, 1st ed., Wrox Press Ltd.
Freund, Y., Schapire, R. E., Singer, Y. and Warmuth, M. K. (1997). Using and Combining Predictors that Specialize, In Proceedings of the 29th Annual ACM Symposium on the Theory of Computing, El Paso, TX, 334-343.
Goldfarb, C. (1990). The SGML Handbook, Oxford University Press, Oxford.
Good, I. J. (1953). The Population Frequencies of Species and the Estimation of Population Parameters, Biometrika, 40, 16-264.
Grosz, B. J. and Sidner, C. L. (1986). Attention, Intentions, and the Structure of Discourse, Computational Linguistics, 12(3), 175-204.
Guyon, I. and Pereira, F. (1995). Design of a Linguistic Postprocessor using Variable Memory Length Markov Models, In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, Canada, 14-16.
Hammond, K., Burke, R., Martin, C. and Lytinen, S. (1995). FAQ Finder: A Case-Based Approach to Knowledge Navigation, In Proceedings of the Conference on Artificial Intelligence for Applications, 80-86.
Han, J. J., Choi, J. H., Park, J. J., Yang, J. D. and Lee, J. K. (1998). An Object-Based Information Retrieval Model: Toward the Structural Construction of Thesauri, In Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries, 117-125.
Hayashi, Y. and Mochizuki, H. (1999). An Efficient Method of Determining Relationships among Compound Keywords Using Machine-AC, In Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages (IRAL’99), 91-96.
Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages, and Computation, Addison-Wesley Publishing Company.
Horstmann, K. H. and Levine, S. (1996). Effect of a Word Prediction Feature on User Performance, Augmentative and Alternative Communication, 12, 155-168.
Introna, L. and Nissenbaum, H. (2000). Defining the Web: The Politics of Search Engines, Computer, 33, 54-62.
ISO 8613 (1989). Office Document Architecture (ODA) and Interchange Format.
ISO 8879 (1986). Information Processing – Text and Office Systems – Standard Generalized Markup Language (SGML).
Ittner, D. J., Lewis, D. D. and Ahn, D. D. (1995). Text Categorization of Low Quality Images, In Proceedings of the Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, April, 301-315.
Jacobs, P. S. (1993). Using Statistical Methods to Improve Knowledge-Based News Categorization, IEEE Expert, 8(2), 13-23.
Jelinek, F. and Mercer, R. L. (1980). Interpolated Estimation of Markov Source Parameters from Sparse Data, In Proceedings of the Workshop on Pattern Recognition in Practice, 381-397.
Jo, T. C. (1999). Text Categorization with the Concept of Fuzzy Set of Informative Keywords, In Proceedings of the 1999 IEEE International Conference on Fuzzy Systems, 609-614.
Joachims, T. (1997). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, In Proceedings of the 14th International Conference on Machine Learning (ICML’97), 143-151.
Jurafsky, D. and Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, Upper Saddle River, New Jersey.
Katz, S. M. (1987). Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400-401.
Kazakov, D. and Manandhar, S. (2001). Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming, Machine Learning, 43(1), 121-162.
Khan, I. and Card, H. C. (1997). Personal Adaptive Web Agent: A Tool for Information Filtering, In Proceedings of the 1997 Canadian Conference on Electrical and Computer Engineering, 305-308.
Kim, J. T. and Moldovan, D. I. (1995). Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction, IEEE Transactions on Knowledge and Data Engineering, 7(5), 713-724.
Kit, C., Liu, Y. and Liang, N. (1989). On Methods of Chinese Automatic Word Segmentation, Journal of Chinese Information Processing, 3(1), 13-20.
Kraft, D. H., Chen, J. and Mikulcic, A. (2000). Combining Fuzzy Clustering and Fuzzy Inferencing in Information Retrieval, In Proceedings of the 9th IEEE International Conference on Fuzzy Systems, 1, 375-380.
Krulee, G. K. (1991). Computer Processing of Natural Language, Prentice-Hall, Englewood Cliffs, N.J.
Lam, S. L. Y. and Lee. D. L. (1999). Feature Reduction for Neural Network Based Text Categorization, In Proceedings of the 6th International Conference on Database Systems for Advanced Applications, 195-202.
Larkey, L. S. and Croft, W. B. (1996). Combining Classifiers in Text Categorization, In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-96), 289-297.
Lee, J. C., Lee, Y. S. and Chen, H. H. (1994). Identification of Personal Names in Chinese Texts, In Proceedings of Research on Computational Linguistics Conference VII (ROCLING VII), Taiwan, 203-222.
Lesher, G. W., Moulton, B. J. and Higginbotham, D. J. (1999). Effects of Ngram Order and Training Text Size on Word Prediction, In Proceedings of the RESNA’99 Annual Conference, 52-54.
Lewis, D. D. (1992). Representation and Learning in Information Retrieval, Ph.D. Dissertation, Department of Computer Science, University of Massachusetts, Amherst, MA.
Lewis, D. D. and Catlett, J. (1994). Heterogeneous Uncertainty Sampling for Supervised Learning, In Proceedings of the 11th International Conference on Machine Learning, New Brunswick, NJ, July, 148-156.
Lewis, D. D. and Gale, W. (1994). Training Text Classifiers by Uncertainty Sampling, In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR-9), Dublin, Ireland, July, 3-12.
Lewis, D. D., Schapire, R., Callan, J. P. and Papka, R. (1996). Training Algorithms for Linear Text Classifiers, In Proceedings of the 19th Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR-96), Zurich, Switzerland, August, 298-306.
Li, B. I., Lin, S., Sun, C. F. and Sun, M. S. (1991). A Maximal Matching Automatic Chinese Word Segmentation Algorithm Using Corpus Tagging for Ambiguity Resolution, In Proceedings of the Research on Computational Linguistics Conference, 135-146.
Li, C. N. and Thompson, S. A. (1981). Mandarin Chinese: A Functional Reference Grammar, University of California Press, 520-563.
Li, X. Q. and King, I. (1999). Gaussian Mixture Distance for Information Retrieval, In Proceedings of the International Joint Conference on Neural Networks, 4, 2544-2549.
Li, Y. and Jain, A. K. (1998). Classification of Text Documents, In Proceedings of the 4th International Conference on Pattern Recognition, 1295-1297.
Lu, C., Lee, K. H. and Chen, H. Y. (1995). TheSys – A Comprehensive Thesaurus System for Intelligent Document Analysis and Text Retrieval, In Proceedings of the 3rd International Conference on Document Analysis and Recognition, 2, 1169-1173.
Mangu, L. (1997). Hierarchical Topic-Sensitive Language Models for Automatic Speech Recognition, Technical Report, CS. Dept. Johns Hopkins University.
Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing, The MIT Press, 296-303.
May, A. D. (1997). Automatic Classification of E-mail Message by Message Type, Journal of the American Society for Information Science, 32-39.
Murthy, K. R. K. and Keerthi, S. S. (1999). Context Filters for Document-Based Information Filtering, In Proceedings of the 5th International Conference on Document Analysis and Recognition, 709-712.
Newell, A., Langer, S. and Hickey, M. (1998). The Role of Natural Language Processing in Alternative and Augmentative Communication, Natural Language Engineering, 4(1), 1-16.
Nie, J. Y., Hannan, M. L. and Jin, W. (1995). Unknown Word Detection and Segmentation of Chinese Using Statistical and Heuristic Knowledge, Communications of COLIPS, 5(1), 47-57.
Oja, E. (1983). Subspace Methods of Pattern Recognition, Wiley.
Pearson, B. L. (1977). Introduction to Linguistic Concepts, Knopf, New York.
Pemberton, S., Altheim, M., Austin, D., Boumphrey, F., Burger, J., Donoho, A. W., Dooley, S., Hofrichter, K., Hoschka, P., Ishikawa, M., Kate, W. ten, King, P., Klante, P., Matsui, S., McCarron, S., Navarro, A., Nies, Z., Raggett, D., Schmitz, P., Schnitzenbaumer, S., Stark, P., Wilson, C., Wugofski, T. and Zigmond, D. (2000). XHTMLTM 1.0: The Extensible HyperText Markup Language, W3C Recommendation.
Raggett, D., Hors, A. L. and Jacobs, I. (1998). HTML 4.0 Specification, W3C Recommendation, 24-26.
Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy, In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1, 448-453.
Ritchie, G. D., Russel, G. J., Black, A. W. and Pulman, S. G. (1992). Computational Morphology: Practical Mechanisms for the English Lexicon, MIT, London.
Rocchio, J. J. (1971). Relevance Feedback Information Retrieval, In The Smart Retrieval System – Experiments in Automatic Document Processing, Prentice-Hall, Englewood Cliffs, NJ, 313-323.
Salton, G. and Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24(5), 513-523.
Sasaki, M. and Kita, K. (1998). Rule-Based Text Categorization Using Hierarchical Categories, In Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, 3, 2827-2830.
Schäuble, P. (1997). Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases, Kluwer Academic Publishers, 49-59.
Schütze, H., Hull, D. A. and Pedersen, J. O., A Comparison of Classifiers and Document Representations for the Routing Problem, In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-95), 229-237.
Shie, G. P. (1995). An Introduction to Linguistics, Sanmin, 121-144.
Sproat, R. and Shih, C. (1990). A Statistical Method for Finding Word Boundaries in Chinese Text, Computer Processing of Chinese and Oriental Languages, 4(4), 336-351.
Srihari, R. and Li, W. (2000). A Question Answering System Supported by Information Extraction, In Proceedings of the ANLP-NAACL 2000, 1, 166-172.
Sun, M. S., Huang, C. N., Gao, H. Y. and Fang, J. (1994). Identifying Chinese Names in Unrestricted Texts, Communication of COLIPS, 4(2), 113-122.
Syu, I., Lang, S. D. and Deo, N. (1996). A Neural Network Model for Information Retrieval Using Latent Semantic Indexing, In Proceedings of the IEEE International Conference on Neural Networks, 2, 1318-1323.
Teahan, W. J., Wen Y., McNab, R. and Witten, I. H. (2000). A Compression-Based Algorithm for Chinese Word Segmentation, Computational Linguistics, 26(3), 375-393.
Tsang, T. F., Luk, R. W. P. and Wong, K. F. (1999). A Hybrid Indexing Strategy Using Words and Bigrams, In Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages, 112-117.
Tsay, J. J. and Wang, J. D. (1999). Term Selection with Distributional Clustering for Chinese Text, In Proceedings of the Research on Computational Linguistics Conference XII (ROCLING XII), 151-170.
Voorhees, E. M. (2000). The TREC-8 Question Answering Track Report, In Proceedings of the Eighth Text REtrieval Conference (TREC-8). (Electronic version is available at http://trec.nist.gov/pubs.html.)
Witten, I. H. and Bell, T. C. (1991). The Zero-Frequency Problem: Estimating the Probabilities of Novel Events in Adaptive Text Compression. IEEE Transactions on Information Theory, 37(4), 1085-1094.
Yang, Y. and Pedersen, J. O. (1997). A Comparative Study on Feature Selection in Text Categorization, In Proceedings of the 14th International Conference on Machine Learning (ICML’97), 412-420.
Zhou, Q. and Feng, S. (2000). Build a Relation Network Representation for How-net, In Proceedings of International Conference on Multilingual Information Processing, 139-145.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊