(3.231.29.122) 您好!臺灣時間:2021/02/25 15:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳俊良
研究生(外文):CHEN CHUN-LIANG
論文名稱:網際網路環境下以派樹為基礎之自然語言處理及應用
論文名稱(外文):PAT-tree-based Natural Language Processing and Applications under Internet Environment
指導教授:李琳山李琳山引用關係簡立峰簡立峰引用關係
指導教授(外文):LEE LIN-SHANCHIEN LEE-FENG
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:英文
論文頁數:140
中文關鍵詞:網際網路自然語言處理派樹文件分類線上術語抽取語言模型文字檢錯資訊檢索
外文關鍵詞:InternetNatural Language ProcessingPAT TreeDocument ClassificationOnline Term ExtractionLanguage ModelText VerificationInformation Retrieval
相關次數:
  • 被引用被引用:0
  • 點閱點閱:246
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:5
當網際網路已成為人類生活中不可或缺的一部份的同時,自然語言處理研究也一同進入了一個嶄新的時代。在我們的日常生活當中,每天不知有多少的電子資源被製造並在世界各地散佈,其速度之快,遠超乎我們想像。而這些與日俱增的電子文字資料庫正提供了自然語言處理一個絕佳的研究環境,不但可作為語料庫的來源,更反映出人們在語言使用上的動態變化。但也相對地增加了自然語言研究的重要性和困難度。因此如何有效率、有方法地從網路上擷取語言處理所須之相關資源並轉化為有用之語言知識,強化應用的廣度,便成為一個迫切而需要的重要課題。
本論文研究的目的在於嘗試提出一套足以提供上述所需的軟體架構,用以結合網路資源及自然語言處理研究。並同時在論文中實際驗證與自然語言處理相關應用結合使用的效果及可行性,其中包括資訊檢索、自然語言處理、和語音辨認等三大重要課題,如: 線上文件自動分類、線上專門術語自動抽取、錯別字自動偵測修正及語音辨認用之網路語言模型等。此一整處理套架構事實上就是由派樹家族所構成的一整套完整的資料索引結構,而此派樹家族正是洪振超君96年論文所提之派樹資料結構的一個延伸。而此一架構會以派樹家族為運作核心之主要目的在於能有效率地動態擷取,進而組織、管理及運用所有帶給語言處理好處的網路語料庫。更進一步,希望能成為網路環境下語言處理在跨課題間互動合作的一個媒介和橋樑。
純就應用性而言,本論文探討了中文和英文文件分類的問題,以派樹為核心運作結構的分類方法相較一些常用的統計方法毫不遜色,其優點是利用動態任意N連字長的特徵作為分類的依據,並可隨時動態更新及調整新的分類特徵向量。在此優點下,不但可用在線上動態分類上,並可針對每一篇分類文件抽取特殊領域之重要之術語,由此獲致之術語典可用來掌握網路動態資訊並可加用在一些自然語言處理的應用上。此外,在本論文中亦嘗試以大量相關訓練語料庫所建立之語言模型來幫助中文文件的檢錯,提出一套錯別字自動偵測及修正的方法,將此方法運用在光學辨識及語音辨識的後處理上均獲致不錯的效果。最後,在語音辨認應用上,分析一個以派樹為核心的任意N連字長的語言模型和雙連、三連、四連語言模型在混淆度(Perplexity)上的比較,並提出一個適用於網路環境下的可調適語言模型處理架構,用以增強整個語音辨認系統在網路環境下使用之語言處理部分的強健性。另就延展性而言,派樹可以結合更高層的語言訊息,如詞類或語意剖析結果。更進一步地往高階語言知識庫發展。並藉由樹狀資料結構的特性(動態增長,樹大小可隨環境調整等),結合網際網際網路資源發展成為分散式語言處理架構,是指日可待的。
上述所提此一以派樹技術為工作核心的家族成員包含有: CPAT(壓縮派樹)、EPAT(英文派樹)、WPAT(詞派樹)、 SPAT(音派樹) 及UPAT(宇派樹)等五大衍生成員。除宇派樹外,每一成員皆是以應用為導向的資料結構。但所有家族樹類其最底層的結構皆繼承自原始派樹的特性,僅視其建構索引的內容、單位及使用的環境和資源而有資料結構上的調整。至於個別的特性、使用時機和應用對象皆於本論文中有詳細的探討。基於本論文提出之以派樹為核心技術之網路環境自然語言處理應用,不但為網路資訊自動化擷取、處理、管理和應用埋下一顆可行的種子,而且也看到了以此基礎架構為核心,結合網路資源和自然語言處理的可行性。總言之,本論文也間接驗證了比爾蓋茲所說「收集、管理和使用資訊的方式決定了輸贏。」這句話的重要性。

Natural language processing (NLP) has entered an entirely new era when the Internet becomes an inseparable part of human life. In our daily life, huge amount of Internet resources are created in a rapid pace and widely disseminated all over the world. The ever-growing and dynamic abundant Internet resources not only enrich the corpora for natural language processing, but also make the research work more challenging than before. As a result, a first lesson we must learn is how to effectively use the Internet resources and transform them into useful linguistic knowledge for the purpose of breakthrough and advances in natural language processing and its related applications.
The major purpose of this thesis is trying to propose an efficient framework for natural language processing in the Internet environment, and meanwhile shows its feasibility on a number of practical and important NLP applications, such as online document classification and domain-specific term extraction, text verification, and Internet-based language model for speech recognition,...etc. The proposed framework is based on the PAT-tree-based working structures, which are motivated by the previous work of Hung in 1996 [Hung'96]. The main purpose of taking the PAT-tree-based data structure as the working structures of this framework is to efficiently and dynamically manage, organize, and then utilize the Internet resources as the useful corpora for linguistic processing and many related applications. At the same time, such a research framework is also pursued to serve as the bridge for interdisciplinary interactions among different NLP fields associated with the Internet.
As for the practical applications, a PAT-tree-based classification approach is proposed to server as a "live" discriminator or learning device in both the Chinese and English document classifications. As compared with the existing statistical approaches, the proposed approach is almost at the same performance level without too much effort put on weight tuning. The advantages of using PAT-tree data structure is its dynamic adaptation on variable N-gram feature vectors when an incoming document is dispatched to its most promising category. Simultaneously, a set of analysis methods, including statistical or morphological rules, will be applied to extract the new domain-specific terms in the incoming document. The incrementally extracted domain-specific lexicons may be very useful for capturing the dynamic properties of language usage over the Internet or for NLP-related applications. In addition, training from a large and related corpus, a PAT-tree-based language model was built to serve as a text verifier in the post-processing of the Chinese OCR-ed document and the output of the mandarin speech recognition. An outstanding performance in OCR-ed text verification was obtained, even better than the commercialized product. At last, a perplexity experiment was performed to analyze the improvement space between low-level and high-level variable N-gram language models. Furthermore, a distributed client-server architecture is proposed to enhance the robustness of language processing under dynamic network environment so that speech-interfaced applications can work well in the Internet environment. As for scalability, PAT tree can combine high level linguistic information, such as syntactic or semantic information, to direct the way toward knowledge base. With the merits of PAT tree properties (dynamic adaptation, can be easily disseminated ,…and so on), it is very promising to develop a distributed powerful linguistic processing framework in the foreseeable future.
The mentioned PAT-tree-based core technology involves a set of members in the PAT-tree family, including five task-oriented tree structures: CPAT (Compact PAT tree), EPAT (English PAT tree), WPAT (Word-based PAT tree), SPAT (Syllable PAT tree) and UPAT (Universal PAT tree). Except UPAT, all of them are derived from the basic PAT tree data structure and variations are made according to different usage and environment. The detailed introduction about each type of data structures and its versatility in different applications is presented in the following chapters. With the PAT-tree-based core technology developed in this thesis, it has shown the possibility for Internet information management and processing and as well as the feasibility on the preliminary framework toward natural language processing in the Internet era.

Chapter 1: Introduction and Background Review 1
1.1 THE CONSIDERING PROBLEMS3
1.1.1 Internet Era3
1.1.2 Information Retrieval4
1.1.3 Natural Language Processing5
1.1.4 Speech Recognition6
1.1.5 Interdisciplinary Interactions7
1.2 SUMMARY OF THE THESIS7
Chapter 2: PAT Tree Data Structure and Its Extensions 11
2.1 EVOLUTION OF PAT TREE11
2.2 PAT TREE DATA STRUCTURE15
2.3 DELETION FUNCTION18
2.4 PAT TREE FAMILY20
Chapter 3: Automatic On-line Term Extraction and Document Classification.25
3.1 DOCUMENT CLASSIFICATION25
3.1.1 Overview of the proposed approach25
3.1.2 Data Sets27
3.1.3 EPAT Tree Data Structure28
3.1.4 Approaches and Experimental Results31
3.1.5 Discussions36
3.1.6 Conclusion40
3.2 AUTOMATIC ON-LINE TERM EXTRACTION42
3.2.1 Introduction42
3.2.2 Problem Specification and Overview of the Proposed Approach43
3.2.3 Corpus storage and classification48
3.2.4 Incremental Term Extraction52
3.2.5 Discussions66
Chapter 4: Text Verification 67
4.1 OCR-ED TEXT VERIFICATION67
4.1.1 Introduction68
4.1.2 CPAT Tree Data Structure69
4.1.3 Proposed Verification Approach76
4.1.4 Experimental Results79
4.1.5 System Overview83
4.2 MANDARIN DICTATION OUTPUT VERIFICATION85
4.2.1 Introduction85
4.2.2 SPAT Tree Data Structure87
4.2.3. Proposed Verification Approach89
4.2.4 Experimental Results94
4.2.5 Discussions96
Chapter 5: Speech Recognition Applications 98
5.1 STATISTICAL N-GRAM LANGUAGE MODELING98
5.2 PERPLEXITY EXPERIMENTS101
5.2.1 Experimental Environment102
5.2.2 Experimental Results104
5.3 SEARCH STRATEGY107
5.3.2 Lattice Search Using Multiple Stacks113
5.3.3 Performance Evaluation115
5.3.4 Advanced Speed-up Approaches116
5.4 DISTRIBUTED CLIENT-SERVER ARCHITECTURE FOR SPEECH RECOGNITION APPLICATIONS OVER THE INTERNET119
5.5 CONCLUSION AND DISCUSSIONS122
Chapter 6: Conclusion and Prospections 124
6.1 INTEGRATED FRAMEWORK FOR NLP APPLICATIONS OVER THE INTERNET125
6-2 TOWARD THE UNIVERSAL PAT TREE128
6-3 CONCLUDING REMARKS129
Bibliography 131

[Ahmad’87] Ahmad Khurshid,. Rogers Margaret. & Thomas Patrica. “Term Banks: A Case Study in Knowledge Representation and Development”, In Terminology and Knowledge Engineering: selected papers from the First International Congress on Terminology and Knowledge Engineering, 1987.
[Ahmad’95a] Ahmad, K., “Research Issues in Terminology” (A deliverable of the EC-sponsored POINTER Project.) 28 pages, 1995.
[Ahmad’95b] Ahmad, K. Keynote Address — “Workbenches and the Engineering of Special Languages” In Terminology in Advanced Microcomputer Applications: Proc. of the 3rd TermNet Symposium; Recent Advances and User Reports. TermNet: Vienna, Austria. pp.7-52. (ISBN 3-901010-12-2),1995.
[Bai’98a] Bo-Ren Bai, “Speech/Text Information Retrieval with Speech Queries for Mandarin Chines ”, Ph.D dissertation, department of electrical engineering of national Taiwan university, 1998.
[Bai’98b] Bo-Ren Bai, Chun-Liang Chen, Lee-Feng Chien, Lin-Shan Lee, "Intelligent Retrieval of Dynamic Networked Inforamtion from Mobil Terminals Using Spoken Natural Language Queries", IEEE Transactions on Consumer Electronics, pp. 62-72,Vol 44, No. 1, February 1998.
[Barbosa’95] E. F. Barbosa, G. Navarro, R. Baeza-Yates, C. Perleberg, and N. Ziviani “Optimized binary search and text retrieval”, In Algorithm- ESA’95,Third Annunal European Symposium, pp. 311-326,Greece, September 1995.
[Cavnar’94] Cavnar, William B., and Trenkle, John M., "N-Gram-Based Text Categorization", the Proceedings of the 1994 Symposium On Document Analysis and Information Retrieval, University of Nevada, Las Vegas, April 1994.
[Chang’94] Chao-Huang Chang “A Pilot Study on Automatic Chinese Spelling Error Correction”, Communication of COLIPS,Vol 4, No 2, pp.143-149, 1994.
[Charniak’93] Eugene Charniak, “Statistical Language Learning”, The MIT Press, 1993.
[Chen’96] Kuang-hua Chen, "Natural Language Processing for Information Retrieval" Bulletin of the Library Association of China, No. 57, pp. 141-153, Dec. 1996.
[Chen’97] Chen, A. et al., “Chinese Text Retrieval without using a Dictionary”, Proceedings of ACM SIGIR’97, pp. 42-49. 1997.
[Chen’98a] Keh-Jiann Chen, Wen Tsuei, Lee-Feng Chien, “PAT-Trees with the Deletion Function as the Learning Device for Linguistic Patterns”, 17th International Conference on Computational Linguistics, COLING’98, 1998.
[Chen’98b] Chen, Chun-Liang, Bai, Bo-Ren, et al., “PAT-tree-based Language Modeling with Initial Application of Chinese Speech Recognition Output Verification, Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing (ISCSLP) Best Student Paper Award, pp. 139-144, Singapore, 1998.
[Chen’98c] Z. C. Chen and W. L. Xu, “結合統計語規則的多層次中文斷詞系統”, pp. 63-72, ROCLING XI, 1998.
[Chen’98d] C. L. Chen, B. R. Bai, L. F Chien and L. S. Lee, "CPAT-Tree-Based Language Models with an Application for Text Verification in Chinese", ROCLING XI, pp. 189-203, 1998.
[Chien’96] Lee-Feng Chien and H. T. Pu, “Important Issues on Chinese Information Retrieval”, Computational Linguistics and Chinese Language Processing, Vol. 1, no.1, pp. 205-221, August 1996.
[Chien’97a] Lee-Feng Chien et al., “PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval”, Proceedings of ACM SIGIR’97, Philadelphia, USA, pp. 50-58, 1997.
[Chien’97b] Lee-Feng Chien et al., “Internet Chinese Information Retrieval Using Unconstrained Mandarin Speech Queries Based on a Client-Server Architecture and a PAT-tree-based Language Model”, Vol. 2, pp.1155-1158, ICASSP’97, 1997.
[Chien’99] Lee-Feng Chien et al., “PAT-Tree-Based Adaptive Keyphrase Extraction for Intelligent Chinese Information Retrieval”, to appear on Information Processing and Management, Elsevier Press, 1999.
[Church’90] Church, K. & Hanks, P. “Word Association Norms, Mutual Information and Lexicography”, Computational Linguistics, 16/1, 1990.
[Clark’96] Clark, D.R., and Munro, J. I. “Efficient Suffix Trees on Secondary Storage”, ACM-SIAM Symposium on Discrete Algorithm, 1996.
[Coffman’98] K. G. Coffman and A. M. Odlyzko, “The Size and Growth of the Internet”, First Monday 3(10) (October 1998), http://www.firstmonday.dk/
[Craven’98] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery, “Learning to Extract Symbolic Knowledge from the World Wide Web”, Technical report CMU-CS-98-122 (extended version of AAAI-98 paper). September 1998.
[Flajolet’86] Flajolet, P. and R. Sedgewick. “Digital search trees revised”, SIAM J Computing, 15; 784-67, 1986.
[Frakes’92] Frakes, W. B. and Baeza-Yates, R. (Eds.) “Information Retrieval: Data Structures and Algorithms”, Englewood Cliffs, NJ: Prentice Hall, 1992.
[George’97] George Demetriou et al., "Large Scale Lexical Semantics for Speech Recognition Support", EuroSpeech’97, Vol. 5, pp. 2755 - 2758,1997.
[Gibbon’97] Gibbon et al., “Eaglet:eagles termbank for spoken language systems”. DRAFT EAGLES Interim Deliverable, University of Bielefeld, 1997.
[Gonnet’92] Gonnet, G. H., Baeza-yates, R. et al. “New Indices for Text: Pat Trees and Pat Arrays, Information Retrieval Data Structures & Algorithms”, pp. 66-82, Prentice Hall, 1992.
[Good’53] I.J. Good, “The population frequencies of species and the estimation of population parameters”, Biometrika, 40(3 and 4):237-264, 1953.
[Ho’98] T.H. Ho, K.C. Yang, K.H. Huang, L.S. Lee, "Improved Search Strategy for large Vocabulary Continuous Mandarin Speech Recognition", ICASSP’98, pp.825-828, 1998.
[Huang’96] Chu-Ren Huang and Keh-Jiann Chen, “Issues and Topics in Chinese Natural Language Processing”, Journal of Chinese Linguistics, monograph series No.9, pp.1-22, 1996.
[Hung’96] Hung J. C. “Dynamic Language Modeling for Mandarin Speech Retrieval for Home Page Information”. Master thesis, department of computer science of national Taiwan university, 1996.
[Jelinek’80] F. Jelinek and Rober L. Mercer, “Interpolated estimation of Markov source parameters from sparse data”, in Proceedings of the Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands: North-Holland, May, 1980.
[Johannes’98] Johannes Fürnkranz, Tom Mitchell, and Ellen Riloff, In M. Sahami (ed.), “A Case Study in Using Linguistic Phrases for Text Categorization on the WWW”, Learning for Text Categorization: Papers from the 1998 AAAI/ICML Workshop, pp. 5-13, Madison, WI, 1998. AAAI Press.
[Kaki’98] S. Kaki, E. Sumita and H. Iida, "A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence", COLING’98, pp.653-657, 1998.
[Karen’94a] Karen Sparck Jones, “Natural Language Processing: a Historic Review”, in Current Issues in Computational Linguistics: in Honour of Don Walker, 1994.
[Karen’94b] Karen Sparck Jones, “Natural Language Processing: She Needs Something Old and Something New (Maybe Something Borrowed and Something Blue, too)”,
[Karen’97] Karen Sparck Jones and Peter Willett, “Readings in Information Retrieval”, CA: Morgan Kaufmann Publishers, 1997.
[Katz’87] S.M. Katz, “Estimation of probabilities from sparse data for the language model component of a speech recognizer”. IEEE Transactions on Acoustic, Speech and Signal Processing, 35(3):400-401, 1987.
[Lee’97] Lin-shan Lee, “Voice Dictation of Mandarin Chinese”, IEEE Signal Processing Magazine, Vol.14, No.4, pp.63-101, July 1997.
[Lee’98a] Lin-Shan Lee, “Spoken Language Processing for Mandarin Chinese- Present and Future”, invited paper, 1998 Symposium on Image, Speech, Signal Processing and Robotics, Hong Kong, pp.II-229-II-234, Sept 1998.
[Lee’98b] Lin-Shan Lee, “Structural Features of Chinese Language — Why Chinese Spoken Language Processing is Special and Where We Are”, keynote Speech, 1998 International Symposium on Chinese Spoken Language Processing, Singapore, pp.1-15, Dec 1998.
[Lewis’96] Lewis, David D. and Sparck Jones, Karen, “Natural Language Processing for Information Retrieval”, Communications of the ACM, Vol. 39, No. 1, pp. 92-101, Jan. 1996.
[Lin’97] Chase Lin, "Blame Assignment for Errors Made by Large Vocabulary Speech Recognizer", EuroSpeech’97, Vol. 2, pp. 815 - 818,1997.
[Liu’97] Yuhsiang Liu, Zhili Guo, Chiching Hsu, Shauyi He and Naipo Lee, “Checking Chinese Text Errors in the Unicode Environment”, 11th International Unicode Conference and Global Computing Showcase, San Jose, CA., Sep. 1997.
[Lochovsky’97] A.F. Lochovsky and K.H. Chung, "Homonym Resolution for Chinese Phonetic Input", Communications of COLIPS, Vol. 7, No. 1, pp.5-15, JUN 1997.
[MEPG’90] Coding of moving pictures and associated audio. Committee Draft of Standard ISO11172: ISO/MPEG/90/176, Dec. 1990.
[Milind’99] Milind Mahajan, Doug Beeferman and X. D. Huang, “Improved Topic-Dependent Language Modeling Using Information Retrieval Techniques”, ICASSP’99, pp. 541-544,1999.
[Mladenic’98] Mladenic, D., Grobelnik, M. “Word sequences as features in text-learning” Proceedings of the Seventh Electrotechnical and Computer Sc. Conference ERK'98 Ljubljana, Slovenia: IEEE section, 1998.
[Morrison’68] Morrison, D., "PATRICIA: Practical Algorithm to Retrieve Information Coded in Alphanumeric”, JACM, pp.514-534, 1968.
[Nagao’94] Nagao, M. and Mori, S., “A New Method of n-gram statistics for large number of n and automatic extraction of words and phrases from large text of Japanese”, Proceedings of COLING-94, pp. 611-615, 1994.
[Paolo’96] Paolo Ferragina and Roberto Grossi “Fast String Searching in Secondary Storage: Theoretical Developments and Experimental Results”, ACM-SIAM Symposium on Discrete Algorithm, 1996.
[Pierre’97] Pierre Dupont and Renald Rosenfeld, “Lattice based language models”, Tech. report CMU-CS-97-173 , September 1997.
[Rabiner’93] Lawrence Rabiner and Biing-Hwang Juang, “Fundamentals of Speech Recognition”, Prentice-Hall Inc., 1993.
[Ricardo’92] A. Baeza-Yates, “Text Retrieval: Theory and Practice”, Information Processing 92, Vol. I, pp. 465-483,1992.
[Ringger’96] E.K. Ringger and J.F. Allen, "A Fertility Channel Model for Post-Correction of Continuous Speech Recognition", ICSLP'96, Vol.2 pp.524-527, 1996.
[Ringger’97] E.K. Ringger and J.F. Allen, "Robust Error Correction of Continuous Speech Recognition", Proceedings of the ESCA-NATO Workshop on Robust speech Recognition for Unknown Communication Channels, 1997.
[Ronald’96] Ronald A. Cole, etc, “Survey of the State of the Art in Human Language Technology”, Sponsored by National Science Foundation and European Commission, 1996.
[Salton’83] Salton and McGill, “Introduction to Modern Information Retrieval”, McGraw-Hill, NY, USC. 1983.
[Salton’89] Salton, G. “Automatic Text Processing: the Transformation, Analysis and Retrieval of Information by Computer”, Addison-Wesley Publishing Company, Inc. USA. 1989.
[Sato’97] T. Sato, “Fast Full Text Retrieval Using Gram Based Tree Structure”, proceedings of 17-th International Conference on Computer Processing of Oriental Languages, pp.572-577, ICCPOL’97.
[Schutze’98] Schutze, Hinrich, “The Hypertext Concordance: A Better Back-of-the-Book Index”, Proceedings of the First Workshop on Computational Terminology (Computerm’98), pp. 101-104, 1998.
[Shi’92] Shi et al. “A Statistical Method for Locating Typo in Chinese Sentence”, Computer and Telecommunication, August pp19-26, 1992.
[Shishibori’97] M. Shishibori, K. Morita, K. Ando and J-I. Aoe, “The Design of a Compact Data Structure for Binary Tries”, proceedings of 17-th International Conference on Computer Processing of Oriental Languages, pp.606-611, ICCPOL’97.
[Smadja’93] Smadja, F., “Retrieving Collocations from Text: Xtract”, Computational Linguistics, 19 (1), pp. 143-177, 1993.
[Stanley’96] Stanley F. Chen, “Building Probabilistic Models for Natural Language”, Ph. D Dissertation, Harvard University, May, 1996.
[Su’96] Keh-Yih Su, Tung-Hui Chiang, Jing-Shin Chang, “An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing”, Computational Linguistics and Chinese Language Processing, Vol. 1, no.1, pp. 101-157, August 1996.
[Sun’98] Sun Maosong, Shen Dayang, Benjamin K. Tsou, “Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data”, COLING’98, pp.1265-1271,1998.
[Wan’97] Wan, T. L., Evens, M. et al., “Experiments with Automatic Indexing and a Relational Thesaurus in a Chinese Information Retrieval System”, Journal of the American Society for Information Science,48(12), pp. 1068-1096, 1997 .
[Weng’98] Weng F. L, Stolcke, A., and Sankar, A. (1998). “Efficient Lattice Representation and Generation”, In the Proceedings of International Conference on Spoken Language Processing, Sydney, Australia, Nov. 30 - Dec. 4, 1998.
[Wong’98] Wong, K.F., Li, W., "Intelligent Chinese Information Retrieval -- Why Is It So Difficult?", In Proceedings of 1st Asian Digital Library Workshop, Hong Kong, August 6-7, 1998.
[Wu’95] Wu, Z., Tseng, G., “ACTS: An Automatic Chinese Text Segmentation System for Full Text Retrieval”. Journal of the American Society for Information Science, 46 (2), pp. 83-96, 1995.
[Xia’96] Y. Xia, X. G. Chang, S. P. Ma, X. Y. Zhu and Y. J. Jin “Co-occurrence Probability Between Chinese Characters” Communications of COLIPS, Vol. 6, No.1, pp.19-23, JUNE 1996.
[Yahoo] Yahoo search engine, http://www.yahoo.com
[Yamamoto’98] Yamamoto, M. and Church, K., “Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus”, Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, pp. 28-37, 1998.
[Yang’97] Yang, Y., Pedersen J.P. “A Comparative Study on Feature Selection in Text Categorization”, Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), 1997.
[Yang’98] Kae-Cherng Yang, “Further Studies for Practical Chinese Language Modeling”, Master Thesis, department of electrical engineering of national Taiwan university,1998.
[Yong’ 97] Steve Young and Gerrit Bloothooft, “Corpus-Based Methods in Language and Speech Processing”, KLUWER ACADEMIC PUBLISHERS, 1997.
[Zamir’98] Zamir, Oren and Etzioni Oren, “Web Document Clustering: A Feasibility Demonstration”, Proceedings of SIGIR’98, pp. 46-53, 1998.
[Zernik’91] Zernik, Uri, “Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon”, Lawrence Erlbaum Associates, Publishers, 1991.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔