[1] Bergroth, L., H. Hakonen and T. Raita, “A Survey of Longest Common Subsequence Algorithms,” Proceedings of the 7th International Symposium on String Processing Information Retrieval (SPIRE), 2000, pp.39-48
[2] Chien, L. F., “PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval,” Proceedings of the 1997 ACM SIGIR, pp.50-58.
[3] Dai, Y., Loh, T. E. and Khoo, C. S. G.., “A New Statistical Formula For Chinese Text Segmentation Incorporating Contextual Information,” Proceeding of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999, pp.82-89.
[4] Dey, D., S. Sarkar, and P. De, “A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 3, 2002, pp.567-582.
[5] Domingo, J. and V. Torra, “Distance-based and Probabilistic Record Linkage for Re-identification of Records with Categorical Variables,” Butlleti de l''Associacio Catalana d''Intelligencia Artificial, No. 28, Fall 2002, pp.243-250.
[6] F. Beil, M. Ester, and X. Xu, “Frequent Term-based Text Clustering,” In Proc. 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD)''2002, Edmonton, Alberta, Canada, 2002. http://www.cs.sfu.ca/~ ester/ publications.html.
[7] Fox, C., “A Stop List for General Text,” ACM SIGIR Forum, Vol. 24, No. 1-2, 1989, pp.19-35.
[8] Frank Tseng S.C. and Chou A.Y.H., “The Concept of Document Warehousing for Content Management of Enterprise Business Intelligence,” Decision Support Systems, 2006, Vol. 42, pp.727-744.
[9] Fung, BCM, Wang, K, Ester M. “Hierarchical Document Clustering Using Frequent Itemsets,” Proceedings of the 2003 SIAM Intl. Conf.on Data Mining (SIAM''03).
[10] Gravano, L., Panagiotis, G. Ipeirotis, H.V. Jagadish, Nick Koudas, S. Muthukrishnan, and Divesh Srivastava. “Approximate String Joins in a Database (almost) for Free” Proceedings of the 27th International Conference on Very Large Databases (VLDB 2001), 2001, pp.491-500.
[11] Gravano, L., Panagiotis G. Ipeirotis, Nick Koudas and Divesh Srivastava,“Text Joins in an RDBMS for Web Data Integration,”Proceedings of the 12th international conference on World Wide Web, 2003, pp.90-101.
[12] Guan, Y., A. Ghorbani and N. Belace, “Y-means: a Clustering Method for Intrusion Detection,” Technical Report, National Research Council of Canada, 2003, pp.1083-1086.
[13] Han, H., X. L. Lu, J. Lu, C. Bo and R. L. Yong, “Data Mining Aided Signature Discovery in Network-based Intrusion Detection System,” ACM SIGOPS Operating Systems Review, Vol. 36 , Issue 4, pp. 7-13, 200.
[14] Han, Jiawei and Micheline Kamber, Data Mining: Concepts and Techniques, San Fransisco: Morgan Kaufmann, 2003.
[15] Hernandez, M. A., and S. J. Stolfo, “The Merge/Purge Problem for Large Databases,” ACM SIGMOD Record, Vol. 24, No. 2, 1995, pp. 127-138.
[16] Jain, A. K., M. N. Murty and P. J. Flynn, “Data Clustering: a Review,” ACM Computing Surveys, Vol. 31, 1999, pp. 264-323.
[17] Jardine, N. and van Rijsbergen, C.J. “The use of hierarchical clustering in information retrieval,” Information Storage and Retrieval, 217-240, 7, 1971.
[18] Johnson, R. A. and D. W. Wichern, Applied Multivariate Statistical Analysis, New Jersey: Prentice-Hall, 1998.
[19] Jude Carroll, “A Handbook for Deterring Plagiarism in Higher Education,” Oxford: The Oxford Centre for Staff and Learning Development (2002), pp. 96, ISBN 1–873576–56–0.
[20] Kang, In-Ho and GilChang Kim, “Query Type Classification for Web Document Retrieval,” Proceedings of the 26th annual international ACM SIGIR conference, July 28-August 1, 2003, pp.64-71.
[21] Kim, D. J., Y. W. Park and D. J. Park, “A Novel Validity Index for Determination of The Optimal Number of Clusters,” IEICE Transactions on Information and Systems Society, Vol. E84-D, No. 2, 2001, pp.281-285.
[22] Laan, M. J. and Pollard K. S., “A New Algorithm for Hybrid Hierarchical Clustering with Visualization and The Bootstrap,” Journal of Statistical Planning and Inference, Vol. 117, No.2, 2003, pp.275-303.
[23] Lee, W., J. Stolfo and Mok K. W., “A Data Mining Framework for Building Intrusion Detection Models,” Proceedings of the IEEE Symposium on Security and Privacy, 1999, pp.120-132.
[24] Liu, Y., Liu Qun, Zhang Xiang and Chang Baobao, “A Hybrid Approach to Chinese-English Machine Translation,” Proceedings. Int. Conference. Intelligent Processing Systems, 1997, pp.1146-1150.
[25] Mannila, H., Toivonen H. and Verkamo A. I., “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, Vol.1, No.3, 1997, pp.259-289.
[26] Mannila, H. and Toivonen H., “Discovering Generalized Episodes using Minimal Occurrences,” Proceedings of the Second Int’l Conference. on knowledge discovery and data mining, 1996, pp.146-151.
[27] MetaTexis, http://www.metatexis.net/.
[28] Morrison, D. R., “PATRICIA- Practical Algorithm to Retrieve Information Coded in Alphanumeric,” Journal of the ACM, Vol.15, No.4, Oct 1968, pp.514-534.
[29] Nie, J. Y., Brisebois, M. and Ren, X., “On Chinese text retrieval,” Proceeding of the 19nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 1996, pp.225-233.
[30] Oh, S. J. and Kim J. Y., “A Hierarchical Clustering Algorithm for Categorical Sequence Data,” Information Processing Letters, Vol. 91, No. 3, 2004, pp.135-140.
[31] Omega, T, http://www.omegat.org/omegat/omegat_en/omegat.html.
[32] Palmer, D. D. and Hearst M. A., “Adaptive Multilingual Sentence Boundary Disambiguation,” Computational Linguistics, 23/3, 1997, pp.241-267.
[33] Plagiarism.org, http://www.plagiarism.org/.
[34] Plagiarism Tools,http://www.shambles.net/pages/staff/ptools/.
[35] Popesuc, A. R., “Implementation of Term Weighting in a Simple IR System,” Kursprojekt, June 2001.
[36] Porter, M. F., “An Algorithm for Suffix Stripping,” Program, Vol.14, No.3, 1980, pp. 130-137.
[37] Porter, E. H. and Winkler, W. E. Approximate String Comparison and Its Effect on an Advanced Record Linkage System, US Bureau of the Census, 1997.
[38] Reynar, J. C. and Ratnaparkhi, A. “A Maximum Entropy Approach to Identifying Sentence Boundaries,” Proceedings of the Fifth A CL Conference on Applied Natural Language Processing (ANLP''97), 1997, pp.16-19.
[39] Riley, M. D., “Some Applications of Tree-based Modeling to Speech and Language Indexing,” Proceedings of the DARPA Speech and Natural Language Workshop, 1989, pp.339-352.
[40] Salton, G. and McGill M. J., Introduction to Modern Information Retrieval, New York: McGraw-Hill Company, 1983.
[41] Salton, G., Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer, Mass.Wokingham: Addison-Wesley Publishing Company, 1988.
[42] Salton, G., Singhal, A., Mitra and Buckley, C.,”Automatic Text Structuring and summarization,” Information Processing and Management, Vol.33, 1997, pp.193-204.
[43] Shannon, C. E., “Prediction and Entropy of Printed English,” Bell System Technical, 1951, pp. 50-64.
[44] Swan, R. and J. Allan, “Automatic generation of overview timeliness,"Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athen, Greece, 2000.
[45] Trados Translators Workbench, http://www.trados.com/Default.asp/.
[46] Vijaya, P. A., M. Murty and D. K. Subramanian, “Leaders–Subleaders: An Efficient Hierarchical Clustering Algorithm for Large Data Sets,” Pattern Recognition Letters, Vol. 25, No. 4, 2004, pp. 505-513.
[47] Webb, Lynn E., “Advantages and Disadvantages of Translation memory: A Cost/Benefit Analysis,” A thesis of MA in Translation of German Graduate Division California: Monterey Institute of International Studies, 1999.
[48] Wong, K. F. and Li, W. J., “Intelligent Chinese Information Retrieval: Why is it so Difficult?” Proceedings of the First Asia Digital Library Workshop, 1998, pp. 47-56.
[49] Wordfast, http://www.wordfast.net/.
[50] Xie, X. L. and G. Beni, “A Validity Measure for Fuzzy Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, 1991, pp. 841-847.
[51] Yancey, W. E., “An Adaptive String Comparator for Record Linkage,” Statistical Research Division U.S. Bureau of the Census Washington D.C., 2003, pp.1-22.
[52] Yang, Y. and Jan, O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” International Conference on Machine Learning, 1997, pp.412-420.
[53] Yeh, C. L. and Lee, H. J., “Rule-Based Word Identification for Mandarin Chinese Sentences-A Unification Approach,” Computer Processing of Chinese and Oriental Languages, Vol. 5, No. 2, March 1991, pp. 97-118.
[54] 中央研究院資訊科學研究所詞庫小組,http://rocling.iis.sinuca.edu.tw/CKIP/
[55] 陳振南、吳毓傑,「特徵選取與權重分配於中文新聞分類之比較」,第十三屆國際資訊管理學術研討會,淡江大學,2002,721-728頁。
[56] 陳淑美,「財經新聞自動分類之研究」,國立台灣大學圖書館學研究所,碩士論文,1992。[57] 曹偉駿、楊景隆、劉經緯,「運用網路誘捕系統對入侵行為之分析與實作」電子商務與數位生活研討會,2005。
[58] 曾元顯,「關鍵詞自動擷取技術與相關詞回饋」,中國圖書館學會會報,第59期,1997年11月。[59] 曾元顯,「關鍵詞自動擷取技術之探討」,中國圖書學會會訊,第106期,1997年9月。
[60] 曾守正、曾有德,「透過研討會承辦系統建置文件倉儲以實現知識管理平台之研究」,2007第二屆數位內容管理與應用學術研討會(DCMA 2007) 論文集,國立台南大學主辦,Jun. 1-2, 2007。
[61] 陳永德,「中文斷詞中長詞優先、詞頻對比、前詞優先規則之使用」,國立台灣大學心理學研究所,博士論文,1997。[62] 蔡嘉嘉、曾守正,「Fuzzy-Based Multi-Categorization of Chinese Documents」,資訊管理學報 (Journal of Information Management),第十二卷,第四期,2005,75-106頁。
[63] 維基百科,http://zh.wikipedia.org/wiki/
[64] 魏玲玉、曾守正,「以文件倉儲概念實現動態群聚與多重文件摘要之研究-以中文電子新聞為例」,資訊管理學報 (Journal of Information Management),2005。[65] 謝清俊、陳淑美、楊允言、陳克健,「Auto classification of Texts」,如何利用大語料庫作研究研討會,計算語言學會主辦,1992。
[66] 顧皓光,「網路文件自動分類」,國立台灣大學資訊管理研究所,碩士論文,1996。