臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.110) 您好！臺灣時間：2026/05/05 23:07

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

楊顓溥

研究生(外文):

Chuan-Pu Yang

論文名稱:

中文縮寫詞研究

論文名稱(外文):

A Study of Chinese Abbreviations

指導教授:

黃純敏

指導教授(外文):

Chuen-Min Huang

學位類別:

碩士

校院名稱:

國立雲林科技大學

系所名稱:

資訊管理系碩士班

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2005

畢業學年度:

語文別:

英文

論文頁數:

中文關鍵詞:

特徵選取、最大熵、中文縮寫詞、最長共同子序列

外文關鍵詞:

Longest Common Subsequence、Maximum Entropy Principle、Chinese abbreviation、Feature Selection

相關次數:

被引用:3
點閱:1077
評分:
下載:59
書目收藏:0

在中文文件中，字詞常以縮寫型態出現，例如:「台灣鐵路局」縮寫成「台鐵局」。這種高度「可縮寫性」之用法雖然為現代人爭取了時效及便利性，但也為中文字詞處理帶來了一些挑戰。像是對於以關鍵字為基礎的資訊檢索系統而言，使用者在進行檢索時所下之關鍵字為「縮寫詞」或是「原形詞」對搜尋引擎而言是兩個不同的詞，造成回傳結果遺失許多資訊。抑或在進行中文斷詞、文件自動分群及字詞權重計算等處理時，亦會對系統效能產生影響。

基於這些問題，本研究提出一個中文縮寫詞或原形詞的對應機制。藉由這樣的機制能夠將文件中的縮寫詞和與之對應原形詞連結起來，而不需依賴任何固定辭典，如同利用語料庫以建立一個動態縮寫詞對照表，也很容易應用到其它語言上。

在本論文中，我們以八千五百篇電子新聞文件為語料庫設計了幾個實驗。主要分為選取最佳候選詞、擷取正確縮寫詞及原形詞兩大部份。實驗為雙流程，即原形詞對應縮寫詞、縮寫詞對應原形詞兩組流程。在選取最佳候選詞的實驗方面，我們考量擷取上下文資訊以訓練最大熵模型，進行最佳候選詞的選取。實驗結果發現選取候選詞的精確率平均可以達到80%~90%。另外在縮寫詞及原形詞對應實驗方面，由縮寫詞對應原形詞，精確率最高可達到70%;而由原形詞對應縮寫詞，我們做出來的精確率最高可以達到80%。

The form of abbreviation is commonly used in the Chinese text. For instance, we often transform ‘台灣鐵路局’ into ‘台鐵局’. This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword-based information retrieval system, using the abbreviated form and the original form as search entry respectively, usually return different results even though both are the same meaning. In addition, the influences of abbreviation on Chinese word segmentation, automatic documents clustering and weight of terms are obvious.

To solve the semantic ambiguity problem, we propose an approach to connect the two forms and construct an abbreviation list automatically in corpus without any fixed dictionary.

In this study, we conduct three major experiments with 8,500 documents from news website. Each experiment is a duo-process, from original form to abbreviation form back and forth. In the first experiment, we employ Maximum Entropy Model which uses many contextual “features” to locate the best candidate. In the second experiment, we attempt to transform original forms from their abbreviations. The third experiment is aimed at finding abbreviations from their original forms. The precision ratios achieve 80%-90%, 70%, and 80% respectively.

Table of Contents
Table Index VII
Figure Index IX
Chapter 1 Introduction 1
1.1 Research Background and Motivation 1
1.2 Research Objectives 2
1.3 Research Limitation 2
1.4 Research Contribution 2
1.5 Research Framework 3
Chapter 2 Literature Review 4
2.1 Chinese word segmentation 4
2.2 Sequence Similarity 4
2.2.1 Longest Common Subsequence, LCS 4
2.2.2 Longest Common Consecutive Subsequence, LCCS 7
2.3 Expansion of Abbreviations 7
2.4 Principle of Maximum Entropy 8
2.4.1 Overview 8
2.4.2 Maximum Entropy for NLP 9
Chapter 3 Experiments 13
3.1 Overview 13
3.2 Part-of-Speech Tag and Chinese Word Segmentation 14
3.3 Extract Possible Abbreviations and Original Forms 14
3.4 Find the Candidates of Abbreviations and Original Forms 15
3.5 Choosing the Best Candidates of Abbreviations and of Original Forms 16
3.5.1 Procedure 17
3.5.2 Feature Template 18
3.5.3 Extract Contextual Information 20
Chapter 4 Experiments Design and Results 22
4.1 Evaluation Criteria 22
4.2 Resource 23
4.3 Results and Analysis 23
4.3.1 Choosing the Best Candidate 23
4.3.1.1 Choosing the Best Candidate of Original Forms 24
4.3.1.2 Choosing the Best Candidate of Abbreviation 28
4.3.1.3 Summary 31
4.3.2 Finding Corresponding Original forms of Abbreviations 32
4.3.2.1 Summary 36
4.3.3 Finding Corresponding Abbreviations of Original forms 36
4.3.3.1 Summary 38
Chapter 5 Conclusion 40
5.1 Research Contributions 40
5.2 Future Works 41
Reference 42
Appendix 44

1.Akira, T., & Takenobu, T. (2001). Automatic disabbreviation by using context information. in Proceedings of the sixth natural language processing pacific rim symposium workshop on automatic paraphrasing:theories and applications, 21-28.
2.Berger, A., Pietra, S. D., & Pietra, V. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39-71.
3.Borthwick, A., Sterling, J., Agichtein, E., & Grishman, R. (1998). Exploiting diverse knowledge sources via maximum entropy in named entity recognition. Paper presented at the The Sixth Workshop on Very Large Corpora.
4.Brown, P., Pietra, S. D., Pietra, V. D., & Mercer, R. (1991). A statistical approach to sense disambiguation in machine translation. Paper presented at the DARPA Workshop on Speech and Natural Language.
5.Darroch, J. N., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5), 1470-1480.
6.Elmi, M. A., & Evens, M. (1998). Spelling correction using context.
7.Greiff, W. R., & Ponte, J. M. (2000). The maximum entropy approach and probabilistic ir models. ACM Transactions on Information Systems, 18, 246-287.
8.Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620-630.
9.Kehler, A. (1997). Probabilistic coreference in information extraction. Paper presented at the Second Conference on Empirical Methods in Natural Language Processing.
10.Leah, L., Ogilvie, P., Price, A., & Tamilio, B. (2000). Acrophile:An automated acronym extractor and server. In Proceedings of the ACM digital libraries conference, 205-214.
11.LIN, H., & YUAN, C. F. (2002). Chinese part of speech tagging based on maximum entropy method. Paper presented at the First International Conference on Machine learning and Cybernetics,Beijing, Beijing.
12.Park, Y., & Byrd, R. J. (2001). Hybrid text mining for finding abbreviations and their definitions. In Proceedings of EMNP2001.
13.Pavlov, D. (2003). Sequence modeling with mixtures of conditional maximum entropy distributions. Paper presented at the Third IEEE International Conference on Data Mining(ICDM''03).
14.Pavlov, D., Popescul, A., Pennock, D., & Ungar, L. (2003). Mixtures of conditional maximum entropy models, Twentieth International Conference on Machine Learning(ICML-2003).
15.Pietra, V. D., Pietra, S. D., & Lafferty, J. (1995). Inducing features of random fields:Technical Report CMU-CS95-144,School of Computer Science,Carnegie-Mellon University.
16.Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In conference on Empirical Methods in Natural Language Processing.
17.Ratnaparkhi, A. (1997). A simple introduction to maximum entropy models for natural language processing. Technical Report 97-08,Institute for Research in Cognitive Science,University of Pennsylvania.
18.Ratnaparkhi, A., JeffReynar, & Roulos, S. (1994). A maximum entropy model for prepositional phrase attachment. In Proceedings of the Human Language Technology Workshop(ARP,1994), 250-255.
19.Reynar, J. C., & Ratnaparkhi, A. (1997). A maximum entropy approach to identifying sentence boundaries. In Fifth Conference on Applied Natural Language Processing, 16-19.
20.Taghva, K., & Gilbreth, J. (1999). Recognizing acronyms and their definitions. International journal on document analysis and recognition(IJDAR), 191-198.
21.Terada, A., Tokunaga, T., & Tanaka, H. (2004). Automatic expansion of abbreviations by using context and character information. Information Processing and Management, 31-45.
22.Toole, J. (2000). A hybrid approach to the identification and expansion of abbreviations. In Proceedings of RIAO''2000, 1, 725-736.
23.賴育佐. (2003). 中文縮寫詞之機率統計模式. 國立暨南國際大學資訊工程學系碩士論文.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	中文縮寫詞之機率統計模式
2.	中文縮寫詞延伸研究
3.	改進中文縮寫詞與原形詞配對率

1.	14.陳彥仲(1998)，『對多項Logit模型參數指定方式之比較分析』，《交大管理學報》，　18卷，2期，頁171-185。
2.	13.陳彥仲(1997b)，『台灣地區期望住宅需求彈性之分析』，《都市與計劃》，24卷，2　　期，頁193-209。
3.	33.謝文盛、林素菁(2000)，『租稅效果對住宅租買選擇影響之分析』，《住宅學報》，9卷，1期，pp. 1-17。
4.	30.薛立敏、李中文、曾喜鵬(2003)，『台灣區域人口遷移及其與就業市場、住宅市場關係之實證研究』，《都市與計劃》，30卷，1期，頁37-61。
5.	16.陳淑美、張金鶚 (2002)，『家戶遷移決策與路徑選擇之研究-台北縣市的實證研究』，《住宅學報》，11卷，1期，頁1-22。
6.	28.薛立敏、陳綉里(1997b)，『住宅租擁選擇下家計消費支出之比較』，《住宅學報》， 7 期，頁21-40。
7.	12.陳彥仲(1997a)，『住宅選擇之程序決策模式』，《住宅學報》，5期，頁37-49。
8.	10.胡志平(2002) ，『空屋鏈與住宅福祉提升計量評估解析—排序性選擇模式之應用』，《建築與規劃學報》，3卷，2期，頁112-134。

1.	中文縮寫詞延伸研究
2.	改進中文縮寫詞與原形詞配對率
3.	應用TopicMaps理論建置知識索引於線上新聞事件檢索研究
4.	運用內容管理機制以輔助教學活動之研究
5.	應用新穎性偵測於事件偵測與追蹤
6.	應用問答系統技術於電腦領域論壇檢索之研究
7.	國中生英語學習動機、學習滿意度與學習成就之相關研究—以雲林縣為例
8.	第三代台灣地區生物源排放量推估模式之驗證
9.	以認知價值觀點探討消費者對行動加值服務付費之意願行動加值服務付費之意願
10.	聯合資訊科技投資事件多宣告資訊移轉效果對同業異常報酬之研究
11.	應用系統模擬於鞋模具生產與派工之探討
12.	以模糊理論為基礎之彈性製造系統動態派工法則
13.	吳晟及其散文研究
14.	利用本體論語言OWL建構以知識管理為基礎網路日誌之研究
15.	以貝氏定理為基礎於垃圾郵件過濾之研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室