跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.76) 您好!臺灣時間:2026/04/23 21:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉景民
研究生(外文):Jin-Min Liu
論文名稱:基於大字符集柏洛菲勒轉換之中文文本資料壓縮方法
論文名稱(外文):A Chinese Text Compression Scheme Based on Large-Alphabet BW-Transform
指導教授:古鴻炎古鴻炎引用關係
指導教授(外文):Hung-Yan Gu
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:53
中文關鍵詞:文本資料壓縮大字符集柏洛菲勒轉換算術編碼
外文關鍵詞:text compressionlarge alphabetBWT(Burrows-Wheeler transform)MTFarithmetic coding
相關次數:
  • 被引用被引用:1
  • 點閱點閱:545
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:1
本論文提出一種基於大字符集柏洛-菲勒轉換(Burrows-Wheeler transform, BWT) 之中文文本資料的壓縮方法,先以Big-5加上ASCII形成的大字符集(alphabet)來剖析輸入的中文文字檔案,再接著進行BWT、MTF(move to front)、和算術編碼的處理。我們也研究了,在大字符集要求下能夠適用於BWT、MTF和算術編碼處理上的實作方法,以提升處理的速度。我們已經將這個壓縮方法製作成可以實際使用之軟體程式,對於中文文字檔案的測試實驗,結果顯示我們方法獲得的壓縮率,比一般常被使用的Win-ZIP好約12.9%,比Win-RAR好約4.7%,而比原始的基於BWT的壓縮軟體BZIP2的壓縮率好約1.7%。
In this thesis, a Chinese text compression scheme based on large alphabet Burrows-Wheeler transform(BWT) is proposed. First, an inputted Chinese text file is parsed with a large alphabet consisting of characters from BIG-5 and ASCII codes. Then, the parsed token stream is processed by BWT, MTF(Move To Front), and arithmetic coding. To improve the speed of the proposed scheme, we have also studied a few ways for practical implementations of BWT, MTF and arithmetic coding under large-alphabet parsing condition. According to the compression scheme, a practically executable program is developed. When compared with other compression programs, i.e., Win-ZIP, Win-RAR, and BZIP2, our program is shown, in Chinese text file compression experiments, to have better compression rates. Rate improvements are 12.9%, 4.7%, and 1.7%, respectively.
目錄
摘要...................................................I
Abstract..............................................II
致謝.................................................III
目錄..................................................IV
圖表目錄..............................................VI
第一章 緒論..........................................1
1.1 研究動機......................................1
1.2 壓縮方法之回顧................................3
1.2.1 中文資料壓縮方法回顧..........................4
1.2.2 基於BWT之壓縮方法回顧.........................7
1.3 本論文之研究方法..............................8
1.4 論文架構......................................9
第二章 大字符集剖析及BWT轉換........................10
2.1 大字符集簡介.................................10
2.2 大字符集的剖析...............................11
2.3 柏洛菲勒轉換(Burrows-Wheeler transform)......12
2.4 BWT實作方法..................................15
2.5 BWT序列之特性................................17
第三章 Heap MTF轉換.................................18
3.1 Heap樹結構之MTF轉換..........................18
3.2 MTF移動之策略................................23
第四章 可變長度碼編碼...............................26
4.1 Heap MTF序列特性.............................26
4.2 霍夫曼編碼 27
4.3 算術編碼 33
第五章 測試實驗與結果 39
5.1 字符集大小實驗 39
5.2 Heap樹架構MTF轉換實驗 40
5.3 Heap MTF之移動策略實驗 42
5.4 調適性霍夫曼編碼之實驗 44
5.5 不同階數之算術編碼實驗 45
5.6 與其他壓縮法比較 46
5.7 英文文章壓縮率實驗 48
第六章 結論與未來方向 49
參考文獻 51
圖表目錄
圖 1 1. 本論文之大字符集BWT轉換壓縮流程 9
圖 2 1. ASCII碼與BIG-5碼合成之大字符集 11
圖 3 1. 字符集元素分成單棵HEAP樹之結構 20
圖 3 2.多棵大小平均之HEAP樹 21
圖 3 3. HEAP MTF位置值分佈圖 22
圖 3 4. 漏斗式HEAP樹分割方式 23
圖 4 1.可調適性霍夫曼編碼主流程圖 30
圖 4 2. 可調適霍夫曼樹串列範例 31
圖 4 3. 可調適霍夫曼樹更新流程 32
圖 4 4. 區間窄化之範例(輸入A1,A2,A3) 34
圖 4 5. n階預測式算術編碼之流程圖 37
表 4 1. MTF輸出位置值之分群 34
表 5 1. 測試檔檔案說明 39
表 5 2. 字符集大小對壓縮率的影響 40
表 5 3. 原始MTF與HEAP MTF處理時間比較 41
表 5 4. HEAP劃分方式與算術編碼的分群作法之組合 41
表 5 5. 各種HEAP MTF結構壓縮率實驗 42
表 5 6. 各種MTF移動策略之壓縮率比較 44
表 5 7. 各種MTF移動策略之處理時間比較 44
表 5 8. 調適性霍夫曼編碼之壓縮率 45
表 5 9. 不同階數算術編碼之壓縮率比較 46
表 5 10.數種壓縮法之壓縮率比較 47
表 5 11.數種壓縮法之壓縮時間比較 47
表 5 12.英文文章壓縮率之比較 48
[1] T. C. Bell, J. G. Cleary, and I. H. Witten, Text Compression, Prentice- Hall Inc., Englewood Cliffs, New Jersey, 1990.
[2] Sayood, Khalid, Introduction to Data Compression, 2''nd ed., Morgan Kaufmann, 2000.
[3] M. Burrows and D. J. Wheeler, A Block-sorting Lossless Data Compression Algorithm, SRC Research Report 124, Digital System Research Center, Palo Alto, USA, May 1994.
[4] Julian Seward, "The bzip2 and libbzip2 home page", http://source.redhat.com/bzip2.
[5] G. H. Ong and S. Y. Hunag, "Compreesion of Chinese text files using a multiple four-bit coding scheme", Communications on the move ICCS/ISITA, Singapore, 1992, pp.1062-1066.
[6] K. T. Lua, "Compression of Chinese text", Interational Conference on Chinese Computing ''94(ICCC94), Singapore, June 1994, pp. 367-375.
[7] G. H. Ong, W. T. Chong, "Compression Chinese text files using an adaptive Huffman coding scheme and a static dictionary of character pairs", Communications and Networks for the Year 2000, Proceedings of IEEE Singapore International Conference on, Volume 2, 6-11 Sept. 1993, pp. 808 -812.
[8] H. Y. Gu, "A new Chinese compression scheme combining dictionary coding and adaptive alphabet grouping", Computer Processing of Chinese and Oriental Languages, 10(3), 321-335(1997).
[9] H. Y. Gu, http://guhy.ee.ntust.edu.tw/text-compress/.
[10] 宋嘉宏,多串列區域調整式資料壓縮法之設計與應用,國立中正大學,碩士論文,民國85年。
[11] L. Y. Tseng and T. H. Huang, "A predictive coding method for Chinese text file", Journal of Computers, 2(3), 18-23(1990).
[12] J.N. Chang, J.S. Chang, and M.H. Chen, "Using word-segmentation model for compression of Chinese text", Computer Processing of Chinese and Oriental Languages, 9(1), 17-30(1995).
[13] H. Y. Gu, "Large alphabet Chinese text compression using adaptive Markov model and arithmetic coder", Computer Processing of Chinese and Oriental Language, 9(2), 111-124(1995).
[14] M. K. Tsay, C. H. Kuo, R. H. Ju, and M.K. Chou, "Modeling for Chinese text file compression", International Conference on Industrial Applications of Computing, Alexandria, Egypt, 1992, pp. 205-208.
[15] P. Vines, J. Zobel, "Compression Techniques for Chinese Text", Software Practice and Experience, Vol 28(12), pp1299-1314, Oct 1998.
[16] B. Balkenhol, S Kurtz, Y. M. Shtarkov, "Modifications of the Burrows and Wheeler Data Compression Algorithm". Proceedings of the IEEE Data Compression Conference, 1999, 188-197.
[17] B. Balkenhol, Y. M. Shtarkov, "One attempt of a compression algorithm using BWT", Preprint 99-133, SFB343, Discrete Structures in Mathematics, Falculty of Mathematics, University of Bielefeld, 1999.
[18] Ziya Arnavut. "Move-to-Front and Inversion Coding". Data Compression Confereence 2000, 193-202.
[19] Binder Edgar, "BWT and distances", http://groups.google.com /groups?selm=390B6254.D5113AD2%40T-Online.de.
[20] N. J. Larsson, "The Context Trees of Block Sorting Compression", IEEE Data Compression 1998.
[21] R.Y. K. Isal, A. Moffat, A. C. H. Ngni, "Enhanced Word-Based Block-Sorting Text Compression", Twenty-Fifth Australasian Computer Science Conference(ACSC2002).
[22] Graham, "Unicode: What Is It and How Do I Use It?", Markup Technologies ''98 Conference.
[23] J. Bentley and R. Sedgewick, "Fast Algorithm for Sorting and Searching String", Proc 8th ACM-SIAM Symposium on Discrete Algorithms, pp. 360-369, 1997.
[24] N. J. Larsson, K. Sadakane. Faster suffix sorting. Technical report LU-CS-TR:99-214., Dept. of Computer Science, Lund University, Sweden, 1999.
[25] R. G. Gallagher. Variations on a Theme by Huffman. IEEE Transcations on Information Theory, IT-24(6):668-674, November 1978.
[26] Mark Nelson, "Arithmetic Coding+Statistical Modeling = Data Compression",http://www.dogma.net/markn/articles/arith/part1.htm.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top