|
Copyright (c) 1990, Microsoft Corp 本論文提出了在視訊壓縮系統 三個有效率的移動估測架構,首先,針對全搜尋演算法提出一個具有二維 資料重覆使用的資料交錯架構,此架構以一維處理器單元陣列和移位暫存 器陣列為基礎,有效地減少外部資料存取的次數與接腳的數目,且能達到 高處理量,此外,相同的晶片能針對不同的區塊大小與搜尋範圍連接在一 起,而還能充分利用到資料重覆性。接著又針對三步驟階層搜尋演算法, 提出一個新的9-細胞具有資料環的陣列架構,由於有效率的資料環和記憶 體組織,規則的raster-scan 資料流和樹狀比較器結構能被應用去簡化內 部 輸入/輸出控制結構和減少延遲時間除此之外,減少外部記憶體存取, 和記憶體模組與處理單元之間的連線所使用的技術能被使用。另外,我們 也延伸資料環的概念到全搜尋演算法上,提出可比例(scalable)架構, 我們可以依據不同的視訊應用所要求的效能和各種演算法參數,如區塊大 小、搜尋範圍,畫面大小等去決定處理器單元的數目,以減少成本。我們 的結果顯示這些移動估測架構在視訊應用上,是一個具有低延遲時間、低 記憶體頻寬、低價格和高效能的架構。 In this dissertation, three efficient architectures are presented for motion estimation in video compression systems. First, a data-interlacing architecture with two-dimensional (2-D)data-reuse for full-search block-matching algorithm is proposed.Based on a one-dimensional processing element (PE) array and two data-interlacing shift-register arrays,the proposed architecture can efficiently reuse data to decreaseexternal memory accesses and save the pin counts. It also achieves 100% hardware utilizationand a high throughput rate.In addition, the same chips can be cascaded for different block sizes, search ranges, and pixel rates.Second, we proposed an efficient 9-cells array architecture with data-ringsfor the 3-step hierarchical search block-matching algorithm. With the efficient data-rings and memory organization, the regular raster-scanned data flow and comparator-treestructure canbe used to simplify control scheme and reduce latency, respectively. In addition, we utilize a three-half-search-area scheme to reduce external memory access and interconnection.It also provides a high normalized throughput solution for the 3SHS.Finally, a high-throughput scalable architecture for full-search block- matching algorithm (FSBMA) is proposed.The number of processing elements (PEs) isscalable according to the variable algorithm parameters andthe performance required for different video compression applications.By use of the efficient PE-rings and the intelligentmemory-interleaving organization,the efficiency of the architecture can be increased.Techniques for reducing interconnections and external memory accesses are also presented. Our results demonstrate that thesearchitectures are flexible and high-performance solution for video applications.
|