|
Nowadays, there are more and more multimedia applications in the market, such as video - on - demand, video conference, virtual reality, and 3D games, etc. However, almost these multimedia applications are supported by dedicated hardware. In order that these applications can be processed by the general purpose microprocessor efficiently, we design multimedia instructions and data types which are compatible with Intel MMX. Moreover, we design a multimedia functional unit responsible for executing these multimedia instructions in the general purpose microprocessor. In this thesis, we present the design of ALU, multiplier, and shifter, and shifter of the multimedia functional unit. These units adopt the SIMD technique (Single Instruction Multiple Data), which makes many multimedia data be processed in parallel to significantly increase the performance of multimedia applications. The ALU executes arithmetic and logic operations. Te arithmetic operations include packed addition and subtraction with optional saturation, comparison of two operands, and two instructions PAVGW and PDISTW which perform the average of operands and executes the absolute differences respectively. The multiplier unit performs packed multiplication and packed multiply - add operations. The shifter unit performs logical left, right, and arithmetic right shift of packed operands. Because the requirements and operations of these functional units are different from those of conventional functional units, we propose some novel methods to improve conventinal designs. The adder in ALU is realized with the Conditional Carry Selection (CCS) approach. Moreover, the architecture of conventional CSS adder is modified to become a SIMD adder. Besides, the saturation and wrap around, comparison, average, and absolution subunits are also added into the ALU. The multiplier is also a SIMD functional unit. Wallace tree is adopted to design the 16 ×16 multiplier and an additional 4 - 2 adder lies between two 16 ×16 multipliers to perform the task of accumulation to realize the multiply - add instruction. Besides, the adder which performs the addition of two operands from the Wallace tree is designed by adopting the technique of CSS, too. As a result, the multiplier is organized into a two - stage balance pipeline whose performance and area are better than a conventional multiplier with an adder for accumulation. The shifter is a barrel shifter. Because conventional shifter is not suitable for SIMD operation, adding multiplexers to the conventional shifter makes it have capability to shift operands of different data types. In the process of developing MFU, 0.6μm cell library is used. From the synthesized and simulation result. MFU indeed greatly increases the performance of multimedia applications, while the circuit area of the developed functional units is almost the same as the conventional functional units without SIMD structure. In addition, since the maximum delay among these functional units is smaller than 10 ns, the maximum clock rate of MFU is higher than 100MHz. Eventually, all multimedia applications will speed by this novel MFU.
|