跳到主要內容

臺灣博碩士論文加值系統

(23.20.20.52) 您好!臺灣時間:2022/01/24 18:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:唐立人
研究生(外文):Lee-Ren Ton
論文名稱:具有非連續摺疊能力之爪哇堆疊運算摺疊
論文名稱(外文):Java Stack Operations Folding with Discontinuous Folding Capability
指導教授:鍾崇斌
指導教授(外文):Chung-Ping Chung
學位類別:博士
校院名稱:國立交通大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:115
中文關鍵詞:爪哇虛擬機器爪哇處理器堆疊運算摺疊POC摺疊模型堆疊運算堆疊
外文關鍵詞:Java Virtual MachineJava ProcessorStack Operations FoldingPOC Folding ModelStack Operation Stack
相關次數:
  • 被引用被引用:0
  • 點閱點閱:155
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:1
近年來,爪哇 (Java) 成為網際網路 (Internet) 上使用最為廣泛的語言之一。因為爪哇的執行平台 — 爪哇虛擬機器 (Java Virtual Machine; JVM) 為一基於堆疊運算的結構,其機器碼 — 位元組碼 (Bytecode) 之執行效能將受限於存取堆疊資料時的真實資料相依性 (True Data Dependency)。在本研究論文中,我們將探討在單一功能單元 (Functional Unit) 的爪哇處理器中各種不同的堆疊運算摺疊 (Stack Operation Folding) 技術以加強爪哇虛擬機器的執行效能。這些堆疊運算摺疊技術包含:
一、 基於樣板 (Template-based) 的連續位元組碼堆疊運算摺疊。
二、 基於法則 (Rule-based) 的連續位元組碼堆疊運算摺疊。
三、 基於法則的全功能位元組碼堆疊運算摺疊。
每個Java的指令將被分類成生產者 (Producer)、運算者 (Operator) 以及消費者 (Consumer) 三類 (POC) 中的一類以進行摺疊之可能性分析。我們首先提出第一個摺疊技術 — POC-樣板連續 (POC-TC),以計算被摺疊堆疊運算樣本的出現百分比來決定是否將該樣本建入Java處理器之指令解碼器 (Instruction Decoder) 之中。依據樣板大小,我們提出三種不同的堆疊運算摺疊策略:雙摺疊、三摺疊與四摺疊,以滿足不同的效能與成本考量。統計資料顯示這三種不同的堆疊運算摺疊策略分別可以消除67%、78%與79%的堆疊運算存取指令,而相對於未使用摺疊技術的堆疊機器來說,其整體的程式執行分別達到1.42、1.52與1.52的加速比。
在第二個摺疊技術中,我們在連續的爪哇位元組碼流 (Bytecode Stream) 中推導出堆疊運算摺疊之法則,稱之為POC-法則連續 (POC-RC) 摺疊模型。其基本觀念為檢查連續的兩個堆疊運算指令N與N+1的POC類別、運算元來源與目的、資料類別與寬度等資訊以判斷它們是否可以摺疊,若它們可被摺疊在一起則會產生一個被摺疊的位元組碼指令 (Folded Bytecode Instruction; FBI),此FBI亦具有由前兩個指令N與N+1所合成的POC類別資訊,並再度成為新的指令N’以繼續和後續之指令N’+1以同一個POC-法則連續摺疊模型檢查進一步摺疊之可能,此步驟將重複到POC-法則連續摺疊模型送出一個結束摺疊檢查的狀態為止。統計資料顯示有69%、82%與83%的堆疊運算存取指令分別被雙摺疊、三摺疊與四摺疊的摺疊策略所消除,而相對於未使用摺疊技術的堆疊機器來說,其整體的程式執行分別達到1.42、1.53與1.54的加速比。
更進一步的堆疊運算摺疊效能可由第三個摺疊技術 — POC-法則全功能 (POC-RA) 摺疊模型來提昇。在此摺疊模型中,在POC-法則連續摺疊模型中被序列執行的非連續位元組碼將透過新設計的硬體結構 — 堆疊運算堆疊 (Stack Operation Stack; SOS) 來進一步完成摺疊。在具有堆疊運算堆疊的爪哇處理器中,一連串解碼過的位元組碼將在被發派 (Issue) 執行前被依序 (Sequentially) 登錄 (Log) 到堆疊運算堆疊中,而指令解碼器中的堆疊運算摺疊單元將同時檢查堆疊運算堆疊與指令緩衝器 (Instruction Buffer) 中指令的POC類別以判斷是否可以進行摺疊。由於有堆疊運算堆疊的結構,對於那些非連續摺疊的堆疊運算可延長它們在堆疊運算摺疊單元中摺疊窗 (Folding Window) 的時間,提供了多次的摺疊機會以達更高的執行效能。統計資料顯示在指令緩衝區大小為7位元組以及堆疊運算堆疊的項目 (Entry) 個數為8時,分別有91%、97%與99%的堆疊運算存取指令被雙摺疊、三摺疊與四摺疊的摺疊策略所消除,而相對於未使用摺疊技術的堆疊機器來說,其整體的程式執行分別達到1.71、1.73與1.74的加速比。
在介紹過各種堆疊運算摺疊技術之後,我們可依據不同的效能與成本的需求來設計單一管線爪哇處理器的堆疊運算摺疊單元俾以開發更高的每週期指令數 (Instructions per Cycle; IPC) 的執行效能。在論文最後我們亦討論在更高效能的多管線爪哇處理器中堆疊運算摺疊技術的效能提昇應用。目前本研究為世界之首,無論在方法及架構上均有最佳之貢獻,不但獲得美國及中華民國專利,且被學術界其他研究所參考,如Austin Kim之Advanced POC摺疊模型即是參考本研究之POC摺疊模型而來,此外,部分指令摺疊之觀念及原則將被工研院電腦與通訊工業研究所之下一個爪哇處理機設計所引用,我們希望在此論文中的成果能進一步對未來其它高效能與低成本的爪哇處理器提供設計的參考。

In recent years, Java becomes the most popular language used over Internet. With the stack-based architecture of Java Virtual Machine (JVM), the execution performance is limited by the true data dependencies caused by stack accesses. In this dissertation, various stack operations folding techniques for Java processors are studied to enhance the execution performance of the JVM. The stack operations folding techniques include:
i) template-based folding for continuous bytecodes,
ii) rule-based folding for continuous bytecodes, and
iii) rule-based folding for all-purpose bytecodes.
Each bytecode is mapped to one of the Producer, Operator, and Consumer (POC) types for folding analysis. In the first folding technique — the template-based folding for continuous bytecodes (POC-TC), we count the percentages of folded bytecode patterns to determine the foldable templates built in the instruction decoder. With different template sizes, there are three folding strategies: 2-foldable, 3-foldable, and 4-foldable, available for various performance/cost issues. Statistical data shows that 67%, 78% and 79% of stack operations can be eliminated by 2-, 3-, and 4-foldable strategies, respectively. Furthermore, each strategy has an overall program speedup of 1.42, 1.52 and 1.52, respectively, as compared to a traditional stack machine without folding. Considering the design tradeoffs between instruction decoder width and the folding performance, a cost-effective folding unit is recommended.
In the second folding technique, we derive the folding rules for continuous bytecode stream called a POC-RC folding model. The basic concept of POC-RC folding model is that it checks the bytecode I and I + 1 to see whether they are foldable or not (based on the POC type, operand sources, operand destination, data type and width). If they are foldable, a Folded Bytecode Instruction (FBI) is generated and becomes the new bytecode I to check the further foldability with the following bytecode I + 1. This process repeats until an ending case of the POC-RC folding states is encountered. Statistical data shows that 69%, 82% and 83% of stack operations can be eliminated by 2-, 3-, and 4-foldable strategies, respectively. Furthermore, each strategy has an overall program speedup of 1.42, 1.53 and 1.54, respectively, as compared to a traditional stack machine without folding. Considering the design tradeoffs between instruction decoder width and the folding performance, a cost-effective POC-RC folding unit is recommended.
Further folding performance can be achieved by the third folding technique called POC-RA folding model. In this model, discontinuous bytecodes that are issued in series are further folded with the proposed SOS (Stack Operation Stack). With the SOS structure, all stack operations are logged into the SOS before being issued sequentially. With an instruction buffer size of 7 bytes and the SOS size of 8 entries, statistical data shows that 91%, 97% and 99% of stack operations can be eliminated by 2-, 3-, and 4-foldable strategies, respectively. Furthermore, each strategy has an overall program speedup of 1.71, 1.73 and 1.74, respectively, as compared to a traditional stack machine without folding. Considering the design tradeoffs between instruction decoder width and the folding performance, a cost-effective POC-RA folding unit is recommended.
Having dealt with the stack operations folding techniques discussed in this dissertation, a cost-effective folding unit to exploit more IPC (Instructions per Cycle) for a high performance single-pipelined Java processor can be built. We hope the efforts in this research can contribute to the design of future cost-effective high performance Java processors.

Chapter 1 Introduction
1.1 Background and Motivation
1.2 Dissertation Objectives and Research Procedure
1.3 Dissertation Organization
Chapter 2 Current Stack Operations Folding Techniques
2.1 Classification of Various Stack Operations Folding Techniques
2.2 Template-based Folding for Continuous Bytecodes
2.3 Rule-based Folding for Continuous Bytecodes
2.4 Template-based Folding for All Bytecodes
2.5 Rule-based Folding for All Bytecodes
Chapter 3 Continuous Bytecodes Folding: Template-based
3.1 Bytecode Classifications and Formulas for Folding Performance Evaluation
3.2 POC-TC Folding Strategies
3.3 Performance Evaluation and Analysis
3.3.1 Simulation Environment
3.3.2 Simulation Results
3.4 Chapter Summary and Discussion
Chapter 4 Continuous Bytecodes Folding: Rule-based
4.1 POC-RC Folding Model
4.2 Constraints in Continuous Bytecodes Folding
4.3 Performance Evaluation and Analysis
4.4 Chapter Summary and Discussion
Chapter 5 All-purpose Bytecodes Folding: Rule-based
5.1 Observing the POC-based Stack Access Behavior
5.2 A Folding Example for Comparison
5.3 Achieving 100% Folding Coverage
5.4 Performance Comparison of Various Folding Models
5.5 Chapter Summary and Conclusion
Chapter 6 Conclusion
6.1 Research Summary
6.2 Discussions and Conclusions
References

[Case 1996] Brian Case, "Implementing the Java Virtual Machine," Microprocessor Report, pp. 12-17, March 1996.
[Chang 1998a] L. C. Chang, L. R. Ton, M. F. Kao and C. P. Chung, "Stack Operations Folding in Java Processors," Proceedings of IEE Computers and Digital Techniques, vol. 145, no. 5, pp. 333-340, September 1998.
[Chang 1998b] L. C. Chang, L. R. Ton, M. F. Kao and C. P. Chung, "A Method of Instruction Folding and Its Implementation," CASES'98 - Workshop on Compiler and Architecture Support for Embedded Systems, December 1998.
[Chang 2000] Chang. L. C., Ton L. R., Kao M. F., and Chung C. P., 2000. Enhancing Java Processor Performance with Smart Dynamic Folding, Journal of the Chinese Institute of Engineers, vol. 23, no. 6, 711-719.
[Chen 1998] C. C. Chen, L. R. Ton, L. C. Chang, M. F. Kao and C. P. Chung,"ILP Analysis for Java Bytecodes with Extensive Folding," CASES98 - Workshop on Compiler and Architecture Support for Embedded Systems, December 1998.
[Cramer 1997] T. Cramer, R. Friedman, T. Miller, D. Seberger, R. Wilson and M. Wolczko, "Compiling Java Just-In-Time," IEEE Micro, vol. 17, no. 3, pp. 36-43, May/June 1997.
[El-Kharashi 2000a] M. W. El-Kharashi, F. ElGuibaly and K. F. Li, A Quantitative Study for Java Microprocessor Architectural Requirements. Part I: Instruction Set Design, Microprocessors and Microsystems, Vol. 24 (2000) 225-236.
[El-Kharashi 2000b] M. W. El-Kharashi, F. ElGuibaly and K. F. Li, A Quantitative Study for Java Microprocessor Architectural Requirements. Part II: High-Level Language Support, Microprocessors and Microsystems, Vol. 24 (2000) 237-250.
[Gosling 1996] J. Gosling, B. Joy and G. Steele, editors, The Java Language Specification, Addison-Wesley Publishing Company, Inc., September 1996.
[Hsieh 1996] C.-H. A. Hsieh, J. C. Gyllenhaal and W. Mei. Hwu, "Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results," Proceedings of the 29th Annual International Symposium on Microarchitecture (MICRO-29), pp. 90-97, December 1996.
[JDK] Sun JDK 1.1.8, http://java.sun.com/products/jdk/1.1/.
[Kemal 1997] E. Kemal, A. Erik and H. Erdem, "A Java ILP Machine Based on Fast Dynamic Compilation," MASCOTS'97 - International Workshop on Security and Efficiency Aspects of Java, 1997.
[Kim 2000a] Austin Kim and Morris Chang, An Advanced Instruction Folding Mechanism for a Stackless Java Processor, Proceeding of the International Conference on Computer Design (ICCD), (September 2000) 565-566.
[Kim 2000b] Austin Kim and Morris Chang, Advanced POC Model-based Java Instruction Folding Mechanism, Proceedings of the 26th EUROMICRO Conference, Vol. 1 (September 2000) 332-338.
[Krall 1997] A. Krall and R. Grafl, "CACAO - A 64 bit JavaVM Just-In-Time Compiler," Concurrency: Practice and Experience, vol. 9, no. 11, pp. 1017-1030, 1997.
[Krall 1998] A. Krall, "Efficient JavaVM Just-In-Time Compilation," Proceedings of International Conference on Parallel Architectures and Compilation Techniques, pp. 205-212, 1998.
[Lentczner 1996] Mark Lentczner, "Java‘s Virtual World," Microprocessor Report, pp. 8-11, March 1996.
[Lindholm 1996] Tim Lindholm and Frank Yellin, The Java Virtual Machine Specification, Addison-Wesley Publishing Company, Inc., September 1996.
[McGhan 1998] H. McGhan and M. O’Connor, picoJava: A Direct Execution Engine for Java Bytecode, IEEE Computer, (1998) 22-30.
[Muller 1996] G. Muller, B. Moura, F. Bellard and C. Consel. "JIT vs. Offline Compilers: Limits and Benefits of Bytecode Compilation," Technical Report 1063, IRISA, France, http://www.irisa.fr/, December 1996.
[NMI] NMI’s Java Bytecode Viewer 5.0, http://njcv.htmlplanet.com/.
[O’Connor 1997] J. Michael O’Connor and Marc Tremblay, "PicoJava-I: The Java Virtual Machine in Hardware," IEEE Micro, pp. 45-53, March/April 1997.
[SPEC 1998] SPEC Corporation, SPEC JVM98, http://www.spec.org/osg/jvm98/
[SUN 1996] Sun Microelectronics, "picoJava-I Microprocessor Core Architecture," http://www.sun.com/sparc/whitepapers/wpr-0014-01/index.html, October 1996.
[SUN 1997] Sun Microsystems, "Sun Unveils Its First Java Processor microJava 701 Looks to Post Industry's Highest Caffeinemarks," http://www.sun.com/smi/Press/ sunflash/9710/sunflash.971015.1.html, October 1997.
[SUN 1998] Sun Microsystems Inc., microJavaÔ-701 User Guide, January 1998.
[SUN 1999a] Sun Microsystems Inc., picoJava-IIÔ Microarchitecture Guide, March 1999.
[SUN 1999b] Sun Microsystems Inc., picoJava-IIÔ Programmer’s Reference Manual, March 1999
[Ton 1997] Lee-Ren Ton, et al., “Instruction Folding in Java Processor,” Proceedings of the International Conference on Parallel and Distributed Systems, December 1997, pp. 238-143.
[Ton 2000] Lee-Ren Ton, Lung-Chung Chang and Chung-Ping Chung, Exploiting Java Bytecode Parallelism by Dynamic Folding Model, Proceedings of the 6th International Euro-Par Parallel Processing Conference, Lecture Notes in Computer Science, Vol. 1900 (August 2000) 994-997.
[Tseng 1997] Han-Min Tseng, et al., “Performance Enhancement by Folding Strategies of a Java Processor,” Proceedings of the International Conference on Computer System Technology for Industrial Applications — Internet and Multimedia, April 1997, pp. 286-293.
[Turley 1996] Jim Turley, "Sun Reveals First Java Processor Core," Microprocessor Report, vol. 10, no. 14, October 1996.
[Venners 1998] Venners B., 1998. Inside the Java Virtual Machine, McGraw Hill, 1st Edition.
[Vijaykrishnan 1998] N. Vijaykrishnan, N. Ranganathan and R. Gadekarla, Object-Oriented Architectural Support for a Java Processor, Proceedings of the ECOOP’98, Lecture Notes in Computer Science, Vol. 1445 (July 1998) 330-354.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top