跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/04 04:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:盧能彬
研究生(外文):Neng-Pin Lu
論文名稱:超純量多重處理中推測性記憶體存取技術之研究
論文名稱(外文):Speculative Memory Accesses in Superscalar Multiprocessing
指導教授:鍾崇斌
指導教授(外文):Chung-Ping Chung
學位類別:博士
校院名稱:國立交通大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2000
畢業學年度:88
語文別:英文
論文頁數:96
中文關鍵詞:超純量處理多重處理推測性取出推測性存入記憶體一致模式快取記憶體一致性規約
外文關鍵詞:superscalar processingmultiprocessingspeculative loadspeculative storememory consistency modelcache coherence protocol
相關次數:
  • 被引用被引用:0
  • 點閱點閱:234
  • 評分評分:
  • 下載下載:11
  • 收藏至我的研究室書目清單書目收藏:0
超純量多重處理系統可以同時開發程式中的粗顆粒及細顆粒平行度來提昇整體系統效能。然而日益擴大的處理器與記憶體速度差將漸漸抵消超純量多重處理系統中平行度的開發。為解決記憶體延遲問題,雖然已有許多方法被提出。諸如:多線處理架構、放寬的記憶體一致模式、及資料預取等等。不過另外一方面在超純量處理中,推測性執行除了可以開發指令平行度外,也可以用來隱藏記憶體延遲。然而在超純量多重處理系統中,為保證程式執行的正確性,超純量處理的動態排程必須遵行特定的記憶體一致模式,因此指令平行度開發及記憶體延遲隱藏都會受到限制。為打破記憶體一致模式的制約,推測性記憶體存取是個相當重要的技術。
本論文在於探討超純量多重處理中推測性記憶體存取技術之設計。首先為能精確地模擬超純量多重處理系統,我們發展了一個超純量多重處理模擬器,此模擬器稱為 smint。藉由此模擬器,我們探討了超純量及多重處理系統中平行度的開發。從模擬結果我們發現藉由中度的超純量處理和高度的多重處理便可完全開發程式中的平行度。接著以此平行度的衡量結果,我探討推測性記憶體存取技術的設計。推測性取出可以隱藏部份的記憶體存取延遲,然而過早的推測性取出易造成錯誤。因此,我們提出了推測性存入技術。由於存入動作會覆蓋記憶體資料,為使推測性存入可回復,我們設計了可回復的推測性存入快取記憶體,同時我們也設計了一個支援前視資料傳輸的快取記憶體一致性規約來實現處理機間的推測性通訊。
藉由此論文的研究,我們發展了一個超純量多重處理模擬器,探討了超存量多重處理系統中平行度的開發,並提出了推測性存入技術。期望此研究結果能對超純量多重處理系統的發展有所助益。
Superscalar multiprocessors can exploit both coarse-grained and fine-grained parallelism in programs. But the continuing widening gap between processor and memory speeds can quickly offset any performance gains expected from the parallelism exploitation of superscalar processing and multiprocessing. To solve the memory latency problem, there have been several mechanisms, such as multithreading, relaxed memory consistency, and data prefetching, proposed. On the other hand, in superscalar processing, speculative execution not only exploits instruction-level parallelism but also hides memory latencies. However, to guarantee correct program execution, the dynamic scheduling capabilities of superscalar processors should be restrict to the access constraints under certain memory consistency models in the superscalar multiprocessor system. As a result, parallelism exploitation and latency hiding may be severe constrained. To break the constraints, speculative memory accesses are vital in superscalar multiprocessing.
In this dissertation, we investigate speculative memory access techniques in superscalar multiprocessing. First, in order to enable accurate simulation of multiprocessing systems, we developed a simulator, called SMINT, for superscalar multiprocessing systems. With this simulator, we examined the exploitable parallelism in superscalar processing and multiprocessing. We found that the parallelism in programs can best be exploited with a moderate degree of superscalar processing and a high degree of multiprocessing. Based on the measurement of parallelism, we considered the load/store unit design for superscalar processors to support speculative memory access techniques. Finally, we investigated the memory design issues for speculative memory accesses in superscalar multiprocessing. Previous research shown speculation of load instructions may hide memory access latencies, but early speculation is likely to fail. To increase the success rate of speculative loads, speculation of store instructions is beneficial. While store instructions are destructive to the memory system. To support speculative store, we use speculative write cache and design a new cache coherence protocol to support cache-to-cache lookahead data transfer and reduce interprocessor communication latencies.
Through the investigation of this dissertation, we built a superscalar multiprocessor simulator, investigated the parallelism exploitation in superscalar mulitprocessing, and developed the speculative store technique. We hope this research can contribute to the exploitation of parallelism in multiprocessing systems.
COVER
Abstract in Chinese
Abstract
Acknowledgement
Contents
List of Ligures
List of Tables
1. Introduction
2. Related Research of Memory Consistency Models
2.1 Memory Consistency Models
2.1.1 Sequential Consistency
2.1.2 Processor Consistency
2.1.3 Weak Consistency
2.1.4 Release Consistency
2.1.5 Summary of Access Constraints
2.2 Techniques to Hiding Consistency Latency
2.2.1 Hardware Prefetching
2.2.2 Speculative Load
2.2.3 Speculative Retirement
2.2.4 Aggressive Speculation of Sequential Consistency
2.3 Speculative Execution of Store Instructions
3. Simulation Environment
3.1 Superscalar Multiprocessor Simulator--SMINT
3.1.1 MINT--A RISC Multiprocessor Simulator
3.1.2 SuperscalarProcessingCore
3.2 Benchmark Programs
3.2.1 FFT
3.2.2 LU
3.2.3 Ocean
3.2.4 Radix
3.3 Instruction-Level Parallelism of the Benchmarks
3.3.1 Inherent Instruction-Level Parallelism
3.3.2 Branch Prediction Affecting Instruction-Level Parallelism
3.3.3 Reorder Buffer Entries Affecting Instruction-Level Parallelism
3.3.4 Speculation Depth Affecting Instruction-Level Parallelism
3.3.5 Load/Store Port Numbers Affecting Instruction-Level Parallelism
3.4 Summary
4. Parallelism Exploitation in Superscalar Multiprocessing
4.1 Parallelism of Multiprocessing
4.2 Parallelism of Superscalar Multiprocessing
4.2.1 Multiprocessing Speedup
4.2.2 Superscalar Processing Speedup
4.2.3 Overall Processing Speedup
4.3 Discussion
4.4 Summary
5. Ordering and Dependencies of Speculative Memory Accesses
5.1 Load/Store Ordering of Superscalar Processors and Access Constraints of Mem ory Consistency Mode ls
5.2 Ordering of Speculative Memory Accesses Within Local Processors
5.2.1 Speculation of Load Instructions
5.2.2 Speculation of Store Instructions
5.3 Dependencies of Speculative Memory Accesses Between Processors
5.3.1 ReadAfterRead
5.3.2 Read After Write
5.3.3 Write After Read
5.3.4 Write After Write
5.4 Discussion
6. Speculative Memory Accesses Within Local Processors
6.1 Performance of Memory Consistency Models
6.1.1 SimulationParameters
6.1.2 Simulation Results
6.2 Hiding Memory Consistency Latencies
6.2.1 Hardware Prefetching
6.2.2 Speculative Execution of Load Instructions
6.2.3 SimulationResults
6.3 Summary
7. Speculative Memory Accesses Between Processors
7.1 Recoverable Memory System that Supports Speculative Loads and Stores
7.1.1 Data Cache
7.1.2 Speculative Write Cache
7.1.3 Load/Store Buffer
7.1.3.1 Speculative Execution of Store Instructions
7.1.3.2 Speculative Execution of Load Instructions
7.1.4 Cache Coherence Protocol
7.2 Performance Evaluation
7.3 Summary
8. Conclusion and Future Work
References
OTHERS
[1]S.V. Adve, V.S. Pai, and P. Ranganathan, "Recent Advances in Memory Consistency Models for Hardware Shared-Memory Systems," Proceedings of the IEEE, Vol. 89, No. 3, March 1999, pp. 445-455.
[2]A. Agarwal, "Performance Tradeoffs in Multithreaded Processors," IEEE Transactions on Parallel and Distributed Systems, Vol. 3, No. 5, September 1992, pp. 525-539.
[3]D. Alpert and D. Avnon, "Architecture of the Pentium Microprocessor," IEEE Micro, Vol. 13, No. 3, June 1993, pp. 11-21.
[4]T. Asprey, G.S. Averill, E. DeLano, R. Mason, B. Weiner and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, Vol. 13, No. 3, June 1993, pp. 22-35.
[5]D.H. Bailey, "FFT''s in External or Hierarchical Memory," Journal of Supercomputing, Vol. 4, No. 10, March 1990, pp. 23-35.
[6]G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S.J. Smith, and M. Zagha, "A Comparison of Sorting Algorithms for the Connection Machine CM-2," Proceedings of the 3rd Annual Symposium on Parallel Algorithms and Architectures, July 1991, pp. 3-16.
[7]M.C. Becker et al., "The PowerPC 601 Microprocessor," IEEE Micro, Vol. 13, No. 4, October 1993, pp. 54-67.
[8]A. Brandt, "Multi-Level Adaptive Solutions to Boundary-Value Problems," Mathematics of Computation, Vol. 31, No. 138, 1977, pp. 333-390.
[9]J. Boyle, R. Butler, T. Disz, B. Blickfeld, E. Lusk, and R. Overbeek, Portable Programs for Parallel Processors. Holt, Rinehart and Winston, 1987.
[10]B. Burgess et al., "The PowerPC 603 Microprocessor," Communications of the ACM, Vol. 37, No. 6, June 1994, pp. 34-42.
[11]H. Burkhardt, et al., Overview of the KSR-1 Computer System. Tech. Report KSR-TR-9202001, Kendall Square Research. February 1992.
[12]B. Catanzaro, Multiprocessor System Architectures: A Technical Survery of Multiprocessor/Multithreaded Systems Using SPARC, Multi-level Bus Architectures and Solaris (SunOS), Mountain View, C.A., Sun Microsystems, 1997.
[13]M. Cekleov, et al., "SPARCcenter 2000: Multiprocessing for the 90''s!" Proceedings of the Compcon Spring 93, February 1993, pp. 345-353.
[14]T.-F. Chen and J.-L. Baer: "A Performance Study of Software and Hardware Data Prefetching Schemes," Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994, pp. 223-232.
[15]Convex, CONVEX Exemplar Architecture, Prentice-Hall, Englewood Cliffs, N.J., 1992.
[16]Cray Research, Cray Superserver CS6400 Product, Brochure, 1993.
[17]Cray Research, Cray T3D Technical Summary, October 1993.
[18]Cray Research, Cray T3E Information, 1997.
[19]K. Diefendorff and M. Allen, "Organization of the Motorola 88110 Superscalar RISC Microprocessor," IEEE Micro, Vol. 12, No. 2, April 1992, pp. 40-63.
[20]M. Dubois and C. Scheurich, "Memory Access Dependencies in Shared-Mrmory Multiprocessors," IEEE Tranactions on Software Engineering, Vol. 16, No. 6, June 1990, pp. 660-673.
[21]J.H. Edmonson et al., "Superscalar Instruction Execution in the 21164 Alpha Microprocessor," IEEE Micro, Vol. 15, No. 2, April 1995, pp. 33-43.
[22]S. Fortune and J. Wyllie, "Parallelism in Random Access Machines," Proceedings of the Tenth ACM Symposium on Theory of Computing, May 1978, pp.114-118.
[23]K. Gharachorloo et al., "Memory Consistency and Event Ordering in Scalable Shared-memory Multiprocessors," Proceedings of the 17th Annual International Symposium on Computer Architecture, May 1990, pp. 15-26.
[24]K. Gharachorloo and P. Gibbons, "Detecting Violations of Sequential Consistency," Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architecture, July 21-24, 1991, pp. 316-326.
[25]K. Gharachorloo, A. Gupta, and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models," Proceedings of the 1991 International Conference on Parallel Processing, 1991, Vol. I, pp. 355-364.
[26]K. Gniady, B. Falsafi, T. Vijaykumar, "Is SC + ILP = RC?" Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999, pp. 162-171.
[27]J.R. Goodman, Cache Consistency and Sequential Consistency, Technical Report No. 61, SCI Committee, March 1989.
[28]J.L. Hennessy and N.P. Jouppi, "Computer Technology and Architecture: An Evolving Intercation," IEEE Computer, Vol. 24, No. 9, September 1991, pp. 18-29.
[29]M.D. Hill, "Multiprocessors Should Support Simple Memory Consistency Models," IEEE Computer, Vol. 18, No. 8, August 1998, pp. 28-34.
[30]T. Horel and G. Lauterbach, "UltraSPARC-III: Designing Third-Generation 64-bit Performance," IEEE Micro , Vol. 19, No. 3, May/June 1999, pp. 73-85.
[31]D. Hunt, Advanced Features of the 64-Bit PA-8000, Palo Alto, C.A., Hewlett Packard Corp, 1996.
[32]R.A. Iannucci, G.R. Gao, R.H. Halstead Jr., and B. Smith. Multithreaded Computer: A Summary of the State of the Art, Kluwer Academic Publishers, 1994.
[33]Intel Corp., Material for Stand High Volume (SHV) servers can be found on Intel''s Web site: http://developer.intel.com/update/archive/issue5/feature.htm
[34]M. Johnson, Superscalar Microprocessor Design, Prentice Hall, Inc., 1991.
[35]D.R. Kaeli and P.G. Emma, "Branch History Table Prediction of Moving Target Branches due to Subroutine Returns," Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991, pp. 34-41.
[36]R.E. Kessler, "The Alpha 21264 Microprocessor," IEEE Micro, Vol. 19, No. 2, March/April 1999, pp. 24-36.
[37]D. Kroft. "Lockup-free instruction fetch/prefetch cache organization," Proceedings of the Eighth Annual International Symposium on Computer Architecture, 1981, pp. 81-87.
[38]Kendall Square Research, KSR Technical Summary, 1993.
[39]Kendall Square Research, KSR2 Product. Brochure, 1993.
[40]A. Kumar, "The HP PA-8000 RISC CPU," IEEE Micro, Vol. 17, No. 2, March/April 1997, pp. 27-32.
[41]L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, Vol. 28, No. 9, September 1979, pp. 690-691.
[42]J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997, pp. 4-18.
[43]J.K.F. Lee and A.J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," IEEE Computer, Vol. 17, No. 1, January 1984, pp. 6-22.
[44]D. Levitan, T. Thomas, and P. Tu, "The PowerPC 620 Microprocessor: a High Performance Superscalar RISC Microprocessor," Compcon ''95. Technologies for the Information Superhighway, March 1995, pp. 285-291.
[45]M.H. Lipasti, Value Locality and Speculative Execution, Ph.D. Thesis, Technical Report CMU-CSC-97-4, Department of Electrical and Computer Engineering, Carnegie Mellon University, May 1997.
[46]T.D. Lovett, R.M. Clapp, and R.J. Safranek, NUMA-Q: An SCI-Based Enterprise Server, Sequent Computer Write Paper, 1996.
[47]E. McLellan, "The Alpha AXP Architecture and 21064 Processor," IEEE Micro, Vol. 13, No. 3, June 1993, pp. 36-47.
[48]T.C. Mowry and A. Gupta, "Tolerating latency through software-controlled prefetching in shared-memory multiprocessors," Journal of Parallel and Distributed Computing, Vol. 12, No. 2, June 1991, pp. 87-106.
[49]K.B. Normoyle, M.A. Csoppenszky, A. Tzeng, T.P. Johnson, C.D. Furman, and J. Mostoufi, "UltraSPARC-IIi: Expanding the Boundaries of a System on a Chip," IEEE Micro, Vol. 18, No. 2, March/April 1998, pp. 14-24.
[50]R.R. Oehler and R.D. Groves, "IBM RISC System/6000 Processor Architecture," IBM J. Res. Develop., Vol. 34, No. 1, January 1990, pp. 23-36.
[51]K. Olukotun, L. Hammond, and M. Willey, "Improving the Performance of Speculatively Parallel Applications on the Hydra CMP," Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999, pp. 21-30.
[52]S. Onder and R. Gupta, "Dynamic Memory Disambiguation in the Presence of Out-of-Order Store issuing," Proceedings of the 32nd Annual ACM/IEEE international symposium on Microarchitecture, November 16-18, 1999, Haifa Israel, pp. 170-176.
[53]V.S. Pai, P. Ranganathan, S.V. Adve, and T. Harton, "An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors," Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996, pp. 12-23.
[54]V.S. Pai, P. Ranganathan, H. Abdel-Shafi, and S.V. Adve, "The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors," IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 218-226.
[55]M. Papamarcos and J. Patel, "A Low Overhead Cache Coherence Solution for Multiprocessors with Private Cache Memories," Proceedings of the 11th Annual International Symposium on Computer Architecture, 1984, pp. 348-354.
[56]D.B. Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, Vol. 16, No. 2, April 1996, pp. 8-15.
[57]P. Ranganathan, V.S. Pai, and S.V. Adve, "Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap Between Memory Consistency Models," Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, June 1997, pp. 199-210.
[58]J.P. Singh, W.-D. Weber, and A. Gupta, "SPLASH: Stanford Parallel Applications for Shared Memory," Computer Architecture News, Vol. 20, No. 1, March 1992, pp. 5-44.
[59]S.P. Song, M. Denman, and J. Chang, "The PowerPC 604 RISC Microprocessor," IEEE Micro, Vol. 14, No. 5, October 1994, pp. 8-17.
[60]J.G. Steffan and T.C. Mowry, "The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization," Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, February 2-4, 1998, Las Vegas, Nevada, pp. 2-13.
[61]Sun, The SuperSPARC Microprocessor, Technical White Paper, Sun Microsystems, Mountain View, C.A., May 1992.
[62]Sun, The Ultra Enterprise 10000 Server, Technical White Paper, Sun Microsystems, Mountain View, C.A., 1997.
[63]M. Tremblay and J.M. O''Connor, "UltraSPARC I: A Four-Issue Processor Supporting Multimedia," IEEE Micro, Vol. 16, No. 2, April 1996, pp. 42-50.
[64]J.E. Veenstra and R.J. Fowler, MINT Tutorial and User Manual, Technical Report 452, University of Rochester, June 1993.
[65]S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, "The SPLASH-2 Programs: Characterization and Methodological Considerations," in Proceeding of the 22nd Annual International Symposium on Computer Architecture, pp. 24-36, June 1995.
[66]K.C. Yeager, "The Mips R10000 Superscalar Microprocessor," IEEE Micro, Vol. 16, No. 2, April 1996, pp. 28-41.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top