跳到主要內容

臺灣博碩士論文加值系統

(98.80.143.34) 您好!臺灣時間:2024/10/14 00:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王紹仲
研究生(外文):Wang, Shao-Chumg
論文名稱:異質多核心上之程式設計模型評估與設計
論文名稱(外文):Evaluation and Design of Programming Models for Heterogeneous Multi-Core Systems
指導教授:李政崑
指導教授(外文):Lee, Jenq Kuen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:36
中文關鍵詞:異質多核心程式設計模型遠端程序呼叫遠端串流程序呼叫軟體快取立體視覺
外文關鍵詞:multi-coreheterogeneousprogramming modelremote procedure callstreamingsoftware cachestereo visionbelief propagation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:204
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在異質多核心架構撰寫平行程式,需要面對許多的問題,像是 MPU 與 DSP 的不同指令集
、不一致的記憶體架構、MPU 有作業系統所以看到的是虛擬記憶體而 DSP 看到的是實體記憶體。
在此環境下撰寫程式需要工程師許多努力,才能達到好的效能,所以我們提出一個異質多核心
程式設計模型,Multicore Software API(MSA) 與 Software Cache API,來幫助程式設計師撰寫平行程式。

MSA 是一個中間層隱藏了底層的硬體資訊,MSA包括了3 個模組,遠端程序呼叫模組、 訊息傳遞模組
與串流模組。遠端程序呼叫模組提供了應用程式介面可以讓使用者卸載程式至 DSP,同時也可以透過函式名稱呼叫在
DSP 上的程序起來執行,而訊息傳遞與串流模組,則是提供了非串流與串流的資料傳輸方式,
來讓程序彼此之間建立溝通管道交換資料。

在嵌入式系統中,大量的存取外部記憶體會帶來效能的降低,
Software Cache API 是被設計來處理愈來愈複雜的記憶體層級,
它提供了應用程式介面來幫助程式設計師處理資料在外部與內部記憶體的流進流出,
簡化軟體開發得難度,同時滿足高效能的需求。

最後,我們在 SID-based 多核心模擬器上實做了 MSA 與 Software Cache API
並以立體視覺應用程式來示範如何使用我麼的程式設計模型來開發平行程式。
Abstract i
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
1.1 Introduction . . . . . . . . . .. . . . . . . . . 1
1.2 Overview of the Thesis . . . . . . . . . . . . . . 2
2 The Design of Multi-core Software API 5
2.1 Invoking Remote Procedure . . . . . . . . . . . . 6
2.2 Task Communication . . . . . . . . . . . . . . . 8
2.3 Guided Programming Sample Using RPC and Communication
Module . . . . . . . . . . . . . . . . . . . . . 10
3 Software-Managed Cache Support 14
3.1 Internal memory management . . . . . . . . . . . 15
3.2 Address lookup table . . . . . . . .. . . . . . . 16
3.3 Replacement Policy . . . . . . . .. . . . . . . . 16
3.4 Stride and Fast Block Access . . . . . . . . . .. 17
3.5 Guided Programming Sample . . . . . . . . . . . . 17
4 Case Study 20
4.1 NAS Parallel Benchmarks . . . . . . . . . . . . . 20
4.1.1 Overview of NAS Parallel Benchmarks . . . 20
4.1.2 NAS Parallel Benchmarks inMSA . . . . . . . 21
4.2 stereo vision with belief propagation . . . . . . 22
4.2.1 Overview of the Belief Propagation . . . . 22
4.2.2 Parallelize Belief Propagation in MSA . . . 24
5 Experiment Results 25
5.1 Experiment Environment . . . . . . . . . . . . . 25
5.2 Experiment . . . . . . . . . . . . . . . . . . . 26
5.2.1 Performance of NAS benchmark . . . . . . . 26
5.2.2 Performance of belief propagation with stereo
vision. . . . . . . . . . . . . . . . . . 27
5.2.3 Performance of software cache . . . . . . 29
6 Conclusion 32
6.1 Summary . . . . . . . . . . . . . . . . . . . . . 32
6.2 FutureWork . . . . . . . . . . . . . . . . . . . 33
[1] Texas Instruments, “OmapTM4 mobile applications
platform,” 2009.
[2] Qualcomm, “The snapdragon platform,” 2010. [Online].
Available:
http://www.qctconnect.com/products/snapdragon.html
[3] T. Lin, C. Liu, S. Tseng, Y. Chu, and A.Wu, “Overview
of ITRI PAC project–from VLIW DSP processor to
multicore computing platform,”in Proc. IEEE Int. Symp.
VLSI Des., Automation, and Test, 2008, pp.188–191.
[4] J. Nickolls, I. Buck, M. Garland, and K. Skadron,
“Scalable parallel programming with cuda,” Queue, vol.
6, no. 2, pp. 40–53, 2008.
[5] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian,
M. Houston, and P. Hanrahan, “Brook for gpus: stream
computing on graphics hardware,”in SIGGRAPH ’04: ACM
SIGGRAPH 2004 Papers. NewYork, NY, USA: ACM, 2004, pp.
777–786.
[6] A. Munshi, “Opencl: Parallel computing on the gpu and
cpu.” SIGGRAPH,2008.
[7] A. D. Reid, K. Flautner, E. G. Evans, and Y. Lin, “Soc-
c: efficient programming abstractions for heterogeneous
multicore systems on chip,”in CASES ’08: Proceedings
of the 2008 international conference on Compilers,
architectures and synthesis for embedded systems.
NewYork, NY, USA: ACM, 2008, pp. 95–104.
[8] W. Thies, M. Karczmarek, and S. Amarasinghe,
“Streamit: A language for streaming applications,” in
Compiler Construction, ser. Lecture Notes in Computer
Science, R. N. Horspool, Ed. Berlin, Heidelberg:
Springer Berlin Heidelberg, March 2002, vol. 2304, ch.
14, pp.49–84.
[9] K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M.
Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J.
Dally, and P. Hanrahan, “Sequoia: programming the
memory hierarchy,” in SC ’06: Proceedings of
the 2006 ACM/IEEE conference on Supercomputing. New
York, NY, USA: ACM, 2006, p. 83.
[10] K.-Y. Hsieh, Y.-C. Liu, P.-W. Wu, S.-W. Chang, and J.
K. Lee, “Enabling streaming remoting on embedded dual-
core processors,” in Parallel Processing, 2008.
ICPP ’08. 37th International Conference on, Sept.
2008, pp. 35–42.
[11] T. Mattson, B. Sanders, and B. Massingill, Patterns
for parallel programming. Addison-Wesley Professional,
2004.
[12] C. A. Moritz, M. Frank, M. M. Frank, W. Lee, and S.
Amarasinghe, “Hot pages: Software caching for raw
microprocessors,” 1999.
[13] D. Patterson and J. Hennessy, Computer Organization
and Design: The Hardware/software Interface. Morgan
Kaufmann, 2005.
[14] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient
belief propagation for early vision,” Computer Vision
and Pattern Recognition, IEEE Computer Society
Conference on, vol. 1, pp. 261–268, 2004.
[15] J. Sun, N. Zheng, and H. Shum, “Stereo matching using
belief propagation,”IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 25, no. 7, pp.
787–800, 2003.
[16] C.-W. Huang, W.-K. Shih, Y. Hsu, and J.-K. Lee,
“Configurable sidbased multi-core simulators for
embedded system education,”in Workshop on Embedded
Systems Education’09, Grenoble, France, 2009.
[17] D. C.-W. Chang, “PAC digital signal processor,” in
Proceedings of Fall Microprocessor Forum, 2006.
[18] K. Hsieh, Y. Lin, C. Huang, and J. Lee, “Enhancing
microkernel performance on vliw dsp processors via
multiset context switch,” Journal of Signal
Processing Systems, vol. 51, no. 3, pp. 257–268, 2008.
[19] Y.-C. Lin, C.-L. Tang, C.-J. Wu, M.-Y. Hung, Y.-P.
You, Y.-C. Moo, S.-Y. Chen, and J. K. Lee, “Compiler
supports and optimizations for PAC VLIW DSP
processors,” in Proceedings of the 18th International
Workshop on Languages and Compilers for Parallel
Computing, 2005.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top