跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/14 22:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蕭明章
研究生(外文):Ming Chang Shiao
論文名稱:合作性加強式學習應用於高速多媒體網路之擁塞控制
論文名稱(外文):A Cooperative Reinforcement Learning Approach to Congestion Control of High-Speed Multimedia Networks
指導教授:吳承崧
指導教授(外文):Cheng Shong Wu
學位類別:博士
校院名稱:國立中正大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:129
中文關鍵詞:擁塞控制
外文關鍵詞:Congestion Control
相關次數:
  • 被引用被引用:0
  • 點閱點閱:230
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
近年來,高等通訊技術提供了更高傳輸的頻寬。然而,網路使用人口的快速增加使得頻寬已不敷使用,產生網路擁塞的現象。擁塞造成傳輸量的降低、封包遺失率及傳送延遲增加。面對頻寬不足的問題,可經由改善擁塞控制的方法,使其能更有效率的利用網路的資源。傳統壅塞控制方法是一直監測佇列 (queue) 的長度,作為調整來源速率的依據。本論文旨在探討所提出擁塞控制方法具有合作機制之加強式學習架構,有別於傳統AIMD的控制方式,以適應瞬息萬變的網路環境。此由類神經網路所構成的加強式學習架構,是由兩個子系統組成: 預測器(expectation-return predictor) 是一長期的評估器,而另一為短期的動作 (action) 選擇器,其由動作值 (action-value) 評估器及隨機動作選擇器所構成。我們研究分成三大部份。第一部份是研究加強式學習架構用於多媒體網路壅塞控制。所提出的加強式學習架構透過由立即獎勵評估器所產生的加強式信號,即能採取最佳的動作去控制來源流並兼顧高傳輸量及低封胞遺失率。第二部份是研究合作性加強式學習架構,用以解決動態高速網路壅塞控制之問題。在接收到由合作性模糊獎勵評估器所產生的合作性加強式信號,使得此合作性加強式學習架構能學得在時變環境下採取調適性的正確動作。最後一部份研究是,在多媒體網路上以學習為基礎的合作性擁塞控制。採用合作性模糊獎勵評估器依據賽局理論所產生的合作性加強式信號是為了充分利用頻寬。模擬結果說明所提出的方法能同時提昇系統使用率與降低數據遺失率。
In recent years, the advanced communications technologies supply more and more network bandwidth. However, Internet users increase rapidly result in the network bandwidth is exhausted. When too many users are present in the Internet cause performance degrades. This situation is called congestion. Low throughput, high packet loss rate and high transmit delay result from congestion. The problem of insufficient bandwidth can be improved by way of enhancing the congestion control mechanism so that it can work more efficiently. Traditional methods for congestion control always monitor the queue length, on which the source rate depends. This paper is meant to explore the proposed congestion control with cooperative reinforcement learning (RL) scheme that differ from control method of AIMD, to adapt to the variant network environment. The RL scheme, mainly implemented by artificial neural networks (ANNs), consists of two subsystems: the expectation-return predictor is a long-term policy evaluator and the other is a short-term action selector, which is composed of an action-value evaluator and a stochastic action selector. In this research, we divide the study of proposed congestion control into three applications. The first application applies a RL scheme to congestion control in multimedia networks. The proposed RL scheme receives reinforcement signals generated by an immediate reward evaluator and takes the best action in the sense of state value evaluation to control source rates in consideration of system performance. The second application is the study for an adaptive multi-agent RL scheme on solving congestion control problems on dynamic high-speed networks. After receiving cooperative reinforcement signals generated by a cooperative fuzzy reward evaluator, the proposed cooperative multi-agent congestion control can learn to take correct actions adaptively under time-varying environments. The last one is the study for a cooperative congestion control for multimedia networks based on learning approach. In order to make the best of link utilization, a cooperative fuzzy reward evaluator provides cooperative reinforcement signals based on game theory are included. Simulation results have shown that these proposed approaches can increase system utilization and decrease packet losses simultaneously.
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Effects of Congestion in Communication Networks . . . . . . . . . . . 1
1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3. Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2. Backgrounds of Research . . . . . . . . . . . . . . . . . . 5
2.1. Introduction to Congestion Control . . . . . . . . . . . . . . . . . . . 5
2.1.1. Congestion Control Approachs . . . . . . . . . . . . . . . . . . 5
2.1.2. Service Category inMultimedia Networks . . . . . . . . . . . 8
2.2. Introduction to Reinforcement Learning . . . . . . . . . . . . . . . . 9
2.2.1. Elements of Reinforcement Learning . . . . . . . . . . . . . . 10
2.2.2. Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3. Actor-CriticMethods . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.4. Temporal Difference (TD) with Eligibility Traces Learning . . 14
Chapter 3. A Reinforcement Learning Approach to Congestion
Control of High-Speed Multimedia Networks . . . . . . . . . . . 17
3.1. RLCC Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1. SystemModel . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2. SystemImplementation . . . . . . . . . . . . . . . . . . . . . 18
3.2. On-LineAlgorithms of RLCC . . . . . . . . . . . . . . . . . . . . . . 23
3.3. Simulations and Comparisions . . . . . . . . . . . . . . . . . . . . . 27
3.3.1. One Node Controller Scenario . . . . . . . . . . . . . . . . . . 30
3.3.2. Multinode Controller Scenario . . . . . . . . . . . . . . . . . . 33
Chapter 4. Cooperative Multiagent Congestion Control for High-
Speed Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1. Theoretical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1. MDP for a Single-Agent . . . . . . . . . . . . . . . . . . . . . 39
4.1.2. Cooperative Learning forMultiagent . . . . . . . . . . . . . . 40
4.1.3. Architecture of CMCC. . . . . . . . . . . . . . . . . . . . . . 43
4.2. On-Line Learning of CMCC . . . . . . . . . . . . . . . . . . . . . . . 46
Table of Contents–Continued
vi
4.2.1. Associative Input Space Partitioning . . . . . . . . . . . . . . 47
4.2.2. Cooperative Fuzzy Reward Evaluator . . . . . . . . . . . . . 50
4.2.3. On-Line Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3. SimulationAnd Comparisons . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1. ASingle Agent Control . . . . . . . . . . . . . . . . . . . . . . 62
4.3.2. Cooperative andNon-CooperativeMultiagent Control . . . . . 64
Chapter 5. Cooperative Congestion Control for Multimedia Networks
Based on Learning Approach . . . . . . . . . . . . . . . . . . 78
5.1. Cooperative Congestion Control Framework . . . . . . . . . . . . . . 79
5.1.1. System Configuration . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.2. Cooperative Congestion ControlOperation . . . . . . . . . . . 82
5.2. Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.1. Game theory for Cooperative Learning . . . . . . . . . . . . . 84
5.2.2. Architecture of CCC . . . . . . . . . . . . . . . . . . . . . . . 86
5.3. On-Line Learning of CCC . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.1. Cooperative Fuzzy Reward Evaluator . . . . . . . . . . . . . . 89
5.3.2. Cooperative LearningAlgorithms . . . . . . . . . . . . . . . . 94
5.4. SimulationAnd Comparisons . . . . . . . . . . . . . . . . . . . . . . 98
5.4.1. Simulation Environment . . . . . . . . . . . . . . . . . . . . . 100
5.4.2. SimulationResults AndDiscussion . . . . . . . . . . . . . . . 101
Chapter 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.1. Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2. FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
[1] Y. H. Long, T. K. Ho and A. B. Rad, “An enhanced explicit rate algorithm for ABR traffic control in ATM networks,” Int. J. commun. Syst. ; vol. 14, pp.909-923, 2001.
[2] Wei-hua Jiang, Wei-hua Li, Jun Du, “The application of ICMP protocol in network scanning,”Parallel and Distributed Computing, Applications and Technologies,2003. PDCAT’2003. Proceedings of the Fourth International Conference on, 27-29 Aug. 2003, pp. 904 - 906
[3] Richard S. Sutton. “Learning to predict by the methods of temporal differences,”Machine Learning, 3:9-44, 1988.
[4] Ching-Fong Su, Gustavo de Veciana, and Jean Walrand, “Explicit Rate Flow Control for ABR Services in ATM Networks,” IEEE/ACM Trans. Networking, Vol. 8, No. 3, June 2000.
[5] Sammy Chan, Moshe Zukerman, Eric W. M. Wong, K.T. Ko, Edmund Yeung, and Bartek Wydrowski, “A congestion control framework for available bit rate service in ATM networks,” Int. J. Commun. Syst., I5: 341-357, January 2002.
[6] Panos Gevros, Jon Crowcoft, Peter Kirstein and Saleem Bhatti, “congestion control Mechanisms and the Best Effort Service Model,” IEEE network, vol. 15, no. 3, pp. 16-26, May/June 2001.
[7] Lawrence S. Brakmo, and Larry L. Peterson, “TCP Vegas: End to End congestion Avoidance on a Global Internet,” IEEE Journal On Selected Areas In Communication, vol. 13, no. 8, pp. 1465-1480 October 1995.
[8] M Yuksel, S. Kalyanaraman, A. Geol, “Congestion prcing overlaid on edge-toedge congestion control,”ICC2003, vol. 2, pp. 880-884, May 2003.
[9] A. A. Tarraf, I. W. Habib, and T. N. Saadawi, “Reinforcement Learning Based Neural Network Congestion Control for ATM Networks,” IEEE Proceeding of MILCOM 1995, Conference Record, 2:pp. 668-672.
[10] Ray-Guang Cheng, Chung-Ju Chang, and Li-Fong Lin, “A QoS-Provisioning Neural Fuzzy Connection Admission Controller for Multimedia High-Speed Networks,” IEEE/ACM Trans. on Networking, vol. 7, no. 1, pp. 111-121, Feb. 1999.
[11] S. J. Lee and C. L. hou, “A Neural-Fuzzy System for Congestion Control in ATM Networks,” IEEE Transactions on System, Man. and Cybernetics, vol. 30, pp. 2-9, 2000.
[12] R. S. Sutton and A. G. Barto, “Reinforcement Learning An Introduction,” Cambridge, Mass., MIT Press, 1998.
[13] D. V. Prokhorov and D. C. Wunsch II, “Adaptive Critic Designs,” IEEE Transactions on Neural Networks, vol. 8, pp. 997—1007, 1997.
[14] R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,”Machine Learning, vol. 3, pp. 9-44, 1988.
[15] V. Gullapalli, “A Stochastic Reinforcement Learning Algorithm for Learning Real-Valued Functions,” Neural Networks, vol. 3, pp. 671-692, 1990.
[16] Robots L. “Enhanced PRCA (proportional rate-control lgorithm,” ATM Forum Contribution 94-0735R1, 1994.
[17] Chan S. Wong E, Ko KT, “Fair packet discarding for controlling ABR traffic in ATM network,” IEEE Transactions on Communications, pp. 45:913-916, 1997.
[18] J. Nagle, ”On packet switches with infinite storage,” IEEE Trans. Commun., vol. 35, pp. 435-438, Apr. 1987.
[19] Gallardo, JR. “Dyanmic predictive weighted fair queueing for differentiated service,”ICC2001, vol 8, pp. 2380-2384, June 2001.
[20] H. Jonathan Chao and Xiaolei Guo, Quality of Service Control in High-Speed Networks, John Willey & Sons, 2002.
[21] H. T. Kung and R. Morris, “Credit-based flow control for ATM networks,” IEEE Netw. Mag., vol.9, no. 2, pp. 40-48, Mar./Apr. 1995.
[22] A. K. Parekh and and R. G. Gallager, “A generalized processor sharing approach to flow control in integrated services networks: the single-node case,”IEEE/ACM Trans. Netw., vol. 1, no. 3, pp. 344-357, Jun. 1993.
[23] Nishanth R. Sastry and Simon S. Lam, “A Theory of Window-Based Unicast Congestion Control,”IEEE/ACM Transactions on Networking, Vol. 13, pp. 330-342, April 2005.
[24] ATM Forum, Traffic Management Specification, Version 4.1, AF-TM-012.000,Mar. 1999.
[25] Lin Cai, Xuemin Shen, Jianping Pan, and Jon W.Mark, “Performance Analysis of TCP-Friendly AIMD Algorithms for Multimedia Applications,”IEEE Transactions on Multimedia, vol. 7, no2. pp.339-355, April 2005.
[26] P. Newman, “Traffic management for ATM local area network,” IEEE Commun. Mag., vol. 32, no. 8, pp. 45-50, Aug. 1994.
[27] X. Guo and T. T. Lee, “backlog balancing flow control in high-speed data networks,”IEEE GLOBECOM’95, pp. 690-695, 1995.
[28] Ion Stoica, Scott Shenker and Hui Zhang, ”Core-Stateless Fair Queueing: A Scalable Architecture to Approximate Fair Bandwidth Allocations in High-Speed Networks,”,IEEE/ACM Trans. Netw., vol. 11, no. 1, pp. 34-36, Feb. 2003.
[29] Dongyu Qiu and Ness B. Shroff,”A predictive flow control scheme for efficient network utilization and QoS,” IEEE/ACM Transactions, vol. 12, no. 1, pp. 161-172, Feb. 2004.
[30] Kao-Shing Hwang and Ching-Shung Lin, “Smooth Trajectory Tracking of Three-Link Robort: A self-Organizing CMAC Approach,” IEEE Transactions on Systems,Man and Cybernetics -Part B: Cybernetics, vol. 28, no. 5, pp. 680-692,Oct. 1998.
[31] Sun, R., “Individual action and collective function: From sociology to multiagent learning”, journal of Cognitive Systems Research 2, pp.1-3, 2001.
[32] Kao-Shing Hwang, Shun-Wen Tan and Min-Cheng sai, “Reinforcement Learning to Adaptive Control of Nonlinear System,” IEEE Transactions on Systems, Man and Cybernetics -Part B: Cybernetics, vol. 33, no. 3, pp. 514-521, Jun.
2003.
[33] Michael L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” Proceedings of the Eleventh Internal Conference on Machine Learning, pp157-163. New Brunswick, 1994.
[34] Giuseppe Bianchi, “Performance Analysis of the IEEE 802.11 Distributed Coordination Function” IEEE Journal on Selected Areas in Communications, vol. 18,no. 3, pp. 535-547, Mar. 2000.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
1. 陳金定,劉焜輝(民92)。青少年依附行為與情緒調適能力之因果模式初探。國立台灣師範大學教育心理與輔導學系,教育心理學報,35卷,1期,39-58頁。
2. 19. 潘雅慧,2002,「新巴塞資本協定及信用風險模型之研析」,中央銀行季刊,第24卷第2期。
3. 張素貞(民89)。依附理論在團體諮商中的應用。諮商與輔導,171(卷)期,14-18。
4. 吳宗祐、鄭伯壎(2003,秋)。組情緒研究之回顧與前瞻。應用心理研究,19期,頁137-173。
5. 陳書梅(2003.09)。我國圖書館館員負面工作情緒之質化研究。圖書資訊學刊,1卷,2期,頁41-57。
6. [19] 鄭有舜, "X-光小角度散射在軟物質研究上的應用", 物理雙月刊(廿六卷二期), p.416~424, 2004年4月.
7. 15. 陳文華、王佳真、吳壽山,1999,「風險值方法之比較」,證券市場發展季刊,139-162頁。
8. 12. 張揖平、洪明欽、吳一芳,2003,「風險值的風險之探討-以台灣加權股價指數和新台幣對美元匯率為例」,風險管理學報,第5卷第2期。
9. 6. 李進生、盧陽正,1999,「風險值:觀念與估算方法」,證券金融季刊,第63期,39-58頁。
10. 葉在庭(2003)。焦慮與憂鬱情緒現象之探討—檢驗情緒三領域模式。中華心理衛生學刊,16卷,2期,頁87-111。
11. 程淑華(民93)。情緒管理課程之學習經歷研究—以軍校學生為例。中華輔導學報,16期,頁35-69。
12. 陳書梅(民91.03)。參考館員負面工作情緒向管理—從圖書館組織層面探討。大學圖書館,6卷,1期,頁46-70。