研究生(外文):Hao-Yu Yang
論文名稱(外文):Developing Freeway Mainline Metering Policy by Deep Reinforcement Learning Combined with Ramp Metering — A Case Study of Freeway No.5
指導教授(外文):Tien-Pen Hsu
口試委員(外文):Chien-Sheng WUShou-Ren Hu
外文關鍵詞:Deep reinforcement learningMainline meteringRamp metering
本研究提出深度強化學習結合ALINEA匝道儀控的聯合儀控模型,並藉由Vissim車流模擬軟體進行策略學習。主線儀控代理會觀察高速公路各路段的流量、速率及密度資料,即時做出最佳決策,並配合獨立運作的匝道儀控系統,共同控制高速公路儀控管理策略。本研究以最小化車輛旅行時間為目標,將負的路網車輛數作為學習獎勵。而為了防堵主線儀控造成高速公路主線嚴重的車流回堵,另加入主線等候車輛數做為學習懲罰。聯合儀控模型經過500回合的訓練,模型得以收斂,最後與高公局現行策略及MRC PI-ALINEA模型分別進行比較。本研究提出之聯合儀控策略與高公局現行策略相比,小客車效率平均下降1.03%,大客車效率平均提升16.53%;與MRC PI-ALINEA模型相比小客車效率平均下降25.13 %,大客車效率平均提升25.09 %。顯示本研究能擴大小客車與大客車之旅行時間差異。而從時空速率圖分析,在啟動儀控後一個小時,本研究確實能有效舒緩雪山隧道內車流壅塞情形,並使路段平均速率達70kph。
Freeway No. 5 is the most important road connecting the eastern of Taiwan, and because it directly passes through the Xue Mountain, the travel time of going to east has been greatly reduced compared with other routes. As a result, many tourists use this freeway to travel to the eastern of Taiwan, causing the weekends’ recurrent congestion. Traffic congestions will reduce the efficiency of freeway operation and bring huge social costs. In order to alleviate this problem, the Freeway Bureau proposed many traffic demand management strategies, such as ramp metering, motorized shoulder bus lane, main metering, high-occupancy regulation policy, and so on.
However, the Freeway Bureau’s control policy of ramp metering and mainline metering is a dynamic look-up method based on the length of the traffic jam. This research believes that the current management measure can be improved by developing a metering control policy using deep reinforcement learning method in order to relieve or avoid congestion in the Xueshan Tunnel. Deep reinforcement learning (DRL) has the ability to tackle the dynamic and complex traffic control problem. Moreover, it can eliminate the assumption of traffic flow models, and can feed traffic data with temporal and spatial characteristics into the models through neural networks.
This research developed a freeway mainline metering policy by deep reinforcement learning combined with ramp metering, which includes mainline metering policy controlled by DRL agent and ramp metering policy controlled by ALINEA algorithm. Vissim, the traffic simulation software was used for model learning and evaluating. The DRL agent makes the best decision in real time based on the flow, speed and density data of each section on the freeway, and works with the ramp metering policy independently. This study considered minimum vehicle travel time as learning target. Then defined the DRL agent’s learning reward as the weighted sum reward function which contains negative number of network vehicles and negative number of waiting vehicles on the freeway mainline. After 500 training episodes, the model has converged. Finally, the proposed model was compared to the current strategy and the MRC PI-ALINEA model respectively. The Q-ratio of passenger cars is decreased by 1.03% and the Q-ratio of buses is increased by 16.53% compared to the current strategy. When compared to the MRC PI-ALINEA model, the Q-ratio of passenger cars is decreased by 25.09% and the Q-ratio of buses is increased by 25.13%. These results show that the joint metering control policy proposed by this research expands the travel time difference between buses and passenger cars. Furthermore, the average speed of Xueshan Tunnel can reach 70kph after the metering started one hour later, and proved it can effectively alleviate traffic congestion.
謝誌 i
摘要 ii
Abstract iii
目錄 v
圖目錄 viii
表目錄 xi
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 4
1.4 研究範圍 5
1.5 研究流程 6
第二章 文獻回顧 8
2.1 主線儀控及聯合儀控 8
2.2 匝道儀控策略及演算法 10
2.2.1 儀控策略(Metering Policy) 10
2.2.2 匝道儀控演算法 13
2.3 多代理強化學習 22
2.4 結論 24
第三章 國道5號交通現況分析及路網建構 26
3.1 國道5號交通現況分析 26
3.1.1 資料說明 26
3.1.2 主線速率時空圖分析 28
3.1.3 路段瓶頸點分析 32
3.1.4 交通績效分析 36
3.2 國道5號路網建構 40
3.2.1 研究範圍之資料蒐集 42
3.2.2 Vissim路網建構 44
3.2.3 駕駛行為參數校估 46
3.2.4 路網驗證 52
第四章 研究方法 54
4.1 強化學習(Reinforcement Learning) 54
4.2 神經網路(Neural Network) 59
4.3 深度Q網路(Deep Q-Network) 64
4.4 多代理學習之獨立Q網路(Independent DQN,IDQN) 67
4.5 高速公路主線與匝道聯合儀控模型 68
4.5.1 聯合儀控之主線儀控模型 69
4.5.2 聯合儀控之匝道儀控模型 74
4.6 基礎模型(Baseline Model) 77
第五章 聯合儀控模型訓練結果及情境分析 81
5.1 模型訓練之設定與訓練結果 81
5.1.1 模型訓練之設定 81
5.1.2 模型之訓練結果 84
5.2 績效評比與實際情境比較 91
5.3 績效評比與基礎模型(MRC PI-ALINEA)比較 96
第六章 結論與建議 101
6.1 結論 101
6.2 研究限制 102
6.3 建議 102
參考文獻 104
附錄一 111
附錄二 115
附錄三 119
