跳到主要內容

臺灣博碩士論文加值系統

(44.222.64.76) 您好!臺灣時間:2024/06/13 09:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林衍百
研究生(外文):Lin, Yan-Bai
論文名稱:利用高公局隧道內外即時影像估測行車速度
論文名稱(外文):Vehicle Speed Estimation Using Freeway CCTVs Inside and Outside Tunnels
指導教授:張明峰張明峰引用關係
指導教授(外文):Chang, Ming-Feng
口試委員:謝文生彭文志張明峰
口試委員(外文):Hsieh, Wen-ShengPeng, Wen-ChihChang, Ming-Feng
口試日期:2020-07-27
學位類別:碩士
校院名稱:國立交通大學
系所名稱:數據科學與工程研究所
學門:電算機學門
學類:軟體發展學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:46
中文關鍵詞:高公局行車速度即時影像卷積神經網路深層殘差網路
外文關鍵詞:Freeway bureauvehicle speedCCTVConvolutional neural networkDeep Residual Convolutional Networks
相關次數:
  • 被引用被引用:0
  • 點閱點閱:310
  • 評分評分:
  • 下載下載:44
  • 收藏至我的研究室書目清單書目收藏:0
在日益發展的交通資訊系統(TIS, Traffic Information System)中,對於車速的估測是個關鍵且十分值得探討的議題。從車速我們可以推估出許多有用的交通資訊,如路況的順暢或壅塞、旅行時間的推算等。因此,能得知即時的車速是對用路人很大的助益。此篇論文利用高公局發佈的CCTV (closed-circuit television)影像,使用深度學習的技術從影像中估測出車速,以提供用路人更即時的交通資訊。

在資料集部分,我們使用CCTV原始影像當作輸入,VD (vehicle detector)的車道車速之平均值作為真值,並使用MAE (mean absolute error)來計算誤差。在資料前處理部分,我們先使用聚合式階層分群法對CCTV影像分類,排除無法當成輸入的影像。接著我們以生成ROI(Region of Interest)的技術製作遮罩,只提取我們想要的單向道路影像作為輸入。

在網路架構的部分,我們使用深層殘差網路(ResNet)做為我們的基本架構;在每層的3D-CNN分成空間維度與時間維度兩層分別處理((2+1)D),兩者結合的架構記為R(2+1)D。我們適度調整網路前期卷積層與池化層在時間上的維度,避免遺失影片前期的時間資訊。我們設計四種不同深度的殘差網路,來探討CNN要學習多長的時空間序列特徵有最好的表現。另外我們設計了分別使用原始影像與光流作為輸入的雙流卷積網路。在實驗結果中,深度為10與18之深層殘差網路在估測車速平均值得到最好的結果。與現有的架構如3D-CNN和 3D-CNN+GRU相比,針對我們測試的隧道外10支CCTV與隧道內6支影像,我們最優架構的MAE減少0.6%至65.2%。此外,大部分CCTV在順暢與壅塞時的資料量並不平衡(data imbalance),當訓練資料足夠大時,我們的車速估測MAE可達2 – 5 km/hr。
Traffic speed estimation plays an important role in traffic information systems (TIS). Traffic speeds can be used to derive road network traffic conditions and travel time prediction. In addtion, it is a great benefit for motorists to know the instantaneous traffic conditions ahead, so that they can be pre-alerted when approaching traffic incidents. This paper uses the CCTV videos provided by the Taiwan Freeway Bureau to estimate the traffic speeds on Taiwan Nation Highways, in order to provide real-time traffic speed information.
For our dataset, we use original CCTV videos and vehicles’ optical flow derived from the original videos as inputs, the average 1-minute speeds of VDs (vehicle detectors) measurements as the ground truth, and MAE (mean absolute error) as the error function. For data pre-processing, first we use Agglomerative Hierarchical Clustering to classify CCTV video frames, and exclude abnormal videos. We then use Region of Interest (ROI) to generate a mask to retrieve the one-way traffic which is measured by the corresponding VD.
We implement a network architecture based on 3D Residual Network (3D ResNet). We split each 3D CNN into a spatial 2D convolution followed by a temporal 1D convolution, i.e., (2+1)D CNN. The combination of above two models will be referred to as R(2+1)D. We moderately lower the temporal filter sizes and use no temporal pooling at the beginning layers of the network to generate more temporal features. We use four different depths of residual networks, test different filter sizes, and construct models for differentframe rates of CCTV videos. Our experimental results show that R(2+1)D of depths 10 and 18generate the best results. Compared with the existing architectures, such as plain 3D CNN and combination of 3D CNN and GRUs, the MAE of our method is reduced by 0.6% to 65.2% . We have tested on 10 outside tunnel CCTV videos and 6 inside tunnel CCTV videos, most CCTVs have issues of data imbalance between free-flow and congestion data. When training data are large enough, our MAE of our speed estimation is about 2 to 5 km/hr,
摘要 …………………………………………………………………………………..i
Abstract ……………………………………………………………………………..ii
致謝 …………………………………………………………………………………iv
目錄 …………………………………………………………………………………v
表目錄 ……………………………………………………………………………..vii
圖目錄 …………………………………………………………………………….viii
一、 緒論 ………………………………………………………………….1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
1.4 總結 4
二、 相關研究與探討 ………………………………………………5
2.1 3D Convolutional Neural Network 5
2.2 Deep Residual Convolutional Networks(ResNet) 6
2.3 (2+1)D Convolutional Networks 8
2.4 Two-Stream Convolutional Networks 9
2.5 Long-term Recurrent Convolutional Networks 9
2.6 總結 11
三、 研究方法 ………………………………………………………..13
3.1 資料前處理 13
3.1.1處理CCTV資料 13
3.1.2處理VD資訊 16
3.2 模型架構設計 16
3.2.1 R(2+1)D Convolution 17
3.2.2 Two-Stream Convolution Networks 20
3.3 類神經網路訓練方式 21
3.3.1不同幀數的訓練 21
3.3.2訓練參數與誤差計算 22
四、 實驗結果 ……………………………………………………….23
4.1 加入遮罩的影響 24
4.2 加入雜訊的影響 26
4.3 不同目標值的估測 28
4.4 不同架構的比較 29
4.4.1不同層數R(2+1)D與R3D的比較 29
4.4.2 雙流卷積網路 30
4.5綜合比較 30
4.5.1隧道外幀數區間與路況分析 30
4.5.2測試隧道內CCTV 34
4.5.3 與現有架構比較 36
4.6其他改動與比較 38
4.6.1 更淺層R(2+1)D之比較 38
4.6.2 加入雜訊的改動 39
4.6.2 CCTV與VD間距的影響 40
五、 結論與未來展望………………………………. 42
參考文獻 ……………………………………………………………………….44
[1] Zhang, J., Wang, F. Y., Wang, K., Lin, W. H., Xu, X., & Chen, C. (2011). Data-driven intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems, 12(4), 1624-1639. [5959985]. DOI: 10.1109/TITS.2011.2158001
[2] 交通部高速公路局「交通資料庫」, http://tisvcloud.freeway.gov.tw/
[3] X. Qimin, L. Xu, W. Mingming, L. Bin and S. Xianghui, "A methodology of vehicle speed estimation based on optical flow," Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics, Qingdao, 2014, pp. 33-37, doi: 10.1109/SOLI.2014.6960689.
[4] Z. Tang, G. Wang, H. Xiao, A. Zheng and J. Hwang, "Single-Camera and Inter-Camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, 2018, pp. 108-1087, doi: 10.1109/CVPRW.2018.00022.
[5] S. Hua, M. Kapoor and D. C. Anastasiu, "Vehicle Tracking and Speed Estimation from Traffic Videos," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, 2018, pp. 153-1537, doi: 10.1109/CVPRW.2018.00028.
[6] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri. “Learning Spatiotemporal Features with 3D Convolutional Networks,” presented at the ICCV, Santiago, Chile, 2015.
[7] Hara, Kensho & Kataoka, Hirokatsu & Satoh, Yutaka. (2017). Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition.
[8] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[9] Z. Qiu, T. Yao and T. Mei, "Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 5534-5542, doi: 10.1109/ICCV.2017.590.
[10] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri, "A Closer Look at Spatiotemporal Convolutions for Action Recognition," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 6450-6459, doi: 10.1109/CVPR.2018.00675.
[11] C. Feichtenhofer, A. Pinz and A. Zisserman, "Convolutional Two-Stream Network Fusion for Video Action Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 1933-1941, doi: 10.1109/CVPR.2016.213.
[12] Carreira, J. & Zisserman, Andrew. (2017). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. 4724-4733. 10.1109/CVPR.2017.502.
[13] Gaoang Wang, Xinyu Yuan, Aotian Zheng, Hung-Min Hsu, Jenq-Neng Hwang; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 382-390
[14] Farnebäck, G. Two-frame motion estimation based on polynomial expansion. In Proceedings of the Scandinavian Conference on Image Analysis, Halmstad, Sweden, 29 June–2 July 2003; pp. 363–370.
[15] S. Ji, W. Xu, M. Yang, and K. Yu. “3D convolutional neural networks for human action recognition.” PAMI, 35(1):221–231, 2013.
[16] Springenberg, Jost & Dosovitskiy, Alexey & Brox, Thomas & Riedmiller, Martin. (2014). Striving for Simplicity: The All Convolutional Net.
[17] Wang, Ching-Hsuan, "Vehicle Speed Estimation Using Freeway CCTVs", https://etd.lib.nctu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dallcdr&s=id=%22GT070656545%22.&searchmode=basic
[18] Donahue, J., Hendricks, L.A., Rohrbach, M., et al.: “Long-term recurrent convolutional networks for visual recognition and description”, IEEE Trans. Pattern Anal. Mach. Intell., 2017.
[19] Simonyan, Karen & Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556.
[20] August Chao, NCHC scenario test "Task A:Visual-Speed Detection", (2018), GitHub repository, https://github.com/TW-NCHC/functionality-scenario-test-A-2018.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top