跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/14 14:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李曼妤
研究生(外文):Man-Yu Lee
論文名稱:用於即時跨相機追蹤系統之具困難樣本校正之非監督式行人重識別
論文名稱(外文):Unsupervised Person Re-Identification with Hard Samples Rectification in a Real-time Multi-Camera Tracking System
指導教授:簡韶逸
指導教授(外文):Shao-Yi Chien
口試委員:莊永裕曹昱陳祝嵩鮑興國
口試委員(外文):Yung-Yu ChuangYu TsaoChu-Song ChenHsing-Kuo Pao
口試日期:2020-07-13
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電子工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:61
中文關鍵詞:深度學習非監督學習行人重識別跨相機追蹤
外文關鍵詞:Deep learningUnsupervised learningPerson re-identificationMulti-camera tracking
DOI:10.6342/NTU202001791
相關次數:
  • 被引用被引用:0
  • 點閱點閱:404
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
跨相機追蹤在智慧城市中是一個關鍵的技術,目標為在一個相機網路下追蹤所有出現的行人。然而跨相機追蹤這個問題太過困難,在最後一步行人匹配的步驟中衍伸出了一個新的研究主題,也就是行人重識別。行人重識別的目標在於利用外表的資訊來辨別不同相機的下的每個人。雖然藉助於卷積神經網路的興起,使用監督式學習方法可以得到很好的成績,但是在非監督式領域自適應這個問題上,因為目標域缺少標籤資料,領域自適應仍然具有相當大的挑戰性。除了匹配的準確性之外,能夠實作一個可行的跨相機追蹤系統也是物聯網監控應用的關鍵的一環。然而,隨著物聯網設備的普及,跨相機系統將會需要在邊緣裝置上運行,來減少網路上的延遲以及數據傳輸量,並實現更高效能的即時性應用。
在這篇論文中,我們提出了一個針對行人重識別具被困難樣本校正的非監督式演算法(HSR),解決了聚類分析容易受到困難樣本的影響而表現不佳的問題。我們提出的HSR包含兩個面向,一個是跨相機的困難正樣本收集,能夠幫助辨別不同相機下的同一個人; 另一個是透過檢查局部同質性來區別具有相似外觀的不同人,也就是困難負樣本。藉由我們的兩個面向的訓練方法,可以修正那些困難樣本並且用準確的標籤資料訓練模型以提高性能。我們進行了大量實驗來證明了我們的方法比起現在最先進的非監督式方法表現得還要更好。
此外,我們提出了一個有效的跨相機追蹤系統架構並運行在物聯網硬體上,來證明系統在邊緣裝置上執行的可行性。我們利用系統的傳遞途徑跟系統中每個運算模組的特性,來減少追蹤系統所需要的大量運算資源。藉由有效地分配運算資源,我們所提出的架構可以實現良好的追蹤成積,並且能夠在邊緣裝置上即時的運行。我們提供全面的實驗以說明跨相機系統中每個組件之間的相關性,並證明了我們提出的系統實用性。
Multi-Camera Tracking (MCT) is a crucial technology in an envisioned smart city which aims to track multiple people through a network of cameras. While MCT is a notoriously difficult problem to solve, a popular research topic has derived from the final step of the matching scheme, person re-identification (re-ID), which address the problem of recognizing people across cameras with visual appearance. Although person re-ID has received great improvement due to the rise of the Convolution Neural Network (CNN) with the supervised learning methods, the task of unsupervised cross-domain re-ID is still challenging owing to the lack of labelled data in the target domain. In addition to the matching accuracy, being able to implement an workable MCT system is also critical factor for IoT surveillance applications. However, as IoT devices become more widespread, the MCT system will need to implement on edge devices to reduce network latency and data transmission and enable for more efficient real-time applications.
In this thesis, we propose a unsupervised learning scheme of Hard Samples Rectification (HSR) for person re-ID which resolves the weakness of original clustering-based methods being vulnerable to the hard positive and negative samples in the dataset. Our proposed HSR contains two learning facets, an inter-camera mining technique which helps recognize the same person under different camera views (hard positive), and a part-based homogeneity technique that makes the re-ID model identify different person but with similar appearance (hard negative) by examining the local homogeneity. By jointly rectifying the hard samples with our dual-faceted learning scheme, the re-ID model can learn on more accurate hard cases to improve the performance. Extensive experiments on two large-scale benchmarks demonstrate the superiority of our HSR over state-of-the-art methods.
Furthermore, we proposed the multi-camera tracking system on a real-world hardware with an efficient framework to demonstrate the viability on edge devices. Specifically, we leverage the system pipeline and the characteristic of each operator of MCT system to eliminate the need for tremendous amount of computational resources. By effectively allocate the computing power, our proposed framework achieves favorable performance and is able to run in real-time on mobile hardware. Comprehensive experiments are conducted to illustrate the correlation between each component in MCT and show the utility of our proposed MCT system.
Abstract i
List of Figures v
List of Tables vii
1 Introduction 1
1.1 Re-identification 1
1.2 Multi-Camera Tracking 4
1.3 Contribution 7
2. Hard Samples Rectification for Unsupervised Person Re-identification 9
2.1 Related Work 11
2.2 Proposed Method 14
2.2.1 Overview of HSR learning scheme 14
2.2.2 Inter-Camera Mining 15
2.2.3 Part-based Homogeneity 16
2.2.4 Optimization Procedure 18
2.3 Experiments 19
2.3.1 Datasets and Evaluation Protocol 19
2.3.2 Implementation Details 20
2.3.3 Comparison with State-of-the-arts 20
2.3.4 Ablation Study 22
2.4 Conclusion 27
3 Multi-Camera Tracking 29
3.1 Offline Multi-Camera Tracking 30
3.1.1 MCT framework 30
3.1.2 Experiments 33
3.2 Online MCT System 36
3.2.1 System Overview 36
3.2.2 Experiments 37
3.3 Proposed Framework of Real-time MCT System 42
3.3.1 System Design 42
3.3.2 Experiments 44
4 Conclusion 55
Reference 57
Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proceedings of European Conference on Computer Vision (ECCV), 2018, pp. 480–496. 1
G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou,“Learning discriminative features with multiple granularities for person re-identification,” in 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2018, pp. 274–282. 1
Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz, “Joint discriminative and generative learning for person re-identification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2138–2147. 1, 11
W. Xiang, J. Huang, X. Qi, X. Hua, and L. Zhang, “Homocentric hypersphere feature embedding for person re-identification,” in Proceedings of IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 1237–1241. 2
P. Peng, T. Xiang, Y. Wang, M. Pontil, S. Gong, T. Huang, and Y. Tian,“Unsupervised cross-dataset transfer learning for person re-identification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1306–1315. 3, 11, 21
W. Deng, L. Zheng, G. Kang, Y. Yang, Q. Ye, and J. Jiao, “Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3, 11, 21
Z. Zhong, L. Zheng, S. Li, and Y. Yang, “Generalizing a person retrieval model hetero-and homogeneously,” in Proceedings of European Conference on Computer Vision (ECCV), 2018, pp. 172–188. 3, 12, 21
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680. 3, 11
H.-X. Yu, A. Wu, and W.-S. Zheng, “Cross-view asymmetric metric learning for unsupervised person re-identification,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017, pp. 994–1002. 3, 12, 20, 21
H. Fan, L. Zheng, C. Yan, and Y. Yang, “Unsupervised person reidentification: Clustering and fine-tuning,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 14, no. 4, p. 83, 2018. 3, 12, 20, 21
Y. Fu, Y. Wei, G. Wang, Y. Zhou, H. Shi, and T. S. Huang, “Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019, pp. 6112–6121. 3, 12, 14, 20, 21
Z. Zhang, J. Wu, X. Zhang, and C. Zhang, “Multi-target, multi-camera tracking by hierarchical clustering: recent progress on dukemtmc project,”arXiv preprint arXiv:1712.09531, 2017. 4
E. Ristani and C. Tomasi, “Features for multi-target multi-camera tracking and re-identification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6036–6046. 4, 11, 18
L. Zheng, H. Zhang, S. Sun, M. Chandraker, Y. Yang, and Q. Tian, “Person reidentification in the wild,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1367–1376. 11
A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,” arXiv preprint arXiv:1703.07737, 2017. 11
H.-X. Yu, W.-S. Zheng, A. Wu, X. Guo, S. Gong, and J.-H. Lai, “Unsupervised person re-identification by soft multilabel learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2148–2157. 12, 21
X. Zhang, J. Cao, C. Shen, and M. You, “Self-training with progressive augmentation for unsupervised cross-domain person re-identification,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019, pp. 8222–8231. 12, 20, 21
M. Ester, H. Kriegel, J. Sander, and X. Xiaowei, “A density-based algorithm for discovering clusters in large spatial databases with noise,” AAAI Press, Menlo Park, CA (United States), Tech. Rep., 1996. 14
T. Dekel, S. Oron, M. Rubinstein, S. Avidan, and W. T. Freeman, “Bestbuddies similarity for robust template matching,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2021–2029. 15
P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987. 16
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1116–1124. 19, 33
E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Proceedings of European Conference on Computer Vision (ECCV). Springer, 2016, pp. 17–35. 19, 33, 38
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. 20, 34, 39
A. Rosenberg and J. Hirschberg, “V-measure: A conditional entropy-based external cluster evaluation measure,” in Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), 2007, pp. 410–420. 24
A. Milan, L. Leal-Taix´e, I. Reid, S. Roth, and K. Schindler, “Mot16: A benchmark for multi-object tracking,” arXiv preprint arXiv:1603.00831, 2016. 33
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Proceedings of European Conference on Computer Vision (ECCV). Springer, 2016, pp. 21–37. 36, 47
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510–4520. 36, 47
R. E. Kalman, “A new approach to linear filtering and prediction problems,”Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960. 42
N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in Proceedings of IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 3645–3649. 44
H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955. 44
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊