(3.236.118.225) 您好!臺灣時間:2021/05/16 11:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:林宗翰
研究生(外文):Tsung-Han Lin
論文名稱:以移動立體視覺相機搭配無監督式時間與空間特徵與監督式應用遞迴類神經網路進行動態物體偵測
論文名稱(外文):Unsupervised Spatio-temporal Feature Learning and Supervised Recursive Neural Network learning for Motion Segmentation from a Moving Stereo Camera
指導教授:王傑智
指導教授(外文):Chieh-Chih (Bob) Wang
口試委員:劉長遠傅立成李明穗連豊力
口試委員(外文):Cheng-Yuan LiouLi-Chen FuMing-Sui LeeFeng-Li Lian
口試日期:2013-07-26
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:38
中文關鍵詞:深度學習遞迴類神經網路移動立體視覺相機動態物體偵測時間與空間特徵學習
外文關鍵詞:Deep LearningAutoencodersMotion segmentationMoving object detectionReconstruction Independent Component AnalysisRecursive Neural Network
相關次數:
  • 被引用被引用:0
  • 點閱點閱:219
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出用深度學習(deep learning) 直接從原始資料無監督式
(unsupervisedly) 的學出動態物體的特徵。具體來說, ,深度學習的技術,像捲 積 (convolution), 兩個非線性的固定卷積 (pooling), 與堆疊 (stacking) ,會被運用來學習時間與空間特徵 (spatio-temporal features) 的多層次表示 (hierarchical representation)。本文是基於移動立體視覺相機的資料進行學習。時
間與空間特徵整合遞迴類神經網路 (Recursive Neural Network) 之後,就可以從影像中辨認出動態物體 (motion segmentation) 。實驗結果顯示,本文提出來的方法,比較於用點特徵 (point feature) 加上運動模型 (egomotion) 的方法,可以在難偵測點特徵的地方,提取特徵助於動態物體偵測

IN this work deep learning is used to unsupervisedly learn features directly from raw data. Instead of hand-engineering features for each new sensor input data, the system advantageously adapts to new data by unsupervised
learning. More specifically, deep learning techniques of convolution, pooling,and stacking are used to learn hierarchical representation of spatio-temporal features
from unlabeled stereo video data. The spatio-temporal features are learned based on Reconstruction Independent Component Analysis (RICA) autoencoder.The learned features are then applied to do motion segmentation on moving objects
in images from a moving stereo camera. In order to do so the spatio-temporalfeatures are extracted from image segments, and Recursive Neural Network is used to recursively build up a segmentation tree to segment out moving objects from the
scenes. To our knowledge, this is the first time deep learning is applied on learning spatio-temporal features together with motion segmentation (scene-parsing).
Comparing to moving object detection methods using point features with egomotion estimation, we show our features can be extracted in situations where good point features are not detectable. The system is evaluated with real-world data with
results similar to state-of-the-art, while achieving better detection in certain situations.

ABSTRACT.................................. ii
LIST OF FIGURES............................ iv
CHAPTER 1. Introduction ................... 1
CHAPTER 2. Related Work ..................... 3
CHAPTER 3. Unsupervised Spatio-temporal Feature Learning . 5
3.1. Deep Learning Concepts .................... 5
3.1.1. Autoencoders ............................. 5
3.1.2. Convolution .............................. 7
3.1.3. Pooling .................................. 8
3.1.4. Stacking ................................. 9
3.2. Reconstruction ICA Learning Module ......... 10
3.3. Stacked Architecture ....................... 12
3.4. Spatio-temporal Features Analysis and Visualization . 14
CHAPTER 4. Motion Segmentation - Recursive Neural Network . 18
4.1. Generating Features for Individual Segment ..... 18
4.2. Cost function and Max-Margin Estimation ........ 19
4.3. Greedy Structure Prediction .................... 21
CHAPTER 5. Experiments ............................. 25
5.1. Training .............................25
5.1.1. Unsupervised Spatio-temporal Feature Training...25
5.1.2. Recursive Neural Network Parsing..............26
5.2. Dataset........................................26
5.3. Moving Object Detection with Libviso2 Egomotion Estimation... 27
5.4. Results ........................................28
5.5. Analysis ..........................................29
CHAPTER 6. Conclusion and Future Work......................35
BIBLIOGRAPHY ..........................................36

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & S ‥ usstrunk, S. (2010). Slic
superpixels. E′cole Polytechnique Fe′de′ral de Lausssanne (EPFL), Tech. Rep, 149300.
Alcantarilla, P. F., Yebes, J. J., Almaz′an, J., & Bergasa, L. M. (2012). On combining
visual slam and dense scene flow to increase the robustness of localization and
mapping in dynamic environments. In IEEE International Conference on Robotics
and Automation (ICRA), (pp. 1290–1297). IEEE.
Coates, A. & Ng, A. (2011). Selecting receptive fields in deep networks. In Advances
in Neural Information Processing Systems, (pp. 2528–2536).
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2012). Scene parsing with multiscale
feature learning, purity trees, and optimal covers. In International Conference
on Machine Learning (ICML).
Geiger, A., Ziegler, J., & Stiller, C. (2011). Stereoscan: Dense 3d reconstruction in
real-time. In Intelligent Vehicles Symposium (IV).
Goller, C. & Kuchler, A. (1996). Learning task-dependent distributed representations
by backpropagation through structure. In Neural Networks, 1996., IEEE
International Conference on, volume 1, (pp. 347–352). IEEE.
Grangier, D., Bottou, L., & Collobert, R. (2009). Deep convolutional networks for
scene parsing. In ICML 2009 Deep Learning Workshop, volume 3. Citeseer.
Jain, V., Murray, J., Roth, F., Turaga, S., Zhigulin, V., Briggman, K., Helmstaedter,
M., Denk,W., & Seung, H. (2007). Supervised learning of image restoration with
convolutional networks. In IEEE 11th International Conference on Computer Vision (ICCV).
Jarrett, K., Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2009). What is the best
multi-stage architecture for object recognition? In Computer Vision, 2009 IEEE
12th International Conference on, (pp. 2146–2153).
Kundu, A., Krishna, K., & Sivaswamy, J. (2009). Moving object detection by multiview
geometric techniques from a single camera mounted robot. In Intelligent
Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, (pp.
4306–4312).
Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng,
A. (2012). Building high-level features using large scale unsupervised learning.
In International Conference in Machine Learning (ICML).
Le, Q., Zou, W., Yeung, S., & Ng, A. (2011). Learning hierarchical invariant spatiotemporal
features for action recognition with independent subspace analysis.
In IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR).
Le, Q. V., Karpenko, A., Ngiam, J., & Ng, A. (2011). Ica with reconstruction cost for
efficient overcomplete feature learning. In Advances in Neural Information Processing
Systems, (pp. 1017–1025).
Le, Q. V., Ngiam, J., Chen, Z., Chia, D., Koh, P. W., & Ng, A. Y. (2010). Tiled convolutional
neural networks. In In NIPS, in press.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning
applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Lenz, P., Ziegler, J., Geiger, A., & Roser, M. (2011). Sparse scene flow segmentation
for moving object detection in urban environments. In Intelligent Vehicles
Symposium (IV), 2011 IEEE, (pp. 926–932).
Ma, L. & Zhang, L. (2008). Overcomplete topographic independent component
analysis. Neurocomputing, 71(10), 2217–2223.
Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L., & Barbano, P. E. (2005).
Toward automatic phenotyping of developing embryos from videos. Image Processing,
IEEE Transactions on, 14(9), 1360–1371.
Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006). Subgradient methods for maximum
margin structured learning. In ICML Workshop on Learning in Structured
Output Spaces, volume 46. Citeseer.
Socher, R., Huval, B., Bhat, B., Manning, C. D., & Ng, A. Y. (2012). Convolutionalrecursive
deep learning for 3d object classification. In Advances in Neural Information
Processing Systems 25.
Socher, R., Lin, C. C., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes
and natural language with recursive neural networks. In Proceedings of the 26th
International Conference on Machine Learning (ICML).
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. D. (2004). Max-margin
parsing. In EMNLP, volume 1, (pp.˜3).
van Hateren, J. H. & Ruderman, D. L. (1998). Independent component analysis of
natural image sequences yields spatiotemporal filters similar to simple cells in
primary visual cortex. In Royal Society: Biological Sciences.
Wedel, A., Meissner, A., Rabe, C., Franke, U., & Cremers, D. (2009). Detection
and segmentation of independently moving objects from dense scene flow. In
Proceedings of the 7th International Conference on Energy Minimization Methods in
Computer Vision and Pattern Recognition.
Zou, W. Y., Zhu, S., Ng, A. Y., & Yu, K. (2012). Deep learning of invariant features
via simulated fixations in video. In Advances in Neural Information Processing Systems
(NIPS).

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top