|
The researches of the motion recognition has received more and more attentions in recent years because the need for computer vision is increasing in many domains, such as the surveillance system, multimodal human computer interface, and traffic control system. Most of the existing approaches separate the recognition into the spatial feature extraction and time domai??cognition. However, we believe that the information of motion resides in the space-time domain, not restricted to the time domain or space domain only. Consequently, it seems more reasonable to integrate the feature extraction and classification in the space and time domains altogether. We propose a Space-Time Delay Neural Network (STDNN) that can deal with the 3-D dynamic information, such as motion recognition. For the motion recognition problem that we focus in this paper, the STDNN is an unified structure, in which the low-level spatiotemporal feature extraction and space-time recognition are embedded. It possesses the spatiotemporal shift-invariant recognition abilities that are inherited from the time delay neural network (TDNN) and space displacement neural network (SDNN). Unlike the multilayer perceptron (MLP), TDNN, and SDNN, the STDNN is constructed by the vector-type nodes and matrix-type links such that the spatiotemporal information can be gracefully represented in a neural network. Some experiments are done to evaluate the performance of the proposed STDNN. In the moving Arabic numerals (MAN) experiments, which simulate the object'smoving in the space-time domain by image sequences, the STDNN shows its generalization ability on spatiotemporal shift-invariance recognition. In the lipreading experiment, the STDNN recognizes the lip motions by the inputs of real image sequences. It shows that the STDNN has better performance than the existing TDNN- based system, especially on the generalization ability. Although the lipreading is a more specific application, the STDNN can be applied to other applications since no domain-dependentknowledge is used in the experiment.
|