A breakthrough in the understanding of dynamic 3D shape recognition was the discovery that our visual system can extract 3D shape from inputs having only sparse motion cues such as (i) point light displays and (ii) random dot displays representing rotating 3D shapes - phenomena named as biological motion (BM) processing and structure from motion (SFM) respectively. Previous psychological and computational modeling studies viewed these two as separate phenomena and could not fully identify the shared visual processing mechanisms underlying the two phenomena. Using a series of simulation studies, we describe the operations of a dynamic deep network model to explain the mechanisms underlying both SFM and BM processing. In simulation-1, the proposed Structure from Motion Network (SFMNW) is trained using displays of 5 rotating surfaces (cylinder, cone, ellipsoid, sphere and helix) and tested on its shape recognition performance under a variety of conditions: (i) varying dot density, (ii) eliminating local feature stability by introducing a finite dot lifetime, (iii) orienting shapes, (iv) occluding boundaries and intrinsic surfaces (v) embedding shape in static and dynamic noise backgrounds. Our results indicate that smaller dot density of rotating shape, oriented shapes, occluding boundaries, and dynamic noise backgrounds reduced the model's performance whereas eliminating local feature stability, occluding intrinsic boundaries, and static noise backgrounds had little effect on shape recognition, suggesting that the motion of high curvature regions like shape boundaries provide strong cues in shape recognition. In simulation-2, the proposed Biological Motion Network (BMNW) is trained using 6 point-light actions (crawl, cycle, walk, jump, wave, and salute) and tested its action recognition performance on various conditions: (i) inverted (ii) scrambled (iii) tilted (iv) masked (v) actions, embedded in static and dynamic noise backgrounds. Model performance dropped significantly for the presentation of inverted and tilted actions. On the other hand, better accuracy was attained in distinguishing scrambled, masked actions, performed under static and dynamic noise backgrounds, suggesting that critical joint movements and their movement pattern generated in the course of action (actor configuration) play a key role in action recognition performance. We also presented the above two models with mixed stimuli (a point light actions embedded in rotating shapes), and achieved significantly high accuracies. Based on the above results we hypothesize that visual motion circuitry supporting robust SFM processing is also involved in the BM processing. The proposed models provide new insights into the relationships between the two visual motion phenomena viz., SFM and BM processing.