LiCaNet: Further Enhancement of Joint Perception and Motion Prediction based on Multi-Modal Fusion

Khalil, Yasser; Mouftah, Hussein T.

doi:10.36227/techrxiv.16553580

Cited by 2 publications

(12 citation statements)

References 33 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The two input LIDAR representations employed in [2] are BEV and RV images. Lastly, recently proposed LiCaNet [1] extends [2] with camera image fusion. LiCaNet records excellent performance for both perception and motion prediction compared to its predecessor.…”

Section: B Perception and Motion Predictionmentioning

confidence: 99%

“…Empty cells are filled with a value of −1. Further details on the projection algorithm can be found in [1].…”

Section: B Licanext Architecturementioning

confidence: 99%

“…They consist of spatio-temporal information sourced from two representations (BEV and RV), physical object dimensions encoded in the input BEV images, occlusion information provided from RV images, and rich semantics signified in a camera image. When these features are inserted into MotionNet backbone network, they yield accurate pixel-wise joint perception and motion predic- Output: [1], :] += 1 end P /= count // average (avoid division by 0) mask = (count == 0) P[mask, :] = -1 // assign -1 to empty cells end tion in real-time.…”

Section: B Licanext Architecturementioning

confidence: 99%

“…The first experiment records the performance of the original MotionNet model, which acts as the primary baseline. LiCaNet [1] proved that engaging RV and camera images into the fusion process outperform the baseline, which depends merely on BEV images as input. Our proposed LiCaNext expands on LiCaNet by incorporating residual images pushing the performance even further.…”

Section: Experimental Evaluation a Datasetmentioning

confidence: 99%

“…Following [1], the learning rate is initialized at 1.6×10 −3 and terminated at 0.8 × 10 −3 , with a decay factor of 0.5 every 10 epochs. The time span between consecutive LIDAR sweeps used to construct the historical BEVs and the sequential residual range images is 0.2s.…”

Section: Training Setupmentioning

confidence: 99%

See 4 more Smart Citations