“…Recently, neural radiance fields (NeRF) [28,13,19,32,33,35,36,47,50,52,53] have shown photo-realistic novel view synthesis results in per-scene optimization settings. To avoid the expensive per-scene training and improve the practicality, generalizable NeRFs [36,52,47] have been proposed which use image-conditioned, pixel-aligned features and achieve feed-forward view synthesis from sparse input views [36,52]. Direct application of these methods to complex and non-rigid human motion is not straightforward, however, and naive solutions suffer from significant artifacts as shown in Fig.…”