“…Auto-detecting objects with their semantics and context from camera views (e.g., [50,55,68]) can assist visual perception for various interested groups and information processing, e.g., robotic affordance and different types of disability. Automation through a comprehensive dataset that provides a granularity of object classes is critical to infer necessary information from semantics.…”