Generating natural language tags for video information management

Khan, Muhammad Usman; Gotoh, Yoshihiko

doi:10.1007/s00138-017-0825-7

Cited by 3 publications

(2 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The image description is generated by creating visual dependency representation of natural images in [38]. Natural language description generation is also done for video for their retrieval purpose in [39], in which they capture relations between keywords associated with videos. Evaluation of machine translation with human‐generated description is also necessary.…”

Section: Related Workmentioning

confidence: 99%

SUGAMAN: describing floor plans for visually impaired by annotation learning and proximity‐based grammar

et al. 2019

View full text Add to dashboard Cite

In this study, the authors propose a framework SUGAMAN (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation). SUGAMAN is a Hindi word meaning ‘easy passage from one place to another’. SUGAMAN synthesises textual description from a given floor plan image, usable by visually impaired to navigate by understanding the arrangement of rooms and furniture. It is the first framework for describing a floor plan and giving direction for obstacle‐free movement within a building. The model learns five classes of room categories from 1355 room image samples under a supervised learning paradigm. These learned annotations are fed into a description synthesis framework to yield a holistic description of a floor plan image. Authors demonstrate the performance of various supervised classifiers on room learning and provided a comparative analysis of system generated and human‐written descriptions. The contribution of this study includes a novel framework for description generation from document images with graphics while proposing a new feature representing the floor plans, text annotations for a publicly available data set, and an algorithm for door to door obstacle avoidance navigation. This work can be applied to areas like understanding floor plans and design of historical monuments, and retrieval.

show abstract

Section: Related Workmentioning

confidence: 99%

SUGAMAN: describing floor plans for visually impaired by annotation learning and proximity‐based grammar

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Image description is generated by creating visual dependency representation of natural images in [25]. Natural language description generation is also done for video for their retrieval purpose in [26], in which they capture relations between keywords associated with videos. Evaluation of machine translation with human generated description is also necessary.…”

Section: B Image Description Generationmentioning

confidence: 99%

SUGAMAN: Describing Floor Plans for Visually Impaired by Annotation Learning and Proximity based Grammar

Goyal¹,

Bhavsar²,

Patel³

et al. 2018

Preprint

View full text Add to dashboard Cite

In this paper, we propose SUGAMAN (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation). SUGAMAN is a Hindi word meaning "easy passage from one place to another". SUGAMAN synthesizes textual description from a given floor plan image for the visually impaired. A visually impaired person can navigate in an indoor environment using the textual description generated by SUGAMAN. With the help of a text reader software the target user can understand the rooms within the building and arrangement of furniture to navigate. SUGAMAN is the first framework for describing a floor plan and giving direction for obstacle-free movement within a building. We learn 5 classes of room categories from 1355 room image samples under a supervised learning paradigm. These learned annotations are fed into a description synthesis framework to yield a holistic description of a floor plan image. We demonstrate the performance of various supervised classifiers on room learning. We also provide a comparative analysis of system generated and human written descriptions. SUGAMAN gives state of the art performance on challenging, real-world floor plan images. This work can be applied to areas like understanding floor plans of historical monuments, stability analysis of buildings, and retrieval.

show abstract