Often when dealing with text data, there exists valuable information that determines the relationship between the words encountered in the corpus. The type of relationship which is sought after is the “has-a” and “is-a” relationship, with which one can build a hierarchical representation of words. Since each language has its own set of rules and syntax, extraction of the relationships ultimately boils down to understanding the syntax of the particular language and using relevant features in the process.
The paper presents a machine-learning model for understanding the language syntax and deducing the relationships between the words encountered. To be specific, a sequence modeling approach if followed, where the model receives a sequence of words and makes use of the various properties of the words to build a hierarchical graph. The algorithm described will be independent of the language, and the model should be versatile enough to be trained for different languages. In addition, the paper also describes how this information can be used to build better topic models, given a corpus of text.
This is a new algorithm which can be applied to contours having discontinuous boundaries in order to obtain its1-pixel thickness equivalent and complete the shape of the contour which can be followed by region filling operation. The conventional thinning algorithm often modifies the shape of the contour and the obtained skeleton will not resemble the original image and hence the thinning operation is not used in the proposed algorithm.
After reducing the thickness of the boundary to 1 pixel, the shape is completed by traversing along the boundary and joining each black pixel (if there is no black pixel in N8 (p)) to its nearest black pixel. The completion of the shape is necessary because ifregion filling algorithm is applied directly it will result in blackening of the entire image or may have no effect depending on the algorithm being used. Hence after linking the discontinuities an existing region filling algorithm such as Boundary Fill is employed to fill the region.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.