Tongue diagnosis is an important way of monitoring human health status in traditional Chinese medicine. As a key step of achieving automatic tongue diagnosis, the major challenges for robust and accurate segmentation and identification of tongue body in tongue images lay in the large variations of tongue appearance, e.g., tongue texture and tongue coating, caused by different diseases for different patients. To cope with these challenges, we propose a novel end-to-end model for multi-task learning of tongue localization and segmentation, named TongueNet, in which pixel-level prior information is utilized for supervised training of deep convolutional neural network. Firstly, we introduce a feature pyramid network based on the designed context-aware residual blocks for the extraction of multi-scale tongue features. Then, the region of interests (ROIs) of tongue candidates are located in advance from the extracted feature maps. Finally, finer localization and segmentation of tongue body are conducted based on the feature maps of ROIs. Quantitative and qualitative comparisons on real-world datasets show that the proposed TongueNet achieves state-of-the-art performance for the segmentation of tongue body in terms of both robustness and accuracy.
Tongue diagnosis plays a key role in TCM (Traditional Chinese Medicine) diagnosis. Tongue image segmentation lays a solid foundation for quantitative tongue analysis and diagnosis. However, the segmentation of tongue body is challenging due to the factors such as large personal variation of tongue body on color, texture and shape, as well as weak edges caused by similar color between tongue body and neighboring tissues, especially the lip. Existing segmentation methods usually use only single color component and simple prior knowledge, thus leading to inaccuracy and instability. To alleviate these issues, a patch-driven segmentation method with sparse representation is proposed in this paper. Specifically, each patch in the testing image is sparsely represented by patches in the spatially varying dictionary, which is constructed by the local patches of training images. The derived sparse coefficients are then employed to estimate the tongue probability. Finally, the hard segmentation is obtained by applying the maximum a posteriori (MAP) rule on the tongue probability map and further polished with morphological operations. The proposed method has been extensively evaluated on a tongue image dataset including 290 subjects using 10-fold cross-validation, as well as additional 10 unseen testing subjects. The proposed method has achieved more accurate segmentation results, compared with the state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.