“…With respect to non-linear modeling power, various network structures have been exploited to represent contexts for segmentation disambiguation, including multi-layer perceptrons on fivecharacter windows (Zheng et al, 2013;Pei et al, 2014;Chen et al, 2015a), as well as LSTMs on characters (Chen et al, 2015b;Xu and Sun, 2016) and words (Morita et al, 2015;Cai and Zhao, 2016;Zhang et al, 2016b). For structured learning and inference, CRF has been used for character sequence labelling models (Pei et al, 2014;Chen et al, 2015b) and structural beam search has been used for word-based segmentors (Cai and Zhao, 2016;Zhang et al, 2016b).…”