“…This has become increasingly evident in work that has sought solutions to the REG problem in naturalistic scenes, as part of a broader research focus on the visionlanguage interface (Kazemzadeh et al, 2014;Mao et al, 2016;Yu et al, 2016). However, context dependence is also a central concern for approaches to REG that assume a more structured input representation where entities and their properties are available, but the extent to which a property applies to a referent is not necessarily an allor-none decision (Horacek, 2005;van Deemter, 2006;Turner et al, 2008;Williams and Scheutz, 2017). Under these conditions, it is no longer possible to assume that properties are crisp or Boolean, or even that both sender and receiver necessarily assume the same semantics for those properties.…”