“…The implementation of various RSA services differs in three key areas: (i) the communication medium between users and remote sighted assistants. Earlier prototypes used audio [56], images [57,20], one-way video using portable digital cameras [58,59], or webcams [59], whereas the recent ones are using two-way video with smartphones [60,61,24,25]; (ii) the instruction form, e.g., via texts [62], synthetic speech [56], natural conversation [60,24,25], or vibrotactile feedback [63,64]; and (iii) localization technique, e.g., via GPS-sensor, crowdsourcing images or videos [65,20,66,67], fusing sensors [65], or using CV as discussed in the next subsection.…”