Large foundation models can exhibit unique capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internetscale text with no images (e.g. from spreadsheets, to SAT questions). As a result, these models store different forms of commonsense knowledge across different domains. In this work, we show that this model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue -in which new multimodal tasks are formulated as a guided languagebased exchange between different pre-existing foundation models, without additional finetuning. In the context of egocentric perception, we present a case study of Socratic Models (SMs) that can provide meaningful results for complex tasks such as generating freeform answers to contextual questions about egocentric video, by formulating video Q&A as short story Q&A, i.e. summarizing the video into a short story, then answering questions about it. Additionally, SMs can generate captions for Internet images, and are competitive with state-of-the-art on zero-shot video-to-text retrieval with 42.8 R@1 on MSR-VTT 1k-A. SMs demonstrate how to compose foundation models zeroshot to capture new multimodal functionalities, without domain-specific data collection. Prototypes are available at socraticmodels.github.io.
A novel compliant robot is proposed for traversing on unstructured terrains. The robot consists of modules, each containing a link and an active wheel-pair, and neighboring modules are connected using a passive joint. This type of robots are lighter and provide high durability due to the absence of link-actuators. However, they have limited climbing ability due to tendency of tipping over while climbing big obstacles. To overcome this disadvantage, the use of compliant joints is proposed in this work. Stiffness of each compliant joint is estimated by formulating an optimization problem with an objective to minimize link joint moments while maintaining static-equilibrium. This is one of the key novelties of the proposed work. A design methodology is also proposed for developing an n-module compliant robot for climbing a given height on a known surface. The efficacy of the proposed formulation is illustrated using numerical simulations of the three and five module robots. The robot is successfully able to climb maximum heights upto three times and six times the wheel diameter using three and five modules, respectively. A working prototype was developed and the simulation results were successfully validated on it.
We study the problem of deploying a high number of low-cost, low-complexity robots inside a known environment with the objective that at least one robotic platform reaches each of N preassigned goal locations. Our study is inspired by SensorFly, a micro-aerial vehicle successfully used for mobile sensor network applications. SensorFly nodes feature limited on-board sensors, so one has to rely on simple navigation strategies and increase performance through redundance in the team. We introduce a simple, fully scalable deployment algorithm exploiting the limited capabilities offered by the SensorFly platform, and we explore its performance by feeding the simulation system with parameters extracted from the real SensorFly platform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.