Robot <scp>Ego‐Noise</scp> Suppression with <scp>Labanotation‐Template</scp> Subtraction

Jaroslavceva, Jekaterina; Wake, Naoki; Sasabuchi, Kazuhiro; Ikeuchi, Katsushi

doi:10.1002/tee.23523

Cited by 6 publications

(4 citation statements)

References 23 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As part of our efforts to develop a realistic robotic operation system, we have integrated the proposed system with a learning-from-observation system (Fig. 17) that includes a speech interface [44], [45], a visual teaching interface [33], a reusable library of robot actions [46], and a simulator for testing robot execution [47]. Please refer to the respective papers for the results of robot execution, as it is beyond the scope of this paper.…”

Section: Discussion: Towards More General Robotic Applicationsmentioning

confidence: 99%

“…The user sends a query to the robot system via text or microphone input. Microphone input is noise-suppressed to prevent the robot's ego noise from interfering with recognition [7], [8]and then converted to text using a third-party text-to-speech technology [9]. The robot system then generates a prompt for the GPT-3/ChatGPT model based on this input.…”

Section: Pipelinementioning

confidence: 99%

See 1 more Smart Citation

A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations

Wake

Arakawa

Yanokura

et al. 2021

2021 IEEE/SICE International Symposium on System Integration (SII)

Self Cite

View full text Add to dashboard Cite

This paper aims to provide a specific example of how OpenAI's ChatGPT can be used in a few-shot setting to convert natural language instructions into a sequence of executable robot actions (Fig. 1). Generating programs for robots from natural language instructions is an attractive goal, but the practical application using ChatGPT is still in its early stages, and there is no established methodology yet. Here, we have designed easy-to-customize input prompts for ChatGPT that meet common requirements in many practical applications, including: 1) easy integration with robot execution systems or visual recognition programs, 2) applicability to various environments, and 3) the ability to provide long-step instructions while minimizing the impact of ChatGPT's token limit. Specifically, the prompts encourage ChatGPT to 1) output a sequence of predefined robot actions with explanations in a readable JSON format, 2) represent the operating environment in a formalized style, and 3) infer and output the updated state of the operating environment as the result of each operation, which will be input with the next instruction to allow ChatGPT to work based solely on the memory of the latest operations. Through experiments, we confirmed that the proposed prompts allow ChatGPT to act in accordance with the requirements in various environments. Additionally, we observed that ChatGPT's conversational ability allows users to adjust its output with natural language feedback, which is crucial for developing an application that is both safe and robust while providing a user-friendly interface. Users can easily customize the prompts as templates. The contribution of this paper is to provide and publish the prompts, which are generic enough to be easily modified to fit the requirements of each experimenter, thereby providing practical knowledge to the robotics research community. Our prompts and source code for using them are open-source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts. Fig. 1. This paper shows practical prompts for ChatGPT to generate for translating a sequences of executable robot actions from multi-step human instructions in various environments.

show abstract

Section: Discussion: Towards More General Robotic Applicationsmentioning

confidence: 99%

Section: Pipelinementioning

confidence: 99%

A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations

Wake

Arakawa

Yanokura

et al. 2021

2021 IEEE/SICE International Symposium on System Integration (SII)

Self Cite

View full text Add to dashboard Cite

show abstract

“…As part of our efforts to develop a realistic robotic operation system, we have integrated our proposed task planner with a learning-from-observation system (Fig. 19) incorporating a speech interface [45], [46], a visual teaching interface [47], a reusable robot skill library [48], [49], and a simulator [50]. The code for the teaching system is available at: https://github.com/microsoft/ cohesion-based-robot-teaching-interface.…”

Section: Connection With Vision Systems and Robot Controllersmentioning

confidence: 99%

ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

Wake,

Kanehira,

Sasabuchi

et al. 2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

This paper introduces a novel method for translating natural-language instructions into executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose customizable input prompts for ChatGPT that can easily integrate with robot execution systems or visual recognition programs, adapt to various environments, and create multi-step task plans while mitigating the impact of token limit imposed on ChatGPT. In our approach, ChatGPT receives both instructions and textual environmental data, and outputs a task plan and an updated environment. These environmental data are reused in subsequent task planning, thus eliminating the extensive record-keeping of prior task plans within the prompts of ChatGPT. Experimental results demonstrated the effectiveness of these prompts across various domestic environments, such as manipulations in front of a shelf, a fridge, and a drawer. The conversational capability of ChatGPT allows users to adjust the output via natural-language feedback. Additionally, a quantitative evaluation using VirtualHome showed that our results are comparable to previous studies. Specifically, 36% of task planning met both executability and correctness, and the rate approached 100% after several rounds of feedback. Our experiments revealed that ChatGPT can reasonably plan tasks and estimate postoperation environments without actual experience in object manipulation. Despite the allure of ChatGPTbased task planning in robotics, a standardized methodology remains elusive, making our work a substantial contribution. These prompts can serve as customizable templates, offering practical resources for the robotics research community. Our prompts and source code are open source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-Prompts.

show abstract

“…Most prior research on the factors that affect ASR has focused on noise, speaker accent, speaker age, and multiple speakers. Noise persists as a significant hurdle to developing ASR, and many approaches have been proposed to enhance the robustness of ASR systems [18,19]. Our previous study suggested a method to associate the Articulation Index to estimate the influence of stationary noise on the ASR word accuracy (ACC) [20].…”

Section: Introductionmentioning

confidence: 99%

Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin

Li,

Ni,

Huang

2024

Applied Sciences

View full text Add to dashboard Cite

Automatic speech recognition (ASR) has been widely used to realize daily human–machine interactions. Face masks have become everyday wear in our post-pandemic life, and speech through masks may have impaired the ASR. This study explored the effects of different kinds of face masks (e.g., surgical mask, KN95 mask, and cloth mask) on the Mandarin word accuracy of two ASR systems with or without noises. A mouth simulator was used to play speech audio with or without wearing a mask. Acoustic signals were recorded at distances of 0.2 m and 0.6 m. Recordings were mixed with two noises at a signal-to-noise ratio of +3 dB: restaurant noise and speech-shaped noise. Results showed that masks did not affect ASR accuracy without noise. Under noises, masks did not significantly influence ASR accuracy at 0.2 m but had significant effects at 0.6 m. The activated-carbon mask had the most significant impact on ASR accuracy at 0.6 m, reducing the accuracy by 18.5 percentage points compared to that without a mask, whereas the cloth mask had the least effect on ASR accuracy at 0.6 m, reducing the accuracy by 0.9 percentage points. The acoustic attenuation of masks on the high-frequency band at around 3.15 kHz of the speech signal attributed to the effects of masks on ASR accuracy. When training ASR models, it may be important to consider mask robustness.

show abstract

Robot Ego‐Noise Suppression with Labanotation‐Template Subtraction

Cited by 6 publications

References 23 publications

A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations

A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations

ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin

Contact Info

Product

Resources

About