“…Specifically, it consists of three steps: (1) collect sensory-motor information (e.g., camera image, joint angle, and torque) with the robot as learning data when a human teleoperates the robot or performs direct teaching, (2) input the sensor information x t at time t into the model, output the sensor information ŷt+1 at the next time t + 1 , and update the weights of the model to minimize the error between the predicted value ŷt+1 and the true value x t+1 , and (3) at execution time, the robot is made to generate sequential motions by inputting the robot's sensor information x t and inputting the predicted value (motion command value) to the robot for the next time. This method can be used to perform various tasks, such as flexible object handling, which is difficult to do with the conventional method [31,32].…”