A team of researchers at the Microsoft Autonomous Systems and Robotics Research Group including Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor has extended the capabilities of ChatGPT to robotics, and controlled multiple platforms such as robot arms, drones, and home assistant robots intuitively with language.
Have you ever wanted to tell a robot what to do using your own words, like you would to a human? Wouldn’t it be amazing to just tell your home assistant robot: “Please warm up my lunch“, and have it find the microwave by itself? Even though language is the most intuitive way for us to express our intentions, we still rely heavily on hand-written code to control robots. Our team has been exploring how we can change this reality and make natural human-robot interactions possible using OpenAI‘s new AI language model, ChatGPT.
ChatGPT is a language model trained on a massive corpus of text and human interactions, allowing it to generate coherent and grammatically correct responses to a wide range of prompts and questions. Our goal with this research is to see if ChatGPT can think beyond text, and reason about the physical world to help with robotics tasks. We want to help people interact with robots more easily, without needing to learn complex programming languages or details about robotic systems. The key challenge here is teaching ChatGPT how to solve problems considering the laws of physics, the context of the operating environment, and how the robot’s physical actions can change the state of the world.
It turns out that ChatGPT can do a lot by itself, but it still needs some help. Our technical paper describes a series of design principles that can be used to guide language models towards solving robotics tasks. These include, and are not limited to, special prompting structures, high-level APIs, and human feedback via text. We believe that our work is just the start of a shift in how we develop robotics systems, and we hope to inspire other researchers to jump into this exciting field. Continue reading for more technical details on our methods and ideas.
Challenges in robotics today, and how ChatGPT can help
Current robotics pipelines begin with an engineer or technical user that needs to translate the task’s requirements into code for the system. The engineer sits in the loop, meaning that they need to write new code and specifications to correct the robot’s behavior. Overall, this process is slow (user needs to write low-level code), expensive (requires highly skilled users with deep knowledge of robotics), and inefficient (requires multiple interactions to get things working properly).
ChatGPT unlocks a new robotics paradigm, and allows a (potentially non-technical) user to sit on the loop, providing high-level feedback to the large language model (LLM) while monitoring the robot’s performance. By following our set of design principles, ChatGPT can generate code for robotics scenarios. Without any fine-tuning we leverage the LLM’s knowledge to control different robots form factors for a variety of tasks. In our work we show multiple examples of ChatGPT solving robotics puzzles, along with complex robot deployments in the manipulation, aerial, and navigation domains.
Robotics with ChatGPT: design principles
Prompting LLMs is a highly empirical science. Through trial and error, we built a methodology and a set of design principles for writing prompts for robotics tasks:
- First, we define a set of high-level robot APIs or function library. This library can be specific to a particular robot, and should map to existing low-level implementations from the robot’s control stack or a perception library. It’s very important to use descriptive names for the high-level APIs so ChatGPT can reason about their behaviors;
- Next, we write a text prompt for ChatGPT which describes the task goal while also explicitly stating which functions from the high-level library are available. The prompt can also contain information about task constraints,
or how ChatGPT should form its answers (specific coding language, using auxiliary parsing elements); - The user stays on the loop to evaluate ChatGPT’s code output, either through direct inspection or using a simulator. If needed, the user uses natural language to provide feedback to ChatGPT on the answer’s quality and safety.
- When the user is happy with the solution, the final code can be deployed onto the robot.
Enough theory… What exactly can ChatGPT do?
Let’s take a look at a few examples…
Zero-shot task planning
We gave ChatGPT access to functions that control a real drone, and it proved to be an extremely intuitive language-based interface between the non-technical user and the robot. ChatGPT asked clarification questions when the user’s instructions were ambiguous, and wrote complex code structures for the drone such as a zig-zag pattern to visually inspect shelves. It even figured out how to take a selfie! 📷
We also used ChatGPT in a simulated industrial inspection scenario with the Microsoft AirSim simulator. The model was able to effectively parse the user’s high-level intent and geometrical cues to control the drone accurately.
PromptCraft, a collaborative open-sourced tool for LLM+Robotics research
Good prompt engineering is crucial for the success of LLMs such as ChatGPT for robotics tasks. Unfortunately, prompting is an empirical science, and there is a lack of comprehensive and accessible resources with good (and bad) examples to help researchers and enthusiasts in the field. To address this gap, we introduce PromptCraft, a collaborative open-source platform where anyone can share examples of prompting strategies for different robotics categories. We release all of the prompts and conversations used in this study. We invite the readers to contribute with more!
Besides prompt design, we hope to also include multiple robotics simulators and interfaces to allow users to test their ChatGPT-generated algorithms. As a start, we also release an AirSim environment with ChatGPT integration that anyone can use to get started with these ideas. We welcome contributions of new simulators and interfaces as well.
Bringing robotics out of labs, and into the world
We are excited to release these technologies with the aim of bringing robotics to the reach of a wider audience. We believe that language-based robotics control will be fundamental to bring robotics out of science labs, and into the hands of everyday users.
That said, we do emphasize that the outputs from ChatGPT are not meant to be deployed directly on robots without careful analysis. We encourage users to harness the power of simulations in order to evaluate these algorithms before potential real life deployments, and to always take the necessary safety precautions. Our work represents only a small fraction of what is possible within the intersection of large language models operating in the robotics space, and we hope to inspire much of the work to come.
Citation
@techreport{vemprala2023chatgpt,
author = {Vemprala, Sai and Bonatti, Rogerio and Bucker, Arthur and Kapoor, Ashish},
title = {ChatGPT for Robotics: Design Principles and Model Abilities},
institution = {Microsoft},
year = {2023},
month = {February},
url = {https://www.microsoft.com/en-us/research/publication/chatgpt-for-robotics-design-principles-and-model-abilities/},
number = {MSR-TR-2023-8},
}
Source: Microsoft