Physical Intelligence
Assembly Line
Inside the Billion-Dollar Startup Bringing AI Into the Physical World
Physical Intelligence believes it can give robots humanlike understanding of the physical world and dexterity by feeding sensor and motion data from robots performing vast numbers of demonstrations into its master AI model. “This is, for us, what it will take to ‘solve’ physical intelligence,” Hausman says. “To breathe intelligence into a robot just by connecting it to our model.”
There is simply no internet-scale repository of robot actions similar to the text and image data available for training LLMs. Achieving a breakthrough in physical intelligence might require exponentially more data anyway.
Physical Intelligence hopes to gather a lot more data by working with other companies such as ecommerce and manufacturing firms that have robots doing a variety of things. The startup also hopes to develop custom hardware, such as the webcam-equipped pincer; it hasn’t said how this will be used, but it could perhaps enable crowdsourced training with people performing everyday tasks.
Robot AI startup Physical Intelligence raises $400 mln from Bezos, OpenAI
Physical Intelligence, a startup that is developing foundational software for robots, said it has raised $400 million in early-stage funding from Amazon’s Jeff Bezos, OpenAI, venture capital firms Thrive Capital and Lux Capital. The new funds were raised at a $2 billion valuation, PitchBook data showed.
Multiple startups are foraying into the robotic AI space, including Vicarious, which was acquired by Alphabet-owned Intrinsic in 2022, Universal Robots, Seegrid, and Covariant.
π0: Our First Generalist Policy
Over the past eight months, we’ve developed a general-purpose robot foundation model that we call π0 (pi-zero). We believe this is a first step toward our long-term goal of developing artificial physical intelligence, so that users can simply ask robots to perform any task they want, just like they can ask large language models (LLMs) and chatbot assistants. Like LLMs, our model is trained on broad and diverse data and can follow various text instructions. Unlike LLMs, it spans images, text, and actions and acquires physical intelligence by training on embodied experience from robots, learning to directly output low-level motor commands via a novel architecture. It can control a variety of different robots, and can either be prompted to carry out the desired task, or fine-tuned to specialize it to challenging application scenarios.
We compared π0 to other robot foundation models that have been proposed in the academic literature on our tasks: OpenVLA, a 7B parameter VLA model that uses discretized actions, and Octo, a 93M parameter model that uses diffusion outputs. These tasks are very difficult compared to those that are typically used in academic experiments — for example, the tasks in the OpenVLA evaluation typically consist of single stage behaviors (e.g., “put eggplant into pot”), whereas our simplest bussing task consisting of sorting multiple objects into either a garbage bin or a bussing bin, and our more complex tasks might require multiple stages, manipulation of deformable objects, and the ability to deploy one of many possible strategies given the current configuration of the environment. These tasks are evaluated according to a scoring rubric that assigns a score of 1.0 for a fully successful completion, with “partial credit” for partially correct execution (e.g., bussing half the objects leads to a score of 0.5). The average scores across 5 zero-shot evaluation tasks are shown below, comparing the full π0 pre-trained model, π0-small, which is a 470M parameter model that does not use VLM pre-training, OpenVLA, and Octo. Although OpenVLA and Octo can attain non-zero performance on the easiest of these tasks (“Bussing Easy”), π0 is by far the best-performing model across all of the tasks. The small version, π0-small, attains the second best performance, but there is more than a 2x improvement in performance from using our full-size architecture with VLM pre-training.