Foundation Model
Assembly Line
π0: Our First Generalist Policy
Over the past eight months, we’ve developed a general-purpose robot foundation model that we call π0 (pi-zero). We believe this is a first step toward our long-term goal of developing artificial physical intelligence, so that users can simply ask robots to perform any task they want, just like they can ask large language models (LLMs) and chatbot assistants. Like LLMs, our model is trained on broad and diverse data and can follow various text instructions. Unlike LLMs, it spans images, text, and actions and acquires physical intelligence by training on embodied experience from robots, learning to directly output low-level motor commands via a novel architecture. It can control a variety of different robots, and can either be prompted to carry out the desired task, or fine-tuned to specialize it to challenging application scenarios.
We compared π0 to other robot foundation models that have been proposed in the academic literature on our tasks: OpenVLA, a 7B parameter VLA model that uses discretized actions, and Octo, a 93M parameter model that uses diffusion outputs. These tasks are very difficult compared to those that are typically used in academic experiments — for example, the tasks in the OpenVLA evaluation typically consist of single stage behaviors (e.g., “put eggplant into pot”), whereas our simplest bussing task consisting of sorting multiple objects into either a garbage bin or a bussing bin, and our more complex tasks might require multiple stages, manipulation of deformable objects, and the ability to deploy one of many possible strategies given the current configuration of the environment. These tasks are evaluated according to a scoring rubric that assigns a score of 1.0 for a fully successful completion, with “partial credit” for partially correct execution (e.g., bussing half the objects leads to a score of 0.5). The average scores across 5 zero-shot evaluation tasks are shown below, comparing the full π0 pre-trained model, π0-small, which is a 470M parameter model that does not use VLM pre-training, OpenVLA, and Octo. Although OpenVLA and Octo can attain non-zero performance on the easiest of these tasks (“Bussing Easy”), π0 is by far the best-performing model across all of the tasks. The small version, π0-small, attains the second best performance, but there is more than a 2x improvement in performance from using our full-size architecture with VLM pre-training.
Introducing Waymo's Research on an End-to-End Multimodal Model for Autonomous Driving
At Waymo, we have been at the forefront of AI and ML in autonomous driving for over 15 years, and are continuously contributing to advancing research in the field. Now, we are sharing our latest research paper on an End-to-End Multimodal Model for Autonomous Driving (EMMA).
Powered by Gemini, a multimodal large language model developed by Google, EMMA employs a unified, end-to-end trained model to generate future trajectories for autonomous vehicles directly from sensor data. Trained and fine-tuned specifically for autonomous driving, EMMA leverages Gemini’s extensive world knowledge to better understand complex scenarios on the road.
Our research demonstrates how multimodal models, such as Gemini, can be applied to autonomous driving and explores pros and cons of the pure end-to-end approach. It highlights the benefit of incorporating multimodal world knowledge, even when the model is fine-tuned for autonomous driving tasks that require good spatial understanding and reasoning skills. Notably, EMMA demonstrates positive task transfer across several key autonomous driving tasks: training it jointly on planner trajectory prediction, object detection, and road graph understanding leads to improved performance compared to training individual models for each task. This suggests a promising avenue of future research, where even more core autonomous driving tasks could be combined in a similar, scaled-up setup.
China’s Baowu Launches Self-Developed AI Tool for Steel Industry
Chinese steel giant China Baowu Group yesterday unveiled its first large language model for the steel sector, which increases efficiency and refines operations across key links of the steel industrial chain, raising the bar for vertical artificial intelligence models in the country.
xIn³Plat is made up of a three-tier architecture comprising a foundational model, an industry-specific vertical model and an application scenario domain model, the Shanghai-based firm said on its WeChat account yesterday.
It covers key areas in the R&D, production, operations, and services of the steel industry, it said. This includes lean manufacturing, refined management of operations, precise services in production and sales, intelligent maintenance services as well as green, low-carbon and energy-saving scenarios.
Baowu’s AI tool has achieved a 30 percent increase in R&D efficiency. And in terms of lean manufacturing, the increase in annual efficiency of a production line where the LLM was adopted has topped CNY10 million (USD1.4 million), and the results were significantly better than when using manual processes.
Introducing the Imubit Foundation Process Model™: A New Paradigm in Plant Optimization
Imubit’s Optimizing Brain™ Solution leads the market in Closed Loop AI Optimization, transforming engineers into AI pioneers with its Foundation Process Model™ and reinforcement learning. Similar to how ChatGPT, the pioneering Foundation Model, was trained on vast datasets to power diverse applications, our model integrates extensive process data with engineering expertise. This evergreen asset continuously updates with live process data, offering clients a versatile tool for optimizing operations, training teams, and diagnosing process issues.
Historically, industrial plants relied on local models and logic to address specific, isolated problems. These models ranged from Linear Programming (LP) to first-principle process models, Advanced Process Control (APC) applications, scheduling models, and operator training systems. While each served its purpose, they often required extensive manual input, fine-tuning, and optimization specific to the task at hand.
Now, we are seeing a shift. Much like how foundation models in AI were trained on broad datasets for wide applicability, plants are training deep learning models on large amounts of process data, investing multidisciplinary efforts to create a single model that is applied to a broad set of applications.
X-Bow Raises More Than $70 Million Series B to Accelerate Expansion of Hypersonic and Solid Rocket Motor Technologies
X-Bow Systems Inc. (X-Bow), the nation’s leading non-traditional producer of advanced manufactured solid rocket motors (SRMs) and hypersonics technologies, announced the successful completion of the initial close of its Series B financing, raising over $70 million. This private capital will accelerate the rapid growth of its hypersonic-capable vehicles, strategic and tactical scale solid rocket motor programs, and completion of its Luling, TX gigafactory campus. The company will also be expanding its engineering and R&D facilities across New Mexico. The round was led by National Security technology-focused growth equity firm Razor’s Edge, with additional participation from Lockheed Martin Ventures, Boeing Ventures, Crosslink Capital and Balerion Space Ventures.
With this funding now secured, X-Bow will accelerate its growth trajectory as the nation’s third supplier of Solid Rocket Motors and continue in its quest to rapidly innovate and deliver agile, affordable solutions for SRMs, hypersonics and associated adjacent markets.
Skild AI Raises $300M Series A To Build A Scalable AI Foundation Model For Robotics
Skild AI, an AI robotics company building a scalable foundation model for robotics, announced it has closed a $300M Series A funding round. The round was led by Lightspeed Venture Partners, Coatue, SoftBank Group, and Jeff Bezos (through Bezos Expeditions), with participation from Felicis Ventures, Sequoia, Menlo Ventures, General Catalyst, CRV, Amazon, SV Angel, and Carnegie Mellon University. The funding brings the company to a valuation of $1.5B. The capital will be used to continue scaling the company’s model and training datasets for future commercial deployment of its technology, in addition to hiring for roles across AI, robotics, engineering, operations, and security.
Skild AI is building intelligence that is grounded in the physical world. The company is breaking the data barrier in robotics, training its model on at least 1,000X more data points than competing models. As opposed to vertically designed robots that are built for specific applications, Skild’s model serves as a shared, general-purpose brain for a diverse embodiment of robots, scenarios and tasks, including manipulation, locomotion and navigation. From resilient quadrupeds mastering adverse physical conditions, to vision-based humanoids performing dexterous manipulation of objects for complex household and industrial tasks, the company’s model will enable the use of low-cost robots across a broad range of industries and applications.
Introducing Aurora: The first large-scale foundation model of the atmosphere
A recent study by Charlton-Perez et al. (2024) underscored the challenges faced by even the most advanced AI weather-prediction models in capturing the rapid intensification and peak wind speeds of Storm Ciarán. To help address those challenges, a team of Microsoft researchers developed Aurora, a cutting-edge AI foundation model that can extract valuable insights from vast amounts of atmospheric data. Aurora presents a new approach to weather forecasting that could transform our ability to predict and mitigate the impacts of extreme events—including being able to anticipate the dramatic escalation of an event like Storm Ciarán.
Aurora’s effectiveness lies in its training on more than a million hours of diverse weather and climate simulations, which enables it to develop a comprehensive understanding of atmospheric dynamics. This allows the model to excel at a wide range of prediction tasks, even in data-sparse regions or extreme weather scenarios. By operating at a high spatial resolution of 0.1° (roughly 11 km at the equator), Aurora captures intricate details of atmospheric processes, providing more accurate operational forecasts than ever before—and at a fraction of the computational cost of traditional numerical weather-prediction systems. We estimate that the computational speed-up that Aurora can bring over the state-of-the-art numerical forecasting system Integrated Forecasting System (IFS) is ~5,000x.
Unlocking new value in industrial automation with AI
Working with the robotics team at NVIDIA, we have successfully tested NVIDIA robotics platform technologies, including NVIDIA Isaac Manipulator foundation models for robot a grasping skill with the Intrinsic platform. This prototype features an industrial application specified by one of our partners and customers, Trumpf Machine Tools. This grasping skill, trained with 100% synthetic data generated by NVIDIA Isaac Sim, can be used to build sophisticated solutions that can perform adaptive and versatile object grasping tasks in sim and real. Instead of hard-coding specific grippers to grasp specific objects in a certain way, efficient code for a particular gripper and object is auto-generated to complete the task using the foundation model and synthetic training data.
Together with Google DeepMind, we’ve demonstrated some novel and high value methods for robotic programming and orchestration — many of which have practical applications today:
- Multi-robot motion planning with machine learning
- Learning from demonstration, applied to two-handed dexterous manipulation
- Foundation model for perception by enabling a robotic system to understand the next task and the physical objects involved requires a real-time, accurate, and semantic understanding of the environment.
Archetype AI Introduces Foundation Model to Pioneer Physical AI
Archetype AI, a physical AI company helping humanity make sense of the world, announced its emergence from stealth and the introduction of Newton™, a first-of-its-kind foundation model that understands the physical world. With Newton, Archetype AI is on a mission to use the power of artificial intelligence to solve real-world problems – empowering people and organizations with an understanding of the physical environment that wasn’t previously possible.
In support of this mission, Archetype AI has raised a $13 million seed funding round led by Venrock, with participation from Amazon Industrial Innovation Fund, Hitachi Ventures, Buckley Ventures, Plug and Play Ventures and several angel investors. In conjunction with the financing, Ganesh Srinivasan, Partner at Venrock, will join the board.
With Newton, Archetype AI is introducing a first-of-its-kind physical AI foundational model that is capable of perceiving, understanding and reasoning about the world. Newton fuses multimodal temporal data – including signals from accelerometers, gyroscopes, radars, cameras, microphones, thermometers and other environmental sensors – with natural language to unlock insights about the physical world in real-time.
NVIDIA Announces Project GR00T Foundation Model for Humanoid Robots and Major Isaac Robotics Platform Update
NVIDIA announced Project GR00T, a general-purpose foundation model for humanoid robots, designed to further its work driving breakthroughs in robotics and embodied AI.
As part of the initiative, the company also unveiled a new computer, Jetson Thor, for humanoid robots based on the NVIDIA Thor system-on-a-chip (SoC), as well as significant upgrades to the NVIDIA Isaac™ robotics platform, including generative AI foundation models and tools for simulation and AI workflow infrastructure.
The SoC includes a next-generation GPU based on the NVIDIA Blackwell architecture with a transformer engine delivering 800 teraflops of 8-bit floating point AI performance to run multimodal generative AI models like GR00T. With an integrated functional safety processor, a high-performance CPU cluster and 100GB of ethernet bandwidth, it significantly simplifies design and integration efforts.
Robots powered by GR00T, which stands for Generalist Robot 00 Technology, will be designed to understand natural language and emulate movements by observing human actions — quickly learning coordination, dexterity and other skills in order to navigate, adapt and interact with the real world. In his GTC keynote, Huang demonstrated several such robots completing a variety of tasks.
Covariant Announces a Universal AI Platform for Robots
Covariant is announcing RFM-1, which the company describes as a robotics foundation model that gives robots the “human-like ability to reason.” “Foundation model” means that RFM-1 can be trained on more data to do more things—at the moment, it’s all about warehouse manipulation because that’s what it’s been trained on, but its capabilities can be expanded by feeding it more data. “Our existing system is already good enough to do very fast, very variable pick and place,” says Covariant co-founder Pieter Abbeel. “But we’re now taking it quite a bit further. Any task, any embodiment—that’s the long-term vision. Robotics foundation models powering billions of robots across the world.” From the sound of things, Covariant’s business of deploying a large fleet of warehouse automation robots was the fastest way for them to collect the tens of millions of trajectories (how a robot moves during a task) that they needed to train the 8 billion parameter RFM-1 model.
Saudi Aramco unveils industry’s first generative AI model
Aramco’s AI model is a pioneering technology in the industrial sector. It has 250 billion parameters that are adjustable during training to generate outputs or make predictions. The AI was trained using seven trillion data points, collecting more than 90 years of company history.
Amin H Nasser, CEO of Saudi Aramco, said the AI model would analyse drilling plans, geological data, historic drilling time and costs as well as recommend the most ideal well options. He added that for the company’s downstream business, “metabrain will have the capability to provide precise forecasts for refined products, including pricing trends, market dynamics and geopolitical insights”.
Aramco plans to develop a version with 1 trillion parameters by the end of this year.
Foundation Models for Materials Discovery: Our Investment in Orbital Materials
Fortunately, innovations in artificial intelligence have led to the emergence of foundation models, which are trained on vast amounts of data and leading to models that can be used across numerous applications. Those foundation models have the potential to enable inverse design, a method of material development that expedites the process by using the specific required properties as an input and generating the new material design as an output. This approach has the potential to revolutionize material development across industries, which is why we are excited to announce Toyota Ventures’ investment in Orbital Materials through our Frontier Fund.
The team has trained a 3D foundation model, named LINUS, for crystal structures and small molecules. Instead of screening millions of materials in hopes of finding one with a specific property, LINUS generates a material based on a given property in a single calculation. To do this, the team has developed a new version of the “transformer”, a model typically used for natural language processing, to allow the model to learn the relationships between the 3D structures of materials and their properties. Advanced materials that absorb and catalyze are crucial in various industries such as carbon capture, sustainable fuels, water treatments, biofeedstock upgrades, and battery recycling.
NASA and IBM Openly Release Geospatial AI Foundation Model for NASA Earth Observation Data
A public/private partnership involving NASA and IBM Research has led to the release of NASA’s first open-source geospatial artificial intelligence (AI) foundation model for Earth observation data. Built using NASA’s Harmonized Landsat and Sentinel-2 (HLS) dataset, the release of the HLS Geospatial Foundation Model (HLS Geospatial FM) is a milestone in the application of AI for Earth science. The model has a wide range of potential applications, including tracking changes in land use, monitoring natural disasters, and predicting crop yields. The HLS Geospatial FM is available at Hugging Face, a public repository for open-source machine learning models.
NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT) played a major role in this work. Located at NASA’s Marshall Space Flight Center in Huntsville, Alabama, IMPACT is a component of NASA’s Earth Science Data Systems (ESDS) Program and is charged with expanding the use of NASA Earth observation data through innovation, partnerships, and technology, including the application of AI to these data.