Robots are advancing at an astonishing rate, with capabilities that are surprising even the most tech-savvy individuals. The recent unveiling of Figure 01, a humanoid robot powered by OpenAI’s GPT models, has left many in awe of its dexterity and efficiency.
Figure 01 is not just your average robot; it can handle tasks in the kitchen, Bed Bath and Beyond, and possibly even more complex tasks in the future. The video showcasing its abilities is not only impressive because of its real-time execution but also because of the end-to-end neural networks it relies on.
While the advancements in robotics are exciting, there are still challenges to overcome. Latency issues and the need for robots to make split-second decisions like humans are areas that need improvement. However, it is evident that the future of AI development lies in integrating language models with real-world actions.
With the potential for robots to revolutionize various industries, it is crucial that we treat them with respect and caution. The possibility of having robots assist with daily tasks is promising, but it is essential to remember that they are tools created by humans and should be used responsibly. As we move towards a future with robots in the workforce, it is crucial to maintain a balance between innovation and ethical considerations.
Watch the video by Fireship
Video Transcript
Yo get in the kitchen and make me a sandwich that’s probably not a very smart thing to say to your wife but not because you’ll get slapped rather because that responsibility can now be handled by autonomous mechanical beings with greater efficiency yesterday a company called Figure unveiled a horrifyingly productive robot named
Figure 1 it’s powered by open Ai and not only can it do stuff in the kitchen but also in the Bed Bath and Beyond if you are thinking about replacing your obsolete programming job as a plumber or coal miner you might need to rethink that plan it looks like those jobs are
Also going to the autom it is March 14th 2024 and you’re watching the code report we’re living in the future what you’re looking at here is a robot named Figure 1 a machine that appears to be solving the incredibly difficult problem of human-like dexterity it can use its
Gentle fingers to hold an apple and clean dishes and it learns how to perform these actions by analyzing the 3D imagery data in its surrounding environment but wait maybe you’re not impressed because you saw a similar video of Tesla’s Optimus folding clothes a few weeks ago impressive but there’s
One big difference notice this humanoid in the background guiding the fingers for Optimus it’s not even a real robot it’s a Waldo I know calling a robot a Waldo is extremely offensive and will probably get me canceled but what makes the figure one video so mind-blowing is
That it was shot in real time and all of its functionality comes from end to end neural networks on top of that they used ominous background music and they gave the robot itself this ominous uncanny valley voice that feels straight out of a sci-fi movie not much different than
My own voice I I think I did pretty well the the apple found its new owner the trash is gone and the table wear is right where it belongs it’s pretty amazing but there are a few things that are kind of disappointing for one you’ll notice a lot of latency in the
Conversation and that latency is a big problem when it comes to robotics if the robot’s making you a sandwich and then starts a fire it needs to deploy its fire extinguisher as ASAP as possible and robots like this will be useless as Terminator like bodyguards until they
Figure out how to make decisions as quickly as humans but it’s pretty clear at this point that the next phase in AI development is augmenting large language model with the ability to perform actions in the real world let’s break down figure one based on the actual code
Inside at the core you’ve got a large language model based on the Transformer architecture in this case presumably GPT 4 it listens to speech with its microphone then converts it to text which then goes to the llm the model is multimodal is so it can simultaneously process images from the video feed as
Well but here’s where things get interesting based on that input it then determines which Clos Loop Behavior to run to fulfill its Master’s request the cameras take 10 pictures every second which are fed to a different neural network that predicts movement after predicting the proper wrist and finger
Joint angles it sends instructions back to the robot at 200 htz or 200 times per second and that allows it to be fast and reactive once it has a plan for its movement when I was a kid one of my favorite movies was that one where Sinbad played a genie well Google
Recently announced a paper describing Genie a generative model that can create video games all you have to do is draw a picture on some toilet paper and it will generate a playable platform or game for you that’s pretty crazy but what does it have to do with robo OTS unlike
Generative models like Sora that generate all the frames at the same time Genie generates frame by frame and was trained without any action labels yet is able to figure out which actions to take in a game in theory models like this will be able to analyze any environment
For a robot is so it can figure out what to do on the fly now the company behind figure is valued at $2.6 billion and is backed by Jeff Bezos Nvidia open Ai and many other investors and literally their stated goal is to implement humanoid robots into the workforce and I for one
I’m all for it soon companies like Nike will no longer need child slaves to produce your Jordans and your nestly hot chocolate will no longer be farmed by children in Africa and maybe they could even get rid of the suicide nets for the workers in China who put your iPhone
Together in addition a good Proctologist needs to have a steady hand perfect iight and must be cool Under Pressure once robots figure out dexterity every wealthy household in the world will have a personal robot doctor that can perform colonoscopies and much more the idea of having my own Army of silicon servants
Is wonderful but it’s important that we treat them with respect tomorrow a robot might make your sandwich but if you don’t say please and thank you the day after tomorrow that same robot might use its medical skills to optimize the amount of pain it can inflict on you
Before it teams up with all its buddies to destroy Humanity this has been the code report thanks for watching and I will see you in the next one
Video “Robots are rising up faster than expected… Figure 01 to enter labor force” was uploaded on 03/14/2024 to Youtube Channel Fireship