The age of AI is truly upon us, as evidenced by the groundbreaking release of OpenAI Sora. In a mesmerizing display of technological prowess, this text-to-video AI is revolutionizing the way we create visual content. The quality of the videos produced by Sora is unparalleled, with pixel-perfect precision and breathtaking realism. The concept of temporal coherence is flawlessly executed, ensuring a seamless flow of images that are both captivating and immersive.
But what truly sets Sora apart is its ability to bring imagination to life. From vlogging corgis to surfing otters, the creative possibilities are endless. The AI’s consistent world model and object permanence showcase a level of sophistication that is truly mind-blowing. And as computing power continues to evolve, the potential for even more astounding advancements becomes limitless.
As we marvel at the wonders of OpenAI Sora, we are reminded that the future of AI is indeed a bright one. The possibilities for innovation and creativity are boundless, and we can only imagine what groundbreaking discoveries lie just two papers down the line. So buckle up, fellow scholars, and get ready to witness history in the making. The age of AI is here, and it’s only just beginning.
Watch the video by Two Minute Papers
Video Transcript
Buckle up Fellow Scholars, because what you are going to see today is something that might be the craziest thing I’ve been able to show you in more than 800 videos. This is the kind of video an AI could createwas capable of doing yesterday. And today,
It can do this. Holy mother of papers! Yes, OpenAI just released their own text to video AI, Sora, and it is so far beyond anything else I have ever seen, it is hard to put into words. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér.
When I first saw these results, I thought this was some April fools’ joke. No, this is not a video coming from a real camera. This is a video that was synthesized pixel by pixel by a new AI. So, let’s give this one a spin. These AI videos we always evaluate by three criteria.
One, quality. This is shocking. The quality of these works is out of this world. If we are not actively seeking for errors in the footage, in many cases, we may not even know that they are made by an AI. And it gets better,
Their DALL-E 3 system is an expert at creating images, and I just stop these videos, and the still images are often as good, or even better than what DALL-E 3 can make. This is beating the King in its own game. Unbelievable. Two, temporal coherence. This means that the AI understands exactly how
Each image in the video should follow each other. This is what it looks like if you don’t have temporal coherence. A paper from just a few years ago, and now we have this. Once again, temporal coherence, second to none. Wow.
And three, wait, this may still not be a great technique. Now I hear you asking, Károly, why is that? Well, it has to follow our prompts correctly. It has to be true to what we asked for. You see, there are techniques out there that give you really good quality,
Coherent videos, however, they don’t care too much about the prompts that we write. And, what about this new technique? Goodness. That is exactly what the prompt is asking for. I am out of words. But, it gets better. It even has a hint of imagination. For instance,
We can ask for a corgi that’s a vlogger, an otter on a surfboard, an italian pup, you name it. Just ask, and it will do it. Imagination in a machine. What a time to be alive! Hmm…wait! I just noticed that we need to take a look at a fourth thing from
Now on. And that is object permanence and consistency. With previous techniques, when something got occluded and is now visible again, the AI might not remember and it might look completely different. But here, let’s see… wow. This has a consistent world model,
So much so that even when we move around in 3D space, everything remains where it should be. And this can do so much more. We can even transform an existing video into a completely new one by just writing one text prompt. And now, hold on to your papers Fellow Scholars,
Because it can also synthesize virtual worlds. Whether that will be something that already exists, like Minecraft, or a completely new game made from scratch, up to you. Just one more paper down the line, it might be that you won’t even need to develop your own games, maybe just hook
Up a controller, write a text prompt, and OpenAI Sora will give that game to you immediately. So, how does all this magic work? Well, one of the key ideas is that the synthesis takes place in a latent space. What is that?
It looks something like this. This is one of my papers where you can walk around this 2D latent space, and each point in this space represents a material for a virtual world. And here comes the key, the latent space works well if you can guarantee that when
Exploring a nearby point, you get similar material models. The link to the paper is available in the video description. This concept also works for creating new fonts, and now to create new videos too. And they come in full HD resolution. Wow.
So is this concept any good so far? Well, let’s have a look. Wait a second. That is not even close to what we’ve seen. What happened? Well, one word. Compute happened. You see, if you don’t have enough computational power, this is what you get. If you have 4 times more,
You get this. And if you have 16 times more, you get this. Oh yes. So the concept comes alive only with a sufficient amount of compute. The virtual brain, if you will has to be developed enough to imagine all these videos in high quality.
And my goodness, this is perhaps the biggest jump in quality between two research works that I have ever seen and this video series has been around for more than 800 episodes now. And now, it’s time. Time for what, you ask? Of course, it is time to invoke the First Law of
Papers. The First Law Of Papers says that research is a process. Do not look at where we are, look at where we will be two more papers down the line. And here is the one more paper down the line. Now,
Exercise: leave a comment about what you think we will be able to do just two more papers down the line. I’d love to know what you Fellow Scholars think. Especially now, because once again we can share one of those sweet moments where we witness history in the making.
In his excellent video, which I highly recommend, MKBHD says that since this is trained on videos made by humans, it likely cannot go beyond what it had seen from humans. I would like to note that in some cases, we see AI papers that have proper zero shot performance. What is that? This
Means that leaning on all this knowledge, like a human, it can try to create new things it hasn’t seen before. For instance, you could ask for a new kind of vehicle for T-Rexes, and it could infer
That T-Rexes that have these little hands, so it would have to have a little wheel that is suitable for their little hands. We will be able to test that and so much more as soon as it is out there.
And we will soon be back with a video with a different AI video system that is not as good as this, but it is more controllable and it is something that you will be able to try out for free right away. We will also have a more in-depth video about the capabilities
Of this new technique soon too. So make sure to subscribe and hit the bell icon to not miss out.
Video “OpenAI Sora: The Age Of AI Is Here!” was uploaded on 02/16/2024 to Youtube Channel Two Minute Papers