OpenAI Sora: The Era of Artificial Intelligence Has Arrived! – Video

OpenAI Sora: The Era of Artificial Intelligence Has Arrived! – Video

The age of AI is truly upon us, as evidenced by the groundbreaking release of OpenAI Sora. In a mesmerizing display of technological prowess, this text-to-video AI is revolutionizing the way we create visual content. The quality of the videos produced by Sora is unparalleled, with pixel-perfect precision and breathtaking realism. The concept of temporal coherence is flawlessly executed, ensuring a seamless flow of images that are both captivating and immersive.

But what truly sets Sora apart is its ability to bring imagination to life. From vlogging corgis to surfing otters, the creative possibilities are endless. The AI’s consistent world model and object permanence showcase a level of sophistication that is truly mind-blowing. And as computing power continues to evolve, the potential for even more astounding advancements becomes limitless.

As we marvel at the wonders of OpenAI Sora, we are reminded that the future of AI is indeed a bright one. The possibilities for innovation and creativity are boundless, and we can only imagine what groundbreaking discoveries lie just two papers down the line. So buckle up, fellow scholars, and get ready to witness history in the making. The age of AI is here, and it’s only just beginning.

Watch the video by Two Minute Papers

Video Transcript

Buckle up Fellow Scholars, because what  you are going to see today is something   that might be the craziest thing I’ve been  able to show you in more than 800 videos. This is the kind of video an AI could createwas  capable of doing yesterday. And today,  

It can do this. Holy mother of papers! Yes,  OpenAI just released their own text to video AI,   Sora, and it is so far beyond anything else I  have ever seen, it is hard to put into words. Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér.

When I first saw these results, I thought this  was some April fools’ joke. No, this is not a   video coming from a real camera. This is a video  that was synthesized pixel by pixel by a new AI. So, let’s give this one a spin. These AI  videos we always evaluate by three criteria.

One, quality. This is shocking. The  quality of these works is out of   this world. If we are not actively seeking  for errors in the footage, in many cases,   we may not even know that they are  made by an AI. And it gets better,  

Their DALL-E 3 system is an expert at  creating images, and I just stop these videos,   and the still images are often as good, or  even better than what DALL-E 3 can make.  This is beating the King in  its own game. Unbelievable. Two, temporal coherence. This means  that the AI understands exactly how  

Each image in the video should follow each  other. This is what it looks like if you   don’t have temporal coherence. A  paper from just a few years ago,   and now we have this. Once again,  temporal coherence, second to none. Wow.

And three, wait, this may still not be a  great technique. Now I hear you asking,   Károly, why is that? Well, it has to follow our  prompts correctly. It has to be true to what   we asked for. You see, there are techniques  out there that give you really good quality,  

Coherent videos, however, they don’t care too much  about the prompts that we write. And, what about   this new technique? Goodness. That is exactly  what the prompt is asking for. I am out of words. But, it gets better. It even has a  hint of imagination. For instance,  

We can ask for a corgi that’s a vlogger,  an otter on a surfboard, an italian pup,   you name it. Just ask, and it will do it.  Imagination in a machine. What a time to be alive! Hmm…wait! I just noticed that we need  to take a look at a fourth thing from  

Now on. And that is object permanence and  consistency. With previous techniques,   when something got occluded and is now visible  again, the AI might not remember and it might   look completely different. But here, let’s  see… wow. This has a consistent world model,  

So much so that even when we move around in 3D  space, everything remains where it should be. And this can do so much more. We can even  transform an existing video into a completely   new one by just writing one text prompt. And now, hold on to your papers Fellow Scholars,  

Because it can also synthesize virtual worlds.  Whether that will be something that already   exists, like Minecraft, or a completely new game  made from scratch, up to you. Just one more paper   down the line, it might be that you won’t even  need to develop your own games, maybe just hook  

Up a controller, write a text prompt, and OpenAI  Sora will give that game to you immediately. So, how does all this magic work? Well,   one of the key ideas is that the synthesis  takes place in a latent space. What is that?

It looks something like this. This is one of  my papers where you can walk around this 2D   latent space, and each point in this space  represents a material for a virtual world.   And here comes the key, the latent space  works well if you can guarantee that when  

Exploring a nearby point, you get similar  material models. The link to the paper is   available in the video description. This  concept also works for creating new fonts,   and now to create new videos too. And  they come in full HD resolution. Wow.

So is this concept any good so far? Well, let’s  have a look. Wait a second. That is not even   close to what we’ve seen. What happened?  Well, one word. Compute happened. You see,   if you don’t have enough computational power,  this is what you get. If you have 4 times more,  

You get this. And if you have 16 times  more, you get this. Oh yes. So the concept   comes alive only with a sufficient  amount of compute. The virtual brain,   if you will has to be developed enough to  imagine all these videos in high quality.

And my goodness, this is perhaps the biggest  jump in quality between two research works that   I have ever seen and this video series has  been around for more than 800 episodes now. And now, it’s time. Time for what, you ask? Of  course, it is time to invoke the First Law of  

Papers. The First Law Of Papers says that research  is a process. Do not look at where we are, look   at where we will be two more papers down the line.  And here is the one more paper down the line. Now,  

Exercise: leave a comment about what you  think we will be able to do just two more   papers down the line. I’d love to know what  you Fellow Scholars think. Especially now,   because once again we can share one of those sweet  moments where we witness history in the making.

In his excellent video, which I highly recommend,  MKBHD says that since this is trained on videos   made by humans, it likely cannot go beyond what  it had seen from humans. I would like to note   that in some cases, we see AI papers that have  proper zero shot performance. What is that? This  

Means that leaning on all this knowledge, like a  human, it can try to create new things it hasn’t   seen before. For instance, you could ask for a new  kind of vehicle for T-Rexes, and it could infer  

That T-Rexes that have these little hands, so it  would have to have a little wheel that is suitable   for their little hands. We will be able to test  that and so much more as soon as it is out there.

And we will soon be back with a video with  a different AI video system that is not as   good as this, but it is more controllable and  it is something that you will be able to try   out for free right away. We will also have  a more in-depth video about the capabilities  

Of this new technique soon too. So make sure to  subscribe and hit the bell icon to not miss out.

Video “OpenAI Sora: The Age Of AI Is Here!” was uploaded on 02/16/2024 to Youtube Channel Two Minute Papers