Sora AI’s Problems [And Solutions]
Open AI’s latest AI tool, Sora, has caused quite a buzz in the tech and creative industries. With the ability to generate photorealistic videos from just a few sentences, Sora has opened up a whole new world of possibilities. But with this exciting new technology comes a set of challenges and limitations that need to be addressed.
One of the main limitations of Sora is the need for huge compute power to generate the videos. This means that currently, only those with access to high-end computing resources can take advantage of Sora’s capabilities. Additionally, the system still has some issues with logical concepts and cause-and-effect relationships, leading to some unrealistic outputs.
Furthermore, as Sora becomes more democratized and accessible to the general public, there are concerns about the potential misuse of the technology. The ability to create realistic videos from scratch raises questions about the authenticity of video evidence and the spread of misinformation and fake news.
Despite these challenges, Sora also presents exciting opportunities for creatives and storytellers. The tool makes it easier for individuals to bring their ideas to life and tell compelling stories through video. However, there is a fear that the abundance of AI-generated content may lead to what some are calling “AI fatigue,” where the value of true creative work is diminished in the eyes of the audience.
In conclusion, while Sora AI offers incredible potential, it also brings to light a range of issues that must be addressed. As we move forward with this technology, it is crucial to find solutions that ensure its responsible and ethical use in society.
Watch the video by ColdFusion
Video Transcript
Hi welcome to another episode of Cold Fusion this is a Reddit thread from 3 years ago discussing AI imagery the top user says imagine in a few years when we can make photorealistic videos from just a few sentences AI is crazy he gets downvoted and the reply comment laughs
At him saying that it’s not going to happen in our lifetime our great grandkids might have such technology well 3 years later and it’s here it is a beautiful drone shot the kind of video that you might see in a travel video right except it’s not real there is no
Drone there is no camera you can’t travel because the video was generated by AI it’s from a new tool just announced a few hours ago by open AI called sora all it takes is typing in a short text a prompt and in minutes it spits out a 60-second video clip of
Pretty much anything you can imagine over the past few days you’ve probably all heard and seen Sora a new tool by open AI that turns text into photorealistic video it’s not perfect but it’s a large step up from what we’ve seen before but what most people don’t
Know is that Sora can do more than just create videos from scratch it can combine separate videos into one scene animate still images modify non AI videos seamlessly depending on the user prompt and much more which will’ll get into later we’re going to split this video into two parts the first is what
Sora can do how Google accidentally made this possible and sora’s limitations part two will be on the implications for society and some solutions to the problems that may arise from this in this episode let’s explore all of that you are watching P Fusion TV so first up what can open ai’s newest
Model do I’ll show some examples including some newer ones that have just been released by those with Early Access Note that all the cutcenes camera angles movement are all quote unquote creative choices of the AI if you want to call it that videos can be up to a minute long and in 1080P resolution okay so cool it makes videos but to understand the context here as
Marcus brownley pointed out this is a viral clip of where text AI video was a year ago but even the state-ofthe-art now is nowhere near close I tested the same prompts on Runway ML and here are the Results The difference with Sora is it’s coherent previous video AI systems have a characteristic morphing quality as the video progresses with Sora that’s vastly reduced or gone altogether objects remain stable even when obscured by things in the foreground it’s a much more robust system but not only this
Sora can animate images such as cartoons or This shea enu we’ve seen stuff similar to this in research since 2019 but what is new is the ability to combine two videos together in one scene let’s take a look at that It can also simultaneously make up different camera angles of a single scene with just one Prompt okay so how did they do it well not going to spend much time here but basically Sora is based off similar Tech to open AI doy 3 Google’s involvement in how Sora was built is kind of interesting back in 2017 Google invented something called a Transformer architecture and they published their
Findings on it you don’t need to know exactly what that means but a Transformer is basically something that makes AI better at generating text open AI would build on Google’s Tech to create their own text models the adventur result was chat G G PT we’ve done a previous episode on this but
Later on Google noticed something strange they modified the Transformer not just Define patterns in text but patterns in videos too and it worked really well open AI saw that and said thank you very much and ran with the idea after some tinkering they would release Sora so how was Sora trained there’s no
Public info on the training data but open AI did partner with Shutterstock last year so there’s a wealth of copyright free data for their AI to chew on and that might be a clue so this is cool and all but before we get ahead of ourselves what are some limitations
While these videos look good aside from the cherry-picked examples and a handful of selected public users we can’t get a full grasp of how robust the system is and I do want to stress that point we don’t know the full picture of its capabilities although I have to say even
The failures look cool for example this man on a treadmill it looks like something I’d see a VFX artist in 2012 post as part of his demo its creators say Sora has trouble distinguishing between left and right and also struggles with some logical Concepts and cause a relationships for example this
Chair morphing from the sand and subsequently floating it’s a complete failure but it is still cool in a surrealist way in my opinion another limitation is that for now to generate such videos requires huge compute power so Sora isn’t perfect now but if we extrapolate a couple of years and the
Failure rate of unrealistic outputs drops what then what happens when this technology becomes democratized beyond the boundaries of just open AI you’ve all probably thought of some implications of this one is the reduced need for stock footage but of course an obvious thing people love to gravitate towards is
Misinformation and fake news people using AI to create events that never happened we’ve already seen this with AI images when they were brand new but if it’s now video will there be issues with with law enforcement forensic video experts May face challenges in distinguishing between genuine and fabricated or modified video evidence
Criminals may also deny video evidence they could claim that the implicating footage was AI generated these issues require the development of new standards for verifying video authenticity I’ll expand on this in a second but on the positive side tools like Sora make it easier for creatives to tell stories videographers might be
Sweating because it gives a similar capacity to those who have never picked up a camera that being said it’s not as cut and dry as videographers disappearing overnight they’ll always be a need for them in certain situations like if you’re filming a particular event or people but the future could
Turn out something like this the higher tier of videographers that do custom work will remain but the lowest rung that take out their cameras just to film something for stock footage purposes or things of that nature we will start to see their work be impacted again it’s
Not now but we can see the trajectory in a couple of years for example although hard data is scarce to come by anecdotal evidence suggests tools like mid Journey are already having an impact for those in the graphic artist industry tools like Sora could have a strange effect on human psychology it’s
An effect that we came up with on the cold fusion podcast last year other people have probably noticed it too but we called it AI fatigue it’s the concept of AI being able to produce stunning imagery in such volume that it lowers the specialty or visual value of true creative work for
Example on social media you could see a crazy video that would have made our jaws drop just a few years ago but now you just think meh reason being you’ve been Overexposed to it every visual media anything you can imagine can be done easily with AI now this is already
Happening there are people on X posting a 2011 Bollywood movie and calling it a SORA video it’s going viral and had some people confused love or hate the movie these scenes with the hard work of a production team and crew but now people just think it was a guy on a computer
That typed a few sentences we’re just not that impressed anymore Beyond this collectively human perception of what we think is real will be altered people aren’t going to believe anything they see if someone does a creative athletic feat or an amazing 3D animation the number one thought people could have is
Isn’t that just AI imagine doing all of that hard work for people just to think once again that you just sat at a computer and typed in some prompts if you’re a creative specializing in unique visual content how do you feel about such a future world I’d like to hear
From you in the comments a lot of people think that I’m an AI and that cold fusion videos are written by an AI and even though that’s not true I kind of know how it feels like let’s take a look at 2 to 5 years in the future when this Tech will become
Trivial along with AI fatigue comes the further erosion of trust for example in journalism and media production while a tool like Sora can enable faster and cheaper video creation it may also challenge traditional Notions of authenticity and Trust in Media a simple website or app will be able to generate
Photorealistic videos for you when it’s democratized there’s going to be a lot of people that use it for nefarious reasons and just like deep fakes before it there’s a potential for chaos only the law and defamation lawsuits could be a deterrent and don’t even get me started on scammers they’re going to
Have a field day with this they could create adverts for products that don’t exist investment opportunities for things that don’t exist YouTube could be filled with AI generated trash so what can be done about this feature scenario a watermark isn’t going to be enough as that can just be cropped out
Wouldn’t it just be great if we could render AI videos that contain some kind of digital marker that tells us that it’s AI generated the good news is that this exists well kind of so what is this digital marker well in February of 2021 the BBC Microsoft Adobe and a few other
Companies got together and realized hey we might have a little problem with generative Ai and misinformation on the horizon their solution was the c2p standard a technical marker that embeds metad data into media and is used for verifying its origin the ctpa standard is also being adopted by camera manufacturers news
Stations and of course open Ai and Sora the metadata also can be edited without anyone else knowing however there’s a problem something as simple as setting a screenshot and resaving the image can destroy that metadata it’s a tricky one but of all the companies Tik Tok looks like they
Might be on to something so Tik Tok issues warnings to viewers when a video might be AI generated but how does it know when a user uploads an AI video they have the option to tag it now normally this wouldn’t work very well and You’ miss a whole bunch but here’s
Where the clever part comes in the hope is that there’s enough honest labels on these AI videos that you can train another AI to learn to distinguish and learn the patterns of what makes an AI video look like an AI video and it’s eventually going to do that better than
Any human could if it’s actually going to work or not is another question but regardless I think we need some kind of automatic AI detection to be built into all social media platforms this is simply so AI generated videos don’t even get the chance to start to spread it’s not perfect but it’s
Something so in all of this there’s one question remaining how does Will Smith eating spaghetti look like now using Sora let’s see honestly pretty good but there’s actually a catch that’s the real Will Smith at the bottom it’s not a SORA generated video Will Smith himself
Posted this meme to play along with the trend of his viral past a monstrosity resurfacing a lot of people were fooled ladies and Gent gentleman hold on to your seat belt things are getting weird so overall we shouldn’t be too negative at the end of the day this is
An amazing tool with countless applications it can streamline the process for animators visual effects artists and videographers those who utilize this tool well will succeed just like Photoshop versus an analog camera or CGI overtaking cell animation and digital audio workstations enveloping recording studios in 5 years will we see an AI
Generated featurelength film with just a couple of people and zero budget in this new world real movies will still have their place as an actor’s performance combined with a great director and Brilliant cinematographer will all come together to make things that people still want to see perhaps the bar will
Just be raised only time will tell at the time of writing Sora is not publicly available and it’s only accessible to a small group of researchers and creative Professionals for feedback and testing but this containment isn’t going to last to avoid a world where we have no
Idea what’s real or not we need to start working on robust detection systems built into the very platforms where these videos spread I want to leave you by looping back to the start of this episode with the two Reddit comments discussing AI from 3 years ago this is a
Great example of how humans fail to grasp exponential progress oh and by the way Sam ultman wants to dominate the entire supply chain of AI he’s asking investors to cough up seven trillion dollars to make his own AI chips and as mentioned in a previous episode Sam
Wants to build an AI phone with Johnny IV of Apple Fame he seems like he out of control or in complete control open AI was supposed to be open source AI for the benefit of all now they’re closed off and only for profit who is this Sam
Guy anyway and what does he want well today is your lucky day I’ve done a full episode on his story so I’ll leave that in the link below if you want to see the cold fusion podcast episode on Sora I’ll leave a link for that below anyway that’s about it from me I
Do want to just give a quick shout out to those who came to see me at the everything electric show in Sydney it was really really cool to meet some of you guys and just have some general conversations it really made me see that these videos that I do make do have an
Impact so thanks for that and thanks to all of you who are watching commenting and supporting it helps keep me going so with that my name is toogo and you’ve been watching cold fusion and I’ll catch you again soon for the next episode cheers guys have a good One cold fusion it’s new Thinking
Video “Sora AI’s Problems [And Solutions]” was uploaded on 02/20/2024 to Youtube Channel ColdFusion