Major AI NEWS#19 (GPT-6 News, Amazons New Q Model, Realtime AI Conversations and More)
The Major AI NEWS#19 video covers a wide range of AI developments from various companies, including OpenAI, Microsoft, Google, and Amazon. The video discusses the release of GPT Vision API, which allows for narration of images using GPT, showcasing its capabilities in unique ways such as narrating a person’s life. Microsoft introduces Co-Pilot, an alternative to custom chat GPTs, aimed at improving AI customization. Google Deep Mind collaborates with YouTube to create AI-generated music and experiments in music creation. Ethan mck presents a paper on using GPT 4 for self-driving cars, highlighting its potential to excel in autonomous driving scenarios. Other developments include Inflection 2 surpassing Google’s Palm 2 in large language models and Amazon’s release of the Q model with a focus on knowledge-based chatbots. The video also mentions real-time AI conversation tools from Crea AI and Be My Eyes, assisting those with vision loss through AI-generated descriptions. Overall, the video showcases the rapid evolution and diverse applications of AI technology across various sectors.
Watch the video by TheAIGRID
Video Transcript
So with Sam Alman being fired and with so much going on in the open AI Saga there was a lot of AI information that you actually might have missed because it wasn’t just open AI that was doing developments there were tons of other companies producing lots and lots of AI
Models and lots and lots of AI research so that’s what we’re going to be showing you in this week’s video of what happened in AI so without further Ado let’s get started one of the first things that I did see and this was a use case for GPT and this was something I
Can’t believe I missed this and I’m sure that some of you did as well but this was something that was pretty pretty crazy so if you know GPT open AI they basically released a GPT for vision but they released the GPT Vision API which means that people can do some really
Really cool things so essentially someone used GPT for vision and an 11 Labs python script so you can basically narrate what’s going on and this guy used David atra to narrate his entire life and I think if you’re someone who’s seen David atber before I think you
Should check this out and you understand truly how amazing this is here’s a quick demo of what happens when David adur narrates your life on the left I’m going to start my camera which every 5 Seconds uh takes a photo from my webcam on the right I’m going to start David adur
And now as I move around he here we have a remarkable specimen of homo sapiens distinguished by his silver circular spectacles and a man of tousled curly locks he is wearing what appears to be a blue fabric covering which can only be assumed to be part of his mating display
Look closely at the subtle arch of his eyebrow it’s as if he’s in the midst of an intricate ritual of curiosity or skepticism the backdrop suggests a sheltered habitat possibly a communal feeding area or Watering Hole ah in its natural environment we observe the sophisticated Homo sapiens engaging in the critical ritual of
Hydration this male individual has selected a small cylindrical container likely filled with life sustaining H2O and is tilting it expertly towards his intake orifice such Grace such Poise then of course we had someone who also used the same one to get someone to critique their designs directly in figma
And that person that I’m talking about was of course Steve Jobs I think something like this is really cool and interesting because imagine you’re someone who’s maybe a software engineer and you’re working on a product maybe you’re someone who’s working on product design and you can take the voice of the
Most noted person in your industry ask chat BT to essentially talk in whoever it is style and have them the rate to you certain changes I think something that is that technologically advanced that you can do is a real application use case it might not be that good in
Terms of the pricing because GPT 4 vision is quite expensive but I still feel like this just shows us how creative these models are and why this is really really cool image shows a computer screen with a web-based application open it appears to be an image gallery within a Content
Management system or a digital Asset Management tool the left side of the screen displays a navigation menu and there’s a search bar at the top suggesting functionality for users to find specific images the main panel on the right features a collection of vibrant colorful artwork depicting various subjects such as portraits and
Pop art inspired designs in assessing the design of this interface there’s a coherence in the color pallet used within the artworks that gives the gallery a unified aesthetic however from a usability standpoint it’s unclear what the focal point is the search functionality should be more prominently featured if it’s an essential tool for
Interaction furthermore the navigation on the left warrants a stronger visual hierarchy to help users intuitively move through categories or functions simplify to amplify let the content breathe and guide users with a clear path through your design then of course we had Microsoft releasing its alternative to custom chat gpts essentially what you
Can do here is you can build Standalone co-pilots which are essentially gpts with a no Cod tool and essentially what this allows you to do is to collect your gpts to your Microsoft data sources quite like how Google bod has access to all your Gmail and all that stuff this
This is Microsoft’s version of that they did release this short trailer which you’re seeing on screen now and I think like we said the style of getting more customized AI models to our computers is definitely going to be a direction that we’re heading in it’s going to make them
More useful it’s going to make them far more effective and this is just the beginning on what we are starting to see because even some of the gpts that we had before and the stuff that with custom gpts sometimes we’re seeing that they aren’t always accurate and don’t
Always listen to our instruction so with co-pilot Microsoft actually did do an entire event but this is essentially something that you could use with Microsoft 365 so if you’re someone that regularly uses documents if you regularly use PowerPoint if you regularly use Hotmail or any of Microsoft’s products then this is going
To be something that you are going to want to use because as you know AI with your data is just literally insane just trust me I’ve used it not this particular software but I’ve used AI with my own data and it is far superior than just asking chat gbt and having to
Input the data in every single time then of course we had something which was actually quite unexpected but this was a development that I did know was brewing and this is of course Google Deep Mind combining their capabilities with YouTube it says today we’re sharing a
Sneak peek at our first set of AI related music experiments dream for shorts and music AI tools built in collaboration with Google Deep Mind now previously we did cover some stuff that was by Google that was essentially music and when I did look at the previous models that were done by Google they
Were actually really really effective and honestly they were quite surprising so it’s clear that Google is still working on music Ai and that Google Deep Mind have been working on this for quite some time now I will try and show you a clip from the trailer to see if it
Actually does work but I think this is going to change everything the reason I say that is because YouTube shorts are an inherently viral platform so what do you think is going to happen when people don’t want to use cop copyrighted music they start using AI generated music are
We going to see that shift away from and then maybe we’re going to get music tracks that are essentially AI generated be the ones that are going to be consistently viral now Deep Mind does have a page about this where they talk about transforming the future of Music
Creation and you can see that announcing our most advanced music generation model and two new AI experiments designed to open a new playground for creativity from Jazz to heavy metal techno to opera music is love and today in Partnership we’re announcing Google deep Minds Lara our most advanced AI music generation
Model to date and two AI experiments for creativity so essentially the reason I do believe that this is really cool is not just the fact that it’s going to be AI generated tracks that you can use it’s actually going to be soundtracks then people can create themselves people
Can literally start humming a soundtrack and then immediately get themselves some music now I think it’s good that they’re testing this out on YouTube short because what we’re going to see is how it actually does work across that section of YouTube now what I also do find interesting that people might not
But I do find this really interesting is that they talk about water masking AI generated audio with synth ID so it says our team is also pioneering responsible deployment of our Technologies with the best inclass tools for watermarking and identifying synthetically generated content so any of the content published
By our Laria model will be watermarked with stin ID which is essentially something that they also use on IM and Google’s Cloud vertex AI so basically what it does is with every single soundtrack it adds a visual spectrogram to add a digital Watermark and essentially that means that if there is
AI generated audio you’re going to be able to know exactly if that is AI generated or not then we have something cool from Ethan mck we essentially have this paper which is really interesting it’s about GPT 4 for self-driving now the reason that this is really cool is because essentially this paper showcases
Using using GPT for’s Vision abilities to use a car to self-drive now essentially what they did was they essentially had a prompt and then essentially it says describe what you see and what you plan to drive next and it they gave it a bunch of different scenarios because they wanted to see
Exactly how gp4 with vision would analyze these scenes and how it would score currently at nighttime it does do really well you can see that it manages to analyze the scene and then predict exactly what it would do in the car and then you can see here that it brings the
Conclusion so for the capabilities of autonomous driving it essentially says that GPT 4 exhibits capabilities that have the potential to surpass those of existing autonomous Driving Systems in aspects such as scenario understanding intent recognition and driving decision-making in corner cases GPT 4 leverages its Advanced understanding capabilities to handle out of
Distribution scenarios and can accurately assess the intentions of surrounding traffic participants and I do think that that is is something where it’s going to excel because where GPT 4 is trained on millions of millions of images and millions of different scenarios especially with it llm autonomous driving isn’t it’s not
Trained in that similar way so that’s why GPT 4 with vision excels in that area it says moreover it can infer the underlying motives Behind These behaviors as highlighted in section 4 we witnessed the performance of GPT 4 with vision in making continuous decisions on open roads and it can even interpret the
User interface of navigation apps and guiding drivers in their decision-making processes which is clearly Superior than an AI that can literally just drive then of course we do have some limitations because with every single large language model and AI system that’s currently available they aren’t without limitations it does sometimes struggle
To recognize left and right there is sometime struggle with traffic light recognition spatial reasoning also is a issue because you need to be able to move the car in the effective enough space but overall I do think that something like this is really interesting because it it showcases that
With more multimodal capabilities we’re getting things that are more and more increasingly capable in ways that we genuinely didn’t think then of course what we do have is anthropic releasing their model claw 2.1 which offers an increased 200k context window and a two times decrease in hallucination rates
There’s system prompts tool use and updated pricing now one thing I did find interesting about this model was the fact that they did something where they essentially nerfed the model now people might not agree with me but I believe that there is a large majority of people
That do because essentially what you can see here is that Claude 2 is in the purple and then claw 2.1 is in the dark purple I really did wish that they use completely different colors because that just doesn’t make sense but you can see that although 2.1 gets many less
Questions wrong it also declines to answer many different more questions which leads to the usability of the model being not as good and I did do some browsing on Reddit and I did find a really really interesting post where they do talk about how bad Claude is but
I do want to say that we actually did just drop a complete guide on claw 2.1 contains absolutely everything that you’re going to need to know if you do want to use claw 2 because I looked around online and there weren’t many Deep dive tutorials that give you absolutely everything including every
Prompt you’re ever going to need how to use it to code how to use it to make stories from how to use the API so that was something that we did do we’ll leave a link to that in the description or you can just find it on the channel as one
Of our recent uploads however with the 200k context Windows there is actually less hallucination in this model so I guess that it is pretty decent for something so if you do want to use this it’s definitely still something that is usable provided that you do know how to
Use it then of course we have something called play HT 2.0 Turbo essentially this is something that I’ve really wanted from AI for quite some time and I think we might finally just have it essentially real time AI conversations are here so essentially this is something called their conversational AI
Texted speech model with less than 300 milliseconds latency which is blazingly fast so essentially why this is so cool is because not only can you clone any voice in accent you get something that is really really fast which means it has more applications one of the problems
With 11 Labs although it does do very well in terms of the coherence of the speech and we know just how good and accurate The Voice cloning is the problem is is that it does take quite a bit of time to get that audio from The
Source but with this imagine this in an API you’re going to be able to have real time conversations if you manage to combine this with an llm like Chachi BT very very quick hello world Hi how are you hi there I’m calling in regards to the purchase you made last
Week hello Play support speaking please hold on a sec and let me just um pull up your details real quick can you tell me your account email or your phone number I can’t wait to see what people are going to do with this because I do believe that this is really the next
Step because this was a big big hurdle and hopefully we do get some kind of model like this from 11 Labs where we do get that latency reduced quite a bit so it has a lot more applications then of course we do have something very very interesting you can see right here we
Can see here that very recently chat GPT slopen aai what they did was they actually trademarked gpt7 and GPT 6 and I do find that really really interesting because prior to this they only really trademark the large language models just before they came out but you can see
Here with GPT for just before it was released they managed to trademark it and they did actually trademark GPT 5 quite some time ago but whatever happened recently I don’t know what’s been going on at open AI maybe it was the new Q breakthrough they thought you
Know what we know exactly what’s going to be in GPT 6 and in gpt7 we might as well trademark it from now so I think that that is something that many people didn’t pick up on but you can see here that this is official this is open AI Co
This is the US trademark thing the trademark and patent filings so this was something that they recently did file I’m sure that they’re going to get this trademark and uh yeah I just find it pretty crazy not many people have mentioned that they trademarked this but it’s clear that we’re going to be
Getting GPT 6 and gpt7 fairly soon I’m not sure when because we haven’t even got GPT 5 but they trademarked that around four months ago so it was clear that they knew what was going on with GPT 5 from then so that’s definitely going to be something that I am looking
Super forward to then of course we have meta saying today we’re sharing two new advances in our generative AI research these new models deliver exciting results in high quality diffusion based text video generation and controlled Imaging with text instructions this quality is really surprising and the
Reason I like this first video so much is because the way that it allows you to actually edit the initial image is going to be something that is gamechanging many of the time stimes we often get an image from mid Journey or darly 3 and we don’t possess the ability to change
Certain objects in the image I mean you could do it with photoshop’s generative fill but it doesn’t just work as well so a model like this could be really really interesting if implemented into things like Photoshop or darly 3 if they manage to do that then of course they do have
Their emu video so essentially it’s text to video it’s another text to model video and it does allow for diverse characters and range of other different things I think this is really really cool because it shows us that text videos moving pretty quickly and it was always one of those things that I
Thought yeah text the image was good but text a video is not going to catch up because it’s so hard but one thing that I realized is never underestimate an ai’s capability to do something that you never thought it was going to be able to do because this technology although you
Think that it might slow down it doesn’t seem like it for right now so this is also some of the examples of emu edit where essentially you can see them editing a cat with a pink jacket and I do want to say that not only is this
Really good the accurac of the the accuracy of this is really really good like the accuracy is not just something that you wouldn’t use the accuracy of this is something that I absolutely would use so it’s definitely something that’s really really cool and I’m pretty sure they do use something with their
Segment anything model which essentially allows an AI system to accurately get every single object in an image and then accurately identify exactly what that is is so it will be interesting to see how that’s done and you can see yeah it does segment the SPAC suit and then of course
Changes everything about that so that is something that is really really cool and hopefully they do release access to this so then we had something that was really cool from a company called be my eyes and essentially they worked with Microsoft to build something that connects with people who are blind or
Have low vision with cited volunteers and companies so essentially you know that many people have vision loss and some people don’t have full vision entally chat GPT now has vision and what does that mean that that Vision can actually help you with GPT 4 so they
Create this thing called be my Ai and essentially you can use that to essentially get detailed description of images and ask it further questions to a dedicated AI assistant imagine you’re walking through a grocery hour your vision is blurry because that’s just how your vision is one thing that you have
To know that some blind people aren’t completely blind being blind is a spectrum so there’s complete blindness and then there are levels and I think depending on where you are in that scale you’re going to need some kind of vision assistant and this is something that’s really cool you can literally take a
Picture of something whether it be close whether it be far and then of course chat with it you know you could even use the voice thing and then of course say look what is going on here and then it could easily tell you exactly what’s going on so one thing that many people
Don’t realize about AI research is that it definitely does help those who are disadvantaged and this is something that’s really wholesome that I’m glad that the development of AI is actually contributing to then of course we have inflection to The Next Step Up So essentially this is the second most
Capable large language model in the world today and why this is crazy is because they did tell us that they were going to be training larger and larger models and they were going to be working with more and more data and eventually they want to train something that’s far
More powerful than GPT 4 and you can see that it actually does beat Google’s Palm 2 so we have inflection one in the light green then we have inflection 2 in the dark green and you can see that it does surpass Palm 2 on many any of the significant benchmarks so it’s really
Interesting that finally companies are starting to ramp up their capabilities you can see from the left to the right on this table you can see that inflection 2 is at second place claw 2 is at third and then Palm 2 is at fourth so it’s really cool to see how these
Large language models are managing to evolve on different benchmarks and it will be interesting to see if any company can manage to catch up to GPT 4 because as these large language models have been developed it seems like with every single one we do get a step closer
To GPT 4 then essentially we had this tool from Crea Ai and essentially it talks about real time AI is here it says with it a new generation of AI creative tools are coming and I think this is really cool because it gives us a new
Way to edit images and a new creative way to create art pieces that we really didn’t have before so you can essentially use a prompt and then just use basic things to fine tune that prompt and I really like this because a lot of the time we don’t get to control
Where certain items are in that image we don’t get to control the background the foreground the colors I mean yeah we do to an extent but sometimes we have such a specific Creative Vision in our heads and I believe a tool like this is going to allow us to really really get our
Hands into that final final creative area where we can truly master and bring anything to life in high quality and bring down the barrier to entry that is creating our especially one that is as photorealistic as this then there was something that I did find kind of interesting because because Amazon
Released a AI model called q and I don’t know if this was just a Jabba open AI or whatever because Amazon do exclusively work with I think they work with Claude or anthropic but they don’t work with opening eye and essentially they called this model Q I’m not sure if that was
Done because everyone’s talking about Q right now and they wanted to steal the Limelight but essentially this is their large language model now we don’t actually know what’s powering it if I did have to guess I would guess that it’s some unique version of Claude because people are speculating online as
To which model it is but it does say here that Amazon and anthropic announced strategic collaboration to advance generative AI so anthropic has selected Amazon web services as its primary cloud provider and will train and deploy its Future Foundation models on Amazon web services trinium so essentially Amazon
Are benefiting from Claude and claude’s benefiting from Amazon so I would say that this is likely what’s behind the chatbot because many people have just forgotten about Claude but some early experiments by Ethan mik essentially show that Q feels like a GPT 3.5 class model with lots of guard rails that is
Very narrowly focused on a knowledge based so essentially he did a bunch of small tests on how to use q and other stuff like that and he finds that it’s not that good compared to of course GPT 4 but then again this is just something that’s just for business so I’m guessing
That this is a specific use case and isn’t just a general knowledge chatbot likely what we’re going to see is these chatbots roll out into Amazon shopping experience I don’t know when but definitely very soon
Video “Major AI NEWS#19 (GPT-6 News, Amazons New Q Model, Realtime AI Conversations and More)” was uploaded on 11/29/2023 to Youtube Channel TheAIGRID