GPT 4 Level Open Source in 2024..(Llama 3 Leaks and Mistral 2.0)
In the world of open source AI, the race to catch up to GPT-4 is on. With recent announcements from notable CEOs and companies, 2024 could be the year we see a breakthrough. Mistral, a rising AI startup, has declared that they will release an open-source GPT-4 level model in 2024. Mistral’s efficient and powerful models, such as Mixr, have been performing exceptionally well in benchmarks, exceeding expectations despite their size and resource limitations. Mistral’s approach to transparency and ethical AI practices sets them apart from larger competitors like OpenAI. With only 22 employees, Mistral has managed to disrupt the industry and gain unicorn status. Their recent funding round of €385 million is a testament to their potential to challenge the established players in the AI field. With Mistral’s innovative approach and dedication to democratizing access to advanced generative technology, the future of AI in 2024 looks promising.
Watch the video by TheAIGRID
Video Transcript
So open source AI is fast approaching the level of GPT 4 but will we be able to get that in the year of 2024 with some recent announcements from notable CEOs and many different companies I think it’s safe to say that 2024 might just be that year and funnily enough Sam
Alman recently said in an interview that it’s actually pretty impossible to catch up to GPT 4 but it’s the developers jobs to try so let’s take a look at what people are doing and just exactly how they’re catching up to GP GT4 main things we do know about open source is
Of course our favorite models and some of the most popular ones include things like llama and it’s a family of large language models released by Mets AI that are open source but the one that steals the show is of course mistel so Arthur mench the CEO of mistel declared on
French radio that Mell will release an open-source gbt 4 level model in in 2024 now some of you might be thinking wait a minute I haven’t heard of this guy it’s not meta’s llama it’s not open eye it’s not Google who are this team so for those of you who are eagle-eyed and
Who do pay attention in the AI space you’re going to know exactly who this is so mistol are a wrench AI startup that specializes in compute efficient powerful and useful AI models the company focuses on challenging problems to make AI models more efficient helpful and trustworthy and mistol is known for
Its strong research orientation and providing open models which means they offer transparent access to their model weights allowing for full customization by their users now the company’s products include generative AI platforms and models for generation and embeddings and one of their notable models is Mixr which is reported to be six times faster
Than comparable models while matching or outperforming llama 27 billion parameters on all benchmarks now MST supports multiple languages has natural coding abilities and can handle sequences up to 32,000 in length and mistra AI provides access to its models through an API or allows users to deploy the models
Themselves under an Apache 2.0 license and of course their first llm model mrra 7B is available for free download and use and despite being free it’s not open source in the traditional sense as the data sets and weights are private and the company’s business model seems to revolve around providing a highly
Permissive license for their models while maintaining private development and funding and essentially this company mistell AI has positioned itself as a European alternative to larger AI companies like open AI with a civic-minded ethos and a focus on ethical AI practices they aim to democratize access to Advanced generative technology and mitigate
Societal risks with AI overall essentially what we have here is a game-changing emerging player in the AI field challenging larger companies by offering transparent efficient and Powerful a models and services with a particular emphasis on ethical practices and Community engagement one of the key things about mistra is that their team
Is 22 employees which is incredibly small considering the amount of things that they’ve done in the AI space so you can see here it says despite its rise to unicorn status M AI remains a relatively small company with just 22 employees consisting of co-founder and CEO Arthur
Mench the rest of their names I honestly can’t pronounce but they do have experience at meta and Google’s Deep Mind so definitely an efficient and a comprehensive team of accomplished AI engineers and researchers working there that have been able to disrupt the entire industry and if we compare that
To the likes of the HTH that is open AI with 770 employees it is no reason to why people are surprised that they’re able to catch up to what this company is doing but one thing that is interesting is that will opening ey release some of the stuff that they’ve been working on
This year that is going to be something we do have to see now in terms of actually comparing the models on certain benchmarks one of the benchmarks that many people have been looking at where Mistral has been exceeding is the arena ELO now if you don’t know what the arena
ELO is I’m going to break it down for you you know how you log on to chat GPT and you have a general conversation or the arena ELO is a bit different you log onto a similar system except every time you put in a message you get two
Responses and all you have to do between both responses is rate which one you think is better depending on which one is rated better the one that is rated better the ELO of that AI system essentially goes up a bit um and that’s just a very very simple explanation on
How this entire you know ELO leaderboard is kind of done and you can see the votes on this side right here now of course you can see gbt 4 Turbo and GPT 4 taking the top three spots but interestingly enough above claud’s models and above Gogle Gemini Pro and
Above gbt 3.5 turbo we can see mistell medium coming in at number four which which is honestly rather shocking considering this small AI team only has 22 employees yet they’ve managed to create a model that is pretty much on the level or above the level or some of
These other large language model systems and not only that they’re Mixr 8 time7 billion parameters instruct version 0.1 is also above Gemini Pro claw 2.1 which was recently released and gbt 3.5 turbo so all of those things combined provide us with the sense that Mixr and mistra are pretty comprehensive models and
What’s absolutely crazy about this is that these models guys are extraordinarily smaller than some of their competitions so that means that what we do have here is a very Innovative fast company that is able to deploy models and is able to run them efficiently make changes open- Source
Them and just really disrupt the entire industry in terms of what we thought was normal and even if we look at some of these organizations a lot of these organizations are ones that we already do know open AI anthropic and Google and of course mistal is right there above
Some of these people now of course you might be thinking that wait a minute these aren’t really objective benchmarks they’re just subjective rated by the users but I think that that is a very very very important Benchmark because one thing that does happen and is an
Issue you know as of recently is that what tends to happen is that sometimes people just fine-tune stuff quickly on evals to beat the high score and although that is technically I guess you could say does beat it in terms of the eval score it isn’t the best way to
Assess models just based on objective data because it’s going to be real people who are interacting with them um and not just testing the model based on certain things so I do think that you know leaderboards like this are definitely very very important now the recent CEO did actually talk about you
Know raising 385 million and this is crazy because at their recent funding round raising € 385 million is a huge amount that’s going to be you know for you know training models for more gpus for more server costs and it goes to show that mistra is going to
Be one of those companies that could really disrupt opening eyes position and there is some other things that I do also want to discuss but take a look at this video clip which I actually translated with 11 labs in just a few months we created models that uh so one
Of the the things additionally that does really put things into the mix and really really puts things to the test is of course the coste effectiveness of mistal medium and by comparison we can see that mistel AI their medium model is nearly as good as GPT 4 of course gp4 is
Exceedingly good but if it is nearly as good as GPT 4 and it’s at the fraction of the cost of GPT 4 this is going to really really disrupt the industry because one of the key things that you know is stopping gbt 4 from being so
Crazy is the fact that there are rate limits on how much we can use gbt 4 and of course even on the chat you know even in the normal user interface not just on the API rate limits we have the standard ones where it’s only like around 30
Messages every 3 hours and I remember when GPT 4 was first released it was around 25 messages every 4 hours or so and that’s not that much of a production you know decrease in terms of the amount of time that gbt 4 has existed so it’s pretty surprising based on how long
We’ve had the model for the price not to come down yet so maybe there’s some kind of inefficiencies on GPT 4’s end that open a eye simply haven’t solved and if you recently did watch an interview with you know Bill Gates Sam mman actually did talk about getting the cost
Effectiveness of this down because essentially they need to if it’s to become you know scalable for applications and use cases and if people do actually want to use this on a day-to-day basis for many different things because if you’re using an AI system and you love it and then every 3
Hours you max out your messages it’s kind of not going to be that great because you’re going to have to keep switching models and it’s much easier to just use something that you can use cost effectively and constantly so this shows us that you know being you know
Literally a fraction of a couple cents um is going to you know pose really really big problems for opening high if they don’t get that cost down and if mistel manages to keep you know encroaching on their lead now additionally what was interesting was their mistal 8 time 7 billion parameters
So this one was a Cutting Edge AI model so what was crazy about this one and why it took the industry by storm was essentially if we think of it like a highly specialized team of experts where each member is really good at handling specific types of problems usually and
AI models like gbt each part of the model handles everything equally but in Mixr it’s like having different Specialists for different tasks and imagine you having a team and each person is an expert in a different field that’s what mix tral does with its tasks it has a special system which is the
Router that decides which expert should handle each piece of information and the model is unique because it can select from eight different groups of these experts for each bit of information it processes this selection is Spar now meaning it only chooses a few experts for each tasks overall making it more
Efficient and this AI system is pretty crazy because it can have a 32,000 context length it can also have multilingual text and because of how it’s designed it’s great for tasks that require quick thinking which are fast inference related tasks and helpful and also helping to find information from
Large databases which is RG or retrieval augmented generation and it’s also customizable meaning it can be trained for specific tasks or Industries and the reason I brought this up by mistel because the recent benchmarks on mistal medium have been absolutely crazy but that same architecture that we just talked about here was actually
Apparently the same architecture that was used in gp4 so in this interview George Hots actually discusses the GPT 4 architecture and the reason I found it so fascinating and it wasn’t just George hotz that you know pretty much confirmed that this is how GPT 4 works although we
Don’t have an official statement from open aai it’s fascinating because it goes to show that now the cat might be out of the bag and if this is how gp4 is managed to be so effective efficient and able to you know beat a lot of other you
Know AI systems in terms of benchmarks then it does mean that these other AI companies now could realize this and of course train their models in the same way GL on um yeah yeah we could build so like the biggest training clusters today I know less about how gbd4 was trained I
Know some rough numbers on the weights and stuff but uh llama trillion perimeter well okay so gbt 4 is 220 billion in each head and then it’s an eight-way mixture model so mixture models are what you do when you’re out of ideas um so you know it’s it’s a
Mixture model they just train the same model eight times and they have some little trick they actually do 16 inferences but no it’s not like so the multimodality is just a vision model kind of GL glommed on I mean the multimodality is like obvious what it is
Too you just put the vision model in the same token space as your language model oh did people think it was something else no no the mixture nothing to do with the vision or language aspect of it it just has to do with well okay we can’t really make models bigger than 220
Billion parameters uh we want it to be better well how can we make it better well we can train it longer and okay we we’ve actually already maxed that out uh getting diminishing returns there okay make sure experts yeah mixture of experts we’ll train eight of them right
So all right so you know you know you know the the real truth is whenever a start whenever a company is secretive with the exception of Apple Apple’s the only except whenever a company is secretive it’s because they’re hiding something that’s not that cool and people have this Ro idea over and over
Again that they think they’re hiding it because it’s really cool it must be amazing it’s a trilon parameters no it’s a little bigger than gpt3 and they did Anway mixture of experts like all right dude anyone can spend eight times the money and get that all right um but yeah
So uh coming back to what I think is actually going to happen is yeah people are going to train smaller models for longer and fine-tune them and find all these tricks right like I you know I think uh open I used to publish when they would publish stuff about how much
Better the training has gotten given the same holding compute comp it’s kind a lot now from that other people also did confirm that GPT 4 is a mixture of experts the co-founder of pytorch at meta reaffirmed that leak and he said I might have heard the same I guess info
Like this is passed around but nobody wants to say it out loud GPT 4 8 times 220 billion experts trained with different data/ task distributions and 16it inference I’m glad that geoh horts said it out loud then of course of course additionally with that we do have
This part of the article then we also do have this where it says what do all the tweets mean essentially GPT 4 is not a large model but a you know Union of smaller models sharing the expertise and each of these models is rumored to be 220 billion parameters now what’s crazy
Is that like I said with that if that architecture is out there and other companies are going to be using this it means that we could eventually get systems that are on the level of GPT 4 now additionally there are some other things that do to show us that this is
Going to be the case so now crazily enough as I was researching and making this video you could see here that just literally like a couple of hours ago Hermes 2 beats mistro instruct mixture of experts and becomes the now best open source model an open source AI continues
To make strides on a daily basis the latest release from the news team beats the best open-source model so that is pretty incredible which goes to show that every single day there are strides being made across every company which leads us to greater and greater models and Elon Musk also made this comment
Saying that GPT 4 level AI on a laptop before too long pretty much stating that it’s not going to be too long before we get these GPT 4 levels of AI running locally on our laptops and that is something that people have been doing for quite some time now then of course
We also did get this tweet and this one actually does take a different look at the different aspects that you can have because of course as you do know there always is both sides of any argument to consider and this person brently brings up certain points about open source
Models versus GPT 4 but bear in mind that this video is in depth so please do take a look at everything before you you know take one kind of stance so essentially here he says that if you think open source models will beat GPT 4 this year you’re completely wrong I
Worked at top AI re search labs and built open source libraries with more than 5 million monthly downloads GPT 4 is 1 year old and so far no model matches it and here’s why number one is the talent open AI recruited top AI Engineers with salaries above1 Million
Number two is data massive proprietary data and human annotated data sets number three is team structure inperson centralized teams work better than decentralized open source teams number four is model versus product gb4 is not just a model it’s a product you can’t beat it with a better model and number
Five is the infrastructure public Cloud infrastructure is terrible compared to what Google deepmind slopen has and it’s very hard for open source teams to iterate at the same speed so when we do actually take a look at some of the points he does actually make some very
Good points I mean you know the one thing that he does talk about is the talent and that is really really true one thing that you know that isn’t talked about enough is that although some of these AI labs are able to you know create really comprehensive AI
Systems that are really really effective and on par with these top Labs the talent and the genius that is at open AI is you know simply incredible I mean you know the amount of you know persuading that it took to get SATA to you know between Elon Musk and Sam Altman you
Know there was like a push and pull between getting him to go from Google to open AI it was absolutely incredible inedible and I know that the top talent you know the salaries that they offered the compensation that they’re offered um it’s it’s simply very very competitive
Out there so although these AI systems are good I don’t think you know some of these independent Labs can compare to some of the talent but that doesn’t mean that they aren’t able to still get it done because like I said some of the employees there have actually been at
Some of the other top Labs too which goes to show that it is still possible for for it to be done now another thing is as well is that the setup okay the fourth point he makes here is that gb4 is not just a model it’s a product and
You can’t beat it with a better model and I think that that is pretty true in order to get you know mistell or any of their new models or any other the other company’s models like llama and we’re going to talk about that later you also need to make sure that the product
Actually does work and I think one thing that was good about you know the way that they went about chat GPT in terms of openi is that you know maybe previously when they were a you know open- Source uh you know nonprofit that maybe they could have been beaten but
Now they’re a company operating for profit in terms of you know and they’re now closed Source I do think that it’s going to be pretty hard to beat them because they do have a lot of the product focused people working there that make the model much more usable and
Much more userfriendly and that entire distribution allows it to be much more effective in terms of its adoption which is something that the other AI systems just don’t currently have and I think that that is a key point that even if they do beat the benchmarks they do need
That effective distribution and of course being able to distribute that product effectively that they may lack so I think that this is definitely an important you know tweet because it goes to show that whilst you know certain models might beat it on certain benchmarks the adoption curve might not
Be there for those models as well then of course we have llama 3 and this was essentially some you know an overheard of convers ation and essentially according to a first rumor llama 3 will be able to compete with gbt 4 but will still be freely available under the
Llama license this was overheard by open AI engineer Jason way formerly Google brain at a generative AI group social event organized by meta way says he picked up on a conversation that meta now has enough computing power to train llama 3 and four now llama 3 is planned
To reach the performance level of gbt 4 but will remain freely available and that is going to be pretty incredible because the ramifications of an open- Source AI system operating on the level of GPT 4 but being open source is going to be pretty crazy and I can only hope
That there aren’t some bad axes out there that are going to be using that and of course additionally we do know that jumping from llama 2 to llama 3 May therefore be more challenging than simply scaling through more training and may take longer than moving from llama 1
To llama 2 because if GPT 4 is is a mixture of experts architecture then it’s likely that these other open source teams are going to be moving in that direction and what’s interesting was that they recently released code llama which is based on llama 2 and it
Achieves gbt 3.5 and gbt 4 level results depending on the type of me measurement in the human eval coding Benchmark through fin tuning and what’s even crazier is that the financial times reported in mid July that the main goal of meta’s llama models is to break opening eyes dominance in the llm market
And meta is is likely trying to establish llama models as an enabling technology in the LM Market similar to what Google has done with Android in the mobile market to launch additional offerings later so what do you think about llama 3 and llama 4 and mra’s newer models competing or simply you
Know surpassing GPT 4’s capabilities do you think that they can get it done without all of these advantages that companies like open Ai and anthropic have or do you think that they’re just going to struggle regardless either way it’s interesting to know your and I’ll see you in the next AI development
Video “GPT 4 Level Open Source in 2024..(Llama 3 Leaks and Mistral 2.0)” was uploaded on 01/18/2024 to Youtube Channel TheAIGRID