The defense industry lost the ability to make weapons when crisis hit. The same pattern is eroding software engineering skills. The timelines are identical.
The useful ones are still provided by big companies because the rest of us can’t afford the hardware to train them.
We have computing power in our pockets a million times more powerful than we used to send man to the moon, why do you think we’ll never have enough power?
The EuroLLM project includes Instituto Superior Técnico, the University of Edinburgh, Instituto de Telecomunicações, Université Paris-Saclay, Unbabel, Sorbonne University, Naver Labs, and the University of Amsterdam. Together they created EuroLLM-22B, a multilingual AI model supporting all 24 official EU languages. Developed with support from Horizon Europe, the European Research Council, and EuroHPC, this open-source LLM aims to enhance Europe’s digital sovereignty and foster AI innovation. Trained on the MareNostrum 5 supercomputer, EuroLLM outperforms similar-sized models. It is fully open source and available via Hugging Face.
So long as someone doesn’t want to rely on big tech there will be people pushing for independence just like Linux users such as myself
There are 700B+ parameter open weight models now. Frontier models are in the trillions.
And even that model apparently took a supercomputer to train. I don’t have a supercomputer so I can’t train my own models like I can compile my own software. This is not comparable to running Linux where you can just compile your own kernel or even whole operating system (former Gentoo user here).
I’ve tried running the models my 8 GB card can handle. They’re OK for a quick question, but they won’t be doing anything useful for me.
We have computing power in our pockets a million times more powerful than we used to send man to the moon, why do you think we’ll never have enough power?
Not the person you replied to, but I have thoughts on this point in particular:
Consumer devices have started to slow down their performance improvements because we’re bumping up against the limits of physics
People/corporations with way more money than the average consumer will always be able to run something orders of magnitude more powerful. Any advances that improve things for the average consumer will improve things for rich people/corporations even more.
Training an LLM isn’t really even about compute speed, it’s about access to good training material. The average consumer can’t afford to buy (or pirate) every book in existence like a rich person/corporation can. An average person doesn’t have the ability or time to curate their own training data, but rich people/corporations do.
Training an LLM isn’t really even about compute speed, it’s about access to good training material.
Consumer can’t train from scratch but can fine tune/modify open weights model with their data. There is significant open source training material available, without needing/wanting Harry Potter knowledge.
Because companies are using so much computing power it requires as much electricity as a city. Or you can take your pocket computing resources and see his long it takes to train an LLM.
The useful ones are still provided by big companies
You’re falling for the fallacy that larger parameter size = more useful. That’s just a marketing gimmick by the AI industry to justify larger investments.
These “flagship” models with hundreds of billions of parameters and giant context windows have not achieved proportional gains in benchmarks.
You know what does significantly improve accuracy, even on smaller-sized LLMs? Retrieval-Augmented Generation. There’s literally no reason to use giant, resource-intensive models. You can just give a smaller one access to databases and libraries with all the information it needs to report on.
A 24 billion parameter model is easily self-hostable on consumer hardware, and can be quantized to further reduce hardware requirements for a marginal loss in accuracy. At 8-bit quantization, that only requires 24GB of RAM + overhead. A lot by some standards, but by no means unachievable for a hobbyist. For a medium-sized business, that’s downright negligible. And you can easily expand your context window using a swapfile.
If you don’t have that much RAM, a 12 or 14 billion parameter model, even at 8-bit quantization, is fine if you do RAG and use swap to expand the context window. You can run that even if you only have 16GB RAM total on your hardware.
the rest of us can’t afford the hardware to train them.
Most of the models on huggingface are pre-trained and list their datasets. You can fine-tune and align them yourself, which uses far less resources than pre-training does. You can even use LoRA to further reduce resource needs.
Don’t fall for the lie that commercial LLM APIs are the only viable option.
These “flagship” models with hundreds of billions of parameters and giant context windows have not achieved proportional gains in benchmarks.
It’s not proportional, but they’re still far more useful. And the giant context window is great when working with a nasty complex codebase (hey, I didn’t write the bloody thing, I do modules for an open source ERP). The entire codebase would be tens or hundreds of millions of tokens, but any useful portion of it is still tens or hundreds of thousands.
You know what does significantly improve accuracy, even on smaller-sized LLMs? Retrieval-Augmented Generation. There’s literally no reason to use giant, resource-intensive models. You can just give a smaller one access to databases and libraries with all the information it needs to report on.
Larger LLMs still have RAG.
A 24 billion parameter model is easily self-hostable on consumer hardware, and can be quantized to further reduce hardware requirements for a marginal loss in accuracy. At 8-bit quantization, that only requires 24GB of RAM + overhead. A lot by some standards, but by no means unachievable for a hobbyist.
And a 24 GB graphics card is like a thousand euros. I could afford it, especially since I can claim VAT back, but the majority of the planet’s population can not unfortunately. You could run it on system RAM, but when I run partially GPU/VRAM, partially CPU/RAM, it gets pretty slow for me. And I’m not even talking about a 24 GB model, I’m talking about qwen3:8gb because I unfortunately only have an 8 GB card right now. I built this PC for gaming not LLMs and it’s a few years old.
Maybe once I upgrade to DDR5, it’ll be bearable on CPU though.
Most of the models on huggingface are pre-trained and list their datasets. You can fine-tune and align them yourself, which uses far less resources than pre-training does. You can even use LoRA to further reduce resource needs.
But that’s what I’m talking about, we can only really run pre-trained models locally in the foreseeable future. Pre-trained by who? Usually some AI company (DeepSeek, Qwen, etc). Training is hard even without the hardware requirements, you also need to hoover up tons of data. So the current, already existing, models we’ll be able to run, and thanks to RAG they won’t ever really be fully out of date like the early iterations of ChatGPT were, but they won’t compete all that well compared to whatever models we’ll have in a few years. And even then we’re relying on commercial search engines that the LLMs can use to get access to new data.
Don’t fall for the lie that commercial LLM APIs are the only viable option.
They may not be, but 20€ a month for an LLM that responds quite fast is more affordable than shelling out tons for a better graphics card to get a slow running and less capable LLM. Okay, sure, that’s opex vs capex, but the graphics card also takes a bunch of electricity to run so it kinda evens out on the opex as well.
Commercial AI is currently the only thing that can help me do a day’s worth of refactoring in 30 minutes. Yes I’ll still have to check the output critically. But overall it can handle such things, as long as there is good test coverage and you understand what the code is supposed to do. I’ve tried using a local LLM for this and it took several minutes to essentially tell me it has no idea what I want it to do. That I believe was qwen2.5-coder:7b and the default context window, though I’ve since upgraded to qwen3:8b and increased context window from the default 4096 that ollama thinks is appropriate for my system (causing the aforementioned offload to CPU/system RAM, making it even slower) to… well 64k in the config, but it’s only really showing up as 40960 in ollama ps so I’m guessing I hit the limit for either the model or my hardware.
So how would I create such an “Open Source” model? They don’t share the data used to create them do they? Let’s not even get started on how much computing power I would need to train one of those things. These selfhosted models solve nothing except some data privacy issues. Sure you no longer send all your code to a shady AI company but you are still 100% dependent on them sharing their models.
So how would I create such an “Open Source” model? They don’t share the data used to create them do they?
No, and going by the OSI definition of “open source AI” they don’t have to, acknowledging that the training material is often copyrighted and can’t be shared.
It’s a strange definition of “open source”, one where you’re not actually allowed to see the source.
The model is named Apertus – Latin for “open” – highlighting its distinctive feature: the entire development process, including its architecture, model weights, training data and methods, is openly accessible and fully documented.
It’s mad easy to build your own Linux from scratch in comparison to building an LLM. You can have your own distro running in like an hour. With buildroot you can have it in even less than that.
I agree, but given the context of the discussion and the commonly accepted definition of Linux from Scratch, what else do you think they could have meant other than building a complete Linux based operating system from source?
Does that actually match with the discussion in your opinion? The discussion about building open source projects? Does the information I provided not help in understanding my response?
Are you being serious or just trying to be pedantic…?
Yes it does match the discussion. They were discussing creating an LLM, from scratch, and no one was implying hitting a button and having it get created automatically for them as being the level of difficulty that the process would entail.
So really you’re the one who took this off on a tangent.
The average person wouldn’t be building an open source LLM either. I don’t think I follow. I was just saying that your comparison wasn’t going to hit correctly at all due to how easy it actually is to build Linux and a full Linux distribution.
The average person wouldn’t be building an open source LLM either
Yeah that’s why I’m saying:
Do you build your own Linux from scratch? If so why would you assume you can build an LLM from scratch?
The OP is basically saying it’s not really open source unless I can personally build it! Which I am saying I don’t think is a requirement of open source software (your personal ability to compile software does not negate from it it’s open sourceness)
tbh I wouldn’t have an idea on how to build either, they are way above my skill level, i have no idea how to make a linux distro either, but i’m certain most are open source
Today, we’re launching Unsloth Studio (Beta): an open-source, no-code web UI for training, running and exporting open models in one unified local interface.
This was only recently released, maybe in the future we’ll have training material uber compressed down in an open source format that anyone with the skill and knowledge can use and different ‘distro’ releases of LLM’s, we already have tons of smaller models especially from European Universities and others
The EuroHPC Joint Undertaking (JU) provides access to the computing time and support services offered by the EuroHPC AI Factories. The AI Factories are open to European users from various sectors, including industry, research, academia and public authorities.
We are only like 3-4 years into AI going mainstream if that, afaik the heat death of the universe is at least 1000 years away, we have lots of time to work and improve on them, I can only wonder where they will be at in 100 years, so I try not to make any damning facebook boomer tier statements about the future
Look at the state of software today. Every corporation and government are blindly sticking with Microsoft, Google or similar. Even though there are some ideas to move away and embrace OSS, I doubt it will happen with governments, even less with corps. I foresee something similar in future with AI.
Sure but it’s mostly been that way for awhile. The players on the board shift, but it’s almost always Java, or Microsoft’s flavor of the decade or classic C or objective c or switch or whatever. Are you arguing that big tech will lock down their documentation on APIs and proprietary language behind their own AIs so that developers are focred to “vibe code” them through AI interaction only, and open source models will be unable to train on them?
In the long run, deep-pocketed companies will not have any distinct advantage on developing core models, but they will always have an advantage on computing infrastructure (both for training and servicing queries) and access to content (either by owning major content sources like social media properties or having exclusive license access to key sources).
For which you still need massive amounts of memory and compute to run reliably. That, and the fact that chatbots and agents nowadays rely on all sorts of proprietary customizations, outside of the realm of LLMs, to perform certain tasks.
The gap will take decades to close, if it ever does.
Previously, to benefit from the full power of LLMs, you had to skew to higher parameter models. Recent developments in models like Gemma 4 and Qwen-3.6-35B-A3B demonstrate advanced capabilities such as tool-calling which enable LLMs to search the web, interact with external APIs and file systems, troubleshoot live systems and fundamentally reason about topics that lie outside of their initial training data.
The gap will take decades to close, if it ever does.
2026’s average gaming PC is massive amounts of memory and compute apparently
Any model that can run on 16GB or less, is not going to be any close in real world tasks, to any other cloud based model. It just cannot be. There are people out there running Qwen on the Mac Studio with 96GB, and it falls short of cloud based models in both performance and speed.
lol there are plenty of open source models in the top 100 with multiple SOTA models released in the last few months alone
The top 100 of what, exactly? Many blended benchmark results are notoriously biased, and LLMs “cheat” on benchmarks on every single opportunity, so it is still hard to tell, outside of real world tasks and speed, which models are actually better than others.
But regardless, the main point of the gap is resources. Even if the average gaming computer was really enough to run meaningful models, the vast majority of the world wouldn’t have access to it, even more so in this day and age, where a single RAM stick couldn’t be bought with a whole monthly salary in most parts of the world.
But regardless, the main point of the gap is resources
What makes you think we won’t have the resources in the future?
Any model that can run on 16GB or less, is not going to be any close in real world tasks, to any other cloud based model. It just cannot be.
Well you can compare Gemma 4 running in LM Studio on an average gaming PC to ChatGPT3.5 and you tell me? Or is your benchmark purely based on right at this very moment between open source models today vs cloud today?
For reference Gemma 4 is 26 billion parameters, gp3 thought to be over 175 billion and of course had no optimisations like MoE, it was searching its entire library every single question so was rather slow as well
We know as well that there is no slow down in pushing for optimisations, Deepseeks initial release was the initial driver for you don’t have to just scale up using hardware alone
They’re also pushing with Chinese native chips from Huawei trying to diversify away from nvidia holding the crown
The problem I’ve got is that you all have a god of the gaps, the conversation I was having 3 years ago was different to 2 years ago was different to 1 year ago, I was told AI could never do songs good enough then suddenly people were worried they couldn’t tell the difference, then they said they could never do movies, now apparently not only is it good enough it’s hilarious
The open source LLM’s we have today are incredible and in the last few months we’ve had Qwen, GLM, Nemotron/Nvidia, Mistral, Google and heeaaps of others released, it feels like you’re just looking for a reason to be dour and pessimistic but that’s just me
The problem I’ve got is that you all have a god of the gaps, the conversation I was having 3 years ago was different to 2 years ago was different to 1 year ago
And I guess the problem I have with you, is that you seem to think that you can get results with 16GB, competitive with models that run on a Blackwell 6000 with 96GB, while ignoring the fact that the vast majority of the people in the world are running GPUs with 4 to 8 GB of VRAM, if they even have access to GPUs, at all.
That’s the gap. Most people don’t have the kind of money you think they do, and even those who do have some money, they will never achieve the same results as with cloud models, because if there’s a state of the art optimization that makes models 10 times smaller, cloud models will become 10 times bigger with that advantage. It’s pretty simple.
Open source models ? https://huggingface.co/ + https://lmstudio.ai/
The useful ones are still provided by big companies because the rest of us can’t afford the hardware to train them.
AI won’t be “democratized” anytime soon like the rest of the computer software world has.
We have computing power in our pockets a million times more powerful than we used to send man to the moon, why do you think we’ll never have enough power?
I have already pointed out https://eurollm.io/
The EuroLLM project includes Instituto Superior Técnico, the University of Edinburgh, Instituto de Telecomunicações, Université Paris-Saclay, Unbabel, Sorbonne University, Naver Labs, and the University of Amsterdam. Together they created EuroLLM-22B, a multilingual AI model supporting all 24 official EU languages. Developed with support from Horizon Europe, the European Research Council, and EuroHPC, this open-source LLM aims to enhance Europe’s digital sovereignty and foster AI innovation. Trained on the MareNostrum 5 supercomputer, EuroLLM outperforms similar-sized models. It is fully open source and available via Hugging Face.
So long as someone doesn’t want to rely on big tech there will be people pushing for independence just like Linux users such as myself
There are 700B+ parameter open weight models now. Frontier models are in the trillions.
And even that model apparently took a supercomputer to train. I don’t have a supercomputer so I can’t train my own models like I can compile my own software. This is not comparable to running Linux where you can just compile your own kernel or even whole operating system (former Gentoo user here).
I’ve tried running the models my 8 GB card can handle. They’re OK for a quick question, but they won’t be doing anything useful for me.
Not the person you replied to, but I have thoughts on this point in particular:
Consumer can’t train from scratch but can fine tune/modify open weights model with their data. There is significant open source training material available, without needing/wanting Harry Potter knowledge.
Because companies are using so much computing power it requires as much electricity as a city. Or you can take your pocket computing resources and see his long it takes to train an LLM.
You’re falling for the fallacy that larger parameter size = more useful. That’s just a marketing gimmick by the AI industry to justify larger investments.
These “flagship” models with hundreds of billions of parameters and giant context windows have not achieved proportional gains in benchmarks.
You know what does significantly improve accuracy, even on smaller-sized LLMs? Retrieval-Augmented Generation. There’s literally no reason to use giant, resource-intensive models. You can just give a smaller one access to databases and libraries with all the information it needs to report on.
A 24 billion parameter model is easily self-hostable on consumer hardware, and can be quantized to further reduce hardware requirements for a marginal loss in accuracy. At 8-bit quantization, that only requires 24GB of RAM + overhead. A lot by some standards, but by no means unachievable for a hobbyist. For a medium-sized business, that’s downright negligible. And you can easily expand your context window using a swapfile.
If you don’t have that much RAM, a 12 or 14 billion parameter model, even at 8-bit quantization, is fine if you do RAG and use swap to expand the context window. You can run that even if you only have 16GB RAM total on your hardware.
Most of the models on huggingface are pre-trained and list their datasets. You can fine-tune and align them yourself, which uses far less resources than pre-training does. You can even use LoRA to further reduce resource needs.
Don’t fall for the lie that commercial LLM APIs are the only viable option.
It’s not proportional, but they’re still far more useful. And the giant context window is great when working with a nasty complex codebase (hey, I didn’t write the bloody thing, I do modules for an open source ERP). The entire codebase would be tens or hundreds of millions of tokens, but any useful portion of it is still tens or hundreds of thousands.
Larger LLMs still have RAG.
And a 24 GB graphics card is like a thousand euros. I could afford it, especially since I can claim VAT back, but the majority of the planet’s population can not unfortunately. You could run it on system RAM, but when I run partially GPU/VRAM, partially CPU/RAM, it gets pretty slow for me. And I’m not even talking about a 24 GB model, I’m talking about qwen3:8gb because I unfortunately only have an 8 GB card right now. I built this PC for gaming not LLMs and it’s a few years old.
Maybe once I upgrade to DDR5, it’ll be bearable on CPU though.
But that’s what I’m talking about, we can only really run pre-trained models locally in the foreseeable future. Pre-trained by who? Usually some AI company (DeepSeek, Qwen, etc). Training is hard even without the hardware requirements, you also need to hoover up tons of data. So the current, already existing, models we’ll be able to run, and thanks to RAG they won’t ever really be fully out of date like the early iterations of ChatGPT were, but they won’t compete all that well compared to whatever models we’ll have in a few years. And even then we’re relying on commercial search engines that the LLMs can use to get access to new data.
They may not be, but 20€ a month for an LLM that responds quite fast is more affordable than shelling out tons for a better graphics card to get a slow running and less capable LLM. Okay, sure, that’s opex vs capex, but the graphics card also takes a bunch of electricity to run so it kinda evens out on the opex as well.
Commercial AI is currently the only thing that can help me do a day’s worth of refactoring in 30 minutes. Yes I’ll still have to check the output critically. But overall it can handle such things, as long as there is good test coverage and you understand what the code is supposed to do. I’ve tried using a local LLM for this and it took several minutes to essentially tell me it has no idea what I want it to do. That I believe was qwen2.5-coder:7b and the default context window, though I’ve since upgraded to qwen3:8b and increased context window from the default 4096 that ollama thinks is appropriate for my system (causing the aforementioned offload to CPU/system RAM, making it even slower) to… well 64k in the config, but it’s only really showing up as 40960 in
ollama psso I’m guessing I hit the limit for either the model or my hardware.If you can show me a local LLM working properly on consumer-grade hardware and doing useful coding tasks in a reasonable amount of time, I may just upgrade my hardware to run a local LLM, but until then I’m thinking of getting an additional subscription to complement Claude Code when running out of tokens. Probably Mistral because fuck ClosedAI. I did Google people’s experiences with running Claude Code and the like on local LLMs and it appears even people with good hardware and better models are having subpar experiences
For just asking questions though, you’re right, a local LLM works great. It’s just not what I need out of an LLM much of the time.
So how would I create such an “Open Source” model? They don’t share the data used to create them do they? Let’s not even get started on how much computing power I would need to train one of those things. These selfhosted models solve nothing except some data privacy issues. Sure you no longer send all your code to a shady AI company but you are still 100% dependent on them sharing their models.
No, and going by the OSI definition of “open source AI” they don’t have to, acknowledging that the training material is often copyrighted and can’t be shared.
It’s a strange definition of “open source”, one where you’re not actually allowed to see the source.
https://ethz.ch/en/news-and-events/eth-news/news/2025/09/press-release-apertus-a-fully-open-transparent-multilingual-language-model.html
There is also a move into synthetic data and human trained so we will have to see where the training data goes copyright wise in the future
Do you build your own Linux from scratch? If so why would you assume you can build an LLM from scratch?
It’s mad easy to build your own Linux from scratch in comparison to building an LLM. You can have your own distro running in like an hour. With buildroot you can have it in even less than that.
That doesn’t sound like “from scratch”
I agree, but given the context of the discussion and the commonly accepted definition of Linux from Scratch, what else do you think they could have meant other than building a complete Linux based operating system from source?
The thing that literally everything other than apparently Linux means when someone says building it from scratch?
Does that actually match with the discussion in your opinion? The discussion about building open source projects? Does the information I provided not help in understanding my response?
Are you being serious or just trying to be pedantic…?
Yes it does match the discussion. They were discussing creating an LLM, from scratch, and no one was implying hitting a button and having it get created automatically for them as being the level of difficulty that the process would entail.
So really you’re the one who took this off on a tangent.
I have no idea what you’re talking about
… Then why did you use it as an example?
Because the average person is not building Linux from scratch nor would they know how to
The average person wouldn’t be building an open source LLM either. I don’t think I follow. I was just saying that your comparison wasn’t going to hit correctly at all due to how easy it actually is to build Linux and a full Linux distribution.
Yeah that’s why I’m saying:
The OP is basically saying it’s not really open source unless I can personally build it! Which I am saying I don’t think is a requirement of open source software (your personal ability to compile software does not negate from it it’s open sourceness)
tbh I wouldn’t have an idea on how to build either, they are way above my skill level, i have no idea how to make a linux distro either, but i’m certain most are open source
https://unsloth.ai/docs/new/studio
This was only recently released, maybe in the future we’ll have training material uber compressed down in an open source format that anyone with the skill and knowledge can use and different ‘distro’ releases of LLM’s, we already have tons of smaller models especially from European Universities and others
https://digital-strategy.ec.europa.eu/en/policies/ai-factories
We are only like 3-4 years into AI going mainstream if that, afaik the heat death of the universe is at least 1000 years away, we have lots of time to work and improve on them, I can only wonder where they will be at in 100 years, so I try not to make any damning facebook boomer tier statements about the future
Look at the state of software today. Every corporation and government are blindly sticking with Microsoft, Google or similar. Even though there are some ideas to move away and embrace OSS, I doubt it will happen with governments, even less with corps. I foresee something similar in future with AI.
Are you sure?
https://www.rfi.fr/en/france/20260417-france-to-remove-windows-from-government-computers-in-sovereignty-push
https://tuta.com/blog/countries-ditching-microsoft-choosing-linux-digital-sovereignty
It does not take much for things to change, you might like this:
We’ve Hit A Wall With Transport. Here’s Why | Black Swans 3 | If You’re Listening
https://youtu.be/o1R6Aq19A6Y?t=1281
Great, all we need is a few decades and a world superpower becoming world-threateningly corrupt
Sure but it’s mostly been that way for awhile. The players on the board shift, but it’s almost always Java, or Microsoft’s flavor of the decade or classic C or objective c or switch or whatever. Are you arguing that big tech will lock down their documentation on APIs and proprietary language behind their own AIs so that developers are focred to “vibe code” them through AI interaction only, and open source models will be unable to train on them?
In the long run, deep-pocketed companies will not have any distinct advantage on developing core models, but they will always have an advantage on computing infrastructure (both for training and servicing queries) and access to content (either by owning major content sources like social media properties or having exclusive license access to key sources).
For which you still need massive amounts of memory and compute to run reliably. That, and the fact that chatbots and agents nowadays rely on all sorts of proprietary customizations, outside of the realm of LLMs, to perform certain tasks.
The gap will take decades to close, if it ever does.
2026’s average gaming PC is massive amounts of memory and compute apparently
lol there are plenty of open source models in the top 100 with multiple SOTA models released in the last few months alone
There’s also smaller LLM’s being made like https://eurollm.io/ which excel in their own ways
Funny that just came up: https://discourse.ubuntu.com/t/the-future-of-ai-in-ubuntu/81130?=0
😁
Any model that can run on 16GB or less, is not going to be any close in real world tasks, to any other cloud based model. It just cannot be. There are people out there running Qwen on the Mac Studio with 96GB, and it falls short of cloud based models in both performance and speed.
The top 100 of what, exactly? Many blended benchmark results are notoriously biased, and LLMs “cheat” on benchmarks on every single opportunity, so it is still hard to tell, outside of real world tasks and speed, which models are actually better than others.
But regardless, the main point of the gap is resources. Even if the average gaming computer was really enough to run meaningful models, the vast majority of the world wouldn’t have access to it, even more so in this day and age, where a single RAM stick couldn’t be bought with a whole monthly salary in most parts of the world.
What makes you think we won’t have the resources in the future?
Well you can compare Gemma 4 running in LM Studio on an average gaming PC to ChatGPT3.5 and you tell me? Or is your benchmark purely based on right at this very moment between open source models today vs cloud today?
For reference Gemma 4 is 26 billion parameters, gp3 thought to be over 175 billion and of course had no optimisations like MoE, it was searching its entire library every single question so was rather slow as well
We know as well that there is no slow down in pushing for optimisations, Deepseeks initial release was the initial driver for you don’t have to just scale up using hardware alone
https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
They’re also pushing with Chinese native chips from Huawei trying to diversify away from nvidia holding the crown
The problem I’ve got is that you all have a god of the gaps, the conversation I was having 3 years ago was different to 2 years ago was different to 1 year ago, I was told AI could never do songs good enough then suddenly people were worried they couldn’t tell the difference, then they said they could never do movies, now apparently not only is it good enough it’s hilarious
https://www.youtube.com/watch?v=fgHn7PI55J4
The open source LLM’s we have today are incredible and in the last few months we’ve had Qwen, GLM, Nemotron/Nvidia, Mistral, Google and heeaaps of others released, it feels like you’re just looking for a reason to be dour and pessimistic but that’s just me
Any way I’m off to sleep, have a good one :)
And I guess the problem I have with you, is that you seem to think that you can get results with 16GB, competitive with models that run on a Blackwell 6000 with 96GB, while ignoring the fact that the vast majority of the people in the world are running GPUs with 4 to 8 GB of VRAM, if they even have access to GPUs, at all.
That’s the gap. Most people don’t have the kind of money you think they do, and even those who do have some money, they will never achieve the same results as with cloud models, because if there’s a state of the art optimization that makes models 10 times smaller, cloud models will become 10 times bigger with that advantage. It’s pretty simple.