(guess whos the owner moderator of lemmy qwen community hehe >v< )
but yesssss smaller models struggle with anything not task oriented.
ive tried the recent gemma 26B MoE and it seems to show more actual understanding, so that might be an option if u got 32GB of RAM (yes, regular RAM, its has some speed to it)
ive found that when switching from ollama to llama-cpp improved speed drastically, to the point that qwen3.6-35B-A3B now runs at 9 tok/sec instead of 4 (due to multi-token-prediction doing its wonders apparently)
It can run in my GPU. I have Qwen 3.6 27b and 25b, but they need RAM and was too lazy to clean up some RAM for them. Especially for something that seemed rather trivial. I’m sadly stuck with ollama right now for (stupid) reasons. The other is definitely the way to go in the future
oooh i see.
(guess whos the
ownermoderator of lemmy qwen community hehe >v< )but yesssss smaller models struggle with anything not task oriented.
ive tried the recent gemma 26B MoE and it seems to show more actual understanding, so that might be an option if u got 32GB of RAM (yes, regular RAM, its has some speed to it)
ive found that when switching from ollama to llama-cpp improved speed drastically, to the point that qwen3.6-35B-A3B now runs at 9 tok/sec instead of 4 (due to multi-token-prediction doing its wonders apparently)
It can run in my GPU. I have Qwen 3.6 27b and 25b, but they need RAM and was too lazy to clean up some RAM for them. Especially for something that seemed rather trivial. I’m sadly stuck with ollama right now for (stupid) reasons. The other is definitely the way to go in the future
Though I’m waiting for the real OS AI !fosai@lemmy.world
whaaaat what’s the stupid reasons?
whaaaat what’s the stupid reasons?