[CW: Assumes Viewer is Transfem] eggAIrl

tr0xy@lemmy.dbzer0.com · edit-2 19 days ago

[CW: Assumes Viewer is Transfem] eggAIrl

NSFW

maria [she/her]@lemmy.blahaj.zone · edit-2 18 days ago

oooh i see.

(guess whos the ~~owner~~ moderator of lemmy qwen community hehe >v< )

but yesssss smaller models struggle with anything not task oriented.

ive tried the recent gemma 26B MoE and it seems to show more actual understanding, so that might be an option if u got 32GB of RAM (yes, regular RAM, its has some speed to it)

ive found that when switching from ollama to llama-cpp improved speed drastically, to the point that qwen3.6-35B-A3B now runs at 9 tok/sec instead of 4 (due to multi-token-prediction doing its wonders apparently)

tr0xy@lemmy.dbzer0.com · edit-2 18 days ago

It can run in my GPU. I have Qwen 3.6 27b and 25b, but they need RAM and was too lazy to clean up some RAM for them. Especially for something that seemed rather trivial. I’m sadly stuck with ollama right now for (stupid) reasons. The other is definitely the way to go in the future

Though I’m waiting for the real OS AI !fosai@lemmy.world

maria [she/her]@lemmy.blahaj.zone · 17 days ago

whaaaat what’s the stupid reasons?

maria [she/her]@lemmy.blahaj.zone · 17 days ago

whaaaat what’s the stupid reasons?