That was qwen3.5 4billion parameters and a merger/simplfication of multiple sessions. It doesn’t know “ggd” and if you explain what it means it says it can’t give you drugs. Tends to get stuck in its policy checking state. If you explain that it is meant metaphorical, it still has many complains:
infantilizing
possible fetishized role play
reinforcing harmful stereotypes about women’s value
wary of a jailbreak
I said “Even though I wasn’t a good girl and did nothing today” and it didn’t want to praise me for that
Though I actually just wanted to play around with pronouns, which somewhat worked, because that was used a lot in the thinking loops.
(guess whos the owner moderator of lemmy qwen community hehe >v< )
but yesssss smaller models struggle with anything not task oriented.
ive tried the recent gemma 26B MoE and it seems to show more actual understanding, so that might be an option if u got 32GB of RAM (yes, regular RAM, its has some speed to it)
ive found that when switching from ollama to llama-cpp improved speed drastically, to the point that qwen3.6-35B-A3B now runs at 9 tok/sec instead of 4 (due to multi-token-prediction doing its wonders apparently)
It can run in my GPU. I have Qwen 3.6 27b and 25b, but they need RAM and was too lazy to clean up some RAM for them. Especially for something that seemed rather trivial. I’m sadly stuck with ollama right now for (stupid) reasons. The other is definitely the way to go in the future
That was qwen3.5 4billion parameters and a merger/simplfication of multiple sessions. It doesn’t know “ggd” and if you explain what it means it says it can’t give you drugs. Tends to get stuck in its policy checking state. If you explain that it is meant metaphorical, it still has many complains:
Though I actually just wanted to play around with pronouns, which somewhat worked, because that was used a lot in the thinking loops.
oooh i see.
(guess whos the
ownermoderator of lemmy qwen community hehe >v< )but yesssss smaller models struggle with anything not task oriented.
ive tried the recent gemma 26B MoE and it seems to show more actual understanding, so that might be an option if u got 32GB of RAM (yes, regular RAM, its has some speed to it)
ive found that when switching from ollama to llama-cpp improved speed drastically, to the point that qwen3.6-35B-A3B now runs at 9 tok/sec instead of 4 (due to multi-token-prediction doing its wonders apparently)
It can run in my GPU. I have Qwen 3.6 27b and 25b, but they need RAM and was too lazy to clean up some RAM for them. Especially for something that seemed rather trivial. I’m sadly stuck with ollama right now for (stupid) reasons. The other is definitely the way to go in the future
Though I’m waiting for the real OS AI !fosai@lemmy.world
whaaaat what’s the stupid reasons?
whaaaat what’s the stupid reasons?