• 1 Post
  • 393 Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle



  • JXL is working alright for me.

    As an example with a lot of dynamic range, here’s a JXL:

    JXL image should be here

    AVIF:

    AVIF image should be here

    Both render in my desktop and iPhone browsers, just fine. I bet at least one renders for you. And I made them from RAWs from a really old camera!

    The problem is, as you say… arbitrary lack of support. As an example, I can’t upload either file to Lemmy. Brand new social media software, and it doesnt’ recognize JXL or AVIF as valid image types, even though they should render just fine? Most image hosts wont take JXL either, hence I had to upload them to litterbox since catbox is down!

    An HEIF, on the other hand, has basically 0 support outside of Apple:

    HIF Image should be here

    All three of these render correctly on my phone, but only the top two do on other devices.


  • All the HEIC files from my camera are still busted.

    To be fair, its a tricky issue. Its camera makers’ fault for using a format no one else wants to touch, and rendering them as HDR files instead of SDR with gain maps, as is standard practice for smartphones.

    …But still, its annoying. They render fine on my iPhone, on Windows, or KDE Linux, out of the box. But they’re completely garbled in Immich :(




  • Because, with a cursory glance, it doesn’t always look like spam.

    A classic example I see starts with “I built a…” in the title, has a wall of text in the description, and actually promises to do something interesting. Only upon deeply inspecting the code (or trying it yourself)… it becomes clear it’s hallucinated nonsense.

    And it’s not always malicious, either. A lot of devs get deep in AI psychosis and truly believe they’ve building something revolutionary with their vibe coding agent.

    And sometimes these projects are interesting!


    Hence it would be EXTREMELY helpful to have this tagged, up front. To me, an [AIP] is gigantic red flag to warrant extra caution, but not necessarily a smoking gun, and would help “regular” homebuilt projects stand out from the vibecoded ones.

    And [AIT] is just nice to have. Some users don’t want to see any AI in /c/selfhosted, period. Hence AI discussion posts get reported as spam because people interpret it as spam, and this would clarify that nebulous distinction, while giving those users a way to easily filter AI posts out.




  • Failure to provide a disclosure after using the tag would mean removing the post. It could be locked, but I would have to assume the majority of the spam-type postings that happened to make it past the rule 7 criteria are the ones who will not provide the requested disclosure. I think it makes for a good filter this way, but please comment if you think otherwise.

    Sounds reasonable to me!

    I think the major choice is for y’all (the mod team), as enforcing a tagging system is going to increase the moderation workload. Though I guess it would cut back on AI reports, like you said.

    I have no recommendations for an existing bot.


    …You could use an embeddings model for a little extra automation though.

    This is a pre-LLM thing, but basically you could feed a script new untagged posts, use a embeddings model to compare the text of their bodies to a keyword (“AI”?), and spit out a number as a rough “similarity” metric. If it’s above a certain threshold (eg if the post seems AI related), send a message to the moderation team to check it, or maybe even post a rules reminder in the comments.

    And FYI, embeddings models are tiny, so it doesn’t need special resources to run or anything.


  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldSelfhosted & AI
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    5 days ago

    They 100% do. They’re probably serving “naive” FP8 via VLLM, which is worse than you’d think, especially if they flip on the awful FP8 KV cache.


    In a local quant, you can stop quantized models from falling apart at higher CTX by leaving the attention heads at a higher quantization. As an example, with MiMo 2.5, I have all the MoE MLP layers at IQ3_KT, the dense experts at Q6K, but all the attention layers at Q8_0.

    For Qwen 27B, I’m still experimenting, but leaning towards IQ4_KT for the MLPs, Q6K for attention, and Q8_0 for the small, very sensitive KV heads. Or a similar scheme as an exl3 quant.


    That being said, sometimes even unquantized models fall apart in certain long context scenarios because the max advertised context is a lie. You just have to test them and see, but Qwen has certainly done this in the past.



  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldSelfhosted & AI
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    5 days ago

    It’s drops off, but not as much as you’d think.

    MiMo uses 5:1 SWA, so its long-context compute doesn’t increase as catastrophically as older models. That, and most of the “slowness” comes from the MoE layers being on CPU (whereas the attention layers that get heavier at high context are all on the 3090).

    That’s the beauty of these MoEs: they’re just the right size for the “compute-lite” parts to stay in CPU RAM.

    I will measure it tomorrow. It is a constant ~9-10TPS for short queries, but definitely slower near my current max context of 85K.


    And do you mean prompt compaction? I don’t automate that; when I use that particular model, I tend to use it in Mikupad, aka “raw” notepad mode, and manipulate the context directly. This is so I can do things like chop out conversations, pick different tokens from the logprobs, or edit its own replies/thinking and continue mid reply.

    I like manually handling this because, being a local model, prompts are cached. Streaming starts quickly if most of the prompt stays cached, which is actually a really nice advantage over APIs.




  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldSelfhosted & AI
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    5 days ago

    I have a single 3090!

    That’s the dream GPU, these days.

    And I have 128GB CPU RAM. So the best model I can run is MiMo 2.5 (a 300B model) at around 10 tokens/sec, using hybrid CPU inference.

    …But that’s the worst-case scenario, for speed. It’s an IQ3_KT quant (a high quality “trellis” quantization type, but very slow on CPU), with a gigantic model that barely fits in my RAM+VRAM combined, with no DFlash or any kind of speculative decoding turned on. I could tune it to be much faster, but I mostly just want “max quality, fast enough to read as it streams, barely fits in memory” for this model.

    For speed, or prompts with lots of thinking or context (like agenic use), I just run Qwen 3.6 27B now. That would fit in your 3090 no matter how much CPU RAM you have, but you have to be smart about the backend and quantization you pick. If you just use Ollama, it’s gonna tell you it won’t fit, or use some horrible default that spits out garbage.


    …This is what I meant to emphasize.

    It’s not just the hardware. You kinda have to be part developer, part enthusiast to even follow this stuff, it up optimally, and keep it up-to-date. If you just try to Google “best LLM for 3090,” you will get absolute garbage.


  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldSelfhosted & AI
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    5 days ago

    You don’t even need Claude anymore. GLM 5.2 API is good enough for 95% of the same things and vastly cheaper.

    MiMo 2.5 Pro and Kimi are also very good. And then there’s Cerebras API if you just want simple things done quick.

    The thing with self hosting, while awesome, is that it requires a lot of hardware and considerable time investment for what’s essentially a “base tier model,” or at best one step down for what’s still a very cheap API. I still love it, especially the privacy and control aspect, but you aren’t running Claude at home unless you’ve got a threadripper or server hardware collecting dust.

    …Hence I can understand why people don’t pursue it. Especially since a cursory Google search will lead you to trying the Deepseek distillation on Ollama (which is awful).


  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldSelfhosted & AI
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    6 days ago

    Oh, both! Yeah. I didn’t even think of that, but [AIT]/[AIP] as separate tags makes a lot of sense.

    I’d like being able to filter by either, actually.

    I guess two tags runs the risk of “rules too complex for some to follow,” but that’s more of a moderation load question. I have no say in that, heh.


  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldSelfhosted & AI
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    6 days ago

    For what it’s worth, I asked my self-hosted LLM (MiMo 2.5, no network access outside my desktop), and it came with [AIT] (AI-Topic).

    …I think that’s my favorite so far. [AIP] would work too.

    I feel like that “obfuscates” the tag enough to blunt impulse downvotes in /new and feeds, without being deceptive or anything.