• 2 Posts
  • 43 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle
  • j4k3@lemmy.worldtoSelfhosted@lemmy.worldConsumer GPUs to run LLMs
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    2 hours ago
    Anything under 16 is a no go. Your number of CPU cores are important. Use Oobabooga Textgen for an advanced llama.cpp setup that splits between the CPU and GPU. You'll need at least 64 GB of RAM or be willing to offload layers using the NVME with deepspeed. I can run up to a 72b model with 4 bit quantization in GGUF with a 12700 laptop with a mobile 3080Ti which has 16GB of VRAM (mobile is like that).

    I prefer to run a 8×7b mixture of experts model because only 2 of the 8 are ever running at the same time. I am running that in 4 bit quantized GGUF and it takes 56 GB total to load. Once loaded it is about like a 13b model for speed but is ~90% of the capabilities of a 70b. The streaming speed is faster than my fastest reading pace.

    A 70b model streams at my slowest tenable reading pace.

    Both of these options are exponentially more capable than any of the smaller model sizes even if you screw around with training. Unfortunately, this streaming speed is still pretty slow for most advanced agentic stuff. Maybe if I had 24 to 48gb it would be different, I cannot say. If I was building now, I would be looking at what hardware options have the largest L1 cache, the most cores that include the most advanced AVX instructions. Generally, anything with efficiency cores are removing AVX and because the CPU schedulers in kernels are usually unable to handle this asymmetry consumer junk has poor AVX support. It is quite likely that all the problems Intel has had in recent years has been due to how they tried to block consumer stuff from accessing the advanced P-core instructions that were only blocked in microcode. It requires disabling the e-cores or setting up a CPU set isolation in Linux or BSD distros.

    You need good Linux support even if you run windows. Most good and advanced stuff with AI will be done with WSL if you haven’t ditched doz for whatever reason. Use https://linux-hardware.org/ to see support for devices.

    The reason I mentioned avoid consumer e-cores is because there have been some articles popping up lately about all p-core hardware.

    The main constraint for the CPU is the L2 to L1 cache bus width. Researching this deeply may be beneficial.

    Splitting the load between multiple GPUs may be an option too. As of a year ago, the cheapest option for a 16 GB GPU in a machine was a second hand 12th gen Intel laptop with a 3080Ti by a considerable margin when all of it is added up. It is noisy, gets hot, and I hate it many times, wishing I had gotten a server like setup for AI, but I have something and that is what matters.









  • There are several ways. Whitelists are a PITA, especially for the fediverse. One could argue it is even more valuable on the fediverse as every computer you connect to implies some degree of trust.

    The cheapest method is to setup the firewall on your local machine. The issue is that several applications are capable of bypassing this type of local implementation unless you get really into the weeds and use stuff like SELinux to restrict file permissions based on the file access context in addition to PAM user/groups. Fedora is the only desktop distro I know of that ships with SELinux integrated and running, but it is set to fully permissive by default so it is not actually doing anything unless you setup all of the access contexts.

    If you are ever interested in this rabbit hole and struggling to wrap your head around the complexity, grab an old phone that is compatible with Lineage OS, install, but do not add a root binary. Then start playing around with the underlying Linux system using the USB ADB bridge. Try figuring out the proprietary modules for the SOC because it is fun. Also try making a script, and try adding or changing any files while you are there. Even if you know bash/GNU Linux the busybox like implementation of the minimum Unix like commands is an interesting experience if you have not tried it. If you’re used to commands like compgen, you’ll quickly see you need your own script to replace it or an aliased command to parse the path locations just to see all available commands. You will quickly begin to see how a fully configured SELinux implementation works with user/group permissions if you try to make any kind of persistent script in this Linux implementation. This is how any why Android is secured enough for idiot users to connect all kinds of high risk stuff like financials, that Linux system is very locked down unless you know and understand a CVE related to the specific orphan kernel.

    The thing that sucks is that this area is generally a difficult thing to wrap your head around. I have not found a single FOSS source that makes a whitelist firewall easy. There are a couple of options in the OpenWRT add-on packages, but these are very limited and still not easy to deal with. The last one I tried only allowed like 500 entries and it was slow to parse the list, and buggy to edit.

    The better option is to run a firewall on the router because your devices can’t effectively bypass it in stuff like docker or podman.

    For me, I like pcWRT. It is an Asian guy in Texas selling routers already configured with OpenWRT and his own replacement front end that makes pretty much all advanced OpenWRT features easy. He also automatically updates and maintains the device. It seemed sketchy to me at first but it has actually worked out well for me. You need to keep occasional backups though, especially with a whitelist as major updates have wiped my setup a couple of times.

    I also modify all of my routers with a CH340 or FT232 USB to UART chip module with a little hole for access from the outside and module hot glued in place. This gives bootloader level access to the chip and kernel logs. Most of what pcWRT is running is done in scripts that can be audited via this connection. There is one binary file, but skimming it in Vi, I did not see any web address strings, but there are strings present. So it is not encrypted. I certainly could have missed something, though I did compare the same router file system running the same version of OpenWRT side by side to compare them.

    The only nuance in terms of a firewall in pcWRT is to create multiple profiles and make the default profile block everything. With pcWRT, you only need to enable the whitelist and add each address with the port number you want to connect to. The interface has full PiHole like functionality which might be another option. Also the hardware is reasonably priced at like $100-$150 last time I checked.


  • I block it and Google static, fonts, captcha, everything. If anyone relies on JavaScript, that is the only thing that is either annoying or nonfunctional. I will never enable any site that is generalized and nonspecific. Very little of the total internet actually breaks from blocking all of this nonsense, stalkerware, and ambiguity. I have multiple reasons for blocking like this. Primarily, my bad scripts and code cannot escape to nmap the internet, and I will never have a sketchy download of a PDF datasheet for vintage hardware dial out again. Your firewall is like the front door of your digital home. You can live with none if you choose. I like to know who is in my house.




  • It is dogma mixed with the unfamiliar. It is very similar to how people that do not GAF about me and my physical disability like to say I’m fine and just lazy or it is in my head. That is the most destructive bullshit anyone can say to me, but I have learned to translate it as them saying “I am a worthless homicidal fuck that wants to kill you,” and that seems to help keep things in perspective. There are a whole lot of stupid people in the world. Average is a very low bar and half of the world is below it.

    “Never argue with stupid people. They bring you down to their level, and beat you with experience.”

    Best part of that quote is when they agree and take it as a complement.



  • I blocked NSQ bc of an active bot as a mod.

    Lemmy in general does not handle conceptual abstractions well at all. I think it is great to question the seemingly obvious subjects, and to poll user depth and intelligence regularly. I hate getting blindsided by someone asking stupid questions like this in real life and having to take the time to think out which of many angles I would like to address the issue from. I find it useful and healthy to see how others address such a question and how people respond to the various approaches. This is fundamental to the intuitive usefulness of NSQ and when that utility is hampered it effectively renders the community useless.

    I rather ineffectively volunteered to take over the community myself when I encountered poor moderation from a bot with no accountable individual to address. Instead I block the community and consider it an embarrassment to exist.