• 0 Posts
  • 7 Comments
Joined 1 year ago
cake
Cake day: July 30th, 2023

help-circle
  • If you’re trying to do VDI in the cloud, that can get expensive fast on account of the GPU processing needed

    Most of the protocols I know of the run CPU-only (and I’m perfectly happy to be proven wrong and introduced to something new) tend to fray at high latency or high resolution. The usual top two I’ve seen are VNC and RDP (XRDP project on Linux), with NoMachine and plain x11 over SSH being right behind that. I think NoMachine had the best performance of those three, but it’s been a hot minute since I’ve personally used it. XRDP is the one I’ve used the most often, but getting login/lock/unlock working was fiddly at first but seems to be stable holding.

    Jumping from the “basic connection, maybe barely but not always suitable for video” to “ultra high grade high speed”, we have Parsec and Sunshine+Moonlight. Parsec is currently limited to only Windows/Mac hosting (with Linux client available), and both Parsec and Sunshine require or recommend a reasonable GPU to handle the encoding stage (although I believe Sunshine may support an X264 encoder which may exert a heavy CPU tax depending on your resolution). The specific problem of sourcing a GPU in the cloud (since you mention EC2) becomes the expensive part. This class of remote access tends to fray at high resolution and frame rate less because it’s designed to transport video and games, rather than taking shortcuts to get a minimum desktop visible.





  • I think you’re asking too much from ZFS. Ceph, Gluster, or some other form of cluster native filesystem (GFS, OCFS, Lustre, etc) would handle all of the replication/writes atomically in the background instead of having replication run as a post processor on top of an existing storage solution.

    You specifically mention a gap window - that gap window is not a bug, it’s a feature of using a replication timer, even if it’s based on an atomic snapshot. The only way to get around that gap is to use different tech. In this case, all of those above options have the ability to replicate data whenever the VM/CT makes a file I/O - and the workload won’t get a write acknowledgement until the replication has completed successfully. As far as the workload is concerned, the write just takes a few extra milliseconds compared to pure local storage (which many workloads don’t actually care about)

    I’ve personally been working on a project to convert my lab from ESXi vSAN to PVE+Ceph, and conversions like that (even a simpler one like PVE+ZFS to PVE+Ceph would require the target disk to be wiped at some point in the process.

    You could try temporarily storing your data on an external hard drive via USB, or if you can get your workloads into a quiet state or maintenance window, you could use the replication you already have and rebuild the disk (but not the PVE OS itself) one node at a time, and restore/migrate the workload to the new Ceph target as it’s completed.

    On paper, (I have not yet personally tested this), you could even take it a step farther: for all of your VMs that connect to the NFS share for their data, you could replace that NFS container (a single point of failure) with the cluster storage engine itself. There’s not a rule I know of that says you can’t. That way, your VM data is directly written to the engine at a lower latency than VM -> NFS -> ZFS/Ceph/etc



  • My server rack has

    • 3x Dell R730
    • 1x Dell R720
    • 2x Cisco Catalyst 3750x (IP Routing license)
    • 2x Netgear M4300-12x12f
    • 1x Unifi USW-48-Pro
    • 1x USW-Agg
    • 3x Framework 11th Gen (future cluster)
    • 1x Protectli FE4B

    All together that draws… 0.1 kWh… in 0.327s.

    In real time terms, measured at the UPS, I have a running stable state load of 900-1100w depending on what I have at load. I call it my computationally efficient space heater because it generates more heat than is required for my apartment in winter except for the coldest of days. It has a dedicated 120v 15A circuit