Datastructure for intervals across 64-bits or 128-bits??

dragontamer@lemmy.world · edit-2 3 months ago

Great answer!!

After thinking about all this for a while, I’ve gone with the basic binary tree (leaning towards AVL tree as I expect my use case to be read heavy).

In my use case, multiple ‘intervals’ can merge together without major penalty (and should be merged together). It looks like a lot of these interval trees (including ph trees) are best when the intervals need to be kept separate.

There is a part of my algorithm where ph trees might be useful though. I’ll have to give it some though.

I’m kind of shocked that a basic binary tree ended up being so usable. Its a classic for a reason, lol. I guess I saw the intervals and got confused and overcomplicated things…

dragontamer@lemmy.world · edit-2 3 months ago

And typical RAM speeds are 100GB/second for CPUs and 500GB/second on GPUs, meaning 512MB operations are literally on the order of 5 miliseconds for CPU and 1ms on GPU.

Below certain sizes, the ‘billions of intervals’ is larger than the damn Bitmask. Seriously, 8 bytes per interval (aka one pointer and 0 data) and that’s 8GB for the data structure.

Instead of a billion 32-bit intervals to store (4GB of RAM at the minimum) it’s obviously a better move to store 500-million byte Bitmasks. And modern GPUs can crush that in parallel with like 3 lines of CUDA anyway.

dragontamer@lemmy.world · 3 months ago

Because CUDA and ROCm/HIP are far easier to program.

The Khronos competitor to CUDA/ROCm is SYCL not OpenCL.

SYCL vs these other options is a fun theoretical problem, but only Intel seems to be pushing SYCL at all. OpenCL got stuck in OCL1.2 (the 2.0 release was dead. 3.0+ OpenCL ignores OCL2.0 but it’s too late, OpenCL is seen as a dead end tech these days).

The biggest issue is that OpenCL is a different language, while CUDA/HIP/SYCL are ‘just’ C++ extensions. This means that if you ever shared data between CPU and GPU in OpenCL (or DirectX or Vulkan for that matter), you have to carefully write and rewrite structs{} to line up between the two languages.

Meanwhile, CUDA/HIP support passing structs, classes and more between CPU and GPU (subject to conditions of course. GPUs can’t do function pointers or vtables for example, but cpu-only classes can have vtables)

dragontamer@lemmy.world · 3 months ago

Datastructure for intervals across 64-bits or 128-bits??

dragontamer@lemmy.world · edit-2 9 months ago

Because Threads and BlueSky form effective competition with Twitter.

Also, short form content with just a few sentences per post sucks. It’s become obvious. That Twitter was mostly algorithm hype and FOMO.

Mastodon tries to be healthier but I’m not convinced that microblogs in general are that useful, especially to a techie audience who knows RSS and other publishing formats.

dragontamer@lemmy.world · 1 year ago

Ummm.

That might be the wrong (right??) Anime to be gushing over.

dragontamer@lemmy.world · edit-2 2 years ago

That’s not what storage engineers mean when they say “bitrot”.

“Bitrot”, in the scope of ZFS and BTFS means the situation where a hard-drive’s “0” gets randomly flipped to “1” (or vice versa) during storage. It is a well known problem and can happen within “months”. Especially as a 20-TB drive these days is a collection of 160 Trillion bits, there’s a high chance that at least some of those bits malfunction over a period of ~double-digit months.

Each problem has a solution. In this case, Bitrot is “solved” by the above procedure because:

Bitrot usually doesn’t happen within single-digit months. So ~6 month regular scrubs nearly guarantees that any bitrot problems you find will be limited in scope, just a few bits at the most.
Filesystems like ZFS or BTFS, are designed to handle many many bits of bitrot safely.
Scrubbing is a process where you read, and if necessary restore, any files where bitrot has been detected.

Of course, if hard drives are of noticeably worse quality than expected (ex: if you do have a large number of failures in a shorter time frame), or if you’re not using the right filesystem, or if you go too long between your checks (ex: taking 25 months to scrub for bitrot instead of just 6 months), then you might lose data. But we can only plan for the “expected” kinds of bitrot. The kinds that happen within 25 months, or 50 months, or so.

If you’ve gotten screwed by a hard drive (or SSD) that bitrots away in like 5 days or something awful (maybe someone dropped the hard drive and the head scratched a ton of the data away), then there’s nothing you can really do about that.

dragontamer@lemmy.world · edit-2 2 years ago

Wait, what’s wrong with issuing “ZFS Scan” every 3 to 6 months or so? If it detects bitrot, it immediately fixes it. As long as the bitrot wasn’t too much, most of your data should be fixed. EDIT: I’m a dumb-dumb. The term was “ZFS scrub”, not scan.

If you’re playing with multiple computers, “choosing” one to be a NAS and being extremely careful with its data that its storing makes sense. Regularly scanning all files and attempting repairs (which is just a few clicks with most NAS software) is incredibly easy, and probably could be automated.