• 1 Post
  • 4 Comments
Joined 2 years ago
cake
Cake day: June 15th, 2023

help-circle
  • Great answer!!

    After thinking about all this for a while, I’ve gone with the basic binary tree (leaning towards AVL tree as I expect my use case to be read heavy).

    In my use case, multiple ‘intervals’ can merge together without major penalty (and should be merged together). It looks like a lot of these interval trees (including ph trees) are best when the intervals need to be kept separate.

    There is a part of my algorithm where ph trees might be useful though. I’ll have to give it some though.


    I’m kind of shocked that a basic binary tree ended up being so usable. Its a classic for a reason, lol. I guess I saw the intervals and got confused and overcomplicated things…


  • And typical RAM speeds are 100GB/second for CPUs and 500GB/second on GPUs, meaning 512MB operations are literally on the order of 5 miliseconds for CPU and 1ms on GPU.

    Below certain sizes, the ‘billions of intervals’ is larger than the damn Bitmask. Seriously, 8 bytes per interval (aka one pointer and 0 data) and that’s 8GB for the data structure.

    Instead of a billion 32-bit intervals to store (4GB of RAM at the minimum) it’s obviously a better move to store 500-million byte Bitmasks. And modern GPUs can crush that in parallel with like 3 lines of CUDA anyway.


  • Because CUDA and ROCm/HIP are far easier to program.

    The Khronos competitor to CUDA/ROCm is SYCL not OpenCL.


    SYCL vs these other options is a fun theoretical problem, but only Intel seems to be pushing SYCL at all. OpenCL got stuck in OCL1.2 (the 2.0 release was dead. 3.0+ OpenCL ignores OCL2.0 but it’s too late, OpenCL is seen as a dead end tech these days).

    The biggest issue is that OpenCL is a different language, while CUDA/HIP/SYCL are ‘just’ C++ extensions. This means that if you ever shared data between CPU and GPU in OpenCL (or DirectX or Vulkan for that matter), you have to carefully write and rewrite structs{} to line up between the two languages.

    Meanwhile, CUDA/HIP support passing structs, classes and more between CPU and GPU (subject to conditions of course. GPUs can’t do function pointers or vtables for example, but cpu-only classes can have vtables)