That's $2000 but for just 3.5-4.25 tokens/s? I'm hesitant to say that 4 tokens/s is useless, but that is a tremendous downgrade (although perhaps some smaller model would be usable)
right, but that is CPU only, there's no "tensor cores" in a GPU getting lit up for that 4t/s. So minimum to actually run deepseek is $2000, and the max is, well it's basically whatever you can afford, based on your needs. if you're only running single prompts at any given time, you only need the number of GPUs that will fit the model plus the context (as i mentioned), at minimum your outlay is going to be on the order of $130,000 in just GPUs.
If i can find it later, as i couldn't find it last night when i replied, there is an article that explains how to start adding consumer GPUs or even 1-2 Nvidia A100 80GB GPUs to the epyc build, to speed that up. I have a vague recollection that can get you up to 20t/s or thereabouts, but don't quote me on that, it's been a while.