Item 43466518

zamadatix • 6 days ago

For others not as familiar, this is pointing out DeepSeek-v3/DeepSeek-R1 are natively FP8 so selecting "Q8_0" aligns with not selecting quantization for that model (though you'll need ~1 TB of memory to use these model unquantized at full context). Importantly, this does not apply to the "DeepSeek" distills of other models, which retain natively being the same as the base model they distill.

I expect more and more worthwhile models to natively have <16 bit weights as time goes on but for the moment it's pretty much "8 bit DeepSeek and some research/testing models of various parameter width".

azinman2 • 5 days ago

I wish deepseek distills were somehow branded differently. The amount of confusion I’ve come across from otherwise technical folk, or simply mislabeling (I’m running r1 on my MacBook!) is shocking. It’s my new pet peeve.