You don't need 16-bit quantization. The difference in accuracy from 8-bit in most models is less than 5%.
Even 4-bit is fine.
To be more precise, it's not that there's no decrease in quality, it's that with the RAM savings you can fit a much better model. E.g. with LLaMA, if you start with 70b and increasingly quantize, you'll still get considerably better performance at 3 bit than LLaMA 33b running at 8bit.
True. The only problem with lower quantization though is that the model fails to understand long prompts.