Item 43468350

mirekrusin • 5 days ago

For coding and other integrations people pay per token on api key, not subscription. Claude code costs few $ per task on your code - it gets expensive quite quickly.

tempoponet • 5 days ago

But something comparable to a local hosted model in the 32-70b range costs pennies on the dollar compared to Claude, will be 50x faster than your gpu, and with a much larger context window.

Local hosting on GPU only really makes sense if you're doing many hours of training/inference daily.

1 reply

mirekrusin • 4 days ago

...or working for company which forbids sending IP over wire somewhere.

Also "many hours of inference daily" may mean you're doing your usual stuff daily while running some processing in the background that takes hours/days or you've put together some reactive automation that runs often all the time.

ps. local training rarely makes sense.

ps. 2. not sure where you got 50x slower from; 4090 is actually faster than A100 for example and 5090 is ~75% faster than 4090