pornel 8 days ago

AI will be cheap to run.

The hardware for AI is getting cheaper and more efficient, and the models are getting less wasteful too.

Just a few years ago GPT-3.5 used to be a secret sauce running on the most expensive GPU racks, and now models beating it are available with open weights and run on high end consumer hardware. Few iterations down the line good-enough models will run on average hardware.

When that Xcom game came out, filmmaking, 3D graphics, and machine learning required super expensive hardware out of reach of most people. Now you can find objectively better hardware literally in the trash.

1
cardanome 8 days ago

I wouldn't be so optimistic.

Moore's law is withering away due to physical limitations. Energy prices go up because of the end of fossil fuels and rising climate change costs. Furthermore the global supply chain is under attack by rising geopolitical tension.

Depending on US tariffs and how the Taiwan situation plays out and many other risks, it might be that compute will get MORE expensive in the future.

While there is room for optimization on the generative AI front we are still have not even reached the point were generative AI is actually good at programming. We have promising toys but for real productivity we need orders of magnitude bigger models. Just look how ChatGPT 4.5 is barely economically viable already with its price per token.

Sure if humanity survives long enough to widely employ fusion energy, it might become practical and cheap again but that will be a long and rocky road.

pornel 8 days ago

LLMs on GPUs have a lot of computational inefficiencies and untapped parallelism. GPUs have been designed for more diverse workloads with much smaller working sets. LLM inference is ridiculously DRAM-bound. We currently have 10×-200× too much compute available compared to the DRAM bandwidth required. Even without improvements in transistors we can get more efficient hardware for LLMs.

The way we use LLMs is also primitive and inefficient. RAG is a hack, and in most LLM architectures the RAM cost grows quadratically with the context length, in a workload that is already DRAM-bound, on a hardware that already doesn't have enough RAM.

> Depending on US tariffs […] end of fossil fuels […] global supply chain

It does look pretty bleak for the US.

OTOH China is rolling out more than a gigawatt of renewables a day, has the largest and fastest growing HVDC grid, a dominant position in battery and solar production, and all the supply chains. With the US going back to mercantilism and isolationism, China is going to have Taiwan too.

joshjob42 8 days ago

Costs for a given amount of intelligence as measured by various benchmarks etc has been falling by 4-8x per year for a couple years, largely from smarter models from better training at a given size. I think there's still a decent amount of headroom there, and as others have mentioned dedicated inference chips are likely to be significantly cheaper than running inference on GPUs. I would expect to see Gemini Pro 2.5 levels of capability in models that cost <$1/Mtok by late next year or plausibly sooner.

jamil7 8 days ago

I think there’s a huge amount of inefficiency all the way through the software stack due to decades of cheap energy and rapidly improving hardware. I would expect with hardware and energy constraints that we will need to look for deeper optimisations in software.