jxjnskkzxxhx 8 days ago

I've used Jax quite a bit and it's so much better than tf/pytorch.

Now for the life of me, I still haven't been able to understan what a TPU is. Is it Google's marketing term for a GPU? Or is it something different entirely?

3
mota7 8 days ago

There's basically a difference in philosophy. GPU chips have a bunch of cores, each of which is semi-capable, whereas TPU chips have (effectively) one enormous core.

So GPUs have ~120 small systolic arrays, one per SM (aka, a tensorcore), plus passable off-chip bandwidth (aka 16 lines of PCI).

Where has TPUs have one honking big systolic array, plus large amounts of off-chip bandwidth.

This roughly translates to GPUs being better if you're doing a bunch of different small-ish things in parallel, but TPUs are better if you're doing lots of large matrix multiplies.

317070 8 days ago

Way back when, most of a GPU was for graphics. Google decided to design a completely new chip, which focused on the operations for neural networks (mainly vectorized matmul). This is the TPU.

It's not a GPU, as there is no graphics hardware there anymore. Just memory and very efficient cores, capable of doing massively parallel matmuls on the memory. The instruction set is tiny, basically only capable of doing transformer operations fast.

Today, I'm not sure how much graphics an A100 GPU still can do. But I guess the answer is "too much"?

kcb 8 days ago

Less and less with each generation. The A100 has 160 ROPS, a 5090 has 176, the H100 and GB100 have just 24.

JLO64 8 days ago

TPUs (short for Tensor Processing Units) are Google’s custom AI accelerator hardware which are completely separate from GPUs. I remember that introduced them in 2015ish but I imagine that they’re really starting to pay off with Gemini.

https://en.wikipedia.org/wiki/Tensor_Processing_Unit

jxjnskkzxxhx 8 days ago

Believe it or not, I'm also familiar with Wikipedia. It reads that they're optimized for low precisio high thruput. To me this sounds like a GPU with a specific optimization.

flebron 8 days ago

Perhaps this chapter can help? https://jax-ml.github.io/scaling-book/tpus/

It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (https://openxla.org/xla/operation_semantics), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).

jxjnskkzxxhx 7 days ago

Thank you for the answer! You see, up until now I had never appreciated that a GPU does more than matmuls... And that first reference, what a find :-)

Edit: And btw, another question that I had had before was what's the difference between a tensor core and a GPU, and based on your answer, my speculative answer to that would be that the tensor core is the part inside the GPU that actually does the matmuls.

jibal 8 days ago

You asked a question, people tried to help, and you lashed out at them in a way that makes you look quite bad.

kgwgk 8 days ago

Did you also read just after that "without hardware for rasterisation/texture mapping"? Does that sound like a _G_PU?

crazygringo 8 days ago

I mean yes. But GPU's also have a specific optimization, for graphics. This is a different optimization.