Believe it or not, I'm also familiar with Wikipedia. It reads that they're optimized for low precisio high thruput. To me this sounds like a GPU with a specific optimization.
Perhaps this chapter can help? https://jax-ml.github.io/scaling-book/tpus/
It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (https://openxla.org/xla/operation_semantics), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).
Thank you for the answer! You see, up until now I had never appreciated that a GPU does more than matmuls... And that first reference, what a find :-)
Edit: And btw, another question that I had had before was what's the difference between a tensor core and a GPU, and based on your answer, my speculative answer to that would be that the tensor core is the part inside the GPU that actually does the matmuls.
You asked a question, people tried to help, and you lashed out at them in a way that makes you look quite bad.
Did you also read just after that "without hardware for rasterisation/texture mapping"? Does that sound like a _G_PU?
I mean yes. But GPU's also have a specific optimization, for graphics. This is a different optimization.