From a data center perspective:
The article says this is about "sustained output token generation". For sustained usage, power is a huge factor for "real world performance". H100 has a peak power draw of 700W, while each of the RTX5090 has a peak power draw of 575W, for a total of 1150W.
According to the article it is 78 tokens per second for H100, and 80 tokens per second for the dual RTX5090. So you go up 400W of power in exchange for only two extra tokens per second.
Long story short there is a reason why data centers aren't using dual RTX5090 over H100. For sustained usage, you will pay for it in electricity, and in extra infrastructure to support that increased electricity draw, also extra heat generation and cooling.
Might make sense for a local personal hobby setup though.
> Might make sense for
Individual or small team use, personal or professional, when 32GB VRAM (or 32+32) is sufficient, at the cost of 3.5k$ (or 3.5+3.5) instead of 25k$.
The RTX 5090 has a VRAM cost of 110$/GB; the H100 of 310$/GB.
(And even professional use in a small team will probably not have the card run at full throttle and peak consumption all day, outside NN training projects.)
I would be surprised if 5090 uses more power per flop. Peak power isn't always representative. After all almost entirety of the power is used in matrix multiplication, and that depends on mostly the process and architecture version and 5090 is ahead in both.