Item 42261021

ipsum2 • 6 hours ago

That only works for inference, not training.

willy_k • 4 hours ago

Why so?

miki123211 • 3 hours ago

Because training usually requires bigger batches, doing a backward pass instead of just the forward pass, storing optimizer states in memory etc. This means it takes a lot more RAM than inference, so much more that you can't run it on a single GPU.

If you're training on more than one GPU, the speed at which you can exchange data between them suddenly becomes your bottleneck. To alleviate that problem, you need extremely fast, direct GPU-to-GPU "interconnect", something like NV Link for example, and consumer GPUs don't provide that.

Even if you could train on a single GPU, you probably wouldn't want to, because of the sheer amount of time that would take.

1 reply

elashri • 2 hours ago

But does this prevent usage of cluster or consumer GPUs to be used in training? Or does it just make it slower and less efficient?

Those are real questions and not argumentative questions.