Item 44002285

riffraff • 1 day ago

But doesn't this mean they have twice the costs in training? I was under the impression that was still the most expensive part of these companies' balance.

kcorbitt • 1 day ago

It's very unlikely that they're doing their own pre-training, which is the longest and most expensive part of creating a frontier model (if they were, they'd likely brag about it).

Most likely they built this as a post-train of an open model that is already strong on coding like Qwen 2.5.

rfoo • 21 hours ago

mid/post training does not cost that much, except maybe large scale RL, but even this is more of an infra problem. If anything, the cost is mostly in running various experiments (i.e. the process of doing research).

It is very puzzling why "wrapper" companies don't (and religiously say they won't ever) do something on this front. The only barrier is talents.

1 reply

anshumankmr • 21 hours ago

You might be underestimating the barrier to hiring the really smart people. Open AI/Google etc would be hiring and poaching people like crazy, offering cushy bonuses and TCs that would make blow your mind.(Like say Noam Brown at Open AI) And some of the more ambitious ones would start their own ventures (like say Ilya etc.).

That being said I am sure a lot of the so called wrapper companies are paying insanely well too, but competing with FAANGMULA might be trickier for them.

2 replies

whywhywhywhy • 20 hours ago

Any half decent and methodical software engineer can fine tune/repurpose a model if you have the data and the money to burn on compute and experiment runs, which they do.

2 replies

anshumankmr • 18 hours ago

Fine tuning/distilling etc is fine. I was speaking to the original commenter's question about research, which is where things are trickier. Fine tuning is something I even managed and Unsloth has removed even barriers for training some of the more commonly used open source models.

brookst • 18 hours ago

They can absolutely do it, but they will get poorer results than someone who really understands LLMs. There is still a huge amount of taste and art in the sourcing and curation of data for fine tuning.

NitpickLawyer • 20 hours ago

FAANGMULA ... Microsoft, Uber?, L??, Anthropic? Who's the L?

2 replies

riffraff • 15 hours ago

A is Airbnb, afair.

Archonical • 20 hours ago

Lyft.