Seems that given enough compute everyone can build a near-SOTA LLM. So what is this craze about securing AI dominance?
AI dominance is secured through legal and regulatory means, not technical methods.
So for instance, a basic strategy is to rapidly develop AI and then say “Oh wow AI is very dangerous we need to regulate companies and define laws around scraping data” and then make it very difficult for new players to enter the market. When a moat can’t be created, you resort to ladder kicking.
I believe in china they have been trying to make all data training data
https://www.forbes.com/councils/forbestechcouncil/2024/04/18...
> everyone
Let's not disrespect the team working on Qwen, these folks have shown that they are able to ship models that are better than everybody else's in the open weight category.
But fundamentally yes, OpenAI has no other moat than the ChatGPT trademark at this point.
They have the moat of being able to raise large funding rounds than everybody else: Access to capital.
many of these labs have more funding in theory than OpenAI. FAIR, GDM, Qwen all are subsidiaries of companies with $10s of billions in annual profits.
Do they have more access to capital than the CCP, if the latter decided to put its efforts behind Alibaba on this? Genuine question.
Maybe truth here, but also Microsoft didn't lead their latest round, which isn't a great sign for their moat
But access to capital is highly dependent on how interesting you look to investors.
If you don't manage to create a technological gap when you are better funded than your competitors then your attractivity will start being questioned. They have dilapidated their “best team” asset with internal drama, and now that they see their technological advance being demolished by competitors, I'm not too convinced in their prospect for a new funding round unless they show that they can make money out of the consumer market which is where their branding is an unmatched asset (in which case it's not even clear that investing in being the state of the art model is a good business decision).
> But fundamentally yes, OpenAI has no other moat than the ChatGPT trademark at this point.
That's like saying that CocaCola has no other moat than the CocaCola trademark.
That's an extremely powerful moat to have indeed.
And perhaps exclusive archival content deals from publishers – but that probably works only in an American context.
It just shows that they're unimaginative and good at copying.
What’s wrong with copying?
If they can only copy, which I'm not saying is the case, then their progress would be bounded by whatever the leader in the field is producing.
In much the same way with an LLM, if it can only copy from its training data, then it's bounded by the output of humans themselves.
1) spreading AI dominance FUD is a good way to get government subsidies
2) not exactly everyone with compute can make LLMs, they need data. Conveniently, the U.S. has been supplying infinite tokens to China through Tiktok.
>Conveniently, the U.S. has been supplying infinite tokens to China through Tiktok
How is this not FUD? What competitive advantage is China seeing in LLM training through dancing videos on TikTok?
you get video tokens through those seemingly dumb tiktok shorts
Of all the types of tokens in the world video is not the one that comes to mind as having a shortage.
By setting a a few thousand security cameras in various high traffic places you can get almost infinite footage.
Instagram, Youtube and Snapchat have no shortage of data too.
except 1) tiktok is video stream data many orders of magnitude larger than any security cam data, that's attached to real identity 2) china doesn't have direct access to Instagram reels and shorts, so yeah
Why does tying it to identity help LLM training?
It's pretty unclear that having orders of magnitude more video data of dancing is useful. Diverse data is much useful!