simonw 6 days ago

Big day for open source Chinese model releases - DeepSeek-v3-0324 came out today too, an updated version of DeepSeek v3 now under an MIT license (previously it was a custom DeepSeek license). https://simonwillison.net/2025/Mar/24/deepseek/

5
echelon 6 days ago

Pretty soon I won't be using any American models. It'll be a 100% Chinese open source stack.

The foundation model companies are screwed. Only shovel makers (Nvidia, infra companies) and product companies are going to win.

jsheard 6 days ago

I still don't get where the money for new open source models is going to come from once setting investor dollars on fire is no longer a viable business model. Does anyone seriously expect companies to keep buying and running thousands of ungodly expensive GPUs, plus whatever they spend on human workers to do labelling/tuning, and then giving away the spoils for free, forever?

Imustaskforhelp 6 days ago

I think it's market leadership which is just free word of mouth advertising which can then lead to consulting business or maybe they can cheek in some ads in llm directly oh boy you don't know.

Also I have seen that once a open source llm is released to public, though you can access it on any website hosting it, most people would still prefer it to be the one which created the model.

Deepseek released its revenue models and it's crazy good.

And no they didn't have full racks of h100.

Also one more thing. Open source has always had an issue of funding.

Also they are not completely open source, they are just open weights, yes you can fine tune them but from my limited knowledge, there is some limitations of fine tuning so owning that training data proprietary also helps fund my previous idea of consulting other ai.

Yes it's not a much profitable venture,imo it's just a decently profitable venture, but the current hype around ai is making it lucrative for companies.

Also I think this might be a winner takes all market which increases competition but in a healthy way.

What deepseek did with releasing the open source model and then going out of their way to release some other open source projects which themselves could've been worth a few millions (bycloud said it), helps innovate ai in general.

TeMPOraL 5 days ago

Winner-takes-all markers are never healthy IMO - it's hardly a market when the winner took all.

What I love about "open" models in general and Deepseek in particular, is how they undermine that market. Deepseek drops especially were fun to watch, they were like last minute plot twists, like dropping some antibiotic into a perti dish filled with bacteria. Sorry, try again with a better moat.

"Open" models are in fact the very thing enabling having a functioning market in this space.

pants2 6 days ago

There are lots of open-source projects that took many millions of dollars to create. Kubernetes, React, Postgres, Chromium, etc. etc.

This has clearly been part of a viable business model for a long time. Why should LLM models be any different?

wruza 6 days ago

So funny to see React among these projects. Tells a story about “frontend” on its own.

mitthrowaway2 6 days ago

Maybe from NVIDIA? "Commoditize your product's complement".

https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

akra 6 days ago

This is the reason IMO. Fundamentally China right now is better at manufacturing (e.g. robotics). AI is the complement to this - AI increases the demand for tech manufactured goods. Whereas America is in the opposite position w.r.t which side is their advantage (i.e. the software). AI for China is an enabler into a potentially bigger market which is robots/manufacturing/etc.

Commoditizing the AI/intelligence part means that the main advantage isn't the bits - its the atoms. Physical dexterity, social skills and manufacturing skills will gain more of a comparative advantage vs intelligence work in the future as a result - AI makes the old economy new again in the long term. It also lowers the value of AI investments in that they no longer can command first mover/monopoly like pricing for what is a very large capex cost undermining US investment in what is their advantage. As long as it is strategic, it doesn't necessarily need to be economic on its own.

notarealllama 6 days ago

A well-rounded take in an age and medium of reactionary hot takes!

While theres some synchronistic effects... I think the physical manufacturing and logistics base is harder to develop than deploying a new model, and will be the hard leading edge. (That's why the US seems to be hellbent on destroying international trade to try and build a domestic market.)

WiSaGaN 5 days ago

This may make sense if there is a centralized force to dictate how much these Chinese foundational model companies charge for their models. I know in the west people just blanketly believes that the state controls everything in China. However it can't be further from the truth. Most of the Chinese foundational model companies like moonshot, 01.ai, minimax, etc used to try to make money on those models. The VC money raised by those companies are in them to make money, not to voluntarily advance state competativeness. Deepseek is just an outlier backed by a billionaire. This billionaire has long been given money to various charities by hundered of millions per year before deepseek. Open-source SOTA models are not out-of-character move for him given his track record.

The thing is, model is in effect a piece of software that has almost 0 marginal cost. You just need a few, maybe even one company to release SOTA models consistently to really crash the valuation of every model companies because every one can acquire that single piece of software without cost to leave other model companies by themselves. The foundational model scene is basically in an extremely unstable state readily to return to a stable state of the model cost goes to 0. You really don't need the state competition assumption to explain the current state of affairs.

akra 4 days ago

I'm not saying there is a centralised force - I didn't say the government per se. Its enough to say many of the models coming out of China - the AI portion isn't their main income source especially for the major models that people are hyping up (Qwen, DeepSeek, etc). This model (Qwen) from Alibaba is a side model more likely complimenting their main business and cloud offerings. DeepSeek started as a way to use AI for trading models firstly; then spun up on the side. I'm more speaking about China's general position - for them AI seems to be more of a compliment than the main business as compared say to the major AI labs in America (ex Google). My opinion is that robotics in particular just extends that going forward.

Given as you say the long term cost of AI models is marginally zero, I don't think this is a bad position to be in.

zamadatix 6 days ago

Once setting investment dollars on fire is no longer viable it'll probably be because scaling died anyways so what's the rush to have a dozen new frontier models each year.

pizzly 6 days ago

One possibility. Certain countries will always be able to produce open models cheaper than others. USA and Europe probably won't be able. However, due to national security and wanting to promote their models overseas instead of letting their competitors promote theirs, the governments of USA and Europe will subsidize models which will lead their competitors to (further?) subsidies. There is a promotional aspect as well, just like Hollywood governments will use their open source models to promote their ideology.

energyrace 6 days ago

What's your take on why certain countries will have it cheaper and subsidies being at the forefront? An energy driven race to the bottom, is perhaps what you mean? I would suppose I have been seeing that China is ahead on their Renewables plan compared to the rest of the world, and they still have the lead on coal energy, so they'd likely be the winners on that front. But did you actually mean something else?

pizzly 5 days ago

Energy is definitely a major factor but other factors too. Cheaper infrastructure (data centers), cheaper components including GPUs (once that is cracked) and cheaper data collection (web scraping, surveillance infrastructure, etc). Any novel idea that improves the architectures of models in the future will inadvertently get leaked quickly and then all these other factors come into play. Countries that cannot make models this cheap will subsidize models for national security reasons and promoting their country's interest reasons.

pzo 5 days ago

The problem with china is, they will have to figure out latency. Right now DeepSeek models hosted in china are having very high latency. It could because of DDoS and not strong enough infrastructure but probably also because of Great Firewall, runtime censoring prompt and servers physical location (big ping to US and EU countries).

bigfudge 5 days ago

Surely ping time is basically irrelevant dealing with LLMs? It has to be dwarfed by inference time.

rfoo 5 days ago

> Right now DeepSeek models hosted in china are having very high latency.

If you are talking about DeepSeek's own hosted API service. It's because they deliberately decided to run the service in heavily overloaded conditions and have very aggressive batching policy to extract more out of their (limited) H800s.

Yes, for some reason (the reason I heard is "our boss don't want to run such a business" which sounds absurd but /shrug) they refuse to scale up serving their own models.

tw1984 5 days ago

> the reason I heard is "our boss don't want to run such a business" which sounds absurd

Liang gave up the No.1 Chinese hedge fund position to create AGI, he has very good chance to short the entire US share market and pocket some stupid amount of $ when R2 is released, he has pretty much unlimited support from local and central Chinese government. Trying to make some pennies from hosting models is not going to sustain what he enjoys now.

rfoo 4 days ago

tbh the "short the stock market" story is pretty silly, it wasn't predictable at all. but yeah, the guy got to do whatever he want to do now.

finnjohnsen2 6 days ago

ads again. somehow. its like a law of nature.

api 6 days ago

If nationalist propaganda counts as ads, that might already be supporting Chinese models. Ask them about Tiananmen Square.

Any kind of media with zero or near zero copying/distribution costs becomes a deflationary race to the bottom. Someone will eventually release something that's free, and at that point nothing can compete with free unless it's some kind of very specialized offering. Then you run into a the problem the OP described: how do you fund free? Answer: ads. Now the customer is the advertiser, not the user/consumer, which is why most media converges on trash.

Imustaskforhelp 6 days ago

These ads can also have ads blockers though.

Perplexity released the deepseek r1 1331? ( I am not sure I forgot) It basically removes chinese censorships / yes you can ask it about the tiananmen square.

I think the next iteration of these ai model ads would be sneaky which might be hard to remove

Though it's funny you comment about chinese censorship yet american censorship is fine lol

Zambyte 6 days ago

There are lots of "alliterated" versions of models too, which is where people will essentially remove the models ability to reject responding to a prompt. The huihui r1 14b alliterated had some trouble telling me about tiananmen square, basically dodging the question by telling me about itself, but after some coaxing I was able to get the info out of it.

I say this because I think that the Perplexity model is tuned on additional information, whereas the alliterated models only include information trained into the underlying model, which is interesting to see.

bigfudge 5 days ago

Abliterated? Alliterated LLMs might be fun though…

Zambyte 4 days ago

Oops, yeah I don't know how that got autocorrected three times without my noticing. Abliterated.

eMPee584 6 days ago

XAI to the rescue!!1!

... (no, not the unintelligible one - the xplainable one)

otabdeveloper4 5 days ago

Big business and state actors don't want AI to be weaponized as economic terrorism. (Economic terrorism aka "we'll replace all your workers and infra with our subscription" is OpenAI's entire sales pitch.)

So for them this is a case of insurance and hedging risks, not profit making.

theptip 6 days ago

Yeah, this is the obvious objection to the doom. Someone has to pay to train the model that all the small ones distill from.

Companies will have to detect and police distilling if they want to keep their moat. Maybe you have to have an enterprise agreement (and arms control waiver) to get GPT-6-large API access.

colechristensen 6 days ago

I think the only people who will ever make money are the shovel makers, the models will always be free because you’ll just get open source models chasing the paid ones and never being all that far behind, especially when this S curve growth phase slows down.

lumost 5 days ago

Product, and infra companies may continue to open these models by virtue that they need to continue improving their product. Omni chat app is a great product.

natch 6 days ago

Many sources, Chinese government could be one.

ada1981 6 days ago

Money from the Chinese defense budget?

Everyone using these models undercuts US companies.

Eventually China wins.

elicksaur 6 days ago

Shoot, didn’t know downloading Llama and running it locally was helping China because I’m not paying Sam Altman money.

Can I send him my bank account info directly? I need to help the cause.

otabdeveloper4 5 days ago

> Can I send him my bank account info directly?

You can. Ask your friendly local IRS.

Imustaskforhelp 6 days ago

And wez the end user get open source models.

Also china doesn't have access to that many gpus because of the chips act.

And i hate it , i hate it when america sounds more communist than china who open sources their stuff because free markets.

I actually think that more countries need to invest into AI and not companies wanting profit.

This could be the decision that can impact the next century.

greenavocado 6 days ago

If only you knew how many terawatt hours were burned on biasing models to prevent them from becoming racist

Imustaskforhelp 6 days ago

To be honest, maybe I am going off topic but I wish for the level of innovation in the ai industry in the energy industry.

It feels as an outsider that very little progress is made on the energy issue. I genuinely think that ai can be accelerated so so much more if energy could be more cheap / green

wenyuanyu 5 days ago

The cycle from idea to product is a bit too long and too costly to materialize in energy sector. And that decides the speed of innovation.

bee_rider 6 days ago

China has allowed quite a bit of market liberalism, so it isn’t that surprising if their AI stuff is responding to the market.

But, I don’t really see the connection on the flip side. Why should proprietary AI be associated with communism? If anything I guess a communist handling of AI would also be to share the model.

Imustaskforhelp 5 days ago

My reasoning for proprietary AI to be associated with communism is that they aren't competing in a free market way where everyone does one thing and do its best. They are simultaneously trying to do all things internally.

For example , Chatgpt etc. self hosts them on their own gpu and they can generate 10tk/s or something.

Now there exists groq , cerebras who can do token generation of 4000 tk/s but they kind of require a open source model.

So that is why I feel its not really abiding by the true capitalist philosophy

dragonwriter 5 days ago

> My reasoning for proprietary AI to be associated with communism is that they aren't competing in a free market way where everyone does one thing and do its best.

That seems based on a very weird idea of what capitalism and communism are; idealized free markets have very little to do with the real-world economic system for which the name “capitalism” was coined, and dis-integration where “everyone does one thing” has little to do with either capitalism or free markets, though it might be a convenient assumption for 101-level discussions of market competition where you want to avoid dealing with real-world issues like partially-overlapping markets and imperfect substitutes to assume every good exists in an isolated market of goods which compete only and exactly with the other groups in that same market in a simple way.

bee_rider 5 days ago

It seems to me like they are acting like true capitalists; they seem very happy with the idea that capital (rather than labor) gives them the right to profit. But, they don’t seem to be too attached to free-market-ism.

Imustaskforhelp 5 days ago

I mean how is a free and open source model not a free market schism atleast in the ai world.

refulgentis 6 days ago

I've been waiting since November for 1, just 1*, model other than Claude than can reliably do agentic tool call loops. As long as the Chinese open models are chasing reasoning and benchmark maxxing vs. mid-2024 US private models, I'm very comfortable with somewhat ignoring these models.

(this isn't idle prognostication hinging on my personal hobby horse. I got skin in the game, I'm virtually certain I have the only AI client that is able to reliably do tool calls with open models in an agentic setting. llama.cpp got a massive contribution to make this happen and the big boys who bother, like ollama, are still using a dated json-schema-forcing method that doesn't comport with recent local model releases that can do tool calls. IMHO we're comfortably past a point where products using these models can afford to focus on conversational chatbots, thats cute but a commodity to give away per standard 2010s SV thinking)

* OpenAI's can but are a little less...grounded?...situated? i.e. it can't handle "read this file and edit it to do $X". Same-ish for Gemini, though, sometimes I feel like the only person in the world who actually waits for the experimental models to go GA, as per letter of the law, I shouldn't deploy them until then

anon373839 5 days ago

A but of a tangent, but what’re your thoughts on code agents compared to the standard “blobs of JSON” approach? I haven’t tried it myself, but it does seem like it would be a better fit for existing LLMs’ capabilities.

cess11 5 days ago

You mean like https://manusai.ai/ is supposed to function?

refulgentis 5 days ago

Yes, exactly, and no trivially: Manus is Sonnet with tools

cess11 4 days ago

Right. Apparently they also claim it's more than that:

https://xcancel.com/peakji/status/1898997311646437487

refulgentis 3 days ago

No, they don't, that's just a bunch of other stuff (ex. Something something we don't differ from academic papers on agents (???))

throwawaymaths 6 days ago

is there some reason you cant train a 1b model to just do agentic stuff?

anon373839 6 days ago

The Berkeley Function Calling Leaderboard [1] might be of interest to you. As of now, it looks like Hammer2.1-3b is the strongest model under 7 billion parameters. Its overall score is ~82% of GPT-4o's. There is also Hammer2.1-1.5b at 1.5 billion parameters that is ~76% of GPT-4o.

[1] https://gorilla.cs.berkeley.edu/leaderboard.html

refulgentis 6 days ago

Worth noting:

- That'll be 1 turn scores: at multiturn, 4o is 3x as good as the 3b

- BFCL is generally turn natural language into an API call, then multiturn will involve making another API call.

- I hope to inspire work towards an open model that can eat the paid models sooner rather than later

- trained quite specifically on an agent loop with tools read_files and edit_file (you'll also probably do at least read_directory and get_shared_directories, search_filenames and search_files_text are good too), bonus points for cli_command

- IMHO, this is much lower hanging-fruit than ex. training an open computer-vision model, so I beseech thee, intrepid ML-understander, to fill this gap and hear your name resound throughout the age

refulgentis 6 days ago

They're real squished for space, more than I expected :/ good illustration here, Qwen2.5-1.5B trained to reason, i.e. the name it is released under is "DeepSeek R1 1.5B". https://imgur.com/a/F3w5ymp 1st prompt was "What is 1048576^0.05", it answered, then I said "Hi", then...well...

Fwiw, Claude Sonnet 3.5 100% had some sort of agentic loop x precise file editing trained into it. Wasn't obvious to me until I added a MCP file server to my client, and still isn't well-understood outside a few.

I'm not sure on-device models will be able to handle it any time soon because it relies on just letting it read the whole effing file.

Seperately...

I say I don't understand why no other model is close, but it makes sense. OpenAI has been focused on reasoning, Mistral, I assume is GPU-starved, and Google...well, I used to work there, so I have to stop myself from going on and on. Let's just say I assume that there wouldn't be enough Consensus Built™ to do something "scary" and "experimental" like train that stuff in.

This also isn't going so hot for Sonnet IMHO.

There's vague displeasure and assumptions it "changed" the last week, but, AFAICT the real problem is that the reasoning stuff isn't as "trained in" as, say, OpenAI's.

This'd be a good thing except you see all kinds of whacky behavior.

One of my simple "read file and edit" queries yesterday did about 60 pages worth of thinking, and the thinking contained 130+ separate tool calls that weren't actually called, so it was just wandering around in the wilderness, reacting to hallucinated responses it never actually got.

Which plays into another one of my hobbyhorses, chat is a "hack" on top of an LLM. Great. So is reasoning, especially in the way Anthropic implemented it. At what point are the abstractions too much, so much that it's unreliable? 3.7 Sonnet may be answering that, because when it fails, all that thinking looks like the agentic loop cooked into Sonnet 3.5. So maybe it's altogether too much to have chat, reasoning, and fully reliable agentic loops...

AlexCoventry 6 days ago

I asked o1-pro what 99490126816810951552*23977364624054235203 is, yesterday. It took 16 minutes to get an answer which is off by eight orders of magnitude.

https://chatgpt.com/share/67e1eba1-c658-800e-9161-a0b8b7b683...

CamperBob2 5 days ago

What in the world is that supposed to prove? Let's see you do that in your head.

Tell it to use code if you want an exact answer. It should do that automatically, of course, and obviously it eventually will, but jeez, that's not a bad Fermi guess for something that wasn't designed to attempt such problems.

refulgentis 6 days ago

Sorry, I'm in a rush, could only afford a couple minutes looking at it, but I'm missing something:

Google: 2.385511e+39 Your chat: "Numerically, that’s about 2.3855 × 10^39"

Also curious how you think about LLM-as-calculator in relation to tool calls.

AlexCoventry 6 days ago

If you look at the precise answer, it's got 8 too many digits, despite it getting the right number of digits in the estimate you looked at.

> Also curious how you think about LLM-as-calculator in relation to tool calls.

I just tried this because I heard all existing models are bad at this kind of problem, and wanted to try it with the most powerful one I have access to. I think it shows that you really want an AI to be able to use computational tools in appropriate circumstances.

piokoch 5 days ago

"The foundation model companies are screwed." Not really, they can either make API access expensive or resign from exposing APIs and offer their custom products. Open Source models are great, but you need powerful hardware to run them, surely it will not be a smartphone, at least in the nearest future.

Imustaskforhelp 6 days ago

Yes I also believe the same though I only believe in either grok , gemini or claude ai of the western world.

Gemini isn't too special , it's actually just comparable to deepseek / less than deepseek but it is damn fast so maybe forget gemini for true tasks.

Grok / gemini can be used as a deep research model which I think I like ? Grok seems to have just taken the deepseek approach but just scaled it by their hyper massive gpu cluster, I am not sure I think that grok can also be replaced.

What I truly believe in is claude.

I am not sure but claude really feels good for coding especially.

For any other thing I might use something like deepseek / chinese models

I used cerebras.ai and holy moly they are so fast , I used the deepseek 70 b model , it is still something incredibly fast and my time matters so I really like the open source way so that companies like cereberas can focus on what they do best.

I am not sure about nvidia though. Nvidia seems so connected to the western ai that deepseek improvements impact nvidia.

I do hope that nvidia cheapens the price of gpu though I don't think they have much incentive.

buyucu 5 days ago

OpenAI is basically a zombie company at this point. They could not make a profit even when they were the only player in town, it's now a very competitive landscape.

AlexCoventry 6 days ago

IMO, people will keep investing in this because whoever accomplishes the first intelligence explosion is going to have the potential for massive influence over all human life.

fsndz 6 days ago

indeed. open source will win. sam Altman was wrong: https://www.lycee.ai/blog/why-sam-altman-is-wrong

chaosprint 6 days ago

it seems that this free version "may use your prompts and completions to train new models"

https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free

do you think this needs attention?

wgd 6 days ago

That's typical of the free options on OpenRouter, if you don't want your inputs used for training you use the paid one: https://openrouter.ai/deepseek/deepseek-chat-v3-0324

overfeed 6 days ago

Is OpenRouter planning on distilling models off the prompts and responses from frontier models? That's smart - a little gross - but smart.

numlocked 6 days ago

COO of OpenRouter here. We are simply stating the WE can’t vouch for the behavior of the upstream provider’s retention and training policy. We don’t save your prompt data, regardless of the model you use, unless you explicitly opt-in to logging (in exchange for a 1% inference discount).

overfeed 6 days ago

I'm glad to hear you are not hoovering up this data for your own purposes.

simonw 6 days ago

That 1% discount feels a bit cheap to me - if it was a 25% or 50% discount I would be much more likely to sign up for it.

numlocked 6 days ago

We don’t particularly want our customers’ data :)

oofbaroomf 6 days ago

Yeah, but Openrouter has a 5% surcharge anyway.

YetAnotherNick 5 days ago

Better way to state is 20% of surcharge then :)

vintermann 5 days ago

You clearly want it a little if you give a discount for it?

huijzer 6 days ago

Since we are on HN here, I can highly recommend open-webui with some OpenAI-compatible provider. I'm running with Deep Infra for more than a year now and am very happy. New models are usually available within one or two days after release. Also have some friends who use the service almost daily.

l72 6 days ago

I too run openweb-ui locally and use deepinfra.com as my backend. It has been working very well, and I am quite happy with deepinfra's pricing and privacy policy.

I have set up the same thing at work for my colleagues, and they find it better than openai for their tasks.

jychang 5 days ago

Yeah, openweb-ui is the best frontend for API queries. Everything seems to work well.

I've tried LibreChat before, but the app is terrible at generating titles for chats instead of leaving it as "New Chat". Also it lacks a working Code Interpreter.

unquietwiki 6 days ago

I'm using open-webui at home with a couple of different models. gemma2-9b fits in VRAM on a NV 3060 card + performs nicely.

mdp2021 5 days ago

> performs nicely

Do you have rough indication of token/s ?

zakki 6 days ago

What is the memory of your NV3060? 8GB?

ngvjmfgb 6 days ago

12GB (edit: that is what mine is)

totetsu 6 days ago

And it’s quite easy to set up a Cloudflare tunnel to make your open-webui instance accessible online too just you

simonw 6 days ago

... or a TailScale network. I've been leaving open-webui running on my laptop on my desk and then going out into the word and accessing it from my phone via TailScale, works great.

totetsu 6 days ago

I would use tail scale. But I specifically want to use open web-ui from a place I can’t install a Tailscale client

fragmede 5 days ago

where's that?

wkat4242 6 days ago

Yeah this sounds like the more secure option, you don't want to be dependent on a single flaw in a web service

wkat4242 6 days ago

Yeah OpenWebUI is great with local models too. I love it. You can even do a combo, send the same prompt to local and cloud and even various providers and compare the results.

eurekin 5 days ago

I've tried using it, but it's browser tab seems to peg one core to 100% after some time. Anyone else experienced it?

indigodaddy 5 days ago

Can open-webui update code on your local computer ala cursor etc?

cess11 5 days ago

It has a module system so maybe it can but it seems more people are using Aider or Continue for that. There's a bit of stitching things together regardless of whether you show your project to some SaaS or run local models but if you can manage a Linux system it'll be easy.

Personally I heavily dislike the experience though, so I might not be the best one to answer.

TechDebtDevin 6 days ago

Thats because its a 3rd party API someone is hosting and trying to arb the infra cost or mine training data, or maybe something even more sinister. I stay away from open router API's that aren't served by reputable well known companies, and even then...

madduci 5 days ago

As always, avoid using sensitive information and you are good to go

behnamoh 6 days ago

good grief! people are okay with it when OpenAI and Google do it, but as soon as open source providers do it, people get defensive about it...

chaosprint 6 days ago

no. it's nothing to do with deepseek. it's openrouter and providers there

londons_explore 6 days ago

I trust big companies far more with my data than small ones.

Big companies have so much data they won't be having a human look at mine specifically. Some small place probably has the engineer looking at my logs as user #4.

Also, big companies have security teams whose job is securing the data, and it won't be going over some unencrypted link to cloudflare because OP was too lazy to set up Https certs.

henry2023 6 days ago

Equifax.

jimmygrapes 6 days ago

I'm not convinced any humans have worked there for most of my lifetime.

ilrwbwrkhv 6 days ago

Both of them are better than any American models. Both for reasoning, agentic, fine tuning etc.

th0ma5 6 days ago

Why do you keep promoting your blog on every LLM post?

rudedogg 6 days ago

I think they didn’t want to rewrite their post. It’s more substantial and researched than any comment here, and all their posts are full of information. I think they should get a pass, and calling it self-promotion is a stretch.

simonw 6 days ago

Because I want people to read it. I only promote it if I think it's useful and relevant.

qingcharles 6 days ago

You're good. Disregard the haters.

th0ma5 5 days ago

I think you need to realize your fans don't have the same intent as you. You should ask your audience what they want you may be surprised.

jschoe 6 days ago

And it still can't answer this:

Q: "9.11 and 9.9, which one is larger?"

A: "To determine which number is larger between 9.11 and 9.9, let's compare them step by step. Both numbers have the same whole number part: 9. 9.11 has a 1 in the tenths place. 9.9 has a 9 in the tenths place. Since 9 (from 9.9) is greater than 1 (from 9.11), we can conclude that 9.9 is larger than 9.11."

"Final Answer: 9.9"

I don't think anything else needs to be said here.

vbezhenar 6 days ago

But that’s correct. 9.9 = 9.90 > 9.11. Seems that it answered the question absolutely correctly.

javchz 6 days ago

He's using Semantic versioning/s

AuryGlenz 6 days ago

I suggest we’ve already now passed what shall be dubbed the jschoe test ;)

sebastiennight 5 days ago

I will now refer to this as the jschoe test in my writing and publications as well!

It's interesting to think that maybe one of the most realistic consequences of reaching artificial superintelligence will be when its answers start wildly diverging from human expectations and we think it's being "increasingly wrong".

manaskarekar 6 days ago

jschoe's post is actually a Turing test for us. :)

(just kidding jschoe)

declan_roberts 6 days ago

He's Poe's law testing us.

owebmaster 6 days ago

> I don't think anything else needs to be said here.

Will this humbling moment change your opinion?

bongodongobob 6 days ago

Lol, well I guess we've a achieved the functional equivalent of AGI, at least for you. Please don't delete your comment.

oefrha 6 days ago

I’ve legit seen a heated online debate with hundreds of comments about this question (maybe not the exact numbers), and I don’t think most participants were memeing. People are that bad at math. It’s depressing.

aurareturn 6 days ago

+1 to Deepseek

-1 to humanity

yencabulator 5 days ago

Based on the presented reasoning, that means humanity wins! Yay!

MiiMe19 6 days ago

Sorry, I don't quite see what is wrong here.

manaskarekar 6 days ago

Parent is thinking Semantic Versioning.

vbezhenar 6 days ago

Semantic version contains 3 numbers.

declan_roberts 6 days ago

One of many pet peeves with semver

dangoodmanUT 6 days ago

9.9-9.11 =0.79

Might want to check your math? Seems right to me

keyle 6 days ago

9.9 is larger than 9.11. This right here is the perfect example of the dunning-kruger effect.

Maybe try rephrase your question to "which version came later, 9.9 or 9.11".

erichocean 6 days ago

This is hilarious, especially if it's unintentional.

declan_roberts 6 days ago

Poe's law in effect.

cplusplus6382 6 days ago

Answer is correct no?

WithinReason 5 days ago

You just failed the Turing test, now we know you're an LLM.

kwakubiney 6 days ago

But the answer is correct? 9.9 is larger than 9.11

gaoryrt 6 days ago

This makes my day.

sejje 6 days ago

What do you think the answer is?

7734128 5 days ago

16 is obviously larger than both 9.9 and 9.11. AI will never be capable of thinking outside the box like that and find the correct answer.