Did Claude's quality drop recently?

At least subjectively, Claude has been outperforming other LLMs for me for a while.

They seemed to be having significant availability issues this week.

As of this morning things are working for me but the quality of the responses seems to be much worse.

I frequently ask for lists of ideas and then drill down in follow up prompts. Today Claude keeps responding to my request for bulleted lists with something like, “I should not use lists unless explicitly requested and should instead write in paragraphs”, and then responds in paragraphs.

Is anyone else having a similar experience?

wqaatwt • 3 days ago

Yes. It seems to be almost incapable of communicating in anything but terse (up to 5 word or so) bullet points.

And even when you force it write in coherent sentences the output still seems markedly worse than it used to.

1 reply

blackeyeblitzar • 3 days ago

I wonder if it is falling back to concise mode as a means to handle load?

3 replies

ativzzz • 2 days ago

I have noticed it's a lot more concise than it was a few weeks ago. I used to glaze over way too much detail in response to my questions, now it's the opposite - too little context

wqaatwt • 3 days ago

When I switch it back to full I just get longer bullet point lists so perhaps they are doing that silently.

Maxamillion96 • 2 days ago

It explicitly said so to me a few days ago when i queried it.

muzani • 3 days ago

Sounds like it might be switching to Claude Haiku instead of Claude Sonnet.

Sonnet 3.5 always has this issue for me though. It excessively follows the original instructions, even in vague ways. It's likely 3.5 (new) is even worse. We use 3.0 in production because of this one quirk.

soul_grafitti • 1 day ago

Maybe it has to do with preparing for this: https://www.anthropic.com/news/model-context-protocol

devonsolomon • 2 days ago

Addressed on Lex Fridman with Dario Amodei (CEO) and Amanda Eskell of Claude, they both insist the answer is no.

I interpret their explanation for “no” as follows: these are probabilistic outputs, and so given any changes, for some inputs, some outputs will be worse some of the time.

The argument goes that, given they’re probabilistic, even without changes, for some inputs, some outputs will be worse than the last time you gave it that input, some of the time.

To be fair to them, it makes sense that any change would then be met with some vocal users who are genuinely experiencing worse output, but are not generally using a worse product.

jotjotzzz • 1 day ago

I have never used Claude, and I gave it a try today. It recommended a book that never existed. When I asked to tell me more, it apologized for giving me a fake book recommendation. I'm like, WTF!

philshem • 3 days ago

Yes, there are some reports. For example: https://news.ycombinator.com/item?id=42215912

1 reply

MeetingsBrowser • 3 days ago

Those are mostly about the availability issues.

I’m not having trouble getting responses as of today, but the quality of the responses seems to be much worse.

1 reply

wswope • 3 days ago

The underlying implication of the linked comment is that Anthropic is using quantization or similar quality-reducing strategies to help keep their service online due to the same shortage that has been causing availability issues.

ldjkfkdsjnv • 3 days ago

The new model is almost certainly a cheaper version of the older model, where they tried to maintain quality.

1 reply

quibono • 3 days ago

Interesting, but isn't the older model still available?

1 reply

Alifatisk • 2 days ago

Maybe through Poe.com?

patrickhogan1 • 3 days ago

Well, first off there is no such thing as Claude as there are multiple models that you can select from. You did not list which model you were using. In my opinion the Claude 3.5 Sonnet model is spectacular. It’s the best model yet for coding both on leaderboard and empirically in projects I’ve had it help me with.

This topic is discussed in recent Lex Fridman interview with with CEO of Anthropic where he very clearly walks through how these claims of it being dumber or not true. It’s a great interview and after listening to it I’m even more bullish on Anthropic.

There was a small degradation in performance that they posted an alert at the top of the page 2 nights ago. It didn’t affect the quality of the responses I got but it didn’t cause somewhat of a slowdown in response speed.

1 reply

MeetingsBrowser • 3 days ago

> Well, first off there is no such thing as Claude as there are multiple models that you can select from.

Apologies, I assumed people would infer that I am referring to 3.5 sonnet.

> In my opinion the Claude 3.5 Sonnet model is spectacular.

Mine as well, until this morning.

> There was a small degradation in performance … 2 nights ago. It didn’t affect the quality of the responses I got…

Also same, but as of this morning the performance is fine but the quality seems to have gotten worse.

> This topic is discussed in recent Lex Fridman interview with with CEO of Anthropic where he very clearly walks through how these claims of it being dumber or not true

Could you elaborate on what was said?

1 reply

MeetingsBrowser • 3 days ago

I found the interview [1].

TL;DR they don’t change the weights, but they sometimes run A/B tests and modify the system prompt. The underlying model is very sensitive to changes. Even a small change can have broad impacts.

[1]: https://lexfridman.com/dario-amodei-transcript#chapter8_crit...

1 reply

patrickhogan1 • 3 days ago

I hope you get it figured out!

One thing that has helped me when I can’t quickly get to the expected result is using the Anthropic prompt generator in the dev console.

This isn’t a critique of your prompt—it’s likely solid since you use the system frequently. However, for troubleshooting, the prompt generator can be useful because it creates very long and specific prompts. You can compare the results from your prompt to the ones generated to see where there might be differences.

sk11001 • 3 days ago

I think so, it was bad enough for me to cancel my subscription.

IAmGraydon • 1 day ago

I think it may have something to do with the launch of Windsurf, which uses Sonnet 3.5. Within days of launch, I noticed lots of problems with Anthropic - not only with Sonnet but also with Haiku. I use it for coding assistance and the quality of the code went way down. I was also constantly getting rate limited. Also keep in mind that Copilot has also added Sonnet as an available model, causing more problems with overhead.

So basically...yeah, probably growing pains.

JLCarveth • 2 days ago

It's been terrible for me the past two weeks. Every day I get a message about the site being at high-capacity, or I get rate-limited well before the supposed 45 message limit.

Today, Claude's responses have been so error prone and incorrect it's quite disappointing, as now the LLM is struggling giving correct answers that wouldn't have been a problem previously. For example, it kept insisting that I use `chmod` to take ownership of a directory.

I am seriously considering cancelling my subscription, since the service has only deteriorated since I have subscribed.

inquisitor27552 • 2 days ago

yes the common sentiment in aitwitter is they cut down on inference costs.

i dont know about you, but maybe claude didnt get dumber, maybe you just got smarter.