Item 43999248

nico • 1 day ago

4o and 4.1 are not very good at coding

My best results are usually with 4o-mini-high, o3 is sometimes pretty good

I personally don’t like the canvas. I prefer the output on the chat

And a lot of times I say: provide full code for this file, or provide drop-in replacement (when I don’t want to deal with all the diffs). But usually at around 300-400 lines of code, it starts getting bad and then I need to refactor to break stuff up into multiple files (unless I can focus on just one method inside a file)

manmal • 1 day ago

o3 is shockingly good actually. I can’t use it often due to rate limiting, so I save it for the odd occasion. Today I asked it how I could integrate a tree of Swift binary packages within an SDK, and detect internal version clashes, and it gave a very well researched and sensible overview. And gave me a new idea that I‘ll try.

2 replies

kenjackson • 1 day ago

I use o3 for anything math or coding related. 4o is good for things like, "my knee hurts when I do this and that -- what might it be?"

1 reply

TeMPOraL • 15 hours ago

In ChatGPT, at this point I use 4o pretty much only for image generation; it's the one feature that's unique to it and is mind-blowingly good. For everything else, I default to o3.

For coding, I stick to Claude 3.5 / 3.7 and recently Gemini 2.5 Pro. I sometimes use o3 in ChatGPT when I can't be arsed to fire up Aider, or really need to use its search features to figure out how to do something (e.g. pinouts for some old TFT screens for ESP32 and Raspberry Pi, most recently).

hnhn34 • 1 day ago

Just in case you didn't know, they raised the rate limit from ~50/week to ~50/day a while ago

1 reply

manmal • 15 hours ago

Thank you, that’s really nice actually!

johnsmith1840 • 1 day ago

Drop in replacement files per update should be done on the heavy test time compute methods.

o1-pro, o1-preview can generate updated full file responses into the 1k LOC range.

It's something about their internal verification methods that make it an actual viable development method.

1 reply

nico • 1 day ago

True. Also, the APIs don't care too much about restricting output length, they might actually be more verbose to charge more

It's interesting how the same model being served through different interfaces (chat vs api), can behave differently based on the economic incentives of the providers