necovek 8 days ago

The premise might possibly be true, but as an actually seasoned Python developer, I've taken a look at one file: https://github.com/dx-tooling/platform-problem-monitoring-co...

All of it smells of a (lousy) junior software engineer: from configuring root logger at the top, module level (which relies on module import caching not to be reapplied), over not using a stdlib config file parser and building one themselves, to a raciness in load_json where it's checked for file existence with an if and then carrying on as if the file is certainly there...

In a nutshell, if the rest of it is like this, it simply sucks.

23
milicat 8 days ago

The more I browse through this, the more I agree. I feel like one could delete almost all comments from that project without losing any information – which means, at least the variable naming is (probably?) sensible. Then again, I don't know the application domain.

Also…

  def _save_current_date_time(current_date_time_file: str, current_date_time: str) -> None:
    with Path(current_date_time_file).open("w") as f:
      f.write(current_date_time)
there is a lot of obviously useful abstraction being missed, wasting lines of code that will all need to be maintained.

The scary thing is: I have seen professional human developers write worse code.

Aurornis 8 days ago

> I feel like one could delete almost all comments from that project without losing any information

I far from a heavy LLM coder but I’ve noticed a massive excess of unnecessary comments in most output. I’m always deleting the obvious ones.

But then I started noticing that the comments seem to help the LLM navigate additional code changes. It’s like a big trail of breadcrumbs for the LLM to parse.

I wouldn’t be surprised if vibe coders get trained to leave the excess comments in place.

cztomsik 8 days ago

More tokens -> more compute involved. Attention-based models work by attending every token with each other, so more tokens means not only having more time to "think" but also being able to think "better". That is also at least part of the reason why o1/o3/R1 can sometimes solve what other LLMs could not.

Anyway, I don't think any of the current LLMs are really good for coding. What it's good at is copy-pasting (with some minor changes) from the massive code corpus it has been pre-trained. For example, give it some Zig code and it's straight unable to solve even basic tasks. Same if you give it really unique task, or if you simply ask for potential improvements of your existing code. Very, very bad results, no signs of out-of-box thinking whatsoever.

BTW: I think what people are missing is that LLMs are really great at language modeling. I had great results, and boosts in productivity, just by being able to prepare the task specification, and do quick changes in that really easily. Once I have a good understanding of the problem, I can usually implement everything quickly, and do it in much much better way than any LLM can currently do.

Workaccount2 7 days ago

I have tried getting gemini 2.5 to output "token efficient" code, i.e. no comments, keep variables to 1 or 2 letters, try to keep code as condensed as possible.

It didn't work out that great. I think that all the context in the verbose coding it does actually helps it to write better code. Shedding context to free up tokens isn't so straightforward.

lolinder 8 days ago

It doesn't hurt that the model vendors get paid by the token, so there's zero incentive to correct this pattern at the model layer.

thesnide 8 days ago

or the model get trained from teaching code which naturally contains lots of comments.

the dev is just lazy to not include them anymore, wheres the model doesn't really need to be lazy, as paid by the token

dkersten 8 days ago

What’s worse, I get a lot of comments left saying what the AI did, not what the code does or why. Eg “moved this from file xy”, “code deleted because we have abc”, etc. Completely useless stuff that should be communicated in the chat window, not in the code.

nostromo 8 days ago

LLMs are also good at commenting on existing code.

It’s trivial to ask Claude via Cursor to add comments to illustrate how some code works. I’ve found this helpful with uncommented code I’m trying to follow.

I haven’t seen it hallucinate an incorrect comment yet, but sometimes it will comment a TODO that a section should be made more more clear. (Rude… haha)

pastage 8 days ago

I have seldomly seen insightful comments from LLMs. It is usually better than "comment what the line does" usefull for getting a hint about undocumented code, but not by much. My experience is limited, but what I have I do agree with. As long as you keep on the beaten path it is ok. Comments are not such a thing.

ramesh31 8 days ago

>The scary thing is: I have seen professional human developers write worse code.

This is kind of the rub of it all. If the code works, passes all relevant tests, is reasonably maintainable, and can be fitted into the system correctly with a well defined interface, does it really matter? I mean at that point its kind of like looking at the output of a bytecode compiler and being like "wow what a mess". And it's not like they can't write code up to your stylistic standards, it's just literally a matter of prompting for that.

mjr00 8 days ago

> If the code works, passes all relevant tests, is reasonably maintainable, and can be fitted into the system correctly with a well defined interface, does it really matter?

You're not wrong here, but there's a big difference in programming one-off tooling or prototype MVPs and programming things that need to be maintained for years and years.

We did this song and dance pretty recently with dynamic typing. Developers thought it was so much more productive to use dynamically typed languages, because it is in the initial phases. Then years went by, those small, quick-to-make dynamic codebases ended up becoming unmaintainable monstrosities, and those developers who hyped up dynamic typing invented Python/PHP type hinting and Flow for JavaScript, later moving to TypeScript entirely. Nowadays nobody seriously recommends building long-lived systems in untyped languages, but they are still very useful for one-off scripting and more interactive/exploratory work where correctness is less important, i.e. Jupyter notebooks.

I wouldn't be surprised to see the same pattern happen with low-supervision AI code; it's great for popping out the first MVP, but because it generates poor code, the gung-ho junior devs who think they're getting 10x productivity gains will wisen up and realize the value of spending an hour thinking about proper levels of abstraction instead of YOLO'ing the first thing the AI spits out when they want to build a system that's going to be worked on by multiple developers for multiple years.

bcoates 8 days ago

I think the productivity gains of dynamic typed languages were real, and based on two things: dynamic typing (can) provide certain safety properties trivially, and dynamic typing neatly kills off the utterly inadequate type systems found in mainstream languages when they were launched (the 90s, mostly).

You'll notice the type systems being bolted onto dynamic languages or found in serious attempts at new languages are radically different than the type systems being rejected by the likes of javascript, python, ruby and perl.

nottorp 8 days ago

> those small, quick-to-make dynamic codebases ended up becoming unmaintainable monstrosities

In my experience, type checking / type hinting already starts to pay off when more than one person is working on an even small-ish code base. Just because it helps you keep in mind what comes/goes to the other guy's code.

lolinder 8 days ago

And in my experience "me 3 months later" counts as a whole second developer that needs accommodating. The only time I appreciate not having to think about types is on code that I know I will never, ever come back to—stuff like a one off bash script.

wesselbindt 8 days ago

> "me 3 months later" counts as a whole second developer

A fairly incompetent one, in my experience. And don't even get me started on "me 3 months ago", that guy's even worse.

nottorp 8 days ago

"How has that shit ever worked?"

Me, looking at code 100% written by me last year.

baq 8 days ago

It gets worse with age and size of the project. I’m getting the same vibes, but for code written by me last month.

guskel 8 days ago

Yep, I've seen type hinting even be helpful without a type checker in python. Just as a way for devs to tell each other what they intend on passing. Even when a small percent of the hints are incorrect, having those hints there can still pay off.

dheera 8 days ago

> You're not wrong here, but there's a big difference in programming one-off tooling or prototype MVPs and programming things that need to be maintained for years and years.

Humans also worry about their jobs, especially in PIP-happy companies; they are very well known for writing intentionally over-complicated code that only they understand so that they are irreplaceable

XorNot 8 days ago

I'm not convinced this actually happens. Seems more like somthing people assume happens because they don't like whatever codebase is at the new job.

SkyBelow 7 days ago

The challenge is that sufficiently bad code could be intentional or it could be from a lack of skill.

For example, I've seen a C# application where every function takes in and outputs an array of objects, supposedly built that way so the internal code can be modified without ever having to worry about the contract breaking. It was just as bad as you are imagining, probably worse. Was that incompetence or building things to be so complicated that others would struggle to work on it?

mistrial9 7 days ago

but that is literally how the browser window DOM works, no? It depends on how diligent the maintenance is IMHO

baq 8 days ago

If your TC is 500k-1M and you don’t feel like job hopping anymore, you’d certainly not want to get hit by a random layoff due to insufficient organizational masculinity or whatever. Maintaining a complex blob of mission critical code is one way of increasing your survival chances, though of course nothing is guaranteed.

LtWorf 8 days ago

People doing layoffs have no idea of who works and who's warming the chair.

baq 8 days ago

Depending on the layoff they may look into yearly reviews... or not.

LtWorf 8 days ago

Ah yes, those work /s

dheera 8 days ago

Oh, I'm convinced, I've seen it first hand.

mistrial9 7 days ago

hmm I have seen conda env with far too many packages and maybe a lot of current version bumping, and the dev says "who cares" and it naturally gets a bit more.. Intentionally complicated is more like an accusation of wrongdoing.

triyambakam 8 days ago

The ML world being nearly entirely in Python, much of it untyped (and that the Python type system is pretty weak) is really scary.

ramesh31 8 days ago

>The ML world being nearly entirely in Python, much of it untyped (and that the Python type system is pretty weak) is really scary

I think this has a ton to do with the mixed results from "vibe coding" we've seen as the codebase grows in scope and complexity. Agents seem to break down without a good type system. Same goes for JS.

I've just recently started on an Objective-C project using Cline, and it's like nirvana. I can code out an entire interface and have it implemented for me as I'm going. I see no reason it couldn't scale infinitely to massive LOC with good coding practices. The real killer feature is header files. Being able to have your entire projects headers in context at all time, along with a proper compiler for debugging, changes the game for how agents can reason on the whole codebase.

ManuelKiessling 8 days ago

I'm certainly extremely happy for having an extensive type system in my daily driver languages especially when working with AI coding assistance — it's yet another very crucial guard rail that ensures that keeps the AI on track and makes a lot of fuckups downright impossible.

dilyevsky 8 days ago

what are you going to do when something suddenly doesn't work and cursor endlessly spins without progress no matter how many "please don't make mistakes" you add? delete the whole thing and try to one-shot it again?

nsonha 8 days ago

Why do you HAVE TO one-shot? No one says you have to code like those influencers. You are a software engineer, use AI like one, iteratively.

ramesh31 8 days ago

>No one says you have to code like those influencers. You are a software engineer, use AI like one, iteratively.

This is my issue with all the AI naysayers at this point. It seems to all boil down to "haha, stupid noob can't code so he uses AI" in their minds. It's like they are incapable of understanding that there could simultaneously be a bunch of junior devs pushing greenfield YouTube demos of vibe coding, while at the same time expert software engineers are legitimately seeing their productivity increase 10x on serious codebases through judicious use.

Go ahead and keep swinging that hammer, John Henry.

necovek 8 days ago

> expert software engineers are legitimately seeing their productivity increase 10x

It's funny you would say this, because we are really commenting on an article where a self-proclaimed "expert" has done that and the "10x" output is terrible.

ManuelKiessling 8 days ago

I have just checked my article — the word "expert" isn't in it, so not quite sure where you got this from.

I'm working in the field professionally since June 1998, and among other things, I was the tech lead on MyHammer.de, Germany's largest craftsman platform, and have built several other mid-scale online platforms over the decades.

How well I have done this, now that's for others to decide.

Quite objectively though, I do have some amount of experience — even a bad developer probably cannot help but pick up some learnings over so many years in relevant real-world projects.

However, and I think I stated this quite clearly, I am expressively not an expert in Python.

And yet, I could realize an actually working solution that solves an actual problem I had in a very real sense (and is nicely humming away for several weeks now).

And this is precisely where yes, I did experience a 10x productivity increase; it would have certainly taken me at least a week or two to realize the same solution myself.

necovek 7 days ago

Apologies for implying you are claiming to be an expert software engineer: I took the "senior" in the title and "25 years of experience" in the post to mean similar things as "expert".

I don't doubt this is doing something useful for you. It might even be mostly correct.

But it is not a positive advertisement for what AI can do: just like the code is objectively crap, you can't easily trust the output without a comprehensive review. And without doubting your expertise, I don't think you reviewed it, or you would have caught the same smells I did.

What this article tells me is that when the task is sufficiently non-critical that you can ignore being perfectly correct, you can steer AI coding assistants into producing some garbage code that very well might work or appear to work (when you are making stats, those are tricky even with utmost manual care).

Which is amazing, in my opinion!

But not what the premise seems to be (how a senior will make it do something very nice with decent quality code).

Out of curiosity why did you not build this tool in a language you generally use?

ManuelKiessling 7 days ago

Because I wanted exactly this experience: can I get to the desired result — functionality-wise, if not code-wise! — even if I choose the stack that makes sense in terms of technology, not the one that I happen to be proficient in?

And if I cannot bring language-proficiency to the table — which of my capabilities as a seasoned software&systems guy can I put to use?

In the brown-field projects where my team and I have the AI implement whole features, the resulting code quality — under our sharp and experienced eyes — tends to end up just fine.

I think I need to make the differences between both examples more clear…

necovek 7 days ago

Ok, I guess you shouldn't complain that you really got exactly what you wanted.

However, your writing style implied that the result was somehow better because you were otherwise an experienced engineer.

Even your clarification in the post sits right below your statement how your experience made this very smooth, with no explanation that you were going to be happy with bad code as long as it works.

ManuelKiessling 7 days ago

I guess we are slowly but steadily approaching splitting-hairs-territory, so not sure if this is still worth it…

However. I‘m not quite sure where I complained. Certainly not in the post.

And yes, I’m very convinced that the result turned out a lot better than it would have turned out if an unexperienced „vibe coder“ had tried to achieve the same end result.

Actually pretty sure without my extensive and structured requirements and the guard rails, the AI coding session would have ended in a hot mess in the best case, and a non-functioning result in the worst case.

I‘m 100% convinced that these two statements are true and relevant to the topic:

That a) someone lacking my level of experience and expertise is simply not capable of producing a document like https://github.com/dx-tooling/platform-problem-monitoring-co...

And that b) using said document as the basis for the agent-powered AI coding session had a significant impact on the process as well as the end result of the session.

achierius 8 days ago

I think some of the suspicion is that it's really not 10x in practice.

Macha 8 days ago

Like AI could write code perfectly as soon as I thought of it, and that would not improve my productivity 10x. Coding was never the slow part. Everything that goes around coding (like determining that the extra load here is not going to overload things, getting PMs to actually make their mind up what the feature is going to do, etc.), means that there's simply not that much time to be saved on coding activities.

nsonha 8 days ago

Same argument can be said for not using any tooling really. "Tech is the easy part". No difference typing code on notepad and having zero process/engineering infrastructure I guess. Because stakeholder management is the main engineering skill apparently.

Btw, AI doesn't just code, there are AIs for debugging, monitoring etc too.

achierius 7 days ago

There are two levels to this.

1. Tooling obviously does improve performance, but not so huge a margin. Yes, if AI could automate more elements of tooling, that would very much help. If I could tell an AI "bisect this bug, across all projects in our system, starting with this known-bad point", that would be very helpful -- sometimes. And I'm sure we'll get there soon enough. But there is fractal complexity here: what if isolating the bug requires stepping into LLDB, or dumping some object code, or running with certain stressors on certain hardware? So it's not clear that "LLM can produce code from specs, given tight oversight" will map (soon) to "LLM can independently assemble tools together and agentically do what I need done".

2. Even if all tooling were automated, there's still going to be stuff left over. Can the LLM draft architectural specs, reach out to other teams (or their LLMs), sit in meetings and piece together the big picture, sus out what the execs really want us to be working on, etc.? I do spend a significant (double-digit) percentage of my time working on that, so if you eliminate everything else -- then you could get 10x improvement, but going beyond that would start to run up against Amdahl's Law.

necovek 7 days ago

If you were to really measure speed improvement of notepad vs a tricked out IDE, it's probably not much. The problem would be the annoyance caused to an engineer who has to manually type out everything.

No, coding speed is really not the bottleneck to software engineer productivity.

nsonha 7 days ago

> coding speed > the annoyance caused to an engineer

No one said productivity is this one thing and not that one thing, only you say that because it's convenient for your argument. Productivity is a combination of many things, and again it's not just typing out code that's the only area AI can help.

necovek 7 days ago

The argument of "coding speed not a bottleneck to productivity" is not in contradiction to "productivity is a combination": it even implies it.

Again, the context here was that somebody discussed speed of coding and you raised the point of not using any tooling with Notepad.

nsonha 7 days ago

The context here is AI assisted engineering and you raised the point that non-engineering productivity is more important for engineers, which I think is absurd.

You can have 10x engineering productivity boost and still complete work in the same amount of time, because of communication and human factors. Maybe it's a problem, may be it's not. It's still a productivity gain that will make you work better nonetheless.

necovek 7 days ago

I did not raise it, but what was raised was "coding speed": as in, the speed to type code into an editor.

That's not "engineering", but "coding".

Engineering already assumes a lot more than just coding: most importantly, thinking through a problem, learning about it and considering a design that would solve it.

Nobody raised communication or the human factors.

Current LLMs can indisputably help with the learning part, with the same caveats (they will sometimes make shit up). Here we are looking at how much they help with the coding part.

LtWorf 8 days ago

Weren't you the guy who only writes HTML? Maybe let domain experts comment on their domain of expertise.

johnnyanmac 8 days ago

My grievances are simple: an expert programming utilizing AI will be a truly dangerous force.

But that's not what we get in this early stage of grifting. We get 10% marketing buzz on how cool this is with stuff that cannot be recreated in the tool alone, and 89% of lazy or inexperienced developers who just turn in slop with little or no iteration. The latter don't even understand the code they generated.

That 1% will be amazing, it's too bad the barrel is full of rotten apples hiding that potential. The experts also tend to keep to themselves, in my experience. the 89% includes a lot of dunning-kruger as well which makes those outspoken experts questionable (maybe a part of why real experts aren't commenting on their experience).

shove 8 days ago

“Maybe you didn’t hear me, I said ‘good morning steam driver, how are you?’”

dilyevsky 8 days ago

The point is because it generally produces crap code you have to one shot or else iteration becomes hard. Similar to how a junior would try to refactor their mess and just make a bigger mess

nsonha 8 days ago

I find it hard to believe that when the AI generates crap code, there is absolutely nothing you can do (change the prompt, modify context, add examples) to make it do what you want. It has not been my experience either. I only use AI to make small modules and refactor instead of one-shoting.

Also I find "AI makes crap code so we should give it a bigger task" illogical.

mistrial9 7 days ago

it seems that there are really, really large differences between models; how well they do, what they respond to.. even among the "best" .. the field does seem to be moving faster

ManuelKiessling 8 days ago

Good insight, and indeed quite exactly my state of mind while creating this particular solution.

Iin this case, I did put in the guard rails to ensure that I reach my goal in hopefully a straight line and a quickly as possible, but to be honest, I did not give much thought to long-term maintainability or ease of extending it with more and more features, because I needed a very specific solution for a use case that doesn't change much.

I'm definitely working differently in my brown-field projects where I'm intimately familiar with the tech stack and architecture — I do very thorough code reviews afterwards.

necovek 8 days ago

I think this code is at least twice the size than it needs to be compared to nicer, manually produced Python code: a lot of it is really superfluous.

People have different definitions of "reasonably maintainable", but if code has extra stuff that provides no value, it always perplexes the reader (what is the point of this? what am I missing?), and increases cognitive load significantly.

But if AI coding tools were advertised as "get 10x the output of your least capable teammate", would they really go anywhere?

I love doing code reviews as an opportunity to teach people. Doing this one would suck.

stemlord 8 days ago

Right, and the reason why professional developers are writing worse code out there is most likely because they simply don't have the time/aren't paid to care more about it. The LLM is then mildly improving the output in this brand of common real world scenario

FeepingCreature 8 days ago

> there is a lot of obviously useful abstraction being missed, wasting lines of code that will all need to be maintained.

This is a human sentiment because we can fairly easily pick up abstractions during reading. AIs have a much harder time with this - they can do it, but it takes up very limited cognitive resources. In contrast, rewriting the entire software for a change is cheap and easy. So to a point, flat and redundant code is actually beneficial for a LLM.

Remember, the code is written primarily for AIs to read and only incidentally for humans to execute :)

fzeroracer 8 days ago

At the very least, if a professional human developer writes garbage code you can confidently blame them and either try to get them to improve or reduce the impact they have on the project.

With AI they can simply blame whatever model they used and continually shovel trash out there instantly.

Hojojo 7 days ago

I don't see the difference there. Whether I've written all the code myself or an AI wrote all of it, my name will be on the commit. I'll be the person people turn to when they question why code is the way it is. In a pull request for my commit, I'll be the one discussing it with my colleagues. I can't say "oh, the AI wrote it". I'm responsible for the code. Full stop.

If you're in a team where somebody can continuously commit trash without any repercussions, this isn't a problem caused by AI.

jstummbillig 8 days ago

> The scary thing is: I have seen professional human developers write worse code.

That's not the scary part. It's the honest part. Yes, we all have (vague) ideas of what good code looks like, and we might know it when we see it but we know what reality looks like.

I find the standard to which we hold AI in that regard slightly puzzling. If I can get the same meh-ish code for way less money and way less time, that's a stark improvement. If the premise is now "no, it also has to be something that I recognize as really good / excellent" then at least let us recognize that we have past the question if it can produce useful code.

merrywhether 7 days ago

I think there’s a difference in that this is about as good as LLM code is going to get in terms of code quality (as opposed to capability a la agentic functionality). LLM output can only be as good as its training data, and the proliferation of public LLM-generated code will only serve as a further anchor in future training. Humans on the other hand ideally will learn and improve with each code review and if they don’t want to you can replace them (to put it harshly).

necovek 7 days ago

I do believe it's amazing what we can build with AI tools today.

But whenever someone advertises how an expert will benefit from it yet they end up with crap, it's a different discussion.

As an expert, I want AI to help me produce code of similar quality faster. Anyone can find a cheaper engineer (maybe five of them?) that can produce 5-10x the code I need at much worse quality.

I will sometimes produce crappy code when I lack the time to produce higher quality code: can AI step in and make me always produce high quality code?

That's a marked improvement I would sign up for, and some seem to tout, yet I have never seen it play out.

In a sense, the world is already full of crappy code used to build crappy products: I never felt we were lacking in that department.

And I can't really rejoice if we end up with even more of it :)

gerdesj 8 days ago

My current favourite LLM wankery example is this beauty: https://blog.fahadusman.com/proxmox-replacing-failed-drive-i...

Note how it has invented the faster parameter for the zpool command. It is possible that the blog writer hallucinated a faster parameter themselves without needing a LLM - who knows.

I think all developers should add a faster parameter to all commands to make them run faster. Perhaps a LLM could create the faster code.

I predict an increase of man page reading, and better quality documentation at authoritative sources. We will also improve our skills at finding auth sources of docs. My uBlacklist is getting quite long.

Henchman21 8 days ago

What makes you think this was created by an LLM?

I suspect they might actually have a pool named faster -- I know I've named pools similarly in the past. This is why I now name my pools after characters from the Matrix, as is tradition.

taurath 8 days ago

This really gets to an acceleration of enshittification. If you can't tell its an LLM, and there's nobody to verify the information, humanity is architecting errors and mindfucks into everything. All of the markers of what is trustworthy has been coopted by untrustworthy machines, so all of the way's we'd previously differentiated actors have stopped working. It feels like we're just losing truth as rapidly as LLMs can generate mistakes. We've built a scoundrels paradise.

How useful is a library of knowledge when n% of the information is suspect? We're all about to find out.

Henchman21 8 days ago

You know, things looked off to me, but thinking it was the output of an LLM just didn't seem obvious -- even though that was the claim! I feel ill-equipped to deal with this, and as the enshittification has progressed I've found myself using "the web" less and less. At this point, I'm not sure there's much left I value on the web. I wish the enshittification wasn't seemingly pervasive in life.

taurath 8 days ago

I believe in people, but I start to think that scrolling is the Fox News or AM radio of a new generation, it just happens to be the backbone of the economy because automation is so much cheaper than people.

lloeki 8 days ago

The pool is named backups according to zpool status and the paragraph right after.

But then again the old id doesn't match between the two commands.

Henchman21 7 days ago

Yep that’s the stuff I noticed that was off too

rotis 8 days ago

How can this article be written by LLM? Its date is November 2021. Not judging the article as a whole but the command you pointed out seems to be correct. Faster is the name of the pool.

gruez 7 days ago

>Its date is November 2021

The date can be spoofed. It first showed up on archive.org in December 2022, and there's no captures for the site before then, so I'm liable to believe the dates are spoofed.

bdhcuidbebe 7 days ago

There was alot going on in the years before ChatGPT. Text generation was going strong with interactive fiction before anyone were talking about OpenAI.

victorbjorklund 8 days ago

I used LLM:s for content generation in july 2021. Of course that was when LLM:s were pretty bad.

selcuka 7 days ago

GPT-2 was released in 2019. ChatGPT wasn't the first publicly available LLM.

rybosome 8 days ago

Ok - not wrong at all. Now take that feedback and put it in a prompt back to the LLM.

They’re very good at honing bad code into good code with good feedback. And when you can describe good code faster than you can write it - for instance it uses a library you’re not intimately familiar with - this kind of coding can be enormously productive.

imiric 8 days ago

> They’re very good at honing bad code into good code with good feedback.

And they're very bad at keeping other code good across iterations. So you might find that while they might've fixed the specific thing you asked for—in the best case scenario, assuming no hallucinations and such—they inadvertently broke something else. So this quickly becomes a game of whack-a-mole, at which point it's safer, quicker, and easier to fix it yourself. IME the chance of this happening is directly proportional to the length of the context.

bongodongobob 8 days ago

This typically happens when you run the chat too long. When it gives you a new codebase, fire up a new chat so the old stuff doesn't poison the context window.

achierius 8 days ago

But it rarely gives me a totally-new codebase unless I'm working on a very small project -- so I have to choose between ditching its understanding of some parts (e.g. "don't introduce this bug here, please") and avoiding confusion with others.

no_wizard 8 days ago

Why isn’t it smart enough to recognize new contexts that aren’t related to old ones?

bongodongobob 8 days ago

I don't know, I didn't invent transformers. I do however know how to work with them.

aunty_helen 8 days ago

Nah. This isn’t true. Every time you hit enter you’re not just getting a jr dev, you’re getting a randomly selected jr dev.

So, how did I end up with a logging.py, config.py, config in __init__.py and main.py? Well I prompted for it to fix the logging setup to use a specific format.

I use cursor, it can spit out code at an amazing rate and reduced the amount of docs I need to read to get something done. But after its second attempt at something you need to jump in and do it yourself and most likely debug what was written.

skydhash 8 days ago

Are you reading a whole encyclopedia each time you assigned to a task? The one thing about learning is that it compounds. You get faster the longer you use a specific technology. So unless you use a different platform for each task, I don't think you have to read that much documentation (understanding them is another matter).

achierius 8 days ago

This is an important distinction though. LLMs don't have any persistent 'state': they have their activations, their context, and that's it. They only know what's pre-trained, and what's in their context. Now, their ability to do in-context learning is impressive, but you're fundamentally still stuck with the deviations and, eventually, forgetting that characterizes these guys -- while a human, while less quick on the uptake, will nevertheless 'bake in' the lessons in a way that LLMs currently cannot.

In some ways this is even more impressive -- every prompt you make, your LLM is in effect re-reading (and re-comprehending) your whole codebase, from scratch!

necovek 8 days ago

I do plan on experimenting with the latest versions of coding assistants, but last I tried them (6 months ago), none could satisfy all of the requirements at the same time.

Perhaps there is simply too much crappy Python code around that they were trained on as Python is frequently used for "scripting".

Perhaps the field has moved on and I need to try again.

But looking at this, it would still be faster for me to type this out myself than go through multiple rounds of reviews and prompts.

Really, a senior has not reviewed this, no matter their language (raciness throughout, not just this file).

barrell 8 days ago

I would not say it is “very good” at that. Maybe it’s “capable,” but my (ample) experience has been the opposite. I have found the more exact I describe a solution, the less likely it is to succeed. And the more of a solution it has come up with, the less likely it is to change its mind about things.

Every since ~4o models, there seems to be a pretty decent chance that you ask it to change something specific and it says it will and it spits out line for line identical code to what you just asked it to change.

I have had some really cool success with AI finding optimizations in my code, but only when specifically asked, and even then I just read the response as theory and go write it myself, often in 1-15% the LoC as the LLM

BikiniPrince 8 days ago

I’ve found AI tools extremely helpful in getting me up to speed with a library or defining an internal override not exposed by the help. However, if I’m not explicit in how to solve a problem the result looks like the bad code it’s been ingesting.

mjr00 8 days ago

I "love" this part:

  def ensure_dir_exists(path: str) -> None:
    """
    Ensure a directory exists.

    Args:
        path: Directory path
    """
An extremely useful and insightful comment. Then you look where it's actually used,

    # Ensure the directory exists and is writable
    ensure_dir_exists(work_dir)

    work_path = Path(work_dir)
    if not work_path.exists() or not os.access(work_dir, os.W_OK):
... so like, the entire function and its call (and its needlessly verbose comment) could be removed because the existence of the directory is being checked anyway by pathlib.

This might not matter here because it's a small, trivial example, but if you have 10, 50, 100, 500 developers working on a codebase, and they're all thoughtlessly slinging code like this in, you're going to have a dumpster fire soon enough.

I honestly think "vibe coding" is the best use case for AI coding, because at least then you're fully aware the code is throwaway shit and don't pretend otherwise.

edit: and actually looking deeper, `ensure_dir_exists` actually makes the directory, except it's already been made before the function is called so... sigh. Code reviews are going to be pretty tedious in the coming years, aren't they?

johnfn 8 days ago

Not all code needs to be written at a high level of quality. A good deal of code just needs to work. Shell scripts, one-offs, linter rules, etc.

jayd16 8 days ago

It'll be really interesting to see if the tech advances fast enough that future AI can deal with the tech debt of present day AI or if we'll see a generational die off of apps/companies.

bdhcuidbebe 7 days ago

I expect some of the big companies that went all in on relying on AI to fall in the coming years.

It will take some time tho, as decision makers will struggle to make up reasons why why noone on the payroll is able to fix production.

jjice 8 days ago

You’re objectively correct in a business context, which is what most software is for. For me, seeing AI slop code more and more is just sad from a craft perspective.

Software that’s well designed and architected is a pleasure to read and write, even if a lower quality version would get the job done. I’m watching one of the things I love most in the world become more automated and having the craftsmanship stripped out of it. That’s a bit over dramatic from me, but it’s been sad to watch.

hjnilsson 8 days ago

It’s probably the same way monks copying books felt when the printing press came along. “Look at this mechanical, low-quality copy. It lacks all finesse and flourish of the pen!”

I agree with you that it is sad. And what is especially sad is that the result will probably be lower quality overall, but much cheaper. It’s the inevitable result of automation.

necovek 6 days ago

Many things have become higher quality with automation. Eg. consider CNC machines, metal machining etc.

deergomoo 8 days ago

I feel exactly the same way, it’s profoundly depressing.

Aperocky 8 days ago

Having seen my fair share of those, they tend to work either until they don't, or you need to somehow change it.

layoric 8 days ago

Also somewhat strangely, I've found Python output has remained bad, especially for me with dataframe tasks/data analysis. For remembering matplotlib syntax I still find most of them pretty good, but for handling datagframes, very bad and extremely counter productive.

Saying that, for typed languages like TypeScript and C#, they have gotten very good. I suspect this might be related to the semantic information can be found in typed languages, and hard to follow unstructured blobs like dataframes, and there for, not well repeated by LLMs.

datadrivenangel 8 days ago

Spark especially is brutal for some reason. Even databrick's AI is bad at spark, which is very funny.

It's probably because spark is so backwards compatible with pandas, but not fully.

nottorp 8 days ago

Here's a rl example from today:

I asked $random_llm to give me code to recursively scan a directory and give me a list of file names relative to the top directory scanned and their sizes.

It gave me working code. On my test data directory it needed ... 6.8 seconds.

After 5 min of eliminating obvious inefficiencies the new code needed ... 1.4 seconds. And i didn't even read the docs for the used functions yet, just changed what seemed to generate too many filesystem calls for each file.

bongodongobob 8 days ago

Nice, sounds like it saved you some time.

nottorp 8 days ago

You "AI" enthusiasts always try to find a positive spin :)

What if I had trusted the code? It was working after all.

I'm guessing that if i asked for string manipulation code it would have done something worth posting on accidentally quadratic.

noisy_boy 8 days ago

Depends on how toxic the culture is in your workplace. This could have been an opportunity to "work" on another JIRA task showing 600% improvement over AI generated code.

nottorp 8 days ago

I'll write that down for reference in case I do ever join an organization like that in the future, thanks.

600% improvement is worth what, 3 days of billable work if it lasts 5 minutes?

noisy_boy 7 days ago

Series of such "improvements" could be fame and fortune in your team/group/vertical. In such places, the guy who toots the loudest wins the most.

nottorp 7 days ago

So THAT's why large organizations want "AI".

In such a place I should be a very loud advocate of LLMs, use them to generate 100% of my output for new tasks...

... and then "improve performance" by simply fixing all the obvious inefficiencies and brag about the 400% speedups.

Hmm. Next step: instruct the "AI" to use bubblesort.

FeepingCreature 8 days ago

> What if I had trusted the code? It was working after all.

Then you would have been done five minutes earlier? I mean, this sort of reads like a parody of microoptimization.

nottorp 8 days ago

No, it reads like "your precious AI generates first year junior code". Like the original article.

FeepingCreature 8 days ago

There is nothing wrong with first year junior code that does the job.

nottorp 8 days ago

Does not. Do you know my requirements? This is actually in a time critical path.

FeepingCreature 8 days ago

Well, that wasn't in your comment. :P

If you hadn't told me that I would also not have bothered optimizing syscalls.

Did you tell the AI the profiler results and ask for ways to make it faster?

nottorp 7 days ago

> Well, that wasn't in your comment. :P

Acting like a LLM now :P

> Did you tell the AI the profiler results and ask for ways to make it faster?

Looking for ways to turn a 10 minute job into a couple days?

FeepingCreature 7 days ago

AI actually doesn't really work for the "a couple days" scale yet. As a heavy AI user, this sort of iterative correction would usually be priced in in a 10-minute AI session. That said-

> Acting like a LLM now :P

Hey, if we're going to be like that, it sure sounds like you gave the employee an incomplete spec so you could then blame it for failing. So... at least I'm not acting like a PM :P

bongodongobob 8 days ago

Why would you blindly trust any code? Did you tell it to optimize for speed? If not, why are you surprised it didn't?

nottorp 8 days ago

So, most low level functions that enumerate the files in a directory return a structure that contains the file data from each file. Including size. You already have it in memory.

Your brilliant AI calls another low level function to get the file size on the file name. (also did worse stuff but let's not go into details).

Do you call reading the file size from the in memory structure that you already have a speed optimization? I call it common sense.

miningape 8 days ago

Yep exactly, LLMs blunder over the most simple nonsense and just leaves a mess in their wake. This isn't a mistake you could make if you actually understood what the library is doing / is returning.

It's so funny how these AI bros make excuse after excuse for glaring issues rather than just accept AI doesn't actually understand what it's doing (not even considering it's faster to just write good quality code on the first try).

nottorp 8 days ago

The "AI" are useful for one thing. I had no idea what functions to use to scan a directory in a native C++ Windows application. Nor that they introduced an abstraction in C++ 2017?. They all work the same (needless fs access should be avoided no matter the OS) but it did give me the names*.

Stuff that google search from 10 years ago would have done without pretending it's "AI". But not google search from this year.

* it wasn't able to simply list the fields of the returned structure that contained a directory entry. But since it gave me the name, i was able to look it up via plain search.

miningape 8 days ago

Yeah I find myself doing that too, use the AI to generate a bunch of names I can put into google to find a good answer. I also think if google hadn't gotten as sh*t as it has AI wouldn't be nearly as useful to most people.

bdhcuidbebe 7 days ago

> It's so funny how these AI bros make excuse after excuse for glaring issues rather than just accept AI doesn't actually understand what it's doing

Its less funny when you realize how few of these people even have experience reading and writing code.

They just see code on screen, trust the machine and proclaim victory.

johnnyanmac 8 days ago

>Why would you blindly trust any code?

because that is what the market is trying to sell?

raxxorraxor 7 days ago

In my opinion this isn't even too relevant. I am no python expert but I believe defining a logger at the top for the average one file python script is perfectly adequate or even very sensible in many scenarios. Depends on what you expect the code to do. Ok, the file is named utils.py...

Worse by far is still the ability of AI to really integrate different problems and combine them into a solution. And it also seems to depend on the language. In my opinion especially Python and JS results are often very mixxed while other languages with presumably a smaller training set might even fare better. JS seems often fine with asynchronous operation like that file check however.

Perhaps really vetting a training set would improve AIs, but it would be quite work intensive to build something like that. That would require a lot of senior devs, which is hard to come by. And then they need to agree on code quality, which might be impossible.

necovek 6 days ago

This is a logging setup being done top-level in an auxiliary module "utils": you might import it into one command and not another, and end up surprised why is one getting the logging setup and the other isn't. Or you might attempt to configure it and the import would override it.

As for getting a lot of code that was vetted by senior engineers, that's not so hard: you just have to pay for it. Basically, any company could — for a price — consider sharing their codebase for training.

byproxy 8 days ago

As an actually unseasoned Python developer, would you be so kind as to explain why the problems you see are problems and their alternatives? Particularly the first two you note.

saila 8 days ago

The call to logging.basicConfig happens at import time, which could cause issues in certain scenarios. For a one-off script, it's probably fine, but for a production app, you'd probably want to set up logging during app startup from whatever your main entry point is.

The Python standard library has a configparser module, which should be used instead of custom code. It's safer and easier than manual parsing. The standard library also has a tomllib module, which would be an even better option IMO.

cinntaile 7 days ago

Regarding your first paragraph, we still don't understand what the issue actually is.

necovek 6 days ago

Logging configuration is done at import time for "utils" module.

Imagine code like this:

main.py:

  import logging
  logging.basicConfig(...)

  logging.info("foo") # uses above config
  
  if __name__ == "__main__":
      import utils # your config is overridden with the one in utils
      logging.info("bar") # uses utils configuration
      ...
Or two "commands", one importing utils and another not: they would non-obviously use different logging configuration.

It gets even crazier: you could import utils to set the configuration, override it, but a second import would not re-set it, as module imports are cached.

Basically, don't do it and no unexpected, confusing behaviour anywhere.

bumblehean 6 days ago

As a non Python developer, what would be the use-case(s) for importing a module inside of the main function instead of importing it at the top of main.py with the others?

necovek 6 days ago

Since the entire evaluation and running is dynamic, you don't need to import (and thus evaluate) a module in certain branches.

Eg. that `if __name__` trick is used to allow a module to be both a runnable script and importable module.

Top it off with plenty of common libraries being dog-slow to import because they are doing some of the anti-pattern stuff too, and you end up executing a lot of code when you just want to import a single module.

Eg. I've seen large Python projects that take 75s just importing all the modules because they are listing imports at the top, and many are executing code during import — imagine wanting to run a simple unit test, and your test runner takes 75s just to get to the point where it can run that 0.01s test for your "quick" TDD iteration.

You can also look at Instagram's approach to solving this over at their engineering blog.

NewsaHackO 8 days ago

>to a raciness in load_json where it's checked for file existence with an if and then carrying on as if the file is certainly there...

Explain the issue with load_json to me more. From my reading it checks if the file exists, then raises an error if it does not. How is that carrying on as if the file is certainly there?

selcuka 8 days ago

There is a small amount of time between the `if` and the `with` where another process can delete the file, hence causing a race condition. Attempting to open the file and catching any exceptions raised is generally safer.

taberiand 8 days ago

Won't it throw the same "FileNotFound" exception in that case? The issue being bothering to check if it exists in the first place I suppose.

selcuka 8 days ago

Yes, but it won't log the error as it is clearly the intention of the first check.

NewsaHackO 8 days ago

OK, that does make sense. Thanks!

ManuelKiessling 8 days ago

Thanks for looking into it.

While I would have hoped for a better result, I'm not surprised. In this particular case, I really didn't care about the code at all; I cared about the end result at runtime, that is, can I create a working, stable solution that solves my problem, in a tech stack I'm not familiar with?

(While still taking care of well-structured requirements and guard rails — not to guarantee a specific level of code quality per se, but to ensure that the AI works towards my goals without the need to intervene as much as possible).

I will spin up another session where I ask it to improve the implementation, and report back.

necovek 6 days ago

I'd definitely be curious to see if another session provides higher quality code — good luck, and thanks for taking this amicably!

ManuelKiessling 5 days ago

I did another session with the sole focus being on code quality improvements.

The commit with all changes that Cursor/claude-3.7-sonnet(thinking) has done is at https://github.com/dx-tooling/platform-problem-monitoring-co....

As you can see, I've fed your feeback verbatim:

  I have received the following feedback regarding this codebase:

  "The premise might possibly be true, but as an actually seasoned Python developer, I've taken a look at one file: @utils.py. All of it smells of a (lousy) junior software engineer: from configuring root logger at the top, module level (which relies on module import caching not to be reapplied), over not using a stdlib config file parser and building one themselves, to a raciness in load_json where it's checked for file existence with an if and then carrying on as if the file is certainly there..."

  I therefore ask you to thoroughly improve the code quality of the implementation in @src   while staying in line with the requirements from @REQUIREMENTS.md, and while ensuring that the Quality Tools (see @makefile) won't fail. Also, make sure that the tests in folder @tests  don't break.

  See file @pyproject.toml for the general project setup. There is already a virtualenv at @venv.
You can watch a screen recording of the resulting Agent session at https://www.youtube.com/watch?v=zUSm1_NFKpA — I think it's an interesting watch because it nicely shows how the tool-based guard rails help the AI to keep on track and reach a "green" state eventually.

dheera 8 days ago

I disagree, I think it's absolutely astounding that they've gotten this good in such a short time, and I think we'll get better models in the near future.

By the way, prompting models properly helps a lot for generating good code. They get lazy if you don't explicitly ask for well-written code (or put that in the system prompt).

It also helps immensely to have two contexts, one that generates the code and one that reviews it (and has a different system prompt).

henrikschroder 8 days ago

> They get lazy if you don't explicitly ask for well-written code (or put that in the system prompt).

This is insane on so many levels.

globnomulous 7 days ago

Computer, enhance 15 to 23.

nunez 8 days ago

Makes sense given that so much of the training data for so many of these tools are trained on hello world examples where this kind of configuration is okay. Not like this will matter in a world where there are no juniors to replace aged-out seniors because AI was "good enough"...

gessha 7 days ago

> This is especially noteworthy because I don’t actually know Python.

> However, my broad understanding of software architecture, engineering best practices, system operations, and what makes for excellent software projects made this development process remarkably smooth.

If the seniors are going to write this sort of Python code and then talk about how knowledge and experience made it smooth or whatever, might as well hire a junior and let them learn through trials and tribulations.

Perizors 8 days ago

How do you properly configure a logger in application like that?

necovek 8 days ago

Just imagine a callsite that configured a logger in another way, and then imports the utils module for a single function: its configuration getting overridden by the one in utils.

There are plenty of ways to structure code so this does not happen, but simply "do not do anything at the top module level" will ensure you don't hit these issues.

rcfox 8 days ago

Usually you would do it in your main function, or a code path starting from there. Executing code with non-local side effects during import is generally frowned upon. Maybe it's fine for a project-local module that won't be shared, but it's a bad habit and can make it had to track down.

tracker1 7 days ago

I can say it isn't any better for JS/Node/Deno/Bun projects that I've seen or tried. About the only case it's been helpful (GitHub CoPilot) is in creating boilerplate .sql files for schema creation, and in that it became kind of auto-complete on overdrive. It still made basic missteps though.

theteapot 8 days ago

> to a raciness in load_json where it's checked for file existence with an if and then carrying on as if the file is certainly there...

It's not a race. It's just redundant. If the file does not exist at the time you actually try to access it you get the same error with slightly better error message.

necovek 6 days ago

There is a log message that won't be output in that case: whether getting a full, "native" FileNotFound exception is better is beside the point, since the goal of the code was obviously to print a custom error message.

And it's trivial to achieve the desired effect sanely:

  try:
      with open(...) ...

  except FileNotFound:
      logger.error(...)
      raise
It'd even be fewer lines of code.

theteapot 5 days ago

Or even fewer by doing it in a global exception handler instead of every time you try to open a file, since all your doing is piping the error though logger.

cess11 8 days ago

wrap_long_lines shares those characteristics:

https://github.com/dx-tooling/platform-problem-monitoring-co...

Where things are placed in the project seems rather ad hoc too. Put everything in the same place kind of architecture. A better strategy might be to separate out the I and the O of IO. Maybe someone wants SMS or group chat notifications later on, instead of shifting the numbers in filenames step11_ onwards one could then add a directory in the O part and hook it into an actual application core.

thwarted 7 days ago

> instead of shifting the numbers in filenames step11_ onwards

There are idioms used when programming in BASIC on how to number the lines so you don't end up renumbering them all the time to make an internal change. It's interesting that such idioms are potentially applicable here also.

ilrwbwrkhv 8 days ago

Yup this tracks with what I have seen as well. Most devs who use this daily are usually junior devs or javascript devs who both write sloppy questionable code.

spoonfeeder006 7 days ago

Perhaps that partly because 90% of the training data used to teach LLMs to code is made by junior engineers?

inerte 7 days ago

100%!

But the alternative would be the tool doesn't get built because the author doesn't know enough Python to even produce crappy code, or doesn't have the money to hire an awesome Python coder to do that for them.

necovek 6 days ago

If you check elsewhere in this thread, the author decided on Python to test out AI capabilities — they could have built it quickly in a language of their choice. I am sure I could have built it quickly in Python to a higher standard of quality.

Perhaps they wouldn't have built it because they did not set the time aside for it, like they did for this experiment (+ the blog post).

abid786 8 days ago

Doesn’t load_json throw if the file doesn’t exist?

isoprophlex 8 days ago

Yes but then why do the check in the first place

globnomulous 7 days ago

Thanks for doing the footwork. These TED talk blog posts always stink of phony-baloney nonsense.