undefined

points

[-]

This seems closely related to the problem of model collapse [1][2][3], where LLMs lose the tails of the distribution, and so when you recursively train on the output of an LLM, or otherwise feed the output back into the input in subsequent stages, you lose the precision and diversity that human authors bring to the work. Eventually everything regresses to the mean and anything that would've made the content unique, useful, and differentiated gets lost.

My takeaway from this is that AI is a temporary phenomena, the end stage of the Internet age. It's going to destroy the Internet as we know it as well as much of the technological knowledge of the developed world, and then we're going to have to start fresh and rebuild everything we know. My takeaway is that I'm trying to use AI to identify and download the remaining sources of facts on the Internet, the human-authored stuff that isn't generated for engagement but comes from the era when people were just putting useful stuff online to share information.

[1] https://en.wikipedia.org/wiki/Model_collapse

[2] https://www.nature.com/articles/s41586-024-07566-y

[3] https://cacm.acm.org/blogcacm/model-collapse-is-already-happ...

by davebren16 hours ago|

parent|

[-]

Yep humans and civilization are subject to the same model-collapse phenomenon as they interact more with LLMs, but engineering knowledge has always been held by a small minority with certain personality characteristics. Maybe the minority will get smaller but I'm not sure it will completely disappear. There's always people like yourself building archives.

by lukebuehler4 hours ago|

parent|

[-]

See A Canticle for Leibowitz

by the847217 hours ago|

parent|

prev|

[-]

There are plenty of AIs that are immune to this because they're trained on something that won't be flooded with slop. E.g. robotics, self-driving cars (both trained on real camera/sensor inputs) or programming/proof-assistant stuff (trained on things that are verifiable).

by fiddlerwoaroof18 hours ago|

prev|

[-]

My experience mostly matches this: I think of a piece of development work having three phases:

1. Prototype 2. Initial production implementation 3. Hardening

My experience with LLMs is that they solve “writer’s block” problems in the prototyping phase at the expense of making phases 2+3 slower because the system is less in your head. They also have a mixed effect on ongoing maintenance: small tasks are easier but you lose some of the feel of the system.

by isityettime18 hours ago|

parent|

[-]

I completely agree with all of these observations.

And indeed for me, the biggest productivity boost has nothing to do with my "typing speed" or any such nonsense, it's that it can help with writer's block and other kinds of unhelpful inertia.

It kind of reminds me of ADHD medication: it alleviates the "inability to direct attention at one thing" problem, but actually exacerbates the "time blindness" and "hyperfocus" problems.

I think probably a lot of complex tools have these characteristics: useful in some ways, liable to backfire in others, and ultimately context-sensitive (and maybe somewhat unpredictable) in their helpfulness.

Hopefully as LLMs are more widely experimented with by developers, the conversation can continue to move away from thinking about the effects of LLM use in terms of some uniform/fungible "productivity" and towards understanding where it hurts and where it helps, how to tell when it's time to put it away, what kinds of codebases are really hurt by that kind of detached engagement, and what kinds of projects leverage that sort of rapid prototyping the most effectively.

Plausible text generation is an almost magical trick, whether it's generating human language or computer code. But it turns out it's not a silver bullet, no matter how impressive the trick is. It's more interesting than a silver bullet, in fact: it's a system of surprising tradeoffs, even for different phases of the same overall task.

by nostrademons17 hours ago|

parent|

prev|

[-]

Usually you'll iterate several times on #1, which is where LLMs are really helpful. They let you get working code from stage #1 quite quickly, so you can check the output and behavior, and then oftentimes you'll find that you framed the problem incorrectly in the first place. Then you can fix your problem definition, have the LLM rewrite the code, try it again, and so on, until you get the results you want.

#1 -> #2 is a gap, but it also helps if you ask the LLM to explain its thinking and generate a human-readable design-doc of the approach it took and code organization it used. Then you read the design doc to gain the context, and pick up with #2.

by majormajor19 hours ago|

prev|

[-]

Yeah, a lot of "it doesn't matter how the code looks" convos seem to be ignoring that we know what happens over time when you just make tactical the-tests-still-pass changes over and over and over again. Slowly some of those tests get corrupted without noticing. And you never had the ENTIRE spec (and all the edge-case but user-relied-on-things) covered anyway. And then new dev gets way harder.

by originalvichy18 hours ago|

prev|

[-]

This is definitely most annoying when dealing with software or standards with slightly illogical or hard to grasp cases. Recently, I worked on one of the software community's favourite spaces, timezones, and kept getting myself and my LLM context polluted with the confusion that arises when using POSIX standard timezone notation and common human-readable formats.

This blog probably covers my exact headache [0]. In summary, "Etc/GMT+6" actually means UTC-6. I was developing a one-off helper script to massively create calendars to a web app via API, and when trying to validate my CSV+Python script's results, I kept getting confused as to when do the CSV rows have correct data and when does the web app UI have correct data. LLM probably developed the Python script in a manner that translated this on-the-fly, but my human-readable "Calendar name" column which had "Etc/GMT+6" would generate a -6 in the web app. This probably would not have been a problem with explicit locations specified, but my use case would not allow for that.

When trying to debug if something is wrong, the thinking trace was going into loops trying to figure out if the "problem" is coming from my directions, the code's bugs, or the CSV having incorrect data.

Learning: when facing problems like this, try using the well-known "notepad file" methods to track problems like this, so that if the over-eager LLM starts applying quick code fixes – although YOU were the "problem's" source – it will be easier to undo or clean up code that was added to the repository during a confusing debug session. For me, it has been difficult to separate "code generated due to more resilient improvements" vs. "code generated during debugging that sort of changed some specific step of the script".

(Do note that I am not an advanced software engineer, my practices are probably obvious to others. My repos are mainly comprised of sysadmin style shell/python helper code! :-) )

[0]https://blacksheepcode.com/posts/til_etc_timezone_is_backwar...

by isityettime18 hours ago|

parent|

[-]

> when facing problems like this, try using the well-known "notepad file" methods to track problems like this, so that if the over-eager LLM starts applying quick code fixes – although YOU were the "problem's" source – it will be easier to undo or clean up code that was added to the repository during a confusing debug session. For me, it has been difficult to separate "code generated due to more resilient improvements" vs. "code generated during debugging that sort of changed some specific step of the script".

Yeah, I have definitely hit this as well. Sometimes I've named a function or variable in a way that misuses a term or concept, or I've changed what something does without fully thinking it through. The LLM sees that code, notices an inconsistency, and makes a guess about what I meant. But because I screwed up, only I know what I really meant (or what I "should have meant"). So the LLM ends up writing a fix that breaks assumptions made in other parts of the code— assumptions that fit with my overall original mental picture, but not the misnomer the LLM got snagged on. Or it writes a small-scoped fix but the mistake of mine it stumbled upon actually merits rethinking and redesigning how some parts interact, so even if its fix is better than what I had before, I want to unwind that change so I can redefine my interfaces or whatever.

That's definitely worth calling out: it's not only the LLM's mistakes that make it more likely to commit future mistakes. Any mistakes in the codebase can compound like that. If you want an LLM to do useful work for you, it's more relevant than ever to "tidy first".

by 18 hours ago|

prev|

[-]

deleted