undefined

points

[-]

GPT literally generates perfect code for me in languages that do not exist anywhere in its training set, so I’m not sure how you’ve achieved this level of failure.

by jofer22 hours ago|

parent|

[-]

Try working in anything domain specific outside of common CRUD patterns. E.g. scientific software development where you describe a problem + give data. I have yet to see a single example of feeding in a problem in natural language involving a specific scientific domain that wasn't pretty catastrophically incorrect.

But yeah, if you want to feed it math and get code, it's reasonably okay with that. All LLMs I've used seem bad at understanding things that don't look like broad human knowledge. I've seen this same general issue across many different models. (And to be fair, geology, geophysics, and remote sensing are what I'm testing, and their semi-rare niches.)

It's also quite dangerous because it's not obvious that what it's doing is complete hallucinations unless you actually are a domain expert. Things _sound_ reasonable. E.g. "this is likely feature X" which _does_ exist, but is absolutely _not_ relevant to the problem or present in the input dataset.

But my current employer is pushing this exact thing (human language + scientific data + LLM -> advanced analysis of scientific data by LLM -> business decisions) and it _really_ worries me. It often gives the rough equivalent of "Start the procedure by severing the patient's aorta. Once they stop moving, you can deal with the hangnail". Just in very reasonable sounding language. And a lot of people don't know any better, because most users aren't domain experts.

by llmssuck21 hours ago|

parent|

[-]

Stuff it's not directly trained on is going to be flaky and sucky. It was like that with programming at first too and it still is sometimes. It's hard to imagine this won't improve with better more focused training. They focus on improving "CRUD" for obvious reasons. The specialization era hasn't begun yet.

Your domain, while I'm sure it is very interesting and complex, if it proves economically interesting will be cracked as well.

by jofer21 hours ago|

parent|

[-]

Just for some context, the domain we're talking about is oil and gas and mineral exploration. E.g. At my previous job, I used to personally manage a >$400 million per year budget and that wasn't even considered significant. We had multiple >$10 billion per year projects ongoing. That was 10 years ago. The amounts are larger now.

The issue isn't a lack of economic interest.

It might be a lack of training data in addition to inherent complexity, but it's certainly not a lack of economic interest.

by llmssuck20 hours ago|

parent|

[-]

I have no idea how and why GenAI would be useful in your profession. I'm sure a lot of money is moved there (not sure about the profits though), but it's not clear to me how software itself is budging that needle. I suppose better algorithms and better understanding of geology will do it, but software itself seems just subservient to that goal.

I guess what I'm saying is that "domain knowledge" is taking software development for a ride here. The software is just the vehicle, the science is the engine here and I can see why companies like OpenAI start going for the low-hanging fruits first instead.

Your specific company might be profitable, but does automating "mineral exploration" give you leverage over quite literally all other domains? My guess is not. For "CRUD" it is a resounding yes, it provides gigantic leverage. Once you automate basic software development you enter a new world. 10 billion, 10 trillion, all bets are off. You automate the creation of the next iteration of automation and on we go. Let's hope it takes a while for this take off. I can't see ourselves being ready for it.

My guess is it'll take a decade or so for real AI science to start taking off though - if that soon - so you're probably fine for now.

by jofer20 hours ago|

parent|

[-]

Yes. My point was that LLMs aren't currently good for everything. The original commenter literally said they were good at everything and I offered a counterpoint of something they're not good at: Most science.

(And yes, a lot of science is software. Analysis is software.)

by woeirua20 hours ago|

parent|

prev|

[-]

Skill issue. I've seen LLMs used in this domain to get mindblowing results. You won't see it published anywhere though.... =).

by calf19 hours ago|

parent|

[-]

Disagree, someone like the other commenter who points out LLMs don't even understand the domain concepts correctly versus someone who uses it anyways for corporate proprietary results have very different standards for what is acceptable. If you wrangle an LLM with harnesses and clever prompts you could use it to get some amazing results but that has more to do with trial and error and creativity, not some kind of fundamental skill of using LLMs.

by woeirua16 hours ago|

parent|

[-]

It definitely understands the concepts well enough if you give it the right context. I'm not the only one saying this either. Like I said, it's a skill issue.

by calf13 hours ago|

parent|

[-]

That's the Clever Hans argument, and the fact that you confidently use this unfalsifiable tactic ("Give it just the right context and it understands stuff!! It works!!" (Well, until the next iteration and then the next until the system paints itself into a corner)) tells me you are engaging in broscience / pseudoscience. Like I say, anti-scientific attitudes like yours are part of the problem, fanning the hype. It's bad faith to attribute people's criticisms of LLMs as some kind of lack of skill. People on here, many who are actual scientists and professional programmers, are very intelligent and highly trained, if they wanted to play around with LLMs they very likely capable of getting impressive one-time results, but proper, sustained use in a non-"vibe-coding" manner, such as with guarantees for validity, consistency, replicability, extensibility, and so forth is a completely open problem. Therefore it is out of proportion to reduce that to human skill. It's analogous to framing a bad design pattern as user error--disingenuous and bad faith. Ironically, with an intellectual standard like that, it then becomes easy to become overconfident about LLMs.

by reachableceo17 hours ago|

parent|

prev|

[-]

Provide an example please.

I keep hearing these “I work in some hard field and the LLM isn’t any good at it”. I keep asking for examples and no one can provide them.

by combyn8tor15 hours ago|

parent|

[-]

Rust kernel development.