undefined

points

[-]

Gemini seems to be an expert in mistaking its own terrible suggestions as written by you, if you keep going instead of pruning the context

by benhurmarcel7 hours ago|

parent|

[-]

In Gemini chat I find that you should avoid continuing a conversation if its answer was wrong or had a big shortcoming. It's better to edit the previous prompt so that it comes up with a better answer in the first place, instead of sending a new message.

by WarmWash7 hours ago|

parent|

[-]

The key with gemini is to migrate to a new chat once it makes a single dumb mistake. It's a very strong model, but once it steps in the mud, you'll lose your mind trying to recover it.

Delete the bad response, ask it for a summary or to update [context].md, then start a new instance.

by wildrhythms8 hours ago|

parent|

prev|

[-]

After just a handful of prompts everything breaks down

by jwrallie10 hours ago|

prev|

[-]

I think it’s good to play with smaller models to have a grasp of these kind of problems, since they happen more often and are much less subtle.

by ehnto8 hours ago|

parent|

[-]

Totally agree, these kinds of problems are really common in smaller models, and you build an intuition for when they're likely to happen.

The same issues are still happening in frontier models. Especially in long contexts or in the edges of the models training data.

by throw31082210 hours ago|

prev|

[-]

Makes me wonder if during training LLMs are asked to tell whether they've written something themselves or not. Should be quite easy: ask the LLM to produce many continuations of a prompt, then mix them with many other produced by humans, and then ask the LLM to tell them apart. This should be possible by introspecting on the hidden layers and comparing with the provided continuation. I believe Anthropic has already demonstrated that the models have already partially developed this capability, but should be trivial and useful to train it.

by 8organicbits8 hours ago|

parent|

[-]

Isn't that something different? If I prompt an LLM to identify the speaker, that's different from keeping track of speaker while processing a different prompt.

by j-bos10 hours ago|

prev|

[-]

At work where LLM based tooling is being pushed haaard, I'm amazed every day that developers don't know, let alone second nature intuit, this and other emergent behavior of LLMs. But seeing that lack here on hn with an article on the frontpage boggles my mind. The future really is unevenly distributed.

by sixhobbits11 hours ago|

prev|

[-]

author here, interesting to hear, I generally start a new chat for each interaction so I've never noticed this in the chat interfaces, and only with Claude using claude code, but I guess my sessions there do get much longer, so maybe I'm wrong that it's a harness bug

by kayodelycaon7 hours ago|

parent|

[-]

I’ve done long conversations with ChatGPT and it really does start losing context fast. You have to keep correcting it and refeeding instructions.

It seems to degenerate into the same patterns. It’s like context blurs and it begins to value training data more than context.

by 11 hours ago|

prev|

[-]

deleted

by scotty799 hours ago|

prev|

[-]

It makes sense. It's all probabilistic and it all gets fuzzy when garbage in context accumulates. User messages or system prompt got through the same network of math as model thinking and responses.

by throwaway6137468 hours ago|

prev|

[-]

[dead]