undefined

upvote

points

by p-e-w7 hours ago |

upvote

by aDyslecticCrow12 minutes ago|

[-]

> character counting

The models now whaste a vast amount of useless neurons memorising the character count the entire English language so that people can ask how many r's are in strawberry and check a tickbox in a benchmark.

The architecture cannot efficiently or consistently represent counting letters in words. We should never have forced trained them to do it.

This goes for other more important "skills" that are unsuited to tranformer models.

Most models can now do decent arithmetics. But if you knew how it has encoded that ability in its neurons then you would never ever ever ever trust any arithmetic it ever outputs, even in seems to "know" it (unless it called a calculator MCP to achieve it).

There are fundamental limitations, but we're currently brute forcing ourselves through problems we could trivially solve with a different tool.

reply

upvote

by dijit7 hours ago|

[-]

Character counting remains a huge issue without tools.

Are you using only frontier models that are gated behind openai/anthropic/google APIs? Those use tools to help them out behind the scenes. It remains no less impressive, but I think we should be clear.

reply

upvote

by coldtea5 hours ago|

[-]

>People keep throwing this phrase around in relation to LLMs, when not a single “fundamental limitation” has been rigorously demonstrated to exist

Some limitations are not rigorously demonstrated to be fundamental, but continuously present from the first early LLMs yes. Shouldn't the burden of proof be on those who say it can be done?

And some limitations are fundamental, and have been rigorously demonstrated, e.g.:

https://arxiv.org/abs/2401.11817?utm_source=chatgpt.com

reply

upvote

by p-e-w5 hours ago|

[-]

That paper’s abstract doesn’t carry its title, to put it mildly.

reply

upvote

by coldtea3 hours ago|

[-]

What part of "Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all the computable functions and will therefore inevitably hallucinate if used as general problem solvers. " doesn't carry the title, to ask mildly?

reply

upvote

by p-e-w1 hours ago|

[-]

I don’t agree with that definition of “hallucination”, for starters.

reply

upvote

by gus_massa44 minutes ago|

[-]

[dead]

reply

upvote

by girvo4 hours ago|

[-]

The literal best public models still fail to count characters consistently in practice so I’m not sure what you mean. It’s literally a problem we’re still trying to solve at work

reply

upvote

by outofpaper3 hours ago|

[-]

What's amazing is that they even can fairly reliably appear to count characters. I mean we're talking about systems that infer sequences not character counters or calculators. They are amazing in unrelated ways and we need to accept this so we can use them effectively.

reply

upvote

by jameshart21 minutes ago|

[-]

I suspect character counting - counting small numbers in general in fact - is something that multimodal models will gradually learn through their visual capabilities. We have generative systems that are capable of generating an image of the word ‘strawberry’, and of counting how many strawberries are visible in an image; seems likely it’s possible for an LLM to ‘imagine’ what the word strawberry looks like and count the ‘Rs’ it can ‘see’.

reply

upvote

by girvo2 hours ago|

[-]

Of course, they’re shockingly powerful, just in an incredibly “spiky” way

reply

upvote

by 3form4 hours ago|

[-]

Is character counting actually not an issue anymore? Do you know somewhere where I can read more about this?

reply

upvote

by mrob3 hours ago|

[-]

Character counting errors are a side effect of tokenization, which is a performance optimization. If we scaled the hardware big enough we could train on raw bytes and avoid it.

reply

upvote

by teiferer1 hours ago|

[-]

No, tokenization is not the only reason. A next-word predictor has fundamentally a hard time executing algorithms, even as simple as counting.

reply

upvote

by 3form3 hours ago|

[-]

Your comment, after removing the particulars, has a shape of:

People have an <opinion> which hasn't been rigorously proven, while <not rigorously proven counteropinion>.

As such, I am not sure what you're trying to achieve here.

reply

upvote

by raincole3 hours ago|

[-]

Drawing five fingered humans was a fundamental limitation... until it's not.

reply

upvote

by danpalmer6 hours ago|

[-]

This is kind of my point, we need to get better at describing the limitations and study them. It seems extremely clear that there are limitations, and not just temporary ones, but structural limitations that existed at the beginning and continue to persist.

reply

upvote

by ijidak3 hours ago|

[-]

Yeah I think it was the word "fundamental" he took issue with.

reply

upvote

by Marazan5 hours ago|

[-]

If you remove the auxiliary tools and just leave the core LLM then strawberry still has an undefined number of `r`s in it.

reply

upvote

by p-e-w5 hours ago|

[-]

That’s false. Larger LLMs learn token decompositions through their training, and in fact modern training pipelines are designed to occasionally produce uncommon tokenizations (including splitting words into individual characters) for this reason. Frontier models have no trouble spelling words even without tools. Even many mid-sized models can do that.

reply

upvote

by kilpikaarna4 hours ago|

[-]

Wait, where can I learn more about this? I don't doubt that varying the tokenization during training improves results, but how does/would that enable token introspection?

reply

upvote

by p-e-w1 hours ago|

[-]

Because LLMs can learn that different token sequences represent the same character sequence from training context. Just like they learn much more complex patterns from context.

You can try this out locally with any mid-sized current-gen LLM. You’ll find that it can spell out most atomic tokens from its input just fine. It simply learned to do so.

reply

upvote

by rimliu7 hours ago|

[-]

of course, if you choose to ignore all the limitations they indeed have no limitations.

reply

upvote

by mkbosmans7 hours ago|

[-]

Nobody says they have no limitations. The question is are those limitation fundamental, i.e. can we expect improvement, say within a year.

reply

upvote

by danpalmer6 hours ago|

[-]

When I talk about fundamental limitations, I mean limitations that can't be solved, even if they could be improved.

We have improved hallucinations significantly, and yet it seems clear that they are inherent to the technology and so will always exist to some extent.

reply

upvote

by p-e-w5 hours ago|

[-]

“Seems clear” based on what?

reply

upvote

by pegasus4 hours ago|

[-]

For one, based on continuously frustrated hopes (and promises!) that hallucinations will go away.

reply

upvote

by coldtea5 hours ago|

[-]

As a general architecture, an LLM also has limitations that can't be improved unless we switch to another, fundamentally different AI design that's non LLM based.

There are also limitations due to maths and/or physics that aren't fixable under any design. Outside science fiction, there is no technology whose limitations are all fixable.

Here's one: https://arxiv.org/abs/2401.11817?utm_source=chatgpt.com

reply