undefined

points

by irishcoffee15 hours ago |

comments

by AlexCoventry14 hours ago|

[-]

Bandwidth is the killer, in distributed LLM training.

by irishcoffee13 hours ago|

parent|

[-]

What’s the rush?

by codebje12 hours ago|

parent|

[-]

It depends on the purpose for the model. AFAIK LLMs aren't particularly capable at researching answers, relying more on having 'truth' baked in to their weights, so if it takes 12 months to train up a crowd-trained LLM it'll be 12 months behind the times.

How serious a risk is poisoned weights?

Can we leverage the cryptobros into using LLM training as a proof of work?

by MarsIronPI10 hours ago|

parent|

[-]

What? I use Qwen 3.5 35B-A3B and it definitely knows how and when to do web searches to fill in gaps in its knowledge.

by codebje8 hours ago|

parent|

[-]

Does Qwen3.5 know it needs to do this because the API in question has had loads of churn and much of its training data is on obsolete versions, or do you need to prompt it? How well does it handle having an API reference with sample code in its context window?

Having an LLM use a web search tool isn't the same thing as researching a topic, IMO, because it's so ephemeral and needs constant reinforcement. LLMs aren't learning machines, they're static ones.

by irishcoffee2 hours ago|

parent|

[-]

How many facts change over time to create obsolete data? Unless you’re researching current events, I contend it’s a moot point.