undefined

points

[-]

They don't need to provide any details at all. They just need to give people access to their model and charge them for it. That they don't do that and instead pay for external evaluations indicates that they believe people would be unimpressed if they could access the model directly. The only purpose of this press release seems to be making investors give them more money.

by giancarlostoro9 minutes ago|

parent|

[-]

Could also be they don't have enough funding to sustain that many users, or even the infrastructure lined up.

by famouswaffles4 hours ago|

prev|

[-]

Business wise, it would make sense to hold off on details till they're at least ready to serve. Look at what happened with Open AI and reasoning models. Everyone struggled with getting RL to work with LLMs for a good while. Open AI figured it out, and a few months later everyone had their prototypes out in short order. Don't forget who these labs employ. They're some of the brightest people around. Sub-q aren't really in a position for that lol. If they'd shared details at the first announcement for instance, the big labs might have had something out by now while they're still pulling resources to scale and then what ?

by cmogni12 hours ago|

parent|

[-]

I don't think it makes sense from a business perspective to hold off on details as a new lab. OpenAI will not implement new architectural changes unless they've tested the changes themselves internally. Even if someone claims some great innovation, they'd need to do scaling experiments to somewhere between the size of GPT-4 to GPT-5 before they'd decide it is worth it to implement themselves. Plenty of mechanisms that seem to work at one scale do not translate to the next.

Because the cost to OpenAI to make an architectural shift is far greater than the cost to a new lab to try something different, providing details is usually a net benefit for recruiting, building trust, getting acquired, etc. The lack of details is a poor business decision because it makes them seem untrustworthy.

I'm not advocating that they should open source their model, but there is already so much noise in the space and many bad papers that being cagey is a poor strategy for winning over talent, developers, etc.

by famouswaffles1 hours ago|

parent|

[-]

>OpenAI will not implement new architectural changes unless they've tested the changes themselves internally.

OpenAI validating it can still happen faster than they can get the compute to serve the models themselves[1]. It doesn't make a lot of sense to give out details if they want to be a serious contender or even as some have said, be acquired.

Yeah there's noise but if they have the real deal then it doesn't matter. They only thing they need to do is let people pay to use the models.

[1] I'm assuming this is the primary cause of the delay. That may not be the case of course.

by supern0va4 hours ago|

prev|

[-]

You don't understand why the thing their entire company is valued upon is...not being given away freely? They literally are taking an open source model and then adapting it with this technique. If they disclose it, the frontier labs will immediately copy it and outperform them.

My guess is that they're angling for an acquisition.

by cmogni11 hours ago|

parent|

[-]

Ahh cf my comment above. The cost of failure at scale is too high for a major to just take a new architecture/mechanism and implement it, especially because a) most claims papers make aren't rigorously tested and b) plenty of things that work at one scale do not work at the scale on which the labs operate. If they want to get acquired, then they should show that they know what they're doing. Otherwise, it looks sketchy.

by supern0va1 hours ago|

parent|

[-]

>The cost of failure at scale is too high for a major to just take a new architecture/mechanism and implement it,

Is it, though? This scrappy startup was able to take a large(-ish) open weights model and adapt it. Why can't the frontier labs do the same cost effectively?

>If they want to get acquired, then they should show that they know what they're doing.

I'm sure they would do so under an appropriate NDA as part of negotiations. I'm not sure why you think a full public disclosure is necessary.

by GenerWork3 hours ago|

parent|

prev|

[-]

>My guess is that they're angling for an acquisition.

This is what I've thought was going to happen ever since they publicized their efforts. They probably don't have the money to train large models themselves, might as well get a nice chunk of change by being acquired by someone who already has said large models running.

by giancarlostoro2 hours ago|

parent|

[-]

They probably don't have the money to run the model at reasonable scale.

by jmward013 hours ago|

prev|

[-]

Well, I know this is possible because I have built things that work just like it is promising to do. The two key technologies needed are:

- guided window attn. Predict where to attend to but in a fixed window. If you do this to just the token/vocab you can keep effectively unlimited context and perfect recall. (yes, I can do that. There is a trick to teaching it how to predict position. This also immediately opens other crazy things like NN memory)

-efficient fixed state size models. So not a recurrent mechanism because that breaks training, parallelizable like transformers, but fixed sized state instead of unbounded attn. Pick a reasonable amount of state and it is amazingly good since it doesn't need to keep separating wheat fro chaff in context (yes, it is possible to build this, I have. It works. This also opens up real streamed models. I have a true infinite context streamed model I toy with locally that I am getting to be audio/text in and audio/text out in real time.)

Put those together and you have O(1) token gen, infinite context and perfect recall. It is a whole new world of models. You can interact with a model until you have it at the state you want and then save its state and use that as if it were your system prompt. Batches pack perfectly so inference is massively more efficient. Training is massively more efficient. Transformer and unlimited attn models are a dead end. But how do you make money on this as an independent researcher? If I release the Two Weird Tricks this is all based on I get zip and the big players get even more tech for free. If I keep it all secret I get Zip and eventually the tricks will be figured out. (Yes a little frustration here) If anyone wants the model architecture of the future make me an offer :)

by jmward0154 minutes ago|

parent|

[-]

As a follow-up, I can see there is not a lot of belief which is why it is also hard to find a company to partner with on this. So, how -do- you make money on something like this as an independent researcher. Maybe I release trick one, show how guided window attn (and nn memory and probably a lot of robotics) can be trained? Thoughts? I can do that pretty quickly. By itself that is a pretty great tech (combined with fixed windows of full attn it is pretty amazing). The second trick, I think, is a bit more powerful although both are general purpose. If I do this, think people will believe trick two (and all the real time multi-modal streaming stuff)?

by bratao2 hours ago|

parent|

prev|

[-]

I´m super curious about those "Two Weird Tricks". I would like that you would release more. It remember me the MiniMax Sparse Attention https://arxiv.org/html/2606.13392v1

by jmward0147 minutes ago|

parent|

[-]

Yeah, looks like fun stuff. You still need to preserve the entire kv cache though right? So even if compute is drastically less, memory keeps growing. The system I described keeps memory constant (well, if you keep the entire token history you technically are gaining one long of data per token generated but I think we can agree that is negligible and could be capped at something high like 1B or so with no meaningful impact). I think I will probably release trick one and see if people then believe trick two even without seeing it.

by regularfry2 hours ago|

parent|

prev|

[-]

It's not quite true to say that if you release it you get nothing. If it's worthwhile and picked up by the open-weights labs, you get much bigger and better models implementing it than you would have had access to or been able to train otherwise, quicker than if they had to figure it out de novo.

by jmward012 hours ago|

parent|

[-]

Yeah. I am about to the point of just releasing it all. I love the tech. It does amazing things. But I want to move to the next big things I can see doing with it and building the custom ops to get it to work efficiently is a pain. I am positive others would run with it and make it all way better which would free me up to do more.

by eikenberry2 hours ago|

parent|

prev|

[-]

Isn't the classic way of making money off an invention is to patent it... so why not patent those "Two Weird Tricks"?

by giancarlostoro5 minutes ago|

parent|

[-]

Expensive and if someone figures out a slight different way to do it you arent really “covered” its not a unique umbrella plus you would sort of give away the secrets.