Similarly with alpha Go they claimed to do it "to advance go" and help go community, but they played Lee se dol, released few curated self play games, collected publicity and abandoned go with no artifacts like source or weights.
But in hindsight their paper turned out to be almost 100% reproducible and resulted in super-human open-source alternative less than a year later.
So the story might repeat here. And they will achieve started goal without releasing anything
If you look at the frenzy of activity that happened after midjourney became accessible, that was awesome for everyone. Midjourney probably got help running their model efficiently and a ton of progress was quickly made.
I'm pretty sympathetic to a company doing a windowing strategy: prepare the API as a sort of beta release timed with the announcement. Spend some time cleaning up the code for public release (at Google this means ripping out internal dependencies that aren't open source), and then release a reference inference implementation along with the weights.
That's pretty reasonable. I wanted to push back on this idea that "the reason Google isn't dropping model + weights is because the corporate screws are coming down hard"
Google isn't waiting to release the weights so that they can profit from this. It's essentially the first step in the process, and serving via API gives them valuable usage data they they might not get if/when it's open sourced
But I can’t use this at all at work (a pharma company) because it would leak confidential information. So anything they learn from usage data is systematically excluding (the vast majority of?) people working on therapeutics.
But you couldn't use it for work anyway because usage is non commercial. So you need to pay them to change the license anyway.
It might give them a bit, but AFAIK most institutions (especially non-American ones) aren't exactly overly happy about using closed American APIs in order to do science, especially not because API usage isn't reproducible.
Sure, they might be able to play around with some toy data, but for Google to actually get valuable usage data, then they need to let people actually use the thing for real things, and then you cannot gate it behind a API, it isn't feasible in a real-world environment.
I think companies in the space should either totally open source or not publish at all.
I can see publishing like this as achieving one (or more) of a several objectives:
1. Marketing software to for sales / licensing
2. Marketing startup to investors
3. Crowdsourcing use cases or product features from academia
Now here are the problems with those:
1. Selling software (exclusively) to drug companies is a terrible business model. Very low ceiling there. You can make more from one drug.
2. Indicates company focus is producing models and not drugs. See point one.
3. Computational labs want to release open source, so not viable to build on restricted tooling. Experimental labs may just be using to algo-wash prior hypotheses / biases.
Now weigh against disadvantage of letting competitors know what you are working on, how far you have progressed, as well as your methods.
I’d argue that the product providing some monetary value for Google will help ensure that this team doesn’t get moved some more profitable project instead. That way they can continue improving this tool and make more tools like it in the future.