Closed source but open weight. Let’s not ruin the definition of the term in advantage of big companies.
The inference code and model architecture IS open source[0] and there are many other high quality open source implementations of the model (in many cases contributed by Google engineers[1]). To your point: they do not publish the data used to train the model so you can't re-create it from scratch.
[0] https://github.com/google-deepmind/gemma [1] https://github.com/vllm-project/vllm/pull/2964
And even if you had the same data, there's no guarantee the random perturbations during training are driven by a PRNG and done in a way that is reproducible.
Reproducibility does not make something open source. Reproducibility doesn't even necessarily make something free software (under the GNU interpretation). I mean hell, most docker containers aren't even hash-reproducible.
Deepseek published a lot of their work in this area earlier this year and as a result the barrier isn’t as high as it used to be.
Their publications about producing Gemma is not accurate enough that even with data you would get the same results.
Also, even if it were for fine tuning, that would require an implementation of the model’s forward pass (which is all that’s necessary to run it).
Are you sure? On a quick look, it appears to use its own bespoke license, not the Apache 2.0 license. And that license appears to have field of use restrictions, which means it would not be classified as an open source license according to the common definitions (OSI, DFSG, FSF).
(Even then, releasing some source code under Apache-2 does not make a model "open source".)
Ah I found https://ai.google.dev/gemma/terms
> You must not use any of the Gemma Services:
>
> 1. for the restricted uses set forth in the Gemma Prohibited Use Policy at ai.google.dev/gemma prohibited_use_policy ("Prohibited Use Policy"), which is hereby incorporated by reference into this Agreement; or
> 2. in violation of applicable laws and regulations.
https://ai.google.dev/gemma/prohibited_use_policyYeah, definitely not open source, even if they had released all the training data.