undefined

points

by jabroni_salad1 days ago |

comments

by nicce23 hours ago|

[-]

> Gemma is open source and apache 2.0 licensed

Closed source but open weight. Let’s not ruin the definition of the term in advantage of big companies.

by zackangelo23 hours ago|

parent|

[-]

Your reply adds more confusion, imo.

The inference code and model architecture IS open source[0] and there are many other high quality open source implementations of the model (in many cases contributed by Google engineers[1]). To your point: they do not publish the data used to train the model so you can't re-create it from scratch.

[0] https://github.com/google-deepmind/gemma [1] https://github.com/vllm-project/vllm/pull/2964

by candiddevmike23 hours ago|

parent|

[-]

If for some reason you had the training data, is it even possible to create an exact (possibly same hash?) copy of the model? Seems like there are a lot of other pieces missing like the training harness, hardware it was trained on, etc?

by OneDeuxTriSeiGo22 hours ago|

parent|

[-]

to be entirely fair that's quite a high bar even for most "traditional" open source.

And even if you had the same data, there's no guarantee the random perturbations during training are driven by a PRNG and done in a way that is reproducible.

Reproducibility does not make something open source. Reproducibility doesn't even necessarily make something free software (under the GNU interpretation). I mean hell, most docker containers aren't even hash-reproducible.

by zackangelo20 hours ago|

parent|

prev|

[-]

Yes, this is true. A lot of times labs will hold back necessary infrastructure pieces that allow them to train huge models reliably and on a practical time scale. For example, many have custom alternatives to Nvidia’s NCCL library to do fast distributed matrix math.

Deepseek published a lot of their work in this area earlier this year and as a result the barrier isn’t as high as it used to be.

by nicce23 hours ago|

parent|

prev|

[-]

I am not sure if this adds even more confusion. Linked library is about fine-tuning which is completely different process.

Their publications about producing Gemma is not accurate enough that even with data you would get the same results.

by zackangelo20 hours ago|

parent|

[-]

In the README of the linked library they have a code snippet showing how to have a conversation with the model.

Also, even if it were for fine tuning, that would require an implementation of the model’s forward pass (which is all that’s necessary to run it).

by nicce19 hours ago|

parent|

[-]

That is completely different discussion. Otherwise, even Gemini 2.5 Pro would be open-source with this logic since clients are open-source for interacting with the cloud APIs.

by Imustaskforhelp22 hours ago|

parent|

prev|

[-]

Yes!! But I doubt how many are truly truly open source models since most just confuse open source with open weights and the definition has been changed really smh.

by cesarb22 hours ago|

prev|

[-]

> Gemma is open source and apache 2.0 licensed.

Are you sure? On a quick look, it appears to use its own bespoke license, not the Apache 2.0 license. And that license appears to have field of use restrictions, which means it would not be classified as an open source license according to the common definitions (OSI, DFSG, FSF).

by yencabulator50 minutes ago|

parent|

[-]

Wait, what files are you reading? https://github.com/google-deepmind/gemma/blob/main/LICENSE

(Even then, releasing some source code under Apache-2 does not make a model "open source".)

Ah I found https://ai.google.dev/gemma/terms

  > You must not use any of the Gemma Services:
  >
  > 1. for the restricted uses set forth in the Gemma Prohibited Use Policy at ai.google.dev/gemma prohibited_use_policy ("Prohibited Use Policy"), which is hereby incorporated by reference into this Agreement; or
  > 2. in violation of applicable laws and regulations.

https://ai.google.dev/gemma/prohibited_use_policy

Yeah, definitely not open source, even if they had released all the training data.

by jabroni_salad1 hours ago|

parent|

prev|

[-]

Perhaps we could rephrase my statement to "there are a bunch of green checkmarks on github that may or may not mean anything depending on who you ask."