undefined

points

[-]

What would be an acceptable amount of energy to spend on something that someone has done in a different manner before? Would you rather we stick with all of the current known ways to do things.

Does this boil down to a condemnation of all scientific endeavours if they use resources?

Would it change things if the people who did it enjoyed themselves? Would they have spent more energy playing a first person shooter to get the same degree of enjoyment?

How do you make the calculation of the worth of a human endeavour? Perhaps the greater question is why are you making a calculation of the worth of a human endeavour.

by mcdeltat21 hours ago|

parent|

[-]

Ok I don't really care either way but to play devil's advocate, what exactly is this specific challenge of adding numbers with a transformer model demonstrating/advancing? The pushpack from people, albeit a little aggressive, does have a grain of truth. We're demonstrating that a model which uses preexisting addition instructions can add numbers? I mean yeah you can do it with arbitrarily few parameters because you don't need a machine learning model at all. Not exactly groundbreaking so I reckon the debate is fair.

Now if you said this proof of addition opens up some other interesting avenue of research, sure.

by Lerc20 hours ago|

parent|

[-]

>what exactly is this specific challenge of adding numbers with a transformer model demonstrating/advancing?

Well for starters, it puts the lie to the argument that a transformer can only output examples it has seen before. Performing the calculation on examples that haven't been seen demonstrates generalisation of the principles and not regurgitation.

While this misconception persists in a large number of people, counterexamples can always serve a useful purpose.

by mcdeltat18 hours ago|

parent|

[-]

Are people usually claiming that it strictly cannot produce any output it hasn't seen before? I wouldn't agree, I mean clearly they are generating some form of new content. My argument would be that while they can learn to some extent, the power of their generalisation is still tragically weak, particularly in some domains.

by qsera19 hours ago|

parent|

prev|

[-]

>it puts the lie to the argument

But it does not, right? You can either show it something, or modify the parameters in a way that resemble the result of showing it something.

You can claim that the model didn't see the thing, but that would mean nothing, because you are making the same effect with parameter tweaks indirectly.

by Lerc16 hours ago|

parent|

[-]

That's a counterargument to a different thing.

Iteratively measuring loss is a way to reconstruct values. That's trivial to show for a single value If 5 gives you a loss of 2 and 9 gives you a loss of 2 then you know the missing value is 7.

A model with enough parameters can memorise the training set in a similar manner. Technically the model hasn't seen that data by direct input either, but the mechanism provides the means to determine the what the data was. In that respect it is reasonable to say the model has seen the data.

Performing well on examples not in the training set is doing something else.

Any attempt to characterise that as having been seen before negates any distinction between taking in data and reasoning about that data.

by qsera15 hours ago|

parent|

[-]

Yea, because "seeing" is also tweaking the parameters. Which this example is doing manually.

So I don't understand how any one can make the claim that the model as not seen it. Because the internal transformation is similar.

by Lerc15 hours ago|

parent|

[-]

You are going to have to be more specific, because that reads like nonsense.

By what mechanism do you propose the model observed the test set?

by qsera15 hours ago|

parent|

[-]

>By what mechanism do you propose the model observed the test set..

By explicitly setting the model parameters.

What happens when a model is trained? We tweak the model parameters by some feed back.

In both cases, you affect the model parameters. Only the method is different. So both are eqvialent to "model observing the test set".

by Lerc11 hours ago|

parent|

[-]

I still do no see any causal link from the test set. When was this observed, how and by whom?

Are you trying to say that the person who entered the parameters had access to the test set? I find it more likely that they encoded the generalising rule than observed every instance of its use.

by qsera11 hours ago|

parent|

[-]

>I find it more likely that they encoded the generalising rule..

Look, I am saying that during training the model ends up "learning" the generalising rule from training data, but here it was explicitly entered into it, with out any training.

by coolsunglasses21 hours ago|

prev|

[-]

>Hacker News

not any more, eh?

by userbinator19 hours ago|

prev|

[-]

Because it's fun. Life is meant to be enjoyed.

Those who worry about an imaginary risk and live their lives in constant fear have turned into nothing more than machines enslaved by propaganda.

by mapontosevenths19 hours ago|

prev|

[-]

> the sheer amount of energy and associated global warming risk

I think that's one very good reason to make them more efficient, and that's part of the point of contests like this one.

by tovej15 hours ago|

parent|

[-]

Making things more efficient in a market setting just means they're used more. Which means we eventually use more resources with efficient methods, not less.

by nradov21 hours ago|

prev|

[-]

Wait until you see the quantum computer that it takes to factor the integer 15.

by thereisnospork21 hours ago|

prev|

[-]

You need to recalibrate your sense of scale if you think that this is a geologically relevant usage of energy.