undefined

upvote

points

by minimaltom13 hours ago |

upvote

by yorwba5 hours ago|

[-]

What is classic about "skip updating parameters with high gradient/loss variance in multiple batches/samples"? Do you have a particular algorithm in mind that uses this heuristic?

reply