undefined

new past comments ask show jobs

upvote

points

by wmf2 hours ago |

upvote

by cyanydeez17 minutes ago|

[-]

not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.

reply