undefined

points

by kgeist3 hours ago |

comments

by amelius3 hours ago|

[-]

How does this work with scaling?

I assume you can then somehow run several hundreds of prompts concurrently?

by CamperBob23 hours ago|

prev|

[-]

You can get 1M context with the lukealonso NVFP4 quant on 8x RTX6000s, which remains coherent and useful through at least 400k. No real need to run 8x H200s unless you just want to. Or unless you need to serve many concurrent users or agents on a regular basis.