The abstract and method sections only mention updating the SSM state during "sleep" (ie the same vectors that change after each token in stock Mamba) not any of the actual weight matrices. AFAICT this is just another attention compaction paper, with misleading tile? It is not very clearly written
No, they're actually training weights based on context before compaction. Context is context, this is splitting the model into persistent weights and malleable ones which are periodically updated.