modern tcmalloc uses per CPU caches via rseq [0]. We use async rust with multithreaded tokio executors (sometimes multiple in the same application). so relatively high thread counts.
[0]: https://github.com/google/tcmalloc/blob/master/docs/design.m...