undefined

points

[-]

I’m surprised (unless they replaced the core tcmalloc algorithm but kept the name).

tcmalloc (thread caching malloc) assumes memory allocations have good thread locality. This is often a double win (less false sharing of cache lines, and most allocations hit thread-local data structures in the allocator).

Multithreaded async systems destroy that locality, so it constantly has to run through the exception case: A allocated a buffer, went async, the request wakes up on thread B, which frees the buffer, and has to synchronize with A to give it back.

Are you using async rust, or sync rust?

by skavi1 hours ago|

parent|

[-]

modern tcmalloc uses per CPU caches via rseq [0]. We use async rust with multithreaded tokio executors (sometimes multiple in the same application). so relatively high thread counts.

[0]: https://github.com/google/tcmalloc/blob/master/docs/design.m...

by usrnm32 minutes ago|

parent|

[-]

How do you control which CPU your task resumes on? If you don't then it's still the same problem described above, no?

by ComputerGuru1 hours ago|

prev|

[-]

That’s a considerable regression for mimalloc between 2.1 and 2.2 – did you track it down or report it upstream?

Edit: I see mimalloc v3 is out – I missed that! That probably moots this discussion altogether.

by skavi1 hours ago|

parent|

[-]

nope.

by codexon24 minutes ago|

prev|

[-]

This is similar to what I experienced when I tested mimalloc many years ago. If it was faster, it wasn't faster by much, and had pretty bad worst cases.