undefined

points

[-]

Incrementing or decrementing a shared counter is done with an atomic instruction, not with a locked critical section.

This has negligible overhead in most cases. For instance, if the shared counter is already in some cache memory the overhead is smaller than a normal non-atomic access to the main memory. The intrinsic overhead of an atomic instruction is typically about the same as that of a simple memory access to data that is stored in the L3 cache memory, e.g. of the order of 10 nanoseconds at most.

Moreover, many memory allocators use separate per-core memory heaps, so they avoid any accesses to shared memory that need atomic instructions or locking, except in the rare occasions when they interact with the operating system.

by usrnm8 hours ago|

parent|

[-]

Atomic operations, especially RMW operations are very expensive, though. Not as expensive as a syscall, of course, but still a lot more expensive than non-atomic ones. Exactly because they break things like caches

by cogman107 hours ago|

parent|

[-]

Not only that, they write back to main memory. There's limited bandwidth between the CPU and main memory and with multithreading you are looking at pretty significantly increasing the amount of data transferred between the CPU and memory.

This is such a problem that the JVM gives threads their own allocation pools to write to before flushing back to the main heap. All to reduce the number of atomic writes to the pointer tracking memory in the heap.

by zozbot2348 hours ago|

prev|

[-]

If you use an ownership/lifetime system under the hood you only pay that synchronization overhead when ownership truly changes, i.e. when a reference is added or removed that might actually impact the object's lifecycle. That's a rare case with most uses of reference counting; most of the time you're creating a "sub"-reference and its lifetime is strictly bounded by some existing owning reference.

by cogman106 hours ago|

parent|

[-]

There are 2 unavoidable atomic updates for RC, the allocation and the free event. That alone will significantly increase the amount of traffic per thread back to main memory.

A lifetime system could possibly eliminate those, but it'd be hard to add to the JVM at this point. The JVM sort of has it in terms of escape analysis, but that's notoriously easy to defeat with pretty typical java code.

by ridiculous_fish2 hours ago|

parent|

[-]

Why would an allocation require an atomic write for a reference count?

Swift routinely optimizes out reference count traffic.

by cogman106 minutes ago|

parent|

[-]

> Why would an allocation require an atomic write for a reference count?

It won't always require it, but it usually will because you have to ensure the memory containing the reference count is correctly set before handing off a pointer to the item. This has to be done almost first thing in the construction of the item.

It's not impossible that a smart compiler could see and remove that initialization and destruction if it can determine that the item never escapes the current scope. But if it does escape it by, for example, being added to a list or returned from a function, then those two atomic writes are required.

by gwbas1c5 hours ago|

prev|

[-]

That's why Rust has Rc<> for single-threaded structs, and Arc<> for thread-safe structs.