There are some fine points to the O(heap size), for example it's clearly unnecessary for the GC to scan objects that do not themselves contain pointers, and work is somewhat proportional to the total number of objects. Combining numerous small objects into manually managed slices, coming up with ways to make the most numerous items pointer-free, etc.
I learned a bit about this when an analytics workload I had ended up with unacceptable pauses (I think over 1 second), Go's GC is more sophisticated now but I think in any GC runtime you have the same considerations to some degree. Some of the best writing at the time was by Gil Tene, one of the principal authors of the C4 concurrent collector at Azul Systems, starting point here:
https://groups.google.com/g/golang-dev/c/GvA0DaCI2BU/m/SmEel...