upvote
It's really fascinating to read this, since I've encountered similar memory issues in other languages (ruby, go, etc.). Debugging these issues is a pain.

Is there a way to make all this much easier to debug and to prevent memory issues in the first place? Is the abstraction level not quite right?

reply
So with CPython's reference counting, if you're good at not building strong cycles, you really can avoid garbage pressure. It's not even that complicated, it's mostly a question of making a weak reference _somewhere_ along the chain. Often the ergonomics are not great, but Python @property's are nice here.

So for example

class Request

class Session

request.session exists, and the session is "part" of the request. but session.request often exists as a facility. That's a reference cycle which prevents the request (and anything it's pointed at!) from being deallocated at the end of a request.

But in this case, you could easily do something like:

session._request = weakref.ref(request) # on session creation

and then have session.request call session._request() (and maybe assert session._request() is not None if you want to be certain). If you're confident that the session is a "child" of the request, and that you would _never_ have a hold of the session after the request is done, this is a cheap trick that makes session.request cost a little bit more but not much.

I think most Python libraries just don't do memory perf analyses here, and also "believe" in the garbage collector. When GC runs, both request and session will get deallocated, after all! But the long term effects of everyone relying on the GC are that GC is expensive when it doesn't need to be, and when looking through memory you just have more stuff to dig through

reply