undefined

points

[-]

For (2): the problem is that the index would still need to keep the dead tuples, until Postgres is positively certain that no transaction holds them, and runs vacuum over them. It may speed up things a bit, but would still overflow the disk storage eventually. It may still prevent other tables from being vacuumed, too!

by cataflutter2 hours ago|

parent|

[-]

For disk usage, yes this doesn't address anything.

But for read performance (which is IMO what the section in the article was motivated by), it doesn't actually matter to have a bunch of entries corresponding to dead tuples in your index, provided Postgres doesn't need to actually consider the dead tuples as part of your query.

So if you have a monotonically increasing `job_id` and that's indexed, then so long as you process your jobs in increasing `job_id` order, you can use the index and guarantee you don't have to keep reconsidering the dead tuples corresponding to jobs that already completed (if that makes sense).

[This is because the index is a b-tree, which supports efficient (O(log n) page reads for n entries) seeking on (any prefix of) the columns in the index.]

by nine_k1 hours ago|

parent|

[-]

This is fair! This should as you descripbe work with a partial index, and with picking the lowest ID that has status = pending (via that index) which is not locked (via select ... for update skip locked). The query plan should be triple-checked though to actually use the index.