"Kamal runs blue-green deploys — it starts a new container, health-checks it, then stops the old one. During the switchover, both containers are running. Both mount ultrathink_storage. Both have the SQLite files open."
WAL mode requires shared access to System V IPC mapped memory. This is unlikely to work across containers.
In case anybody needs a refresher:
https://en.wikipedia.org/wiki/Shared_memory
https://en.wikipedia.org/wiki/CB_UNIX
https://www.ibm.com/docs/en/aix/7.1.0?topic=operations-syste...
I think you're exactly right about the WAL shared memory not crossing the container boundary. EDIT: It looks like WAL works fine across Docker boundaries, see https://news.ycombinator.com/item?id=47637353#47677163
I don't know much about Kamal but I'd look into ways of "pausing" traffic during a deploy - the trick where a proxy pretends that a request is taking another second to finish when it's actually held in the proxy while the two containers switch over.
From https://kamal-deploy.org/docs/upgrading/proxy-changes/ it looks like Kamal 2's new proxy doesn't have this yet, they list "Pausing requests" as "coming soon".
The easiest approach is to kill sqlite, then start the new one. I’d use a unix lockfile as a last-resort mechanism (assuming the container environment doesn’t somehow break those).
I don't, fwiw (so long as all containers are bind mounting the same underlying fs).
Could the two containers in the OP have been running on separate filesystems, perhaps?
Although my tests were slamming the db with reads and write I didn't induce a bad read or write using WAL.
But I wouldn't use experimental results to override what the sqlite people are saying. I (and you) probably just didn't happen to hit the right access pattern.
The containers would need to use a path on a shared FS to setup the SHM handle, and, even then, this sounds like the sort of thing you could probably break via arcane misconfiguration.
I agree shm should work in principle though.
> The wal-index is implemented using an ordinary file that is mmapped for robustness. Early (pre-release) implementations of WAL mode stored the wal-index in volatile shared-memory, such as files created in /dev/shm on Linux or /tmp on other unix systems. The problem with that approach is that processes with a different root directory (changed via chroot) will see different files and hence use different shared memory areas, leading to database corruption. Other methods for creating nameless shared memory blocks are not portable across the various flavors of unix. And we could not find any method to create nameless shared memory blocks on windows. The only way we have found to guarantee that all processes accessing the same database file use the same shared memory is to create the shared memory by mmapping a file in the same directory as the database itself.
That would eliminate the need for shared memory.
See more: https://sqlite.org/wal.html#concurrency
Incorrect. It requires access to mmap()
"The wal-index is implemented using an ordinary file that is mmapped for robustness. Early (pre-release) implementations of WAL mode stored the wal-index in volatile shared-memory, such as files created in /dev/shm on Linux or /tmp on other unix systems. The problem with that approach is that processes with a different root directory (changed via chroot) will see different files and hence use different shared memory areas, leading to database corruption."
> This is unlikely to work across containers.
I'd imagine sqlite code would fail if that was the case; in case of k8s at least mounting same storage to 2 containers in most configurations causes K8S to co-locate both pods on same node so it should be fine.
It is far more likely they just fucked up the code and lost data that way...
Why not?
Some that I used that are gone... Ultrix (MIPS), Clix, Irix, SunOS 4, SCO OpenServer, TI System V.
I did hold a copy in my hands for 486-class machines in the college bookstore.
Doctor: simply do not do that
Patient: but doctor,
"The constraint is real: one server, and careful deploy pacing."
Another strong LLM smell, "The <X> is real", nicely bookends an obviously generated blog-post.
Yikes. Thank you I'm not going to read “Lessons learned” by someone this careless.
The Meta dev model of diff reviews merge into main (rebase style) after automated tests run is pretty good.
Also, staging and canary, gradual, exponential prod deployment/rollback approaches help derisk change too.
Finally, have real, tested backups and restore processes (not replicated copies) and ability to rollback.