undefined

[-]

Thanks for this, the anecdote with the lost data was very concerning to me.

I think you're exactly right about the WAL shared memory not crossing the container boundary. EDIT: It looks like WAL works fine across Docker boundaries, see https://news.ycombinator.com/item?id=47637353#47677163

I don't know much about Kamal but I'd look into ways of "pausing" traffic during a deploy - the trick where a proxy pretends that a request is taking another second to finish when it's actually held in the proxy while the two containers switch over.

From https://kamal-deploy.org/docs/upgrading/proxy-changes/ it looks like Kamal 2's new proxy doesn't have this yet, they list "Pausing requests" as "coming soon".

by hedora6 hours ago|

[-]

Pausing requests then running two sqlites momentarily probably won’t prevent corruption. It might make it less likely and harder to catch in testing.

The easiest approach is to kill sqlite, then start the new one. I’d use a unix lockfile as a last-resort mechanism (assuming the container environment doesn’t somehow break those).

[-]

I'm saying you pause requests, shut down one of the SQLite containers, start up the other one and un-pause.

[-]

> I think you're exactly right about the WAL shared memory not crossing the container boundary.

I don't, fwiw (so long as all containers are bind mounting the same underlying fs).

[-]

I just tried an experiment and you're right, WAL mode worked fine across two Docker containers running on the same (macOS) host: https://github.com/simonw/research/tree/main/sqlite-wal-dock...

Could the two containers in the OP have been running on separate filesystems, perhaps?

by jmull5 hours ago|

[-]

I dug into this limitation a bit around a year ago on AWS, using a sqlite db stored on an EFS volume (I think it was EFS -- relying on memory here) and lambda clients.

Although my tests were slamming the db with reads and write I didn't induce a bad read or write using WAL.

But I wouldn't use experimental results to override what the sqlite people are saying. I (and you) probably just didn't happen to hit the right access pattern.

by Retr0id4 minutes ago|

[-]

"the sqlite people" don't say anything that contradicts this

[-]

Perhaps they're using NFS or something - which would give them issues regardless of container boundaries.

by hedora6 hours ago|

https://sqlite.org/wal.html

[-]

It would explain the corruption:

The containers would need to use a path on a shared FS to setup the SHM handle, and, even then, this sounds like the sort of thing you could probably break via arcane misconfiguration.

I agree shm should work in principle though.

by PunchyHamster3 hours ago|

[-]

Not how SQLite works (any more)

> The wal-index is implemented using an ordinary file that is mmapped for robustness. Early (pre-release) implementations of WAL mode stored the wal-index in volatile shared-memory, such as files created in /dev/shm on Linux or /tmp on other unix systems. The problem with that approach is that processes with a different root directory (changed via chroot) will see different files and hence use different shared memory areas, leading to database corruption. Other methods for creating nameless shared memory blocks are not portable across the various flavors of unix. And we could not find any method to create nameless shared memory blocks on windows. The only way we have found to guarantee that all processes accessing the same database file use the same shared memory is to create the shared memory by mmapping a file in the same directory as the database itself.

by chasil6 hours ago|

[-]

You might consider taking the database(s) out of WAL mode during a migration.

That would eliminate the need for shared memory.

by gcr6 hours ago|

[-]

The SQLite documentation says in strong terms not to do this. https://sqlite.org/howtocorrupt.html#_filesystems_with_broke...

See more: https://sqlite.org/wal.html#concurrency

[-]

They tell you to use a proper FS, which is largely orthogonal to containerization.

by jmull5 hours ago|

[-]

WAL relies on shared memory, so while a proper FS is necessary, it isn't going to help in this case.

by fauigerzigerk4 hours ago|

[-]

Why does it not help if both containers can mmap the same -shm file?

by jmull2 hours ago|

[-]

Shared memory across containers is a property of a containerization environment, not a property of a file system, "proper" or not.

by Retr0id3 minutes ago|

[-]

It's a property of the filesystem, docker does not virtualize fs.

by merb4 hours ago|

[-]

btw nfs that is mentioned here is fine in sync mode. However that is slow.

by ncruces35 minutes ago|

[-]

This thread in the SQLite forum should be instructive: https://sqlite.org/forum/forumpost/90d6805c7cec827f

by PunchyHamster3 hours ago|

[-]

> WAL mode requires shared access to System V IPC mapped memory.

Incorrect. It requires access to mmap()

"The wal-index is implemented using an ordinary file that is mmapped for robustness. Early (pre-release) implementations of WAL mode stored the wal-index in volatile shared-memory, such as files created in /dev/shm on Linux or /tmp on other unix systems. The problem with that approach is that processes with a different root directory (changed via chroot) will see different files and hence use different shared memory areas, leading to database corruption."

> This is unlikely to work across containers.

I'd imagine sqlite code would fail if that was the case; in case of k8s at least mounting same storage to 2 containers in most configurations causes K8S to co-locate both pods on same node so it should be fine.

It is far more likely they just fucked up the code and lost data that way...

[-]

> This is unlikely to work across containers.

Why not?

by voidfunc3 hours ago|

[-]

Ooh new historical Unix variant I had never heard of.. neat!

by chasil3 hours ago|

https://en.wikipedia.org/wiki/Ultrix

[-]

AIX is still supported and sold, so quite current?

Some that I used that are gone... Ultrix (MIPS), Clix, Irix, SunOS 4, SCO OpenServer, TI System V.

https://en.wikipedia.org/wiki/Intergraph

by nxobject2 hours ago|

[-]

NeXTstep? (Leaving aside fun spitballing about whether Tahoe is morally OPENSTEP 26, and whether it was NeXT that actually bought Apple for negative $400 million...)

by chasil1 hours ago|

[-]

Alas, I never had access to any of the Next environments, until PPC MacOS.

I did hold a copy in my hands for 486-class machines in the college bookstore.

by crabmusket3 days ago|

[-]

Patient: doctor, my app loses data when I deploy twice during a 10 minute interval!

Doctor: simply do not do that

by pavel_lishin3 days ago|

[-]

Doctor: solution is simple, stop letting that stupid clown Pagliacci define how you do your work!

Patient: but doctor,

by pjc507 hours ago|

[-]

pAIgliacci: as a large language model, I am unable to experience live comedy.

by rcakebread5 hours ago|

[-]

Bob Newhart did it best https://www.youtube.com/watch?v=LhQGzeiYS_Q

by xnorswap7 hours ago|

[-]

I'm fairly confident they let it write the blog post too.

[-]

"Not as a proof of concept. Not for a side project with three users. A real store" - suggestion for human writers, don't use "not X, not Y" - it carries that LLM smell whether or not you used an LLM.

by xnorswap6 hours ago|

[-]

And that's just the opening paragraph, the full text is rounded off with:

"The constraint is real: one server, and careful deploy pacing."

Another strong LLM smell, "The <X> is real", nicely bookends an obviously generated blog-post.

by These3352 hours ago|

[-]

You're absolutely right, this was an AI post

by bombcar7 hours ago|

[-]

Hey, Apple still takes their store down during product launches!

by pstuart6 hours ago|

[-]

I assumed that it was to ensure that the announced products were revealed in a controlled manner rather than because they aren't able to do updates to their product listings as a regular thing.

by bombcar6 hours ago|

[-]

My reading of the tea leaves is it started out as the latter and continues as the former as part of the “mystique”.

by littlestymaar4 hours ago|

[-]

> Wait, you let _Claude_ push your e-commerce code straight to main which immediately results in a production deploy?

Yikes. Thank you I'm not going to read “Lessons learned” by someone this careless.

by 66yatman4 hours ago|

[-]

The issue wasn’t done by the ai but their lack of architectural knowledge

by tensegrist7 hours ago|

[-]

i hate to be so blunt but look around the site and then tell me you're surprised

by burnt-resistor3 hours ago|