undefined

upvote

points

by russellthehippo1 days ago |

upvote

by andersmurphy1 days ago|

[-]

Is the main use case for this for languages that only have access to process based concurrency?

Struggling to see why you would otherwise need this in java/go/clojure/C# your sqlite has a single writer, so you can notify all threads that care about inserts/updates/changes as your application manages the single writer (with a language level concurrent queue) so you know when it's writing and what it has just written. So it always felt simpler/cleaner to get notification semantics that way.

Still fun to see people abuse WAL in creative ways. Cool to see a notify mechanism that works for languages that only have process based concurrency python/JS/TS/ruby. Nice work!

reply

upvote

by zbentley13 hours ago|

[-]

There's more process-based concurrency than you'd expect in shops that use those languages.

Cron jobs might need to coordinate with webservers. Even heavily threaded webservers might have some subprocesses/forking to manage connection pools and hot reloads and whatnot. Suid programs are process-separated from non-suid programs. Plenty of places are in the "permanent middle" of a migration from e.g. Java 7 to Java 11 and migrate by splitting traffic to multiple copies of the same app running on different versions of the runtime.

If you're heavily using SQLite for your DB already, you probably are reluctant to replace those situations with multiple servers coordinating around a central DB.

Nit:

> languages that only have process based concurrency python/JS/TS/ruby

Not true. There are tons and tons of threaded Python web frameworks/server harnesses, and there were even before GIL-removal efforts started. Just because gunicorn/multiprocessing are popular doesn't mean there aren't loads of huge deployments running threads (and not suffering for it much, because most web stacks are IO bound). Ruby's similar, though threads are less heavily-used than in Python. JS/TS as well: https://nodejs.org/api/worker_threads.html

reply

upvote

by russellthehippo17 hours ago|

[-]

I actually hadn’t thought about it this way. The killer app I was imagining was 1ms reactivity without SQL polling and messaging atomic with business commits, plus “one db” and no daemon.

But this is actually a great main benefit as well.

reply

upvote

by infogulch1 days ago|

[-]

He mentions Litestream, maybe this also works for litestream read-only replicas which may be in completely different locations?

reply

upvote

by russellthehippo17 hours ago|

[-]

Whoa I really hadn’t considered this. Do a litestream read replica, trigger across machines with S3 as the broker essentially. But you’re still stuck with the litestream sync interval. Maybe interesting for cross server notify?

reply

upvote

by infogulch17 hours ago|

[-]

I guess the idea is to have all writes go through a central server with local read replicas for improved read perf. The default litestream sync interval is 1s. I bet many use-cases would be satisfied with a few seconds delay for cross-region notifications.

reply

upvote

by russellthehippo16 hours ago|

[-]

It's good for pubsub but not for claim/ack workflow unless you do If-None-Match CAS semantics on a separate filesystem which, actually, yeah that's probably fine. Feels heavy on S3 ops. But! you do save on inter-AZ networking, the Warpstream hypothesis.

reply

upvote

by ncruces15 hours ago|

[-]

Claims kill this, IMO.

Unless you have a single "reader", you don't mind the delay, and don't worry about redoing a bunch of notifications after a crash (and so, can delay claims significantly), concurrency will kill this.

reply

upvote

by vrajat4 hours ago|

[-]

I wrote a simple queue implementation after reading the Turbopuffer blog on queues on S3. In my implementation, I wrote complete sqlite files to S3 on every enqueue/dequeue/act. it used the previous E-Tag for Compare-And-Set.

The experiment and back-of-the-envelope calculations show that it can only support ~ 5 jobs/sec. The only major factor to increase throughput is to increase the size of group commits.

I dont think shipping CDC instead of whole sqlite files will change the calculations as the number of writes mattered in this experiment.

So yes, the number of writes (min. of 3) can support very low throughputs.

reply

upvote

by russellthehippo15 hours ago|

[-]

exactly. then you're building distributed locking and it's probably time for a different tool

reply

upvote

by grumbelbart22 hours ago|

[-]

Very cool!

Another maybe stupid question, would something like inotify(7) help to get rid of any active polling?

reply

upvote

by arowthway1 days ago|

[-]

Nice, I had no idea that stat() every 1 ms is so affordable. Aparently it takes less than 1 μs per call on my hardware, so that's less than 0.1% cpu time for polling.

reply

upvote

by WJW23 hours ago|

[-]

"Syscalls are slow" is only mostly true. They are slower than not having to cross the userspace <-> OS barrier at all, but they're not "slow" like cross-ocean network calls can be. For example, non-VDSO syscalls in linux are about 250 nanoseconds (see for example https://arkanis.de/weblog/2017-01-05-measurements-of-system-...), VDSO syscalls are roughly 10x faster. Slower than userspace function calls for sure, but more than affordable outside the hottest of loops.

reply

upvote

by vlovich12322 hours ago|

[-]

Filesystem stuff tends to be slower than average syscalls because of all the locks and complicated traversals needed. If this is using stat instead of fstat then it’s also going through the VFS layer - repeated calls likely go through the cache fast path for path resolution but accessing the stat structure. There’s also hidden costs in that number like atomic accesses that need to acquire cache line locks that are going to cause hidden contention for other processes on the CPU + the cache dirtying from running kernel code and then subsequently having to repopulate it when leaving all of which adds contended L3/RAM pressure.

In other words, there’s a lot of unmeasured performance degradation that’s a side effect of doing many syscalls above and beyond the CPU time to enter/leave the kernel which itself has shrunk to be negligible. But there’s a reason high performance code is switching to io_uring to avoid that.

reply

upvote

by russellthehippo17 hours ago|

[-]

Oh cool, so using io uring plus pragma data version would actually beat stat on Linux holistically speaking? The stat choice was all about cross platform consistency over inotify speed. But syscalls overwhelm can be real.

reply

upvote

by vlovich12311 hours ago|

[-]

“Beat” is all relative. It depends on load and how frequently you’re doing it, but generally yes. But if you’re doing io_uring, you may as well use inotify because you’re in the platform specific API anyway as that’s the biggest win because you’re moving from polling to change detection which is less overhead and lower latency. Inotify can be accessed by io_uring and there may even be cross-platform libraries for your language that give you a consistent file watcher interface (although probably not optimally over io_uring). Whether it’s actually worth it is hard as I don’t know what problem you’re trying to solve, but the super lowest overhead looks like inotify+iouring (it also has the lowest latency)

reply

upvote

by xenadu0212 hours ago|

[-]

If you're interested you can use kqueue on FreeBSD and Darwin to watch the inode for changes. Faster than a syscall, especially if all you need is a wakeup when it changes.

reply

upvote

by slashdev23 hours ago|

[-]

That’s ignoring the other costs of syscalls like evicting your stuff from the CPU caches.

But I agree with the conclusion, system calls are still pretty fast compared to a lot of other things.

reply

upvote

by vlovich12322 hours ago|

[-]

Small correction on ambiguous wording - syscalls do not evict all your stuff from CPU caches. It just has to page in whatever is needed for kernel code/data accessed by the call, but that’s no different from if it was done in process as a normal function call.

reply

upvote

by Polizeiposaune19 hours ago|

[-]

Depending on implementation details of your CPU and OS, the syscall path may need to flush various auxillary caches (like one or more TLBs) to prevent speculation attacks, which may put additional "drag" on your program after syscall return.

reply

upvote

by vlovich12311 hours ago|

[-]

Correct but you’d also still have that drag just from the kernel dirtying those caches in the first place.

But I was clarifying because the wording could be taken as data/instruction cache and there generally isn’t a full flush of that just to enter/leave kernel.

reply

upvote

by ncruces22 hours ago|

[-]

Probably missing something, why is `stat(2)` better than: `PRAGMA data_version`?

https://sqlite.org/pragma.html#pragma_data_version

Or for a C API that's even better, `SQLITE_FCNTL_DATA_VERSION`:

https://sqlite.org/c3ref/c_fcntl_begin_atomic_write.html#sql...

reply

upvote

by infogulch21 hours ago|

[-]

Yeah the C API seems like a perfect fit for this use-case:

> [SQLITE_FCNTL_DATA_VERSION] is the only mechanism to detect changes that happen either internally or externally and that are associated with a particular attached database.

Another user itt says the stat(2) approach takes less than 1 μs per call on their hardware.

I wonder how these approaches compare across compatibility & performance metrics.

reply

upvote

by russellthehippo6 hours ago|

[-]

I just tested this out. PRAGMA data_version uses a shared counter that any connection can use while the C API appears to use a per-connection counter that does not see other connections' commits.

reply

upvote

by psadri22 hours ago|

[-]

For one it seems to be deprecated.

reply

upvote

by ncruces22 hours ago|

[-]

It's not.

reply

upvote

by psadri8 hours ago|

[-]

You are correct. I apologize. I seemed to have read the next pragma’s depreciation notice!

Aside from this - SQLite has tons of cool features, like the session extension.

reply

upvote

by russellthehippo17 hours ago|

[-]

Yep, definitely still in use. Do yall above have an opinion if the pragma is better than the syscall? What are the trade offs there? Another comment thread mentioned this as well and pointed to io uring. I was thinking that dism spam is worse than syscall spam.

reply

upvote

by ncruces15 hours ago|

[-]

Depends on what to mean by better.

I may be wrong, but I think you wrote somewhere that you're looking at the WAL size increasing to know if something was committed. Well, the WAL can be truncated, what then? Or even, however unlikely, it could be truncated, then a transaction comes and appends just enough to it to make it the same size.

If SQLite has an API it guarantees can notify you of changes, that seems better, in the sense that you're passing responsibility along to the experts. It should also work with rollback mode, another advantage. And I don't think wakes you up if a large transaction rolls back (a transaction can hit the WAL and never commit).

That said, I'm not sure what's lighter on average. For a WAL mode database, I will say that something that has knowledge of the WAL index could potentially be cheaper? That file is mmapped. The syscalls involved are file locks, if any.

reply

upvote

by russellthehippo15 hours ago|

[-]

Interesting, thank you for the response and explanation. Honker workers/listerners are holding an open connection anyway. I do trust SQLite guarantees more than cross-platform sys behavior. I will explore the C API angle.

reply

upvote

by rich_sasha22 hours ago|

[-]

Pretty cool! I have a half baked version of something similar :)

Can you use it also as a lightweight Kafka - persistent message stream? With semantics like, replay all messages (historical+real time) from some timestamp for some topics?

As with pub/sub, you can reproduce this with some polling etc but as you say, that's not optimal.

reply

upvote

by russellthehippo17 hours ago|

[-]

Absolutely! That’s the durable pubsub angle for sure.

reply

upvote

by infogulch1 days ago|

[-]

Neat idea!

Would it help if subscriber states were also stored? (read position, queue name, filters, etc) Then instead of waking all subscription threads to do their own N=1 SELECT when stat(2) changes, the polling thread could do Events INNER JOIN Subscribers and only wake the subscribers that match.

reply

upvote

by noveltyaccount1 days ago|

[-]

This is really interesting. I'm building something on Postgresql with LISTEN/NOTIFY and Postgraphile. I'd love to (in theory) be able to have a swappable backend and not be so tightly coupled to the database server.

reply

upvote

by hk13371 days ago|

[-]

I love the name!

reply

upvote

by russellthehippo16 hours ago|

[-]

honk

reply