We use it heavily and everyone who started on it doing simple scripting/automation all love it, everyone who built real production systems on top of it all hate it. Possibly operator error, but my experience hasn’t matched the rosy picture painted in these comments.
Determinism sucks, you do have to work hard and make everything idempotent in activities like we would for durable software anyway. The language we used was incorrect (Go) and has a lot of boilerplate compared to alternatives we later investigated (Python and TypeScript). Visibility can be slow and misses information. We needed to write our own APIs to work effectively with Agents for root-cause analysis of failures.
With all the caveats - Temporal is amazing, it feels much better than previous orchestrators I used like Prefect or Airflow. 100% would adopt again.
There are no free lunches in this space. I have no idea how good or bad Temporal is since my usage is pretty small and isolated, but software rarely just works and impresses me and Temporal for my local machine orchestrating genuinely did. I think Netflix's conductor is another cool option, but I ended up with Temporal due to license.
- Temporal itself is written in Go and we use Go for our backend so we expected this to be a natural fit. - Temporal makes writing activities in Go very explicit and boilerplatey - This in turn makes testing more difficult than it needs to be often - Temporal doesn't play well with Go's concurrency model at all (all stuff like goroutines needs to go through its special workflows.Go) a lot more often you have to write stuff that "appeases" temporal. - The whole workflows.ExecuteActivity(...).Get(...) is weird, having futures in a language explcitly designed to avoid that is weird. - All our compute isn't done on temporal workers anyway, its done (in another AWS account, owned by the customer) in batch compute (aws batch, lambda, ec2, whatever) so our temporal code isn't CPU heavy but is highly concurrent and needs a very high reliability guarantee. - Compare that to temporal with TypeScript, where it's simple and easy to use the same code inside or outside of temporal. Testing is trivial and the code looks like "regular code".
I see this as Temporal surfacing inherent complexity of the domain in a way that forces the developer to consider it, rather than introducing extra complexity.
If it didn't make workflow determinism a strict requirement, the requirement would still exist - it would just hurt much worse in production when it's broken.
See also: Rust borrowing
If you're fine with deploying several versions of workers (and are on a reasonably new version) you can just avoid the determinism issue altogether with their k8s controller.
If you do need to have some long workflows, there is an explicit hook for "what happens to existing workflows on version upgrade".
But to be fair - none of the other orchastrators I used (like AirFlow) made me write workflows.IsNewCode/IsOldCode like temporal does. On the other hand AirFlow doesn't even have the capability to do that in the first place (or at least it didn't last I used it).
That said, I appreciate this is hard in practice. We need to start small to manage the development rabbit hole risk, while also wanting to dream big. There is a tension there that I find hard to balance.
I also have restate.dev on my reseearch list, which on paper should scale well and be definitely more lightweight and simple to setup, worth having a look.
I wouldn't know, I've not done either, but I'd like to learn more from your or other's experience.
I think a genuine problem right now is people are building agentic work flows and learning the hard way highly reliable agentic work flows are hard. Agents are unreliable. They are both not deterministic and not the backing APIs have pretty high error rates. Temporal has solved that pain for me and made it easy to diagnose problems.
I don’t have anything really large scale running. But big enough that it takes billions of tokens and high reliability to finish.
ive been over here using claude relatively simply as of recent, just claude code and i might enter plan mode to do some bigger like scrap together a test suite of some sort, or i just have him scripting and refactoring/reformatting stuff under my direction. i wrote my own cli tool (needed to bake in the snowflake golang driver for external browser sso propagation) and added it as a skill so he can talk to our cloud dbms when im doing analytics things but for the most part its all pretty simple. feel like my productivity is 50x but after over a year with claude ive really backed off on asking him to do insane stuff and mostly keep him churning stuff out for me in domains i know very well.
so i read all this workflow stuff that needs durability and logging and im kind of astounded how many people have their AI stuff just running on their own round the clock. i didn't realize how much of peoples day to days needed to be automated, i don't seem to find myself surrounded by much that should be automated. jira is probably the only thing i need to sit down and automate because its such a translation tax on developers just so business people can feel involved. but outside of that... guess im behind the times, but i dont know if its that. i see the big grand things people use llms for ("im creating the ultimate knowledge base" or "ive automated everything under the sun and im making 10k a week" etc) and i am feeling either too tired, not ambitious enough, or unenthused by the creative and grand ways people are working with AI. seems like everyone has their own "perfect way to use AI" but I can't seem to find the oomph to go beyond using claude as a utility anymore. a year ago (maybe more cant remember anymore its all a blur) with claude in the sonnet era i was so amazed the first thing i did was try to reverse engineer a game using ghidra. had him building test suites to verify the math was correct. we were at this for weeks. my nearby datacenter probably drained 10 lakes. that was just one of _many_ over-ambitious projects i selected because of claude that never saw a finish line.
yesterday i opened beej.us and just started reading. im young and i feel like i somehow went from 'damn this claude shit is pretty cool' to 'AI is whatever its fine' in a year. like the bell curve meme.
I've tried clever tricks to get AI produce unsupervised stuff and came back from it. The slop and loss of cognitive knowledge about what it did was uncomfortable to me... I cannot understand how you would hand off critical job to it.
They ship Helm charts so reality is somewhere between "helm deploy" and "substantial ops burden". I don't have to touch it very frequently, but that is not to say I don't have to touch it. There's occasional releases and there have been times where (probably due to my inexperience with helm) I botched an upgrade and lost some data. And I've been on this journey for years; when I first started, they didn't have a Python SDK and it was one of my (many) excuses to learn Go. But anyway to your point, yes, if you're comfortable with k8s and Helm then you shouldn't have much of a problem running hundreds of thousands of workflows; if you want to really push the throughput and optimize cost you probably need to get creative the individual services and look into cassandra (maybe? idk).
My devops coworker just shrugs, pumps out some yaml and helm and away it goes.
It really depends on your experience and tolerance for a lot of things.
Usually maintenance burden doesent start to make itself known till you get off the happy path or something breaks. Sometimes it can be a long while before that happens, sometimes it happens right away.
Other orgs have never heard of alerts or error reporting and naturally will not catch issues until they are catastrophic (for example services that crash frequently in the background go unnoticed until the crash frequency causes a catastrophic failure). In my experience a lot of issues are pretty simple such as running out of memory, CPU throttling, crashes caused by simple bugs (nil panics). If you have good observability you can catch those issues early.
For example: people rag on Ceph that their cluster somehow got into a broken state, but that really only occurs when abuse of the ceph cluster has went on long enough that the cluster finally reaches the tipping point where it is unrecoverable. If you set ceph up, follow the correct replication rules so components are spread across failure domains, and use the metrics and alerts that are distributed with ceph it is actually quite hard to break the cluster.
As best I can tell it doesn't do any batching of it's writes/reads, and it's update heavy in places rather than append (I suspect their cloud version might do some of these things)
It's pretty close to "let's make every function call serialise it's parameters/return value, go through a postgres table and several network hops"
That said it can be very useful, but it's a heavy tool that's best suited for high value/risk workflows where you're earning enough from the execution that you can afford the overhead (for example an Uber trip with several dollars of service fees is probably a good fit, unsurprisingly since it's roots are from Uber)
Self-hosting is very easy in my experience, I've done it for 2 years but management wanted to move to Temporal Cloud. They have a helm chart which just works including upgrades. This does assume you have the whole k8s shebang set up and working in your company. I never had to touch is outside upgrades which took maybe 30m including validation.
I rewrote the pipeline in Python (a correct version of it) with state management in SQLite and logs in plain old flat files, and everything has been running smoothly ever since. In fact this is the only data flow that has worked without errors or interruptions in the last six months.
Instead of replicating the db file with Litestream I do a remote backup with Restic before and after each run; it's not an exact replacement of Litestream as we could possibly lose a whole run if the machine died / disappeared at the end of a run, but it lets one restore any day very easily. In an ideal world I think we should have both (live replica + backups).
1. A robust durability implementation 2. A library of high performance data structure and algorithms
The fact this it's SQL is nice, but those two attributes are what make it great.
For example, I'm implement an in-process event log that I want to be durable. I started simple, but soon saw some edge cases and instead of playing whackamole I just swapped to using sqlite as an ordered kv store that gives me ACID.
Another example: ingesting multiple inter related datasets. Instead of a dozen hash maps in memory, I load them up into sqlite (no persistence) and then slice and dice as I need to.
It's a super useful tool.
The underlying database isn't the most important thing. Just use SQL. Its namespacing (eg, through CTEs) is good and you're more likely to have colleagues who know SQL compared to jq.
As an occasional consumer of JSON/CSV, that's why I really like DuckDB, it's just SQL for such file formats. And it manages to be super fast at it too.
Usually we end up writing a script to incrementally refresh a data-set I'm analyzing (or have someone send me a copy after they pull it).
I've been using sqlite for anything which needs an UPDATE - modifying a row deep inside the data-set with jsonl is a pain.
My github is full of java programs which update sqlite3 files with threadpools and a single big lock around the UPDATE (& then I write or have an agent write code to analyze it).
DuckDB is slowly replacing it in the context of python, simply because of the ease of pushing a UDF into the SQL.
Also because I really like expressing things as LEAD/LAG with a UDF on top.
On a modern processor, that's about GBs of data typically, right?
Just wanted to make sure no one missed this point in your comment because eventually users will be paying the full cost for tokens instead of VC's paying, with GitHub Copilot's pricing realignment leading the way.