I recently wrote my own zero allocation HTTP server and while the above statement is possible to achieve, at some point you need to make a decision on how you handle pipelined requests that aren't resolved synchronously. Depending on your appetite for memory consumption per connection, this often leads to allocations in the general case, though custom memory pools can alleviate some of the burden.
I didn't see anything in the article about that case specifically, which would of been interesting to hear given it's one of the challenges I've faced.
In OxCaml, it has support for the effect system that we added in OCaml 5.0 onwards, which allows for a fiber to suspend itself and be restarted via a one-shot continuation. So it's possible to have a pipelined connection stash away a continuation for a response calculation and be woken up later on when it's ready.
All continuations have to be either discarded explicitly or resumed exactly once; this can lead to memory leaks in OCaml 5, but OxCaml has an emerging lifetime system that guarantees this is safe: https://oxcaml.org/documentation/parallelism/01-intro/ or https://gavinleroy.com/oxcaml-tutorial-icfp25/ for a taste of that. Beware though; it's cutting edge stuff and the interfaces are still emerging, but it's great fun if you don't mind some pretty hardcore ML typing ;-) When it all settles down it should be very ergonomic to use, but right now you do get some interesting type errors.
Ahh, that's interesting. I think you still run into the issue where you have a case like this:
1. You get 10 pipelined requests from a single connection with a post body to update some record in a Postgres table.
2. All 10 requests are independent and can be resolved at the same time, so you should make use of Postgres pipelining and send them all as you receive them.
3. When finishing the requests, you likely need the information provided in the request object. Lets assume it's a lot of data in the body, to the point where you've reached you per connection buffer limit. You either allocate here to unblock the read, or you block new reads, impacting response latency, until all requests are completed. The allocation is the better choice at that point but that heuristic decision engine with the goal of peak performance is definitely nuanced, if not complicated.
Its a cool problem space though, so always interested in learning how others attack it.
On my TODO list next is to hook up the various O(x)Caml memory profiling tools: we have statmemprof which does statistical sampling, and then the runtime events buffer, and (hopefully) stack activity in OxCaml's case from the compiler.
This provides a pretty good automation loop for a performance optimising coding agent: it can choose between heap vs local, or copy vs reference, or fixed layout (for SIMD) vs fragmentation (for multicore NUMA) depending on the tasks at hand.
Some references:
- Statmemprof in OCaml : https://tarides.com/blog/2025-03-06-feature-parity-series-st...
- "The saga of multicore OCaml" by Ron Minsky about how Jane Street viewed performance optimisation from the launch of OCaml 5.0 to where they are today with OxCaml https://www.youtube.com/watch?v=XGGSPpk1IB0
That's exactly what substructural logic/type systems allows you to do. Affine and linear types are one example of substructural type systems, but you can also go further in limiting moves, exchanges/swaps etc. which helps model scenarios where allocation and deallocation must be made explicit.
Experimental and of course one can debate whether Haskell is mainstream but I figured it merits a mention.
https://www.reddit.com/r/rust/comments/vo31dw/comment/ieao7v...
- it will use O(n) space where n is some measure of one of the parameters (instead of n you could have some sort of function of multiple measures of multiple parameters)
- same but time use instead of space use
- same but number of copies
- the size of an output will be the size of an input, or less than it
- the allocated memory after the function runs is less than allocated memory before the function runs
- given the body of a function, and given that all the functions used in the body have well defined complexities, the complexity of the function being defined with them is known or at least has a good upper bound that is provably true
> I am also deeply sick and tired of maintaining large Python scripts recently, and crave the modularity and type safety of OCaml.
I can totally relate. Switching from Python to a purely functional language can feel like a rebirth.
Without strict settings, it will let you pass this value as of any other type and introduce a bug.
But with strict settings, it will prevent you from recovering the actual type dynamically with type guards, because it flags the existence of the untyped expression itself, even if used in a sound way, which defeats the point of using a gradual checker.
Gradual type systems can and should keep the typed fragment sound, not just give up or (figuratively) panic.
In my experience "almost" is doing a lot of heavy lifting here. Typing in python certainly helps, but you can never quite trust it (or that the checker detects things correctly). And you can't trust that another developer didn't just write `dict` instead of `dict[int, string]` somewhere, which thus defaults to Any for both key and value. And that will type check (at least with mypy) and now you lost safety.
Using a statically typed language like C++ is way better, and moving to a language with an advanced type system like that of Rust is yet another massive improvement.
Although I've found that much of the pain of static type checks in Python is really that a lot of popular modules expose incorrect type hints that need to be worked around, which really isn't a pleasant way to spend one's finite time on Earth.
Meanwhile, in Python, I just haven't figured out how to effectively do the same (even with uv ruff and other affordances) without writing a ton of tests. I'm sure it's possible, but OCaml's spoilt me enough that I don't want to have to learn it any more :-)
In my experience, "purely functional" always means "you can express pure functions on the type level" (thus guaranteeing that it is referentially transparent and has no side effects) -- see https://en.wikipedia.org/wiki/Pure_function
The scoping rules of Python are not lexical
Lambdas in Python are not multiline
Recursion is not a practical way to write code due to stack overflows
Monkey patching
It's too difficult right now to directly jump to the functional version since I don't understand the flambda2 compiler well enough to predict whta optimisations will work! OxCaml is stabilising more this year so that should get easier in time.
When I learnt FP, the choice was between Lisp, Scheme, Miranda, Caml Light and Standard ML, depending on the assignment.
Nowadays some folks consider FP === Haskell.
let parse_int64 (local_ buf) (sp : span) : int64# =
let mutable acc : int64# = #0L in
let mutable i = 0 in
let mutable valid = true in
while valid && i < I16.to_int sp.#len do
let c = Bytes.get buf (I16.to_int sp.#off + i) in
match c with
| '0' .. '9' ->
acc <- I64.add (I64.mul acc #10L) (I64.of_int (Char.code c - 48));
i <- i + 1
| _ -> valid <- false
done;
acc