For example, I usually like using the `Map` data structure, and that's a pretty neat immutable structure and is usually fine for most stuff, but when performance becomes critical, it's easy enough to break into a boring imperative loop with a regular hash map. If I keep everything contained into one function, I usually can avoid feeling super dirty about it.
There are basically two points to programming with immutable-first data. One, eliminate certain classes of data race concurrency bugs. Two, less mutable state in a given context makes it easier to reason about.
So, if you're inside a function scope and you aren't launching any concurrent operations from inside that function, you don't have to worry about benefit #1. If you're inside a function (and you're not reaching out for global mutable state), then the context you need to keep in your working memory is likely fairly small, so a few local mutable variables doesn't significantly harm "understandability" of the implementation (in most cases). So, you really don't have to worry about #2, either. Make your functions black boxes with solid "APIs" (type signatures), and let the inside do whatever it needs to make it work the best.
Just because premature optimization is the root of all evil, it doesn't mean we need to jump right to premature pessimization...
I will personally almost always prefer the pretty functional versions of things, and that's almost always what I start with. I like immutable data structures, and they are usually more than fast enough. Occasionally, though, you hit a bottleneck of some kind (usually in some form of loop), and you have to avoid all the beautiful functional stuff and go back to sad imperative stuff. When I do that, I usually try and keep it scoped to one function. Even within one function, I do find the persistent structures easier to reason about, but as you stated it's a small enough surface area to not be too irritating.
There are exceptions to this, of course. Sometimes for caching/memoizing I will make a global ConcurrentDictionary, and I'll use the interlocked thing to do global counters sometimes.
One thing I do want to try out is publishing it with native AOT. I had a lot of luck with that on one of my other F# projects, I got like a 75% speedup out of it. I understand the JIT is supposed to outperform native AOT in the long term but I haven't seen it reach that speed.
And sorry for the paranoia, I find a lot of people tried f# or even c# back in 4.x Framework era and think it hasn't changed.