Numerical programs being very stable in terms of control is what enables GPU parallelization and loop optimizations in the long tradition of Fortran compilers. Optimizations like loop tiling, interchange, strip mining, etc aren't going to be easy to do with trace compilation.
Anyway my comment was more directed toward trace compilation in the context of dynamic languages, and there I think it's pretty well established it only works well for small programs.
ML frameworks even take advantage of this to compute, ahead-of-time, how much memory will be used at different points in the program graph, and thereafter schedule memcpy's to make space as necessary. Of course this only works for well-behaved program classes, but e.g. most LLM architectures fit into that category. Interestingly MoE models don't, since they require data-dependent control flow, thus the recent push towards accommodating dynamism in frameworks (like JAX, which until ~recently couldn't handle it at all).