That specific advice isn't terribly transferable (you might choose to hack up SystemD or some other components instead, maybe even the problem definition itself), but the general idea of measuring and tuning the system running your code is solid.
What else could be improved? Would like to learn :)
Maybe using huge pages?
Disabling c-states, pinning network interfaces to dedicated cores (and isolating your application from those cores) and `SCHED_FIFO` (chrt -f 99 <prog>) helps a lot.
Transparent hugepages increase latency without you being aware of when it happens, I usually disable that.
Idk, there's a bunch but they all depend on your use-case. For example I always disable hyperthreading because I care more about latency than processing power- and I don't want to steal cache from my workload randomly.. but some people have more I/O bound workloads and hyperthreading is just and strict improvement in those situations.
In prod most trading companies do disable it, not sure about generic benchmarks best practices