You want low deterministic latency with sharp tails.
If all you care about is throughput then deep pipelines + lots of threads will get you there at the cost of latency.
You have to optimize your memory usage patterns to fit in CPU cache as much as possible which is something typical Java develops don't consider. I have a background in assembly and C.
I'd say it's slightly harder since there is a little bit of abstraction but most of the time the JIT will produce code as good as C compilers. It's also an niche that often considers any application running on a general purpose CPU to be slow. If you want industry leading speed you start building custom FPGAs.
how exactly you are passing data? You can pass some primitives without allocating them on heap. You can use some tiny subset of Java+standard library to write high performance code, but why would you do this instead of using Rust or C++?
Strangely this is one of the areas where I want to use project panama so I might re-implement some of the ring buffers constructs.
You allocate off heap memory and dump data into it. With modern Java classes like Arena, MemoryLayout, and VarHandle it's honestly a lot like C structs.
I answered "why" in another post in this thread.
Then things like the jit, by default, doing run time profiling and adaptation.
In terms of speed, memory usage, runtime characteristics... sure there are better options. But if java is good enough, or can be made good enough by writing the code correctly, why add another toolchain?
"writing code correctly" here means stripping 95% of lang capabilities, and writing in some other language which looks like C without structs (because they will be heap allocated with cross thread synchronization and GC overhead) and standard lib.
Its good enough for some tiny algo, but not good enough for anything serious.
those have low bar of performance, also they mostly became popular because of investments from Java hype, and rust didn't exist or had weak ecosystem at that time.