As usual, this depends heavily on what you do. I had written a program where Arc reference counting was 25 % of the runtime. All from a single instance as well. I refactored to use borrows and annotated relevant structs with lifetimes. This also enabled additional optimisation where I could also avoid some copies, and in total I saved around 36% of the total runtime compared to before.
The reason Arc was so expensive here is likely that the reference count was contended, so the cacheline was bouncing back and forth between the cores that ran the threads in the threadpool working on the task.
In conclusion, neither extreme is correct. Often Arc is fine, sometimes it really isn't. And the only way to know is to profile. Always profile, humans are terrible at predicting performance.
(And to quite a few people, coming up with ways to avoid Arc/clone/Box, etc can be fun, so there is that too. If you don't enjoy that, then don't participate in that hobby.)
I think it's fair to say Java's "syntax ergonomics" are a little below the rest / somewhat manual like rust or C++ by default.