The residual connections resemble the Euler method (this observation led to Neural ODE's IIRC) which isn't known to be exactly clean. If the model has been trained to be a particular number of layers, adding more layers will also add a lot of noise.
Ultimately, the LLM will need to be fine tuned with the loops or a looped architecture trained from scratch, such as: <https://ouro-llm.github.io> unfortunately they made the mistake of looping the entire LLM rather than just the center portion.