I haven't found any satisfying solution to it all; collecting information for logging vs information that a caller would want... I've been meaning to investigate tracing_error to see if it brings it all together.
edit: I've just finished debugging a multi system chain - FE -> SNS -> SQS -> Lambda -> DynamoDB -> Lambda -> Webhook -> My poor code
My code has multiple layers - and I was trying to find where in the very long chain of calls the data was being mangled
It turned out that there was an unlogged error, which was mismanaged by a caller - there's no shade here - the caller was handling the error how it was designed to, but by not logging that there was an error - it took a minute to understand.