Build a deterministic query set and automate it for monthly or daily reporting reconcilliation.
Leave AI out of it.
How do you verify that all the tariffs are properly allocated to the correct GL code without going through the invoices and checking for each tariff on the list? How do you make sure none were accidentally assigned to other GL codes? All you have is pdfs, you dont know what the AI did or didnt do with the info on the pdf, there are not many use-cases to catch its errors without doing the work yourself.
If anything, it's going to add a step to these "kids" work where they have to use the AI to do the work and then redo 90% of the work anyway just to verify the output and then AI is going to get the credit anyway.
Or the overworked people are going to use AI and not verify it, which means not catching any errors or hallucinations, which apparently is fine because someone claims it's a solved problem for the black box of infinite possibility and inconsistent output.
When management signs off on work (SOX requires CEOs and CFOs to personally certify the accuracy of financial reports), they do not personally 'verify that all the tariffs are properly allocated to the correct GL code' or nearly any other hard numbers. The world works with human-level best effort, and management of that risk. I'm sure additional checks will be developed to categorize that risk, but the entire field of finance is about analyzing and pricing in risk so I think it'll work just fine.
For anything math, it’s much more reliable to give agents tools. So if you want to verify that your real estate offer is in the 90–95th percentile of offerings in the past three months, don’t give Claude that data and ask it to calculate. Offload to a tool that can query Postgres.
Similar with things needing data from an external source of truth. For example, what payers (insurance companies) reimburse for a specific CPT code (medical procedure) can change at any time and may be different between today and when the service was provided two months ago. Have a tool that farms out the calculation, which itself uses a database or whatever to pull the rate data.
The LLM can orchestrate and figure out what needs to be done, like a human would, but anything else is either scary (math) or expensive (it using context to constantly pull documentation.)