undefined

points

[-]

Yeah I mean we have a mechanism that can bypass AI models for log lines where we are pretty sure no PII is in there (kind of like smart caching using fuzzy template matching to identify things that we have seen before many times, as logs tend to contain the same stuff over and over with tiny variations e.g. different timestamps), so we only need to pass the lines where we cannot be sure there's nothing to the AI for inspection. And we can of course parallelize. Currently we use a homebrew CFR model with lots of tweaks and it's quite good but an LLM would of course be much better still and capture a lof of cases that would evade the simpler model.