> Prioritize recall over precision.
Have you tried stemming your regex? That would help you catch messages where a different form of your word appeared. For example instead of “story” you look for “stor” which catches “stories” as well.
Then you might think, could we do an even better job by figuring out the general semantic intent of the query and history? Let’s project them into a semantic vector space! That’s an embedding.
Then you want to query that, which means you need a vector database. So now we can take the query, embed it, query the vector DB with that embedding and retrieve the N closest history documents. You can use that to augment the generation of the response to your prompt.
This is RAG.
Anyway, interesting to see different degrees of sophistication here. Certainly a handful of naive regex are very snappy.
There’s probably a hybrid approach where you use sophisticated NLP and embedding techniques to robustly define topics, then train a regex to approximate that well.