I'd love to see a more modern day attempt at something like Bioware's Neverwinter Nights - which was designed so that someone could create a campaign, and then the game would provide the behavior, pathfinding, assets, and everything else with a virtual (or human) DM behind the scenes. You could still tell a human-driven story, but the engine would do a lot of the heavy lifting.
I think a lot of those attempts you mentioned try and brute force the problem or trust the AI too much on what to generate.
A lot of the same problems that AI coding agents run into also apply to this problem. You have to really manage context (avoid sending a novel at the model) and enforce strict rules in the "engine". The hard part is world building that is consistent without railroading the player and forcing specific paths. I have an agent (for lack of a better term) that manages arcs across each tier. World arcs (nations, factions), player character arcs, NPC arcs, individual scene arcs, and location arcs (towns, cities, dungeons, etc). By prompting all of these as tight, individual arcs with flavor and context peppered in as needed, you end up with stuff that is more compelling. It has to be loose enough that you don't railroad the player. When you decline that NPC's quest, down the road that might have changed the overall arc for a town in a meaningful way.
I won't pretend that I've perfected anything but I have definitely noticed a spark in its writing and world building that I personally have really enjoyed.
OTOH, that means that the underlying story is that much more important. I think a lot of people mistake coherence for novelty. Biggest offender is puzzles - oh god do LLMs absolutely blow dire wolf chunks at coming up with organic and interesting puzzles.
I have a private vs public flag for assets that I'm considering more unique or sensitive, at the AI GM's discretion. I'm using embeddings from there to try and parse if an asset already exists in the public pool or not, and reuse it if possible. The thinking is that eventually I will have pretty decent asset coverage on most standard campaigns. I can't account for people going way off book though.
I have an asset pipeline that tries to determine player intent and pre-generate assets before they're needed. That way we can attempt to hide the "load screens" like retro games did with elevators. I have a kind of sliding scale for player coherency, and if the player has too many "misses" on the pre-generation pipeline it will increase its requirements for when it starts generating.
I may have wildly over-engineered this but I love it. =)