undefined

points

[-]

Excellent project! I see that the agent modifies the google docs using an interesting technique: convert doc to html, AI operates over the HTML and then diff the original html with ai-modified html, send the diff as batchUpdate to gdocs.

IMO, this is a better approach than the one used by Anthropic docx editing skill.

1. Did you compare this one with other document editing agents? Did you have any other ideas on how to make AI see and make edits to documents?

2. What happens if the document is a big book? How do you manage context when loading big documents?

PS:I'm working on an AI agent for Zoho Writer(gdocs alternative) and I've landed on a similar html based approach. The difference is I ask the AI to use my minimal commands (addnode, replacenode, removenode) to operate over the HTML and convert them into ops.

This works pretty well for me.

by sothatsit15 hours ago|

prev|

[-]

We have been using something similar for editing Confluence pages. Download XML, edit, upload. It is very effective, much better than direct edit commands. It’s a great pattern.

by holmb14 hours ago|

parent|

[-]

I would be very interested in this if you could share? Maintaining a Knowledge Base without a Git workflow is a pain currently.

by Jagerbizzle10 hours ago|

parent|

[-]

You can use the Copilot CLI with the atlassian mcp to super easily edit/create confluence pages. After having the agent complete a meaningful amount of work, I have it go create a confluence page documenting what has been done. Super useful.

by sothatsit13 hours ago|

parent|

prev|

[-]

I'm afraid I can't easily share this, as we have embedded a lot of company-specific information in our setup, particularly for cross-linking between confluence/jira/zendesk and other systems. I can try explain it though, and then Claude Code is great at implementing these simple CLI tools and writing the skills.

We wrote CLIs for Confluence, Jira, and Zendesk, with skills to match. We use a simple OAuth flow for users to login (e.g., they would run jira login). Then confluence/jira/zendesk each have REST APIs to query pages/issues/tickets and submit changes, which is what our CLIs would use. Claude Code was exceptional at finding the documentation for these and implementing them. Only took a couple days to set these up and Claude Code is now remarkably good at loading the skills and using the CLIs. We use the skills to embed a lot of domain-specific information about projects, organisation of pages, conventions, standard workflows, etc.

Being able to embed company-specific links between services has been remarkably useful. For example, we look for specific patterns in pages like AIT-553 or zd124132 and then can provide richer cross-links to Jira or Zendesk that help agents navigate between services. This has made agents really efficient at finding information, and it makes them much more likely to actually read from multiple systems. Before we made changes like this, they would often rabbit-hole only looking at confluence pages, or only looking at jira issues, even when there was a lot of very relevant information in other systems.

My favourite is the confluence integration though, as I like to record a lot of worklog-style information in there that I would previously write down as markdown files. It's nicer to have these in Confluence as then they are accessible no matter what repo I am working in, what region I am working in, or what branch or feature I'm working on. I've been meaning to try to set something similar up for my personal projects using the new Obsidian CLI.

by holmb9 hours ago|

parent|

[-]

Thanks for the insights!

We have been doing something similar but it sounds like you have come further along this way of working. We (with help from Claude) have built a similar tool that you describe to interface with our task- and project management system, and use it together with the Gitlab and Github CLI tools to allow agents to read tickets, formulate a plan and create solutions and create MR/PR to the relevant repos. For most of our knowledge base we use Markdown but some of it is tied up in Confluence, that's why I have an interest in that part. And, some is even in workflows are in Google Docs which makes the OP tool interesting as well -- currently our tool output Markdown and we just "paste from markdown" into Gdocs. We might be able to revise and improve that too.

by graeme11 hours ago|

parent|

prev|

[-]

Thank you! Sounds like a fantastic setup. Are the claude code agents acting autonomously from any trigger conditions or is this all manual work with them? And how do you manage write permissions for documents amongst team members/agents, presumably multiple people have access to this system?

(Not OP, but have been looking into setting up a system for a similar use case)

by sothatsit2 hours ago|

parent|

[-]

This is all manual, so people ask their agent to load Jira issues, edit Confluence pages, etc. Users sign-in using their own accounts using the CLIs, so the agents inherit their own permissions. Then we have the permissions in Claude Code setup so any write commands are in Ask, so it always prompts the user if it wants to run them.

by reachableceo4 hours ago|

parent|

prev|

[-]

Edit the markdown using GitHub workflow. Then insert markup (pick markdown) into the confluence page.

Works wonderfully!

by neuronexmachina9 hours ago|

parent|

prev|

[-]

I've found that usually works ok, but currently tends to timeout with the Atlassian MCP when trying to do updates on large Confluence pages: https://github.com/atlassian/atlassian-mcp-server/issues/59

by sothatsit2 hours ago|

parent|

[-]

Yeah, we wrote our own CLIs for Jira/Confluence/Zendesk using their REST APIs instead. Works well, although a bit more work.