upvote
I'm actually reviewing a PR right now that enables the MCP server to have a library of common robots with a specification file for each robot containing important context to the language model.

The demo video with the Unitree Go (robot dog) uses this approach to give the LLM additional context of the custom poses available to it.

reply
Not starting from clean slate improves the interaction significantly. I started from a clean slate on the industrial robot video in order to highlight how much is possible even when starting from one.
reply
That makes this all the more impressive!! What happens when you get an incorrect interpretation though? That is now in the "previous context bucket". Assuming the user addresses the issue through the LLM layer by talking to through it, do you think that the subsequent interactions could compound the error?

I sometimes face issues with LLMs running out of tokens or only using partial contacts from previous conversations - thereby repeating or compounding on previous incorrect responses.

Any thoughts on how to tackle that? Or is that too abstract a problem/beyond the scope to address at the moment?

reply
(Mentioning beforehand that we're still very early when it comes to the exact behavior of each language model)

So far, I've observed that for Claude and Gemini, which are what we've been testing most with, the Language model has been pretty good at recognizing its a faulty initial interpretation when it queries more information from the system.

Running out of tokens is a more significant issue. We saw it a lot when we queried imaged topics, which led us to try writing better image interpreters within the MCP server itself (credit to my collaborators at the Hanyang university in Korea) to defend the context window. Free tiers of the language models also run out of tokens quite quickly.

PS - Thank you for the questions, I'm enjoying talking about this here on HN with people who look at it critically and challenge us!

reply