Show HN: Robot MCP Server – Connect Any Language Model and ROS Robots Using MCP

upvote

Show HN: Robot MCP Server – Connect Any Language Model and ROS Robots Using MCP

(github.com)

23 points

by r-johnv10 hours ago |

upvote

by Shrikrishna_Gad9 hours ago|

[-]

Really impressive work! I love that you can set this up without touching any existing robot code—just start rosbridge and the MCP server, and the LLM can both control and observe the ROS system. It’s like having a conversational ros2cli. The KUKA arm demo is particularly striking—the LLM can call gripper services in real time, all from natural language. One thing I’m curious about—could this setup handle coordinating multiple robots simultaneously, or is it still limited to a single robot per session?”

reply

upvote

by r-johnv8 hours ago|

[-]

Thank you!

The underlying stack definitely can connect to multiple robots simultaneously. Our current implementation is sequential, where the language model can connect from one robot to the next (and then back).

But it is definitely possible for us to write it to be simultaneous as well.

reply

upvote

by awanacode8 hours ago|

[-]

Really cool, the multi-robot angle got me thinking. For simultaneous setups, would that mean spinning separate MCP clients (one per robot), or could a single client handle multiple connections in parallel, almost like an operator shell aware of multiple robot contexts?

On the flip side, how would you handle conflicting commands from multiple clients? Is it last-writer-wins, or do you envision some arbitration layer? It feels like orchestration + conflict resolution will be key if MCP is to scale beyond single-robot demos into fleet-level use.

reply

upvote

by r-johnv6 hours ago|

[-]

Conflicting commands from multiple clients is one of the open questions at the moment.

We're also building out a technical steering committee to help guide our direction on topics like this. Safety is a big category where having direction from across the community will be important.

reply

upvote

by r-johnv6 hours ago|

[-]

It should be possible to do it from one client. The MCP server would handle the parallel connections.

We're using websockets as the interface between the server and the robot itself, which to the best of my exploration does have the ability for simultaneous connections.

reply

upvote

by sayid_islam4 hours ago|

[-]

Fascinating work!

Does this also entail industrial (sector-agnostic) applications where mitigating actions, based on vision or other sensor data based leading indicators, can proactively be taken using LLM-directed mitigation protocols? Does it allow for non-technical users to perhaps drive debugging or other similar mitigation actions?

reply

upvote

by wengmister1 hours ago|

[-]

Really fascinating stuff - I've tried it out and it works like a charm. Really gets one to think if natural language would be the language to program and control robots in the future (just with higher level of abstraction)

reply

upvote

by r-johnv1 hours ago|

[-]

Thank you for trying it out. Do you have any feedback on how it worked for you, and things that you would hope to see in it?

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by k-warburton4 hours ago|

[-]

This is a very intriguing application of physical AI. It is astounding to see demos of how simple, human instructions can produce complex machine actions. What can we do to improve the safety of protecting assets or people in the case of misuse? I see some code contributions regarding permission controls, but are there other steps we can take to ensure this technology "understands" when the physical motion being requested is not appropriate because it might endanger people or expensive hardware that is difficult to replace?

reply

upvote

by sdallagasperina56 minutes ago|

[-]

Great point, I believe safety has to be layered. The real challenge is deciding which agent is responsible for judging whether a command is safe to execute. For instance, MCP could enforce permissions, rate limits, and safe defaults, while the ROS stack could add motion constraints, watchdogs, and velocity/force caps, all backed by physical interlocks as the final safeguard.

reply

upvote

by sdallagasperina55 minutes ago|

[-]

I see the LLM not as the one giving direct commands, but as suggesting a path. An arbitration layer should always check whether that suggestion is safe, and if it isn’t, the system should fall back to a deterministic, well-tested behavior. That way you get flexibility without ever compromising safety.

reply

upvote

by avasan10 hours ago|

[-]

LLMs sometimes hallucinate while stating one thing and outputting disconnected commands/code on the back end in many superficial/general models (Claude included) - does the MCP try to correct/account for that? Or is user oversight necessary to ensure that actions match the LLM output? Is the only way to stop aberrant operation by hitting a physical emergency stop or can the LLM interface be used to for that?

reply

upvote

by sdallagasperina8 hours ago|

[-]

Great question! I’m one of the collaborators on the project. Right now, the MCP server doesn’t “correct” hallucinations itself, but it enforces a strict tool interface: the LLM can only call valid ROS topics, services, or actions that actually exist and that are explicitly exposed as safe to use. This information is provided through the MCP, so if the model hallucinates a command, the call simply fails gracefully rather than executing something unintended.

For more advanced use cases, we’re also thinking about adding validation layers and safety constraints before execution — so the MCP acts not just as a bridge, but also as a safeguard.

reply

upvote

by r-johnv7 hours ago|

[-]

Adding direct links to two of the videos of the MCP server in action.

This is the video of interacting with and debugging an industrial robot. (A few of the other comments here have been talking about this, that we see some amount of what looks like emergent behavior) https://www.youtube.com/watch?v=SrHzC5InJDA

This is a video from a collaborator research lab controlling a Unitree Go (robot dog) https://youtu.be/RW9_FgfxWzs?si=o7tIHs5eChEy9glI

reply

upvote

by awanacode8 hours ago|

[-]

Hey guys, nice work! Finally someone is taking the bull by the horns.

What excites me most is the potential for MCP to help with diagnostics and deployment for non-developers. A lot of lab techs or operators don’t want to dive into ros2 topic hz or parse logs — they just want to ask simple questions like “why isn’t the arm responding?” or “is this topic publishing?”.

A natural language layer over ROS could make debugging and deployment way easier for non-technical users — almost like having a conversational ros2 doctor or ros2 launch.

reply

upvote

by sdallagasperina1 hours ago|

[-]

Exactly! With MCP, we’ve started to imagine a workflow where instead of digging through logs, you just ask “why isn’t the robot responding?” and get guided through the diagnostics. No need to memorize every ROS command.

This isn’t just a bridge between LLMs and robots, it can also be a bridge between non-developer operators and the ROS ecosystem.

reply

upvote

by querist97 hours ago|

[-]

great. thanks for the nice comment.

reply

upvote

by avasan10 hours ago|

[-]

This looks like very exciting work! I'm curious - how much pre-context did you need to provide Claude to operate the industrial robot? That looks like a very complex environment.

reply

upvote

by r-johnv10 hours ago|

[-]

Thank you! This has been an exciting project to work on!

For the industrial robot (in the video on the main readme page) I intentionally gave Claude no context beforehand. All of the inferences that you see there are based on information that it got through the MCP tools that let it analyze the ROS topics and services on the robot.

In fact, I had a starting prompt to ignore all context from previous conversations because this looked to be like an example of emergent behavior and I wanted to confirm that it was not picking things from my earlier conversations!

reply

upvote

by avasan7 hours ago|

[-]

If you don't start from a clean-slate, does the behavior change drastically? Does the interaction between the LLM layer and the MCP/ROS become more efficient if it already has some context? Would that be something you'd want to toggle or do previous conversation contexts cause issues with new command implementations?

reply

upvote

by r-johnv7 hours ago|

[-]

I'm actually reviewing a PR right now that enables the MCP server to have a library of common robots with a specification file for each robot containing important context to the language model.

The demo video with the Unitree Go (robot dog) uses this approach to give the LLM additional context of the custom poses available to it.

reply

upvote

by r-johnv7 hours ago|

[-]

Not starting from clean slate improves the interaction significantly. I started from a clean slate on the industrial robot video in order to highlight how much is possible even when starting from one.

reply

upvote

by avasan7 hours ago|

[-]

That makes this all the more impressive!! What happens when you get an incorrect interpretation though? That is now in the "previous context bucket". Assuming the user addresses the issue through the LLM layer by talking to through it, do you think that the subsequent interactions could compound the error?

I sometimes face issues with LLMs running out of tokens or only using partial contacts from previous conversations - thereby repeating or compounding on previous incorrect responses.

Any thoughts on how to tackle that? Or is that too abstract a problem/beyond the scope to address at the moment?

reply

upvote

by r-johnv7 hours ago|

[-]

(Mentioning beforehand that we're still very early when it comes to the exact behavior of each language model)

So far, I've observed that for Claude and Gemini, which are what we've been testing most with, the Language model has been pretty good at recognizing its a faulty initial interpretation when it queries more information from the system.

Running out of tokens is a more significant issue. We saw it a lot when we queried imaged topics, which led us to try writing better image interpreters within the MCP server itself (credit to my collaborators at the Hanyang university in Korea) to defend the context window. Free tiers of the language models also run out of tokens quite quickly.

PS - Thank you for the questions, I'm enjoying talking about this here on HN with people who look at it critically and challenge us!

reply

upvote

by lpigeon7 hours ago|

[-]

I believe this project will play a significant role in helping to control robots using natural language!

reply

upvote

by sdallagasperina1 hours ago|

[-]

Thanks for the comment! We see the future of human–robot collaboration as being closely tied to how LLMs can translate verbal instructions into higher-level, longer-horizon commands. The goal here isn’t to “code faster,” but to make things like diagnostics and behavior tree design more intuitive and accessible — both for developers and for operators who don’t want to dive deep into ROS internals.

reply

upvote

by suninderminion7 hours ago|

[-]

Great Work!

reply

upvote

by gauravbot1 hours ago|

[-]

[dead]

reply