undefined

points

[-]

Thanks. Yeah, Cursor / Claude code + MCP is powerful. We differentiate on two fronts, mainly:

1) Greater accuracy with our specialized tools: Most MCP tools allow agents to query data, or run *ql queries - this overwhelms context windows given the scale of telemetry data. Raw data is also not great for reasoning - we’ve designed our tools to ensure that models get data in the right format, enriched with statistical summaries, baselines, and correlation data, so LLMs can focus on reasoning.

2) Product UX: You’ll also find that text based outputs from general purpose agents are not sufficient for this task - our notebook UX offers a great way to visualize the underlying data so you can review and build trust with the AI.

by hrimfaxi9 hours ago|

parent|

[-]

To be clear, are the main differentiators basically better built-in MCPs and better UX? Not knocking just trying to understand the differences.

I have had incredible success debugging issues by just hooking up Datadog MCP and giving agents access to it. Claude/cursor don't seem to have any issues pulling in the raw data they need in amounts that don't overload their context.

Do you consider this a tool to be used in addition to something like cursor cloud agents or to replace it?

by behat8 hours ago|

parent|

[-]

For the debugging workflow you described, we would be a standalone replacement for cursor or other agents. We don't yet write code so can't replace your cursor agents entirely.

Re: diffentiation - yes, faster, more accurate and more consistent. Partially because of better tools and UX, and partially because we anchor on runbooks. On-call engineers can quickly map out that the AI ran so-and-so steps, and here's what it found for each, and here's the time series graph that supports this.

Interesting that you have had great success with Datadog MCP. Do you mainly look at logs?

by verdverm6 hours ago|

parent|

[-]

> For the X workflow, we would be a standalone replacement for other agents.

Imo, this is not what users want. They want extension to their agent. If a project tells me I have to use their interface or agentic setup, it's 95% not going to happen. Consider how many SaaS tools we already have to deal with, that many agents is not desirable, they all have their little quirks and take time to "get to know"

Instead, build extensions, skills, and subagents that fit into my agentic workflow and tooling. This will also simplify what you need to do, so you can focus on your core competency. For example, you should be able to create a chat participant in VS Code / Copilot, and take advantage of the native notebook and diff rendering, sharing the MCPs (et al) the user already has for their agents for their internal systems.

by behat6 hours ago|

parent|

[-]

> They want extension to their agent. If a project tells me I have to use their interface or agentic setup, it's 95% not going to happen

Yes, there’s definitely friction there. It may be that the right form factor is that you trigger Relvy’s debugging agent via Claude code / Cursor .

Our early users are heavy on needing to look at the raw data to be able to review the AI RCA, so a standalone set up makes sense. Also, the dominant usage pattern is background agentic execution triggered by alerts, and not manual.

by verdverm6 hours ago|

parent|

[-]

Yup, we are moving up the ladders of abstraction and will have our agentic team interfaces that include agents triggered outside of human input. It does not change things. As soon as I need to go into the code or to the agent to fix the problem, I'm back to copy and pasting, or switching to view, between multiple interfaces. That's the kind of stuff we loathe

Runbooks are great and all, but actions need to be taken and I'm not going to give all the vendor interfaces to the internal systems. They can be subagents in my system which already has the tools and permission gates needed, access to code and git for IaC changes, etc...

It seems like the way to go now, it's easier to get moving and show off an experience and the vision, but it's definitely not the operational way in prod for a lot of reasons, security being a paramount one.

I also do not discount that your SaaS can be easily replaced by an open sourced subagent team in the next couple of years.

by 8 hours ago|

parent|

prev|

[-]

deleted

by esafak8 hours ago|

prev|

[-]

They claim a 12% lead (from 36% to 48%) over Opus 4.6 in a RCA benchmark: https://www.relvy.ai/blog/relvy-improves-claude-accuracy-by-...

by behat8 hours ago|

parent|

[-]

heh, I was just about to post the following on your previous comment re: reproducible benchmark results. Thanks for posting the blog.

With the docker images that we offer, in theory, people can re-run the benchmark themselves with our agent. But we should document and make that easier.

At the end of it, you really would have to evaluate on your own production alerts. Hopefully the easy install + set up helps.