The Steadybit MCP Server enables organizations to use AI to reveal reliability gaps and surface new insights on experiment results
SOLINGEN, Germany–(BUSINESS WIRE)–#AI–Steadybit GmbH, the leader in chaos engineering and reliability testing, announced today the launch of the new Steadybit MCP (Model Context Protocol) Server – the first AI-extensible solution for chaos engineering.
This MCP Server is a standardized way to connect Steadybit data to LLMs and AI workflows, enabling SRE teams to rapidly run analysis and generate insights about their system reliability and resilience. Recent high-profile outages across major cloud and security platforms highlight the tremendous cost of unexpected system failures.
As SRE teams work to improve their system reliability in an increasingly complex world, chaos engineering is the go-to strategy for making proactive improvements. AWS describes chaos engineering as a strategic necessity that is “essential for improving resilient systems”, and Gartner recommends chaos engineering for organizations as a critical resilience practice.
Bringing Chaos Engineering Into the AI Era
By running chaos experiments with Steadybit, teams are able to test and define the limits of their system resilience before incidents occur so they can mitigate risks and validate redundancies. With this new MCP, teams can easily pull data from their chaos experiments into their LLM workflows.
“Every team and tech stack works a little differently. We believe it’s important for a chaos engineering tool to be as easy to deploy and customize as possible, while maintaining the best-in-class features that make adoption across an enterprise seamless,” said Benjamin Wilms, CEO and Co-founder of Steadybit.
“With our new MCP, we are providing a new way for teams to work with their experiments to learn about their systems and improve their overall system resilience.”
By using all the data from past incidents, post-mortems, and completed experiments, the Steadybit MCP Server can help SRE teams uncover reliability learnings and take informed actions to improve their systems.
Prompt Examples Featuring the Steadybit MCP
With simple prompts, organizations using Steadybit for chaos engineering can now use LLM workflows in Claude, Gemini, or ChatGPT to get answers to questions like:
- “We’ve been running experiments with Steadybit for a few months now. Can you create a report to summarize the experiment results since then for each team?”
- “Review the types of experiments we have been running so far. Can you recommend a prioritized list of experiment types relevant to our systems that we have not yet run?”
When the Steadybit MCP is combined with other MCPs from observability and incident response tools, teams can then enter even more meaningful prompts, like:
- “Since we have started running chaos experiments, please use metrics in PagerDuty to report the difference it has made on our MTTR and incidents.”
- “Review recent incidents for Service A in Datadog. Can you suggest a few experiments we could run with Steadybit that would help us test and improve the service’s reliability?”
Introducing New Reliability Workflows for Teams
“As our teams test out different AI use cases, we can now directly connect data from Steadybit into any LLM workflows,” commented Krishna Palati, Director of Software Engineering at Salesforce. “This MCP will enable us to just type a prompt to pull custom reports, analyze reliability testing gaps, and get insights on what experiments to run next.”
Steadybit is on a mission to make it easier for teams to adopt and roll out chaos engineering at scale. With this latest release, Steadybit is making chaos engineering more accessible and empowering teams to innovate and learn with every experiment.
About Steadybit
Steadybit is the chaos engineering platform that makes it easy for organizations to proactively reveal reliability issues and train their operational resilience. With Steadybit, reliability and platform teams can quickly build, customize, and deploy experiments across their full tech stack using an intuitive no-code editor, flexible open source framework, and extensive automation capabilities.
With strong observability integrations, Steadybit enables teams to seamlessly optimize alerts, discover reliability gaps, and establish continuous verification of their systems. With this proactive approach to reliability, enterprises can confidently achieve service availability objectives, mitigate incidents, and deliver best-in-class services at scale.
To learn more, visit steadybit.com to request a demo or start a free trial today.
Contacts
Media:
Patrick Londa
Head of Marketing, Steadybit
[email protected]
01
From telecom veteran to Dutch Startup Visa success: The Jignesh Dave story