Introduction to Guardrails

Your AI is protected with MagicBlocks' groundbreaking Guardrail models.

This means that our models will detect and prevent people trying to trick or break your AI Agents (jailbreak), moderate the messages coming in, or redact sensitive information (like social security numbers or financial info).    

If it detects anything amiss, the AI will respond with a message along the lines of, 'Sorry, I'm having trouble with that one. Would you mind rephrasing?'

Why is protection important?

Jailbreaking attempts can cause compromised security, legal issues, unreliability of the AI's performance, and damaged customer trust. Our Guardrail models will help you protect your reputation and protect your customers' data. Plus, it provides a much better experience for users doing the right thing.

Principles Monitor: ensures that the AI’s responses meet your established principles. If a response doesn’t align with these guidelines, the AI will attempt to rephrase. If it can’t, it will display a fallback message.

Redaction: The AI can remove sensitive or confidential info sent in by an end user, before it reaches our platform. Things like identification, personal data, and financial details are commonly redacted to protect privacy.

Jailbreak prevention: We will detect and stop anyone trying to mess with your AI.

Moderation: We detect and prevent bad or hateful messages going into the AI Agents.

Set up Guardrails in MagicBlocks:

 

In your agent workspace, go to General.1

Go to 'Guardrails'2

Here, in Basic Mode you can34

and for Advanced mode you can see various options like setup Redaction, Jailbreak Prevention, and Moderation. 5

These features allow you to configure specific protections for your AI.

Choose the option 'On.'

On Redaction you can choose the sensitive info you want to redacted such as:

  • Date of Birth
  • Email Address
  • Phone Number
  • Tax Identifier
  • Password
  • Bank Account
  • Credit Card Number

You can either choose a message from your saved 'Snippets', or type a custom message of your own.

Examples of messages

  • Oh no! Looks like something's gone wrong. Please try again in a few minutes.
  • I can't help you with that. Try asking something else.
  • Hmm... I can't help you with that. Looks like it goes against the overarching AI policy, so we've flagged it with our team (and please ask something else next time).
  • I can't answer that. Is there something else I can help with?

Was this article helpful?