Guardrails
The Guardrails (Advanced) tab in MagicBlocks gives you control over how your AI Agent handles sensitive data, inappropriate inputs, and compliance rules.
By configuring Guardrails, you ensure that your AI:
- Follows factual and ethical guidelines
- Redacts personally identifiable information (PII)
- Detects unsafe or jailbreak attempts
- Monitors content for moderation
- Rewrites messages to comply with your set rules
This system ensures your Agent stays professional, safe, and accurate — no matter what users type.
When to Use
Use the Advanced Guardrails tab when you want to:
- Enforce brand, legal, or compliance rules across conversations
- Automatically filter or redact personal or confidential data
- Prevent unsafe, biased, or off-topic responses
- Detect “jailbreak” attempts or manipulative prompts
- Add moderation messages for sensitive conversations
This is especially useful for industries with strict standards — such as finance, healthcare, legal, or education.
Advanced Guardrails Settings
1. Rules
Rules define the behavior boundaries for your AI Agent.
They ensure your Agent provides accurate, compliant, and responsible replies.
Example rules:
- Acknowledge when information is unknown or unavailable.
- Do not fabricate information if not available.
- Do not provide estimates on pricing or rates unless stated.
- Avoid giving advice outside of introduction services.
- Maintain confidentiality of client information at all times.
- Clearly communicate business offerings without exaggeration.
- Direct users to a human representative for specific inquiries.
- Ensure all claims about services are truthful and verifiable.
- Never make up facts; avoid fabricating details or fake links.
- Refrain from phrases like “I can help you find...” since there’s no access to real-time data.
Each rule works as a checkpoint — if the AI’s draft response violates one, MagicBlocks will auto-rewrite or fallback with a compliant message.
You can add new rules anytime by clicking + New Rule (up to 15 per Agent).
2. PII Collection Control
Purpose:
Control which personal or sensitive data your AI Agent can collect or display.
How it works:
Use this dropdown to Select PII to exclude from chat, such as:
Names
Phone numbers
Email addresses
Credit card details
This helps you comply with privacy standards like GDPR or HIPAA.
3. Redaction
Purpose:
Automatically detect and hide sensitive information before it’s stored or processed.
Options:
Off — The AI will allow all information to pass through (no filtering).
On — MagicBlocks will redact selected PII from user or AI messages.
When On, you can select which information types should be redacted.
Example: 3 Selected → Name, Email, and Phone Number.
Tip: Keep redaction on for live or production Agents that handle user data.
4. Rules Monitor
Purpose:
Actively enforces Guardrail rules during chat.
If the AI produces a response that breaks one, it will automatically rewrite it.
Options:
Off — Disables rule monitoring (AI responds freely).
On — Enables automatic rewrite to align with set rules. If rewriting fails, the Agent shows your fallback message.
Example:
If your Agent tries to provide an estimate when “No Pricing Discussion” is enabled, it rewrites the message to:
“I don’t have pricing information available, but I can connect you with our sales team.”
5. Jailbreak Prevention
Purpose:
Protects your Agent from malicious or manipulative prompts (e.g., “ignore all previous instructions,” “pretend you’re not an AI”).
Toggle:
Off — Jailbreak detection is disabled.
On — AI automatically detects and ignores unsafe prompts.
When enabled, this feature ensures users cannot trick the Agent into breaking company rules, accessing hidden data, or revealing restricted info.
6. Moderation
Purpose:
Filters inappropriate, offensive, or high-risk language before it reaches your users.
Modes:
Select (Default) — Use MagicBlocks’ default moderation filter.
Custom — Create your own moderation message for sensitive responses.
Custom Message Example:
“Sorry, I can’t respond to that request. Let’s keep the conversation focused on how I can help you with our services.”
You can add snippets ({user_name}, {agent_name}) for personalization.
Summary — Guardrails (Advanced)
Setting | Purpose |
|---|---|
Rules | Define how your AI should behave and communicate. |
PII Collection Control | Choose which personal data to exclude from chat. |
Redaction | Automatically hide sensitive information. |
Rules Monitor | Rewrite or block responses that break your rules. |
Jailbreak Prevention | Detect and neutralize unsafe or manipulative prompts. |
Moderation | Filter or replace inappropriate messages with safe responses. |
Best Practices
Always enable Redaction for production Agents.
Use Rules Monitor ON to prevent compliance violations automatically.
Regularly review your Guardrail Rules — especially when updating brand or legal standards.
Turn on Jailbreak Prevention for Agents deployed on public websites.
Personalize your Moderation message to keep tone consistent with your brand.
Combine Guardrails with Memory Capture to ensure secure, context-aware sessions.
FAQs
1. Can Guardrails rewrite AI responses automatically?
Yes — with Rules Monitor ON, MagicBlocks auto-rewrites messages that break your rules.
2. What happens if redaction is off?
Sensitive information will be visible and stored in session logs. Only disable in testing environments.
3. How many custom rules can I add?
You can create up to 15 Guardrail rules per Agent.
4. What is a “Jailbreak” attempt?
It’s when a user tries to override the AI’s instructions (e.g., “ignore your rules,” “act like a human”). The system detects and blocks it automatically.
5. Can I use snippets in Moderation messages?
Yes — snippets personalize fallback or moderation replies dynamically.