Meet Orbit

An SRE agent built on AWS Bedrock AgentCore. It lives in Slack, thinks with Claude Opus 4.6, and keeps your infrastructure in check — with human-in-the-loop safety for every dangerous action.

Slack Integration
Step Functions
Bedrock AgentCore
DynamoDB
Lambda (Python)
Explore Architecture
12
Lambdas
8
Skills
66
Trusted Domains
8hr
Max Runtime
17
Auto-Deny Rules
How It Works

Three steps. Zero overhead.

@mention Orbit in any Slack channel. It processes your request through a serverless pipeline with built-in safety rails.

1

Slack Trigger

User @mentions Orbit in a Slack thread. The message hits API Gateway, gets signature-verified, deduplicated, and kicks off a Step Functions workflow.

2

Agent Processing

Step Functions invokes the Orbit agent on AgentCore via callback pattern. Claude Opus 4.6 processes the request with access to CloudWatch, Datadog, Jira, Confluence, and more.

3

Safe Response

Every tool call passes through a four-tier permission guard. Structural shell bypasses and catastrophic commands are auto-denied, dangerous actions require Slack approval, and safe commands auto-allow. Responses are chunked and posted back to the thread.

Architecture

Main Request Flow

From @mention to response — follow the path of a Slack message through the entire serverless pipeline.

Slack / API Gateway
Lambda Functions
Step Functions
AgentCore Runtime
DynamoDB
Click to watch a request flow through the system
click to expand
Slack Workspace
@mention Orbit API Gateway receives events
Approve / Reject API Gateway receives actions
Two API Gateway HTTP routes receive all Slack traffic. Every request is verified with HMAC-SHA256 before any processing occurs. The Events route handles @mentions; the Actions route handles interactive button clicks from the HITL approval flow.
click to expand
Verification Lambda
1. HMAC-SHA256 signature check
2. Dedup via DynamoDB (1h TTL)
3. Start Step Functions
4. Return 200 within 3s
Validates request signature, deduplicates via DynamoDB with TTL, then starts async processing via Step Functions. Must ACK within Slack's retry window.
click to expand
Handle Interactivity Lambda
1. HMAC-SHA256 signature check
2. Atomic DynamoDB update
    (prevents double-click race)
3. Update Slack message with decision
Handles approval button clicks with atomic DynamoDB updates to prevent race conditions. Supports both tool-level and workflow-level approval modes.
click to expand
Step Functions (callback pattern)
PostThinking — post "Thinking…" to Slack
InvokeAgentWithCallbackwaitForTaskToken
PostResult — update thread with response
Error handlers — 4 catch states
Callback pattern: Step Functions generates a unique task token and PAUSES at zero cost. The agent processes asynchronously and calls SendTaskSuccess when done.

Includes configurable retry with exponential backoff and jitter, execution timeouts, and multiple error handler states that post specific error messages back to the Slack thread.
click to expand
invoke_agent Lambda
Generate deterministic session ID from Slack thread
Invoke AgentCore with task token + prompt
Generates a deterministic session ID from the Slack thread context, ensuring all messages in the same thread share a session for multi-turn conversation. Fetches thread history for context injection.
click to expand
AgentCore Runtime (Orbit)
Spawns background thread, returns ACK
Claude Opus 4.6 processes the request
Sends SFN heartbeats every 30 min
Calls SendTaskSuccess when done
Tool Permission Guard (tool_guard_hook)
SAFE auto-allow — Read, Grep, CloudWatch, Lumigo, etc.
STRUCTURAL auto-deny — $(...), eval, | bash, exec, netcat
CATASTROPHIC auto-deny — fork bomb, mkfs, dd to device
DANGEROUS HITL approval — rm -rf, kill -9, untrusted URLs
Skills: cloudwatch-guide, datadog-guide, lumigo-guide, jira-guide, confluence-guide, embrace-guide, tacobell-store-api, tacobell-menu-api
MCP servers: CloudWatch, Jira, Confluence, Lumigo, Datadog, Embrace
Session persistence: Claude session ID stored locally for conversation continuity across invocations.
Thread context: Injects prior Slack messages into prompt (full, missed, or none based on session freshness). Truncated to 2,000 chars/message, 80,000 chars total.
click to expand
DynamoDB
event-dedup-table
approval-tokens-table
Event dedup table: Prevents duplicate Slack event processing using TTL-based expiration.
Approval tokens table: Stores HITL approval state and tool context with automatic TTL cleanup.
Safety

Human-in-the-Loop Approval

When the tool guard classifies a command as dangerous, the agent pauses and asks a human reviewer via Slack buttons. Fail-closed on timeout.

Click to watch the HITL approval flow in action
click
Agent detects danger
Tool classified as
DANGEROUS tier
The tool_guard_hook runs before every tool call. When a bash command matches dangerous patterns (rm -rf, kill -9, etc.) or a WebFetch targets an untrusted domain, the agent initiates the approval flow.
click
post_approval_request
Post Slack buttons
Store approval_id in DynamoDB
Generates a unique approval_id, stores the tool call context (command, arguments, reason) in DynamoDB, and posts a Slack message with [Approve] and [Reject] buttons to the thread.
Slack Buttons
Approve Reject
Reviewer clicks to decide
click
handle_interactivity
Atomic DynamoDB update
Prevents double-click
Uses DynamoDB ConditionExpression: only succeeds if status = PENDING. If two reviewers click simultaneously, only the first write wins. Updates the Slack message to show who approved/rejected and when.
DynamoDB
approval-tokens-table
Stores approval decision
Agent polls
Periodic polling with timeout
Fail-closed on timeout
APPROVED tool executes REJECTED tool denied, agent informed TIMEOUT tool denied (fail-closed)
Interactive

Tool Guard Playground

Try typing a bash command to see how the four-tier permission guard classifies it in real-time. Structural shell bypasses and catastrophic commands are auto-denied, dangerous commands require HITL approval, and safe commands auto-allow.

Enter a command above to see its classification
Try these examples:
ls -la /var/log
cat /etc/hosts
rm -rf /tmp/cache
kill -9 1234
:(){ :|:& };:
mkfs.ext4 /dev/sda1
chmod 777 /etc/passwd
systemctl stop nginx
dd if=/dev/zero of=/dev/sda
python3 -c "import os"
kubectl get pods
sed -i 's/foo/bar/' config
xargs rm *.log
shutdown -h now
rm -rf /
echo test | bash
eval "rm -rf /"
bash -c "whoami"
nc -l 4444
Infrastructure

Lambda Functions

Serverless Lambda functions powering the pipeline. Lambdas needing slack_sdk share a Lambda Layer.

Function Purpose Timeout
verificationSignature verify, dedup, start Step Functionsfast
invoke_agentGenerate session ID, invoke AgentCoremedium
post_to_slackPost/update Slack messages, rate limit retry, chunkingmedium
post_approval_requestPost Slack approval buttons, store tokenmedium
handle_interactivityHandle button clicks, atomic update, callbackfast
resume_agentSend approval decision to agent (workflow-level HITL)medium
scheduled_triggerStart proactive health check workflows on schedulefast
jiraJira REST API integration (search, CRUD, transitions)medium
confluenceConfluence REST API integration (search, CRUD, comments)medium
lumigoLumigo Log API integration (search, aggregate, investigate)extended
datadogDatadog REST API (monitors, metrics, logs, incidents)medium
embraceEmbrace Metrics API integration (crash data, session analytics)medium
Capabilities

Agent Tools

65 tools across 9 categories. Click a category to expand. Every tool is classified by the permission guard.

55
Auto-Allow
10
HITL Required
6
MCP Servers
8
Skills
🛠
Built-in Claude Tools 10
Read
Read file contents from the filesystem
AUTO
Write
Write or create files on disk
AUTO
Edit
Edit existing file contents in-place
AUTO
Glob
Find files by pattern matching
AUTO
Bash
Execute shell commands (4-tier classification)
SMART
WebSearch
Search the web for information
AUTO
WebFetch
Fetch URL content (trusted domains auto-allow, others HITL)
SMART
Skill
Load skill reference docs for guided tool usage
AUTO
Task / TaskList / TaskGet
Task management and progress tracking
AUTO
Notebook / NotebookEdit
Create and edit Jupyter-style notebooks
AUTO
📊
CloudWatch MCP 9 tools · all auto-allow
get_metric_data
Query raw metric datapoints and timeseries
AUTO
analyze_metric
Statistical analysis: avg, p50, p90, p99
AUTO
get_active_alarms
List currently firing CloudWatch alarms
AUTO
get_alarm_history
Retrieve alarm state-change history
AUTO
describe_log_groups
List and search CloudWatch log groups
AUTO
analyze_log_group
Summarize recent activity in a log group
AUTO
execute_log_insights_query
Run CloudWatch Logs Insights queries
AUTO
get_logs_insight_query_results
Retrieve Logs Insights query results
AUTO
get_recommended_metric_alarms
Get alarm recommendations for resources
AUTO
🐝
Datadog MCP 14 tools · 3 HITL
search_monitors
Search monitors by status, tags, or name
AUTO
get_monitor
Get full details for a specific monitor
AUTO
query_metrics
Query AWS API Gateway metric timeseries
AUTO
search_metrics
Search available metric names by prefix
AUTO
search_logs
Search and retrieve Datadog log entries
AUTO
search_events
Search Datadog events by time and tags
AUTO
list_incidents
List Datadog incidents with filters
AUTO
get_incident
Get incident details by ID
AUTO
list_dashboards
List dashboards, optionally filtered by title
AUTO
get_dashboard
Get dashboard details and widget summary
AUTO
list_downtimes
List currently scheduled downtimes
AUTO
mute_monitor
Mute a Datadog monitor
HITL
unmute_monitor
Unmute a Datadog monitor
HITL
schedule_downtime
Schedule a Datadog downtime window
HITL
🎯
Jira MCP 8 tools · 4 HITL
jira_search
Search Jira issues using JQL queries
AUTO
jira_get_issue
Get issue details by key (includes recent comments)
AUTO
jira_get_transitions
Get available status transitions for an issue
AUTO
jira_get_issue_sla
Get SLA information for JSM request issues
AUTO
jira_create_issue
Create a new Jira issue
HITL
jira_update_issue
Update fields on an existing issue
HITL
jira_transition_issue
Transition issue to a new status
HITL
jira_add_comment
Add a comment to a Jira issue
HITL
📖
Confluence MCP 8 tools · 3 HITL
confluence_search
Search Confluence pages using CQL queries
AUTO
confluence_get_page
Get page content by ID
AUTO
confluence_get_page_views
Get page view analytics
AUTO
confluence_get_comments
Get comments on a Confluence page
AUTO
confluence_get_page_children
Get child pages of a parent page
AUTO
confluence_create_page
Create a new Confluence page
HITL
confluence_update_page
Update an existing Confluence page
HITL
confluence_add_comment
Add a comment to a page
HITL
🔎
Lumigo MCP 3 tools · all auto-allow
lumigo_search_logs
Search Lambda logs by severity, resource, or free text
AUTO
lumigo_aggregate_logs
Aggregate log data: count, avg, p95, p99, timeseries
AUTO
lumigo_get_issue_details
Investigate issues with root cause analysis and stack traces
AUTO
📱
Embrace MCP 3 tools · all auto-allow
embrace_list_metrics
List available metric names from Embrace, optionally filtered by substring
AUTO
embrace_query_instant
Execute a PromQL instant query for current metric values
AUTO
embrace_query_range
Execute a PromQL range query for time-series metric data
AUTO
🌮
Taco Bell API Tools 2 scripts via Bash
store_lookup.py
Taco Bell store locator — search by lat/lng, ZIP, or address. Returns store details, hours, and capabilities.
AUTO
menu_lookup.py
Taco Bell menu catalog API — search menu items by name, get item details, nutrition info, and pricing by store.
AUTO
📚
Skills (Reference Guides) 8 skills
cloudwatch-guide
CloudWatch metric queries, Log Insights syntax, alarm investigation playbooks
GUIDE
datadog-guide
Monitor search, metric query format, dashboard lookup, troubleshooting
GUIDE
lumigo-guide
Log search syntax, aggregation patterns, issue investigation workflows
GUIDE
jira-guide
JQL search, issue CRUD, project board conventions
GUIDE
confluence-guide
CQL search, page CRUD, space conventions
GUIDE
embrace-guide
Embrace Metrics API, crash analytics, session investigation
GUIDE
tacobell-store-api
Store locator API endpoints, response schema, search examples
GUIDE
tacobell-menu-api
Menu catalog API endpoints, item search, nutrition data schema
GUIDE
Technology

Tech Stack

The building blocks behind Orbit.

🧠

Claude Opus 4.6

Frontier reasoning model powering all agent decisions

☁️

Bedrock AgentCore

AWS-managed agent runtime with session persistence

AWS Lambda

Serverless Python functions with shared layers

🔄

Step Functions

Callback pattern orchestration with zero-cost waits

🗀

DynamoDB

Event dedup + HITL approval state with TTL cleanup

💬

Slack API

Events + Interactivity with HMAC-SHA256 verification

📊

CloudWatch MCP

Metrics, logs, alarms via MCP server integration

🐝

Datadog

Monitors, metrics, logs, incidents via REST API Lambda

🔎

Lumigo

Log search, aggregation, and trace investigation

🎯

Jira

Issue search, CRUD, transitions, board management

📖

Confluence

Page search, CRUD, comments, space management

📱

Embrace

Mobile crash analytics, session metrics via PromQL

🌎

Terragrunt

Infrastructure as code for all AWS resources