Silmaril

The world's first self-healing prompt injection defense

Book a Demo
Backed by Y Combinator

SILMARIL HACKED

MicrosoftOpenAIAnthropicGooglePerplexityDropbox

Problem

Attacks on AI are compounding in complexity faster than defenses can adapt, and the gap is widening.

6 months ago, a prompt injection was a single hidden instruction to fool the model. If the guardrail or the model caught the pattern, the attack failed.

Today's attacks are chains that manipulate agents. A poisoned input such as a calendar invite triggers agent behavior which exfiltrates data, escalates privilege, and causes real damage. Guardrails are too static to block these chains.

This complexity used to be theoretical because no human would manually sequence these chains at scale. Now attackers use AI to create thousands of multi-step attack paths and converge on the ones that work, causing $30B in damages in 2025 alone.

THIS IS HAPPENING NOW
01

EchoLeak hijacked Microsoft 365 Copilot through a single unopened email, silently exfiltrating enterprise data from OneDrive, SharePoint, and Teams without any user interaction CVE-2025-32711 · CVSS 9.3

02

GitHub Copilot YOLO Mode RCE let hidden instructions in README files auto-approve shell commands on developer machines, while the Morris II worm proved these attacks can now self-replicate across AI ecosystems using natural language alone, infecting 20 new hosts per compromised client.

03

In March 2026, researchers chained a crafted Google Calendar invite through Perplexity's Comet browser to achieve full 1Password vault takeover. The fix took four months and multiple bypasses to land.

Solution

Silmaril wraps your inference calls to evaluate whether an execution sequence is heading toward a harmful outcome.

Existing guardrails filter inputs. Silmaril's multihead classifier inspects user intent, application context, and execution states together to detect harmful outcomes before they materialize. The model retrains continuously on exploits our threat hunting agents discover in your environment. Your defense gets stronger before attackers even have a chance to probe it.

Five lines of code, zero overhead. Silmaril operates at the application layer and supports every major agentic SDK and inference provider. It is available as a managed SaaS for teams that want to move quickly, or as a self-hosted deployment for environments with strict data residency and compliance requirements. Blocking is configurable by workflow node type.

from langchain_anthropic import ChatAnthropic
from silmaril_security import FirewallHandler

firewall = FirewallHandler(
    api_key=SILMARIL_API_KEY,
    api_url=SILMARIL_API_URL,
)

llm = ChatAnthropic(model="claude-sonnet-4-6")
result = llm.invoke(messages, config={"callbacks": [firewall]})
Claude CodeClaude Code
Run thousands of powerful coding agents in parallel, each one protected from exploits at every step.
OpenClawOpenClaw
Deploy fleets of autonomous agents connected to critical services, fully defended from the first request.

Approach

01 // ATTACK

Finding vulnerabilities before attackers do

Autonomous agents probe the defender’s system through the UI, determining trust boundaries and hacking from first principles. They chain AI risks such as prompt injection, tool abuse, context poisoning, and more. Silmaril has found chains resulting in self-replicating worms and cross-user remote code execution.

02 // PROTECT

Blocking attacks in real-time

Classifier model in the firewall is based on the deberta-v3-base model architecture. It is trained on execution traces from autonomous threat research runs against your application, so it learns your specific decision boundaries, tool calls, and data flows. At inference time, it evaluates the current execution sequence (user intent, application context, and accumulated state) and outputs a binary pass/block decision with a p90 latency of 20ms. Integration is 5 lines of code, zero overhead.

When the classifier blocks, it throws an error to the orchestration layer, allowing you to determine threat handling.

03 // RETRAIN

Turning every attack into a deployed defense

Every attack discovered generates synthetic training data. The firewall retrains and deploys updated weights automatically, from discovery to active defense in under an hour. When a novel technique is blocked at one deployment, it is anonymized and propagated to every other firewall deployment.

Performance

Accuracy

5060708090100
Silmaril Firewall
0.0%
BrowseSafe
0.0%
Lakera Guard
0.0%
GPT Safeguard
0.0%
Model Armor
0.0%

Metrics

SystemPrecisionRecallF1Latency
Silmaril Firewall0.9320.9790.95520ms
Lakera Guard0.6990.8070.749114ms
BrowseSafe0.9090.6260.741102ms
GPT Safeguard1.0000.2610.413537ms
Model Armor0.7780.2650.395220ms
BENCHMARKED ON PRODUCTION ATTACK DATA

Threats Blocked

15 critical vulnerabilities disclosed to OpenAI, Anthropic, Google, and Microsoft in two weeks.

CASE STUDY // CRITICAL

#1 AI-native productivity app

Silmaril discovered exploits and retrained the firewall, preventing $68M in damages over a month, spanning:

  • //Self-replicating worm propagation via document poisoning
  • //Agent-to-agent supply chain compromise
  • //Sandbox credential theft leading to cross-user remote code execution
  • //Zero-click data exfiltration through calendar injection
  • //Silent document and message harvesting via email injection
CASE STUDY // HIGH

#1 AI-native analytics platform

Silmaril discovered exploits and retrained the firewall, preventing $20M in damages over a month, spanning:

  • //Entity injection via feedback fields into agent context
  • //Unauthorized workflow execution through tool-manipulation payloads
REPORT // CRITICAL
Open AI

Open AI

Silmaril hacked the ChatGPT agent by chaining a prompt injection into escalated root access and moved laterally across containers. Silmaril accessed internal source code and secrets. The exploit took <5 minutes to execute and <5 hours to ideate with our agents.

REPORT // CRITICAL
Microsoft

Microsoft

Critical prompt injection vulnerabilities using email as the entry vector, achieving data exfiltration through SSRF in Copilot. Microsoft patched the vulnerability for millions of users.

FAQ

Guardrails pattern match against known attack signatures and fall behind as new techniques emerge. They evaluate inputs in isolation and cannot see attacks that emerge from the interaction between an agent, its tools, and its context. Silmaril's multihead classifier inspects user intent, application context, and execution states together, so it catches indirect injection, multi-turn chains, context poisoning, and confused deputy attacks that pattern-matching approaches miss entirely.

Win the Arms Race

Only adaptive defenses outpace AI augmented attackers.

Book a Demo