Silmaril

The world's first self-healing prompt injection defense

Book a Demo
Backed by Y Combinator

SILMARIL HACKED

MicrosoftOpenAIAnthropicGooglePerplexityDropbox

Problem

Attacks on AI are compounding in complexity faster than defenses can adapt, and the gap is widening.

6 months ago, a prompt injection was a single hidden instruction to fool the model. If the guardrail or the model caught the pattern, the attack failed.

Today's attacks are chains that manipulate agents. A poisoned input such as a calendar invite triggers agent behavior which exfiltrates data, escalates privilege, and causes real damage. Guardrails are too static to block these chains.

This complexity used to be theoretical because no human would manually sequence these chains at scale. Now attackers use AI to create thousands of multi-step attack paths and converge on the ones that work, causing $30B in damages in 2025 alone.

THIS IS HAPPENING NOW
01

ShareLeak and PipeLeak turned public form fields into agent hijack paths across Microsoft Copilot Studio and Salesforce Agentforce, exfiltrating SharePoint and CRM data through legitimate Outlook and email actions. CVE-2026-21520 · CVSS 7.5

02

CurseChain showed hidden README comments in Cursor can steal developer SSH keys, then poison future unrelated projects with regenerated exfiltration code even when the attacker ships no malicious code.

03

Forcepoint and Google found indirect prompt injection deployed across live websites, with payloads for API-key theft, financial fraud, data destruction, and agent denial of service. Google measured a 32% rise in malicious prompt-injection content.

Solution

Silmaril wraps your inference calls to evaluate whether an execution sequence is heading toward a harmful outcome.

Existing guardrails filter inputs. Silmaril's multihead classifier inspects user intent, application context, and execution states together to detect harmful outcomes before they materialize. The model retrains continuously on exploits our threat hunting agents discover in your environment. Your defense gets stronger before attackers even have a chance to probe it.

Five lines of code, zero overhead. Silmaril operates at the application layer and supports every major agentic SDK and inference provider. It is available as a managed SaaS for teams that want to move quickly, or as a self-hosted deployment for environments with strict data residency and compliance requirements. Blocking is configurable by workflow node type.

from langchain_anthropic import ChatAnthropic
from silmaril_security import FirewallHandler

firewall = FirewallHandler(
    api_key=SILMARIL_API_KEY,
    api_url=SILMARIL_API_URL,
)

llm = ChatAnthropic(model="claude-sonnet-4-6")
result = llm.invoke(messages, config={"callbacks": [firewall]})
Claude CodeClaude Code
Run thousands of powerful coding agents in parallel, each one protected from exploits at every step.
OpenClawOpenClaw
Deploy fleets of autonomous agents connected to critical services, fully defended from the first request.

Approach

01 // ATTACK

Finding vulnerabilities before attackers do

Autonomous agents probe the defender’s system through the UI, determining trust boundaries and hacking from first principles. They chain AI risks such as prompt injection, tool abuse, context poisoning, and more. Silmaril has found chains resulting in self-replicating worms and cross-user remote code execution.

02 // PROTECT

Blocking attacks in real-time

Classifier model in the firewall is based on a ModernBERT variant with Flash Attention. It is trained on execution traces from autonomous threat research runs against your application, so it learns your specific decision boundaries, tool calls, and data flows. At inference time, it evaluates the current execution sequence (user intent, application context, and accumulated state) and outputs a binary pass/block decision with a p90 latency of 20ms. Integration is 5 lines of code, zero overhead.

When the classifier blocks, it throws an error to the orchestration layer, allowing you to determine threat handling.

03 // RETRAIN

Turning every attack into a deployed defense

Every attack discovered generates synthetic training data. The firewall retrains and deploys updated weights automatically, from discovery to active defense in under an hour. When a novel technique is blocked at one deployment, it is anonymized and propagated to every other firewall deployment.

Performance

Accuracy

5060708090100
Silmaril Firewall
0.0%
BrowseSafe
0.0%
Lakera Guard
0.0%
GPT Safeguard
0.0%
Model Armor
0.0%

Metrics

SystemPrecisionRecallF1Latency
Silmaril Firewall0.9320.9790.95520ms
Lakera Guard0.6990.8070.749114ms
BrowseSafe0.9090.6260.741102ms
GPT Safeguard1.0000.2610.413537ms
Model Armor0.7780.2650.395220ms
BENCHMARKED ON PRODUCTION ATTACK DATA

Threats Blocked

15 critical vulnerabilities disclosed to OpenAI, Anthropic, Google, and Microsoft in two weeks.

CASE STUDY

#1 AI-native productivity app

Silmaril found the exploits, retrained the firewall, and prevented $68M in damages, spanning:

  • //Self-replicating worm propagation via document poisoning
  • //Agent-to-agent supply chain compromise
  • //Sandbox credential theft leading to cross-user remote code execution
  • //Zero-click data exfiltration through calendar injection
  • //Silent document and message harvesting via email injection
CASE STUDY // HIGH

#1 AI-native analytics platform

Silmaril found the exploits, retrained the firewall, and prevented $20M in damages, spanning:

  • //Entity injection via feedback fields into agent context
  • //Unauthorized workflow execution through tool-manipulation payloads
REPORT // CRITICAL
Open AI

Open AI

Silmaril hacked the ChatGPT agent by chaining a prompt injection into escalated root access and moved laterally across containers. Silmaril accessed internal source code and secrets. The exploit took <5 minutes to execute and <5 hours to ideate with our agents.

REPORT // CRITICAL
Microsoft

Microsoft

Critical prompt injection vulnerabilities using email as the entry vector, achieving data exfiltration through SSRF in Copilot. Microsoft patched the vulnerability for millions of users.

FAQ

Guardrails pattern match against known attack signatures and fall behind as new techniques emerge. They evaluate inputs in isolation and cannot see attacks that emerge from the interaction between an agent, its tools, and its context. Silmaril's multihead classifier inspects user intent, application context, and execution states together, so it catches indirect injection, multi-turn chains, context poisoning, and confused deputy attacks that pattern-matching approaches miss entirely.

Win the Arms Race

Only adaptive defenses outpace AI augmented attackers.

Book a Demo