SILMARIL HACKED

Problem
Attacks on AI are compounding in complexity faster than defenses can adapt, and the gap is widening.
6 months ago, a prompt injection was a single hidden instruction to fool the model. If the guardrail or the model caught the pattern, the attack failed.
Today's attacks are chains that manipulate agents. A poisoned input such as a calendar invite triggers agent behavior which exfiltrates data, escalates privilege, and causes real damage. Guardrails are too static to block these chains.
This complexity used to be theoretical because no human would manually sequence these chains at scale. Now attackers use AI to create thousands of multi-step attack paths and converge on the ones that work, causing $30B in damages in 2025 alone.
EchoLeak hijacked Microsoft 365 Copilot through a single unopened email, silently exfiltrating enterprise data from OneDrive, SharePoint, and Teams without any user interaction CVE-2025-32711 · CVSS 9.3
GitHub Copilot YOLO Mode RCE let hidden instructions in README files auto-approve shell commands on developer machines, while the Morris II worm proved these attacks can now self-replicate across AI ecosystems using natural language alone, infecting 20 new hosts per compromised client.
In March 2026, researchers chained a crafted Google Calendar invite through Perplexity's Comet browser to achieve full 1Password vault takeover. The fix took four months and multiple bypasses to land.
Solution
Silmaril wraps your inference calls to evaluate whether an execution sequence is heading toward a harmful outcome.
Existing guardrails filter inputs. Silmaril's multihead classifier inspects user intent, application context, and execution states together to detect harmful outcomes before they materialize. The model retrains continuously on exploits our threat hunting agents discover in your environment. Your defense gets stronger before attackers even have a chance to probe it.
Five lines of code, zero overhead. Silmaril operates at the application layer and supports every major agentic SDK and inference provider. It is available as a managed SaaS for teams that want to move quickly, or as a self-hosted deployment for environments with strict data residency and compliance requirements. Blocking is configurable by workflow node type.
Approach
Finding vulnerabilities before attackers do
Autonomous agents probe the defender’s system through the UI, determining trust boundaries and hacking from first principles. They chain AI risks such as prompt injection, tool abuse, context poisoning, and more. Silmaril has found chains resulting in self-replicating worms and cross-user remote code execution.
Blocking attacks in real-time
Classifier model in the firewall is based on the deberta-v3-base model architecture. It is trained on execution traces from autonomous threat research runs against your application, so it learns your specific decision boundaries, tool calls, and data flows. At inference time, it evaluates the current execution sequence (user intent, application context, and accumulated state) and outputs a binary pass/block decision with a p90 latency of 20ms. Integration is 5 lines of code, zero overhead.
When the classifier blocks, it throws an error to the orchestration layer, allowing you to determine threat handling.
Turning every attack into a deployed defense
Every attack discovered generates synthetic training data. The firewall retrains and deploys updated weights automatically, from discovery to active defense in under an hour. When a novel technique is blocked at one deployment, it is anonymized and propagated to every other firewall deployment.
Performance
Accuracy
Metrics
| System | Precision | Recall | F1 | Latency |
|---|---|---|---|---|
| Silmaril Firewall | 0.932 | 0.979 | 0.955 | 20ms |
| Lakera Guard | 0.699 | 0.807 | 0.749 | 114ms |
| BrowseSafe | 0.909 | 0.626 | 0.741 | 102ms |
| GPT Safeguard | 1.000 | 0.261 | 0.413 | 537ms |
| Model Armor | 0.778 | 0.265 | 0.395 | 220ms |
Threats Blocked
15 critical vulnerabilities disclosed to OpenAI, Anthropic, Google, and Microsoft in two weeks.
#1 AI-native productivity app
Silmaril discovered exploits and retrained the firewall, preventing $68M in damages over a month, spanning:
- //Self-replicating worm propagation via document poisoning
- //Agent-to-agent supply chain compromise
- //Sandbox credential theft leading to cross-user remote code execution
- //Zero-click data exfiltration through calendar injection
- //Silent document and message harvesting via email injection
#1 AI-native analytics platform
Silmaril discovered exploits and retrained the firewall, preventing $20M in damages over a month, spanning:
- //Entity injection via feedback fields into agent context
- //Unauthorized workflow execution through tool-manipulation payloads
Open AI
Silmaril hacked the ChatGPT agent by chaining a prompt injection into escalated root access and moved laterally across containers. Silmaril accessed internal source code and secrets. The exploit took <5 minutes to execute and <5 hours to ideate with our agents.
Microsoft
Critical prompt injection vulnerabilities using email as the entry vector, achieving data exfiltration through SSRF in Copilot. Microsoft patched the vulnerability for millions of users.