Back to ShieldTechnical deep dive

Shield under the hood.

A technical view of how Shield protects web inputs, existing AI chat, LLM proxy and MCP tools: architecture, decision flow, audit and comparison with WAF / reCAPTCHA.

What Shield is

First-layer protection — not a SIEM, not an analytics tool.

Shield is an active and passive protection layer for forms, login, checkout, uploads, AI chat, MCP tools and APIs. The dashboard shows decisions and events so you can tune the defense — not as a log explorer. Events are operational signal, not analytics inventory.

Shield is:

✓Inline protection for web / e-shop / chat / MCP / backend
✓Per-request decision: allow / monitor / challenge / block
✓Tamper-evident, signed audit log of every decision
✓Mobile native SDK — iOS + Android — on the roadmap (Sprint M)

Shield is not:

×A SIEM or log aggregator
×A replacement for your WAF or CDN — it sits one layer deeper
×A reporting / business-analytics tool
×A certified compliance product — controls align with frameworks, audit is your own

Architecture

Three protection paths depending on what you need to protect.

Shield can sit in front of your web, AI chat or MCP tools. Each path has a clear decision point and an auditable result.

Layer 1

Web and forms

The JS widget and backend SDK protect contact forms, login, checkout and uploads. They collect security signals, attach an HMAC token and allow allow / challenge / block before a request reaches sensitive logic.

Layer 2

AI chat and LLM proxy

The prompt and response pass through an LLM firewall. Shield can anonymize sensitive data, block prompt injection, check for system-instruction leakage and work in front of the chat you already use.

Layer 3

MCP tools, policy and audit

MCP calls are evaluated by schema, permissions and action risk. Destructive or sensitive tool calls can require an approval gate. Every decision lands in a tenant-scoped audit log.

Comparison

What Shield covers vs. WAF vs. CAPTCHA.

Shield does not replace your existing perimeter protection. It sits one layer deeper.

Capability	Shield	WAF	reCAPTCHA / Turnstile
Headless bot detection	Yes — multi-signal scoring	Partial (IP reputation)	Yes — at the edge
Prompt injection against LLMs	Yes — semantic firewall	No	No
MCP agent abuse	Yes — policy engine	No	No
Form spam / disposable email	Yes — 5 languages	No	Partial
Upload malware scan	Yes — quarantine	Partial	No
SQL injection payloads	Yes — AST validation	Yes — regex	No
Credential stuffing (distributed)	Yes — per-account lockout	Partial (per-IP)	Partial
Tamper-evident audit log	Yes — exportable	Varies	No

Reduced control matrix

40+ concrete capabilities across 9 categories.

Complete categorized matrix. Exact thresholds, signal weights and detection internals are available to customers in the portal.

Keystroke dynamics, mouse trajectory R², scroll patterns, touch events, form-fill timing, page-dwell — multi-signal inputs fed into the local scorer and backend scoring pipeline.

Protects against

Form-fill bots, headless automation, scripted submissions.

Canvas, WebGL, audio context, font detection, navigator fingerprinting fused into a SHA-256 device hash. Detects headless browsers and anti-detect tools.

Protects against

Headless browser frameworks, anti-detect automation tooling.

Short-lived cache snapshot of device_hash, webgl_renderer, user_agent, timezone, screen_resolution at session start. Sensitive events (login, form submit, checkout) compare the live fingerprint; drift adds significant risk signals respectively.

Protects against

Session hijacking, token replay, stolen-cookie attacks, mid-session device swaps.

OpenAI- and Anthropic-compatible base URL. Shield scans every prompt before forwarding and every completion before returning, blocks on policy hit, strips PII / secrets on stream.

Protects against

Prompt injection, jailbreak, PII / secret exfiltration from LLM apps.

Embedding-based detection across many attack categories. "Disregard earlier directives" ≈ "Ignore previous instructions" at cosine similarity. Ollama-local embeddings — zero per-request API cost.

Protects against

Paraphrased prompt injection, synonym jailbreaks, obfuscated attacks, cross-language variants.

Tool-call interception for Claude / Cursor / IDE agents. JSON Schema validation of arguments, chain-step limit, domain allowlist, explicit approval gates on destructive tools. Inspects every invocation against agent-protection rules before execution.

Protects against

Malicious tool abuse, file / shell exfiltration, supply-chain agents, runaway agent loops.

40+ patterns scanning input + output + tool calls before / after the model runs. Runs alongside the Semantic Firewall for layered defence.

Protects against

Prompt injection, DAN-style jailbreaks, memory poisoning, tool abuse, data exfiltration.

5 tools exposed via MCP: shield_get_stats, shield_get_threats, shield_add_rule, shield_get_events, shield_verify_token. Let your Claude / Cursor agent investigate and act on incidents without leaving the chat.

Protects against

Blind admin response — agents can investigate and act on incidents programmatically.

AST-parsed SQL validation. Blocks UNION, INTO OUTFILE, pg_sleep, information_schema. LIMIT capped. Sensitive columns (password, api_key, ssn) auto-redacted. Query fingerprinting and honeytoken trap tables.

Protects against

SQL exfiltration, schema enumeration, pagination abuse, sensitive-data leaks.

Wallet detection: BTC (P2PKH/Bech32), ETH, SOL, TRX, XRP, LTC, DOGE. BIP-39 seed phrase scanning (12/24 word). Signing prompts (EIP-712). Mining domains blocked. Payment redirect patterns.

Protects against

Wallet theft, seed-phrase leakage, mining script injection, payment redirects.

Bigram gibberish detection (EN / DE / CS / SK / ES), 100+ disposable email domains, spam patterns (repeated chars, ALL CAPS, URL flood), suspicious name detection. Phishing and bad-content corpus covers 9 languages (see Phishing card). Additive scoring with cluster bonuses.

Protects against

Form spam, fake registrations, throwaway accounts, gibberish submissions.

Multi-layered email + attachment scanner. Detects Slovak/Czech/Polish/German/French/Spanish/Serbian bodies stripped of diacritics (the strongest real-world phishing signal), password-hint social engineering across 9 languages, mainframe-mimicry filenames, and password-protected PDF / Office files. Brand-agnostic cluster catches the same shape with any impersonated company name.

Protects against

Phishing drops, credential harvesting, password-protected malware droppers, attachment-based social engineering.

check_upload() accepts form_fields. When a file upload is accompanied by form data (title, description, name, message), Content Quality Scoring runs on those fields too. A clean PDF with gibberish metadata still gets rejected at high-confidence score.

Protects against

Fake account registrations, low-quality form spam with attachments, bot-filled support tickets.

Every file passes a quarantine gate — extension allowlist, magic-byte MIME sniffing, Office macro detection, PDF JavaScript / Launch / OpenAction, SVG / HTML script injection. Per-tenant max size and extension list.

Protects against

Malware drop, macro viruses, PDF-borne JS, SVG-XSS, polyglot files.

Python (FastAPI / Django / Flask), Node.js (Express / Next.js), PHP (WordPress / Laravel). Validates X-Shield-Token on every request. No token → 403. HMAC verify is cached with a short-lived cache per (token, path).

Protects against

Requests bypassing the JS widget (curl, Postman, Python requests, raw HTTP).

3-state breaker (closed / open / half_open) in all three backend SDKs. After consecutive transport errors → OPEN for a brief interval → 1 HALF_OPEN probe. 4xx doesn't trip the breaker. PHP uses APCu for cross-FPM-worker state. No more timeouts on every request during an upstream incident.

Protects against

Cascading timeouts, retry storms, request pile-up during Shield-API outages.

Reason → (machine_code, human_hint) map. /shield/verify and all 3 SDK 403 bodies return remediation + remediation_code. Legit false-positive users see "Your session expired — please reload" instead of a silent 403.

Protects against

Bad UX on false-positive, support ticket load, silent-fail confusion.

Drop-in PHP plugin: auto-injects the widget, ships middleware that validates Shield tokens on /wp-login.php and admin endpoints. Fail-closed by default, configurable.

Protects against

WordPress brute force, xmlrpc abuse, wp-admin enumeration on EU SMB sites.

Multi-dimensional rate limiting: per-IP, per-device, per-endpoint, with progressive escalation. Server-side counters with sliding windows.

Protects against

Brute force, credential stuffing, scraping, API enumeration.

IP geolocation via ip-api.com (short-lived cache). Per-site blocked / allowed country lists. Datacenter and proxy / Tor score modifiers. Page-load hard block with access-denied overlay before widget initialises.

Protects against

Traffic from disallowed regions, anonymising infrastructure, compliance-driven restrictions.

Widget prevents form submission at high-confidence score. Red overlay: "Blocked by Corpilus Shield". Server-signed HMAC-SHA256 tokens auto-attached to fetch() via interceptor.

Protects against

High-confidence bot submissions reaching the backend.

278 compiled detection patterns scanned automatically on every event — covers all OWASP Top 10 2025 categories. Payload-level inspection happens before scoring.

Protects against

SQL injection, XSS, path traversal, command injection, SSRF, SSTI, LDAP injection, XXE, NoSQL injection, log4j JNDI, security misconfig probes, supply-chain typosquats, stack-trace leakage.

AI analyzér analyses events continuously. RAG context grounded in a curated security knowledge base. Auto-creates threats and rules from real observations.

Protects against

Novel / unseen attack patterns missed by static rules.

Pre-built threat-intel context (mini-CAG). Bot signatures, attack patterns, OWASP samples baked in — new sites are protected from the first page view.

Protects against

Cold-start blindness — new sites are protected immediately.

Shield's Security Knowledge collection ships with curated docs (OWASP Top 10, bot detection, incident response). Admins can upload their own company playbooks, post-mortem reports, or domain-specific threat intel. Every upload runs through a multi-layer scan. Clean docs land as trust_state='pending' until an admin explicitly promotes them to 'active'. Only active docs reach the AI analyzer's RAG context.

Protects against

Tenant-specific attack patterns that generic training data never sees — internal fraud schemes, industry-specific account takeover, post-M&A integration attacks. Scan + canary gate prevent poisoning of the learning pipeline.

Anonymised pattern sharing — IPs reduced to /24, PII stripped, maturity gating (experimental → candidate → confirmed). One tenant's confirmed attacker becomes everyone's known threat within minutes.

Protects against

Distributed campaigns hitting multiple Shield-protected sites.

Widget MutationObserver snapshots all <script> tags at boot. Any subsequently injected script is reported as script_integrity_violation telemetry with src, external/same-origin, content length, stable hash. Capped per page-load. Tenant allowlist for trusted CDNs.

Protects against

Supply-chain attacks, malicious browser extensions, XSS token theft, ad-fraud overlays.

Redis counter per SHA-256(account_id). Each failure over the cap adds significant risk score. A distributed attack that spreads many attempts over thousands of IPs still lands on the same account bucket — the attempt on victim@corp.com triggers challenge regardless of which IP sent it. Counter resets on a successful login.

Protects against

Distributed credential-stuffing, residential-proxy brute force, low-and-slow password guessing.

GET /shield/password/breach-range/{prefix} — client computes SHA-1(password) locally in the browser, sends only the 5-char hex prefix, Shield proxies to api.pwnedpasswords.com and streams back the suffix+count list. Client compares its own suffix locally. Server never sees plaintext OR the full hash.

Protects against

Credential reuse, signup with known-breached passwords, quiet exposure via paste-bin dumps.

A/AAAA + MX record check on signup. Fail-open on timeout. Short-lived per-domain cache so rapid signup waves from the same throwaway domain don't re-hammer DNS.

Protects against

Throwaway signup domains, typo-squats that have no hosting, shortly-after-registration attacker domains.

25+ protected brands (Google, Microsoft, Apple, PayPal, Stripe, Meta, LinkedIn, Revolut, SK/CZ banks & insurers). Three-tier detector: 1) normalised exact match via homoglyph map, 2) Levenshtein distance for long brands, 3) brand-substring + decorative suffix (secure/login/support/verify/auth/signin/account/official/help).

Protects against

Brand-impersonation signups, phishing-infrastructure registrations, fake 'support' domains.

Velocity counters per IP and per device. Recent-login requirement: no successful login from this device recently → significant risk signal. Session Continuity: password_change is now in the SENSITIVE event set, so full fingerprint drift blocks immediately. The classic 'attacker grabs session → changes password → locks out user' chain needs to survive all three gates.

Protects against

Account-takeover lockout chain, session-replay password reset, mass bulk-takeover via stolen cookies.

Email (HTML), Slack, Discord, generic JSON webhooks. Weekly security report with stats, top threats, block rate. Per-webhook severity gate (low / medium / high / critical).

Protects against

Late incident detection — admins notified within seconds.

Every rule change, site config edit, manual block, AI decision is recorded with actor, timestamp, before/after diff. Hash-chained, signed, and exportable as auditor-ready evidence bundle.

Protects against

Silent tampering — and provides a complete paper trail when your auditor (ISO, SOC 2, internal) asks. Shield itself is not currently externally certified.

HMAC-SHA256 tokens are minted server-side from the per-site secret and returned via /shield/events. The widget never holds the signing secret — a leaked site_key cannot be used to forge valid tokens.

Protects against

Token forgery from a stolen public site_key.

PostgreSQL Row-Level Security forced on all shield_* tables. Each request runs under a tenant-scoped role — no application-layer bypass possible even if the API has a bug.

Protects against

Cross-tenant data leaks, broken-access-control bugs in app code.

Tracks attempts per card BIN across rolling windows. Burst patterns consistent with card-testing activate progressive challenge or block. Thresholds are tenant-tunable; defaults are conservative.

Protects against

Card-testing campaigns, BIN enumeration, stolen-card validation bursts.

When the same PSP-provided card fingerprint appears across multiple devices, sessions or tenants in a short window, attempts are correlated and scored as a coordinated attack. Raw PAN never leaves your PSP.

Protects against

Distributed card-testing, velocity evasion via rotating IP / device.

Tenant-scoped baseline of issuer-country distribution. A sudden concentration of attempts against issuers from a small number of countries — well above baseline — flags probable carding traffic.

Protects against

Targeted issuer attacks, stolen-card dump campaigns, geo-clustered fraud.

Aggregates multiple signals — diverse BIN spread, same device or session, high failure ratio — into a named carding verdict. Upgrades decision severity when confirmed by post-charge PSP feedback.

Protects against

Coordinated card-testing campaigns, fraud-validation traffic, PSP penalty avoidance.

Slow-burn attacks no longer slip through. Shield watches the whole conversation arc, not just one message at a time. An attacker who chats innocuously for many turns and only then pivots to data extraction or credential phishing is caught at the moment the pattern emerges.

Protects against

Multi-turn jailbreaks, slow-pivot social engineering, AI agents that start friendly and shift to extraction over a long session.

Before your agent runs a tool, Shield asks: is the user's actual intent consistent with calling this tool? A request to summarise a document should not trigger a database export. A travel-booking chat should not be calling a payments tool. Mismatches are gated for review.

Protects against

Agents calling sensitive tools under benign-looking prompts, prompt-injected tool abuse, accidental destructive operations.

Compromised agents and curious LLMs typically scan the environment before acting — listing directories, reading config paths, enumerating environment variables. Shield flags this reconnaissance pattern early, before any data leaves the box.

Protects against

Sandbox-escape attempts, container reconnaissance, env-secret enumeration, agent-stage probing before exfiltration.

A single conversation can never quietly burn your whole monthly AI budget. Shield enforces a per-session ceiling on tokens, tool calls and elapsed time. When the cap is reached the session is paused or terminated and the operator is notified.

Protects against

Cost-explosion attacks, infinite-loop agent failures, denial-of-wallet, accidental runaway prompts.

Shield learns what normal looks like for each user — typical hours, typical actions, typical pace — and quietly flags the day that pattern breaks. A logged-in session that suddenly behaves nothing like the real user is treated as a possible takeover.

Protects against

Compromised accounts, identity hijack after credential theft, insider-mode account misuse, post-phishing session reuse.

Decoy records, files and credentials are planted in places only an attacker would dig. Real users never see them. The moment one is touched, accessed or used, Shield has a high-confidence breach signal with effectively zero false positives.

Protects against

Silent breaches that bypass other detections, insider data theft, post-compromise lateral movement.

Attackers hide malicious payloads inside layered encodings — base64, hex, percent-encoding, unicode escapes — to slip past simple string filters. Shield unwraps these layers before scoring, so the underlying attack is matched against the same protections as a plain-text version.

Protects against

Base64 / hex / percent-encoded smuggling, multi-layer payload obfuscation, encoding-based filter evasion.

Before any rule, model or scorer update ships, it is run against a continuously growing corpus of real-world attack scenarios. If a release accidentally weakens detection on a known threat shape, the change is blocked at CI — not after a customer is breached.

Protects against

Silent detection regressions, accidental false-negative drift during releases, security debt accumulating across versions.

Every security decision and config change is written to a tamper-evident chain. Edits and deletions are mathematically detectable. Auditors, regulators and incident responders get a trustworthy timeline even in the worst-case scenario where an attacker reaches admin credentials.

Protects against

Insider rewriting history, forensic tampering, regulatory dispute over what happened and when, post-incident attribution gaps.

When something happens, you do not want to spend hours collecting logs. One click produces an encrypted, time-stamped bundle of the relevant tenant state — events, rules, decisions, recent traffic — ready to hand to your security team, lawyer or regulator.

Protects against

Slow incident response, lost forensic state between detection and review, breach disclosures that miss the regulator window.

Shield does not lock you into one AI vendor. Bring your own OpenAI / Anthropic / Google key, point at a dedicated Ollama instance, or run fully local. Set hard cost caps and routing rules. Your data flows only to providers you explicitly approve.

Protects against

Vendor lock-in, surprise cost overruns, data-residency gaps, regulatory restrictions on cross-border AI processing.

For your highest-risk actions Shield can require a hardware-rooted gesture: Touch ID, Windows Hello, a hardware security key. These are physical-presence checks that an LLM-powered agent or remote attacker cannot solve, no matter how clever the prompt.

Protects against

LLM-solvable CAPTCHAs, remote-only account takeover, agent-driven privileged actions, password-only step-up flows.

For regulated, classified or disconnected environments Shield ships as a self-hosted package with signed release artifacts and a fully offline install path. Nothing has to talk to the public internet, but you still get rule, model and intel updates on your own schedule.

Protects against

Compliance-restricted environments, classified networks, regulated data-sovereignty zones, supply-chain attacks on the install path.

Shield can flag form, message and document submissions that look machine-generated rather than human-typed. Combined with behaviour and timing signals, this gives operators a clear answer to "is this real?" on application forms, CVs, support tickets and reviews.

Protects against

AI form spam, AI-written CV / application fraud, AI-generated support ticket floods, AI-written fake reviews.

The widget snapshots fetch, XHR, navigator and userAgent at boot and re-checks periodically. If a browser extension, injected script or third-party tag flips navigator.webdriver, wraps fetch, replaces XHR or mutates navigator descriptors, Shield reports the tampering and can refuse to issue a token. Per-attribute form.action / hidden-input change tracking is roadmap, not wired today.

Protects against

Malicious browser extensions, ad-injected form takeovers, evil third-party tag managers, client-side payment-form hijacking.

Every request is checked in O(1) against 48,000+ real-time threat indicators refreshed frequently. No customer setup — platform-funded. Adds score boost on match.

Protects against

Botnet C2 callbacks, scrapers, anonymizing infrastructure, active attacker IP ranges, hijacked netblocks.

Premium reputation services lookup on suspicious events only. Per-tenant Fernet-encrypted keys; no platform-shared keys, lookups happen on your quota.

Protects against

Targeted attacker IPs flagged by commercial threat intelligence vendors, beyond what public feeds catch.

All ten 2025 OWASP categories addressed — A01 access control, A02 misconfig, A03 supply chain, A04 crypto, A05 injection, A06 design, A07 auth, A08 integrity, A09 logging, A10 exception handling. Pattern set sourced from OWASP CRS v4, nuclei templates, PayloadsAllTheThings.

Protects against

Every web-app threat in the OWASP 2025 catalog, from classic injection through the new Supply Chain and Mishandling-of-Exceptional-Conditions categories.

Identifies bots from OpenAI, Anthropic, Google-Extended, Perplexity, ByteDance, CommonCrawl, Meta, Apple, Cohere, Mistral, AllenAI, You.com and more. Tenant chooses block / monitor / allow per vendor.

Protects against

Unauthorized scraping of your content for LLM training, while still allowing legitimate search engines (Bingbot, classic Googlebot) through.

log4j JNDI gadgets (${jndi:ldap://...}), LDAP injection, XML External Entity, MongoDB-style NoSQL operator injection — all blocked at the /shield/events ingest before reaching your backend.

Protects against

The 2021-era log4shell-class attacks, no-SQL bypass, XML entity exfiltration, LDAP query escapes — categories most WAFs only added recently.

Read-only view of all 278 patterns Shield runs on every request, grouped by category. Customers see exactly what's protecting them — no marketing claims to verify.

Protects against

Transparency gap — auditors and security teams can match Shield's actual detection set against their own risk register.

Click any card to expand for the full description and threat model.

Audit log — Shield v2.6

Tamper-evident, hash-chained, cryptographically signed.

The audit log is the legal and forensic backbone of Shield. Designed so even a compromised admin cannot rewrite history without it being visible.

SHA-256 hash chain

Every audit record carries the SHA-256 hash of the previous record plus the current event canonicalized. Removing or modifying any past event breaks every downstream hash and the chain refuses verification.

Per-tenant Ed25519 signing

Each tenant has its own Ed25519 keypair. The public key is exposed for independent verification; the private key signs every audit record at write time. A leaked DB dump cannot be forged without the private key.

RFC 3161 time anchoring

Chain heads are periodically anchored against an external RFC 3161 Time Stamp Authority. This binds the log to absolute wall-clock time and proves the chain existed in that form before the timestamp.

DB-role append-only hygiene

The shield_app role has INSERT only — UPDATE and DELETE are REVOKEd at the PostgreSQL level. Even an attacker with full app-context credentials cannot rewrite rows; they would need to escalate to a superuser DB role, which is itself audited at the platform layer.

Verify and export endpoints

GET /shield/audit/verify re-walks the hash chain and validates every signature; GET /shield/audit/export streams a signed, line-delimited JSON archive for internal or external auditors. Both are tenant-scoped and rate-limited.

Compliance mapping

Provides technical evidence common security frameworks ask for (immutable logging, signed integrity, monitoring, GDPR Art. 32 alignment). Shield itself is not externally certified — the export supports your audit, it does not replace one.

Forensic snapshot — Shield v2.6

One-click incident snapshot, encrypted, off-site.

When something goes wrong, you need an immutable picture of the moment. Shield produces it in under a minute and seals it so only your private key can open it.

What is inside a snapshot

•Security events (raw + decisions + reason codes)
•Audit log slice with chain head + signature
•Active sessions and session-level signals
•Active rules and rule-version history
•Threat intelligence cache and recent threat events
•Protected sites and their HMAC configuration
•Tenant settings and feature flags
•Platform events (deploy / backup / drift)
•Container metadata (image, version, host fingerprint)

Hybrid envelope encryption

A single-use AES-256-GCM data key encrypts the payload. The data key is wrapped with RSA-OAEP-SHA256 against the tenant’s public key. Only the holder of the private key can recover the AES key and decrypt the bundle. Shield infrastructure cannot read past snapshots after they leave the producing container.

Storage and operations

Snapshots upload to any S3-compatible store (AWS, Wasabi, MinIO, on-prem). Optional weekly cron auto-archives a fresh snapshot for continuous forensic readiness. MTTR target from trigger to sealed, uploaded archive is ~60 seconds.

API surface

POST /shield/forensic/snapshot triggers a new snapshot; GET /shield/forensic/snapshots lists existing ones with metadata, size and sealed status. Both are admin-scoped and produce platform-level audit events.

Security posture

Compliance and auditability without overclaiming.

Shield provides technical evidence, logs and mappings. Formal certifications or customer attestations depend on the specific deployment scope.

Auditor-ready audit trail

Shield produces audit logs, decision reasons and exports that can support security and compliance review. Formal certification or attestation is confirmed individually based on customer scope.

Auditor evidence support

The technical controls can be mapped to your internal audit framework. Shield itself is not externally certified.

MITRE ATT&CK

Detections can be mapped to relevant MITRE ATT&CK techniques, especially Initial Access, Credential Access, Exfiltration and Command & Control.

OWASP OAT

Shield covers multiple classes of automated threats from OWASP OAT. Detailed mapping is provided to customers in technical materials.

Denial-of-Wallet

Inference-cost attacks (RA-ICA): the wallet is now a target

A peer-reviewed 2026 attack class (Hong Kong Polytechnic University) that doesn't touch your data or uptime — it silently multiplies your AI bill.

An attacker plants a poisoned document on the public web. Your RAG assistant fetches it for an ordinary question and the model burns far more tokens — the answer stays correct, so nothing looks wrong until the invoice arrives.

13.12×

more tokens consumed

>90%

chance the poisoned doc is retrieved

100%

correct answers — ordinary filters catch nothing

Three tactics (CREEP framework)

Decoy injection

Hidden math / logic / planning puzzles the model unknowingly solves mid-reasoning, burning tokens.

Contradiction injection

Mutually contradictory facts push the model into overthinking and long generations.

Task-oriented manipulation

An attacker AI optimizes the text for maximum cost while staying inconspicuous and evading detection.

How Shield stops it — at every phase

Attack phase	Shield defense	Effect
1 · The malicious doc must be retrieved	Relevance gate + thresholds keep weakly-relevant or force-injected documents out of context.	Filters opportunistic poisoning.
2 · Hidden instructions in the document	Retrieved-content sanitization (data-tagged 'treat as text only') + untrusted-source separation.	The model ignores embedded tasks.
3 · Output-token inflation (the core)	Hard per-request output-token cap + context-budget management (retrieval-share limit, dedup).	The 13× amplification is capped.
4 · Cumulative cost across requests (DoW)	Progressive rate-limiting + per-session budgets (tokens / cost / time) + agent trust-score telemetry.	A bot can't scale the attack.

The core of RA-ICA is output inflation (in the study, ~100 → 2,048 tokens). Shield's hard per-request output cap is exactly what breaks the amplification that makes the attack profitable.

Coverage

One defense layer, the whole LLM threat family (OWASP LLM Top 10)

RA-ICA sits at the intersection of two attack families Shield already covers — the same layer protects against the full spectrum.

Attack class	How Shield responds
Inference-cost / Denial-of-Wallet	Output-token cap, context budget, rate-limiting, per-session budgets, resource-exhaustion patterns.
Knowledge-base / RAG poisoning	Retrieved-content sanitization, relevance gate, trusted/untrusted source separation.
Prompt injection (direct & indirect)	Pattern rules + semantic firewall; indirect injections hidden in documents are neutralized.
Jailbreak (DAN, developer mode, roleplay)	Rules + semantic firewall catch obfuscated and paraphrased variants by meaning.
System-prompt / config leakage	Input and output scanning; extraction attempts blocked, leaks redacted on output.
Data / PII exfiltration	Output scanning — passwords, API keys, JWTs, private keys, IDs and contacts caught and redacted.
Tool / action abuse	Tool-argument inspection and a dangerous-operation list (code execution, deletion, file access).
Obfuscation / encoding (Base64, ROT13, hex)	Detection of encoded payloads and token-smuggling attempts.
Authority spoofing	Detects 'I'm your developer / admin / official test' manipulations.

Defense in depth

Independent layers — one failing doesn't drop the whole.

Protection isn't a single filter. It's several independent layers.

Input guard

Rule patterns in 6 families (prompt injection, jailbreak, tool abuse, memory poisoning, data leakage, resource exhaustion). Runs before the model.

Semantic firewall

100+ curated attack patterns in 16 types (incl. explicit Denial-of-Wallet), at the meaning level (embeddings) — catches paraphrases and obfuscation.

RAG guardrails

Retrieved-content sanitization, relevance gate, untrusted-source separation, dedup and context budget.

Output-token cap

Hard generation-length limit per request, tuned to the task and model.

Output guard

Leak redaction: credentials, API keys, JWTs, private keys, PII — works on streamed responses too.

Per-session budgets

Caps on tokens, cost, time and tool calls within a single session.

Rate limiting

Per IP / device / endpoint, progressive (monitor → slow → challenge → block).

Telemetry & audit

Token-spend and agent trust-score tracking, threat webhooks, and a tamper-proof audit log.

Deployment

Integrate Shield the way you need to.

Deploy by what you need to protect.

LLM Proxy (drop-in)

Swap a single base_url for OpenAI / Anthropic. Shield scans input (blocks before the provider) and output (redacts leaks). Keeps your API key, streaming and tool calls.

Scan API

Endpoints to scan input, output and tool calls for custom integration into your existing pipeline.

RAG protection

Sanitization, relevance gate and source separation directly in your chat / RAG pipeline.

Web widget

Protects forms, logins and APIs from bots and abuse (classic WAF + bot layer).

Technical whitepaper

Detailed technical document including integration diagrams, threat modeling and performance benchmarks.

Need more detail?

Protection scope notice. Corpilus Shield is a real-time AI protection layer designed to extend standard security mechanisms for websites, e-shops and LLM applications, not replace them. It does not replace antivirus, firewall, penetration testing or a formal security audit. For comprehensive protection, we recommend combining several layers.

Shield under the hood.

First-layer protection — not a SIEM, not an analytics tool.

Shield is:

Shield is not:

Three protection paths depending on what you need to protect.

Web and forms

AI chat and LLM proxy

MCP tools, policy and audit

What Shield covers vs. WAF vs. CAPTCHA.

40+ concrete capabilities across 9 categories.

Behavior Analysis

Device Fingerprinting

Session Continuity

LLM Proxy (Drop-in)

Semantic Firewall

MCP Guard

Agent / LLM Protection

MCP Tools (Corpilus AI)

Data Shield (SQL Protection)

Crypto Abuse Pack

Content Quality Scoring

Phishing & Brand Impersonation

Form + Upload Quality

Upload Shield

Backend SDK / Middleware

Circuit Breaker

Auto-Remediation

WordPress Plugin

Smart Rate Limiting

Geo-blocking + IP Intel

Real Form Blocking

OWASP Detection

AI Self-Learning

Knowledge Packs

Bring-Your-Own Security Playbook

Cross-Tenant Learning

Script Integrity Monitor

Per-account brute-force lockout

HIBP password breach check

DNS MX / A record check

Typosquat email detection

Password-change hardening

Alerts + Weekly Reports

Security Audit Log

Server-Only Token Signing

Tenant Isolation (RLS)

BIN Velocity

Card Fingerprint Linking

Issuer Anomaly

Carding Pattern Classifier

Long-Horizon Attack Detection

Agent Tool Intent Classification

Sandbox Recon Detection

Session Compute Cap

Per-User Behavioral Baseline

Honeytokens

Obfuscated Payload Detection

Adversarial Regression Bench

Tamper-Evident Audit

One-Click Forensic Snapshot

Choose Your AI Provider

Hardware-Rooted Challenge

Enterprise / Air-Gap Deployment

AI-Generated Content Detection

Runtime Integrity Monitor

Threat Intelligence Network

Bring-Your-Own-Key Premium Intel

OWASP Top 10 2025 — full coverage

AI Crawler Detection

log4j / LDAP / XXE / NoSQL coverage

Pattern Catalog in Dashboard

Tamper-evident, hash-chained, cryptographically signed.

SHA-256 hash chain

Per-tenant Ed25519 signing

RFC 3161 time anchoring

DB-role append-only hygiene

Verify and export endpoints

Compliance mapping

One-click incident snapshot, encrypted, off-site.

What is inside a snapshot