A new innovation in AI security has emerged with the introduction of Arc Gate, a robust prompt injection proxy designed to protect OpenAI-compatible endpoints from malicious prompts. This cutting-edge solution can be tested live without requiring signup, code, or setup, allowing users to experience its capabilities firsthand.
Arc Gate features a multi-layered detection system, with its primary layer utilizing a behavioral SVM on sentence-transformer embeddings. This approach enables the identification of semantic intent behind prompts, rather than merely relying on pattern matches. The system consists of four layers in total, ensuring a comprehensive defense mechanism.
Benchmark tests conducted on 40 out-of-distribution (OOD) prompts, including indirect, roleplay, and hypothetical framings, have demonstrated Arc Gate’s superior performance. The results show Arc Gate achieving a recall of 0.90 and an F1 score of 0.947, outperforming OpenAI Moderation (recall 0.75, F1 0.86) and LlamaGuard 3 8B (recall 0.55, F1 0.71). Notably, Arc Gate has maintained zero false positives on benign prompts, including security discussions and safe roleplay, with a block latency of 329ms.
For developers looking to integrate Arc Gate into their projects, the process is straightforward, requiring only a single URL change: base_url="https://web-production-6e47f.up.railway.app/v1". The project is open-sourced and available on GitHub, where users can star it if they find it useful.
Photo by elif s. on Pexels
Photos provided by Pexels
