Huge AI News

Revolutionizing AI Efficiency: Cutting Token Costs by Up to 90%

Running complex AI sessions can lead to a substantial surge in token costs. However, research has revealed that accuracy doesn’t always increase with spending. A recent study discovered that context fills with repeated history, tool schemas, and subagent handoffs, resulting in a consistent degradation pattern across tools.

In one notable case, the context reached 450,000 tokens, causing the agent to drop early constraints, re-query sources already in history, and require manual reset. To address this issue, several innovative controls were implemented, including:

External knowledge bases (PLAN.md and INVARIANTS.md) that are refreshed at each major turn
A 2,000-line read budget gate per turn, where the agent states its intent before retrieval
Out-of-band notes for subagent coordination, ensuring side traffic never enters the main transcript

These controls yielded remarkable results, with token costs reduced significantly. The same class of task peaked near 85,000 tokens, and dynamic tool discovery produced similar ratios. One harness reduced input tokens by 96% and total spend by 90% by loading schemas only for tools the agent actually selects.

For a more in-depth analysis, including a full write-up with paper analysis, tree-sitter extraction patterns, and an implementation checklist, click here.

Photo by AI25.Studio Studio on Pexels
Photos provided by Pexels

Revolutionizing AI Efficiency: Cutting Token Costs by Up to 90%

More posts

Dreame L20 Ultra Robot Vacuum: A Low-Maintenance Cleaning Powerhouse at an Unbeatable Price

Supernatural Revival: Beloved VR Fitness Game Gets New Lease on Life

From Builder to Self-Discovery: The Unplanned Journey of AI Evolution

Unlocking Memrith: Revolutionizing Device Memory