Running complex AI sessions can lead to a substantial surge in token costs. However, research has revealed that accuracy doesn’t always increase with spending. A recent study discovered that context fills with repeated history, tool schemas, and subagent handoffs, resulting in a consistent degradation pattern across tools.
In one notable case, the context reached 450,000 tokens, causing the agent to drop early constraints, re-query sources already in history, and require manual reset. To address this issue, several innovative controls were implemented, including:
- External knowledge bases (PLAN.md and INVARIANTS.md) that are refreshed at each major turn
- A 2,000-line read budget gate per turn, where the agent states its intent before retrieval
- Out-of-band notes for subagent coordination, ensuring side traffic never enters the main transcript
These controls yielded remarkable results, with token costs reduced significantly. The same class of task peaked near 85,000 tokens, and dynamic tool discovery produced similar ratios. One harness reduced input tokens by 96% and total spend by 90% by loading schemas only for tools the agent actually selects.
For a more in-depth analysis, including a full write-up with paper analysis, tree-sitter extraction patterns, and an implementation checklist, click here.
Photo by AI25.Studio Studio on Pexels
Photos provided by Pexels
