Revolutionizing Job Search Automation: 85% Reduction in Token Usage

A groundbreaking experiment in optimizing Claude usage in job search automation pipelines has achieved remarkable results, with a staggering reduction of approximately 85% in token usage. Initially, the naive flow was consuming around 16,000 tokens per application, which was unsustainable. However, after reworking the pipeline with a focus on token efficiency, the results showed a significant decrease to around 900 tokens per application.

The key takeaways from this experiment include:

  • Prompt caching: implementing a caching system and profile context resulted in a significant reduction of around 40% in repeated operations.
  • Model routing: utilizing lightweight models like Haiku for simple tasks and reserving more expensive models like Opus for heavy tasks.
  • Precomputing reusable data: building an answer bank of standard responses and reusing them across applications eliminated around 94% of LLM calls during form filling.
  • Avoiding duplicate work: using TF-IDF semantic deduplication to filter duplicate job listings before evaluation prevented burning tokens on the same content repeatedly.
  • Reducing over-intelligence: adding a lightweight classifier step before heavy reasoning and only escalating to deeper models when needed.

These optimizations have made the pipeline more stable and efficient, and the code is available on GitHub for others to learn from and build upon.

Photo by Wolfgang Weiser on Pexels
Photos provided by Pexels