How to Reduce Token Consumption with Claude Code and Codex

March 19, 2026

Tiago Valverde

4 min read

Token usage with AI coding agents is easy to ignore until you get a surprise bill or hit a context limit mid-task. After using both Claude Code and Codex daily, I have picked up habits that keep consumption lean without slowing anything down. Most of these apply to both tools since they share the same underlying pattern: everything in your session becomes context, and context costs tokens.

What burns tokens fast

Large file reads. Every time the agent reads a file, the contents go into context. Reading a 500-line file when you only needed one function is expensive. In Claude Code you can ask for specific line ranges. In Codex, keeping your working files small and well-scoped helps the model stay focused.

Long conversation threads. Claude Code compresses old messages, but a session running for hours accumulates a lot of context. Codex chats in VS Code have a similar problem. Starting a fresh session for a new, unrelated task is cheaper than continuing an old one.

Verbose tool output. Commands that print large amounts of output (long git logs, full test suites, large JSON blobs) all feed into context. Pipe through head, tail, or grep where possible.

Frequent re-reads. Asking the agent to check the same file multiple times in a session adds up. A well-scoped session that reads once and acts is more efficient.

Broad searches. Asking the agent to search an entire codebase for something that could have been found with a targeted pattern costs more than it should. Be specific about where to look.

Using a large model for small tasks. In Claude Code, using Opus for a one-liner edit is overkill. Sonnet handles most coding tasks well, and Haiku is fine for quick lookups. In Codex, GPT-4o is heavier than GPT-4o-mini for straightforward completions.

Habits that reduce usage

Be specific in your requests. "Fix the bug on line 42 of src/lib/mdx.ts" burns fewer tokens than "there is something wrong with the MDX rendering, can you investigate". The more precise the prompt, the less the agent needs to explore.

Front-load context with config files. In Claude Code, a well-written CLAUDE.md means you do not have to re-explain the project every session. Codex picks up initial context from AGENTS.md. Use these to capture architecture decisions, conventions, and key file paths once rather than repeating them in every chat.

Scope sessions tightly. One session per task is cleaner than one session for everything. A focused session with a clear goal uses context efficiently in both tools.

Avoid re-explaining. If you find yourself repeating the same context across sessions, it belongs in a config file, not the chat.

Pick the right model for the job. In Claude Code, Haiku for simple tasks, Sonnet for day-to-day work, Opus for architecture or complex debugging. In Codex/Copilot, prefer lighter models for autocomplete and inline suggestions, and heavier ones only for multi-step agent tasks.

Reference files, do not paste them. Point to a file path rather than pasting content into the prompt. The agent reads it more efficiently than inline text.

The underlying principle

Tokens are context. The more context the agent carries, the more it costs, regardless of whether you are using Claude or Codex. Keeping sessions focused, prompts specific, and file reads targeted is not just about saving money. It also makes the agent faster and more accurate because it is not wading through noise to find the signal.

Work with the agent the way you would pair with a sharp colleague: give clear briefs, point to the right files, and wrap up the session when the task is done.

What burns tokens fast#

Habits that reduce usage#

The underlying principle#

Related Posts

The Shift from Vibe Coding to Spec-Driven Development

Claude's New Telegram Plugin: Talking to Your AI Agent on the Go

Performance Tracing with Chrome DevTools MCP and The AI Agent

What burns tokens fast

Habits that reduce usage

The underlying principle