Disclosure: This article includes a link to Cursor. AgentPlix may earn a commission if you sign up via that link. All views are the author’s own.

I Built a “Pulse” for Claude Code So I Stop Guessing Tokens and Cost

Three weeks into using Claude Code as my daily driver, I got a surprise: a $47 bill for a single afternoon of refactoring. I had no idea I was burning that many tokens. Claude Code is genuinely great at its job, which is exactly the problem. It’s so fluid that you stop thinking of it as “calling an API.” You think of it as typing. And typing, it turns out, can get expensive fast.

So I built a small tool I’m calling pulse: a lightweight Python script that tails Claude Code’s session log in real time, parses every message exchange, and prints a running token count and cost estimate directly in your terminal. No dashboards, no SaaS subscriptions. Just a second terminal pane that tells you exactly where you stand.

This article walks through why the problem is harder than it looks, how to build pulse from scratch, and a few extensions that make it genuinely useful for daily work.


Why Claude Code’s Token Blindspot Hurts More Than You Think

Most API wrappers at least surface token counts in their response objects. You call client.messages.create(), you get back usage.input_tokens and usage.output_tokens, and you can log them yourself. Claude Code abstracts all of that. It manages context, handles tool calls, injects system prompts, and streams output. By design, you interact with it like an IDE assistant, not like an API client.

The consequence is that you have no idea how many tokens are in the active context window at any moment. That matters for two reasons:

  1. Cost compounds with context size. Every turn of the conversation includes the full prior context as input tokens. A 10,000-token codebase that you’ve been chatting about for an hour might be costing you 80,000 input tokens per message by the end of the session. The per-message cost isn’t linear; it accelerates.

  2. Context limits affect output quality. When you’re approaching the 200K context limit, Claude starts compressing earlier context. Knowing you’re at 150K lets you decide to start a fresh session instead of getting degraded output.

💡 Key Insight
Input tokens cost more to accumulate than any single output. A two-hour session where you never clear context can cost 5x more than the same work split into focused sub-sessions. Visibility changes behavior.

Claude Code writes a detailed JSONL session log to disk for every session. That log is your data source. Pulse reads it.


Finding the Session Log

Claude Code stores session data at a predictable path on macOS and Linux:

~/.claude/projects/<project-hash>/sessions/<session-id>.jsonl

Each line in the file is a JSON object representing a single event in the conversation: a user message, an assistant message, a tool call, a tool result, or metadata. The structure looks roughly like this:

{
  "type": "assistant",
  "message": {
    "usage": {
      "input_tokens": 4821,
      "output_tokens": 312,
      "cache_read_input_tokens": 0,
      "cache_creation_input_tokens": 0
    },
    "content": [...]
  },
  "timestamp": "2026-04-23T09:14:22.441Z"
}

The usage block on assistant messages is the gold mine. It gives you input tokens (everything Claude read to produce this response, including full context), output tokens (what it wrote back), and two cache-related fields that matter for cost calculation.

⚠️ Note on Cache Tokens
Claude's prompt caching reduces cost on repeated context. Cache read tokens cost roughly 10% of standard input tokens. Cache creation tokens cost 25% more. Pulse accounts for both so your cost estimate stays accurate.

The latest session file is always the one with the most recent modification time. You don’t need to know the session ID. A quick glob + max(key=os.path.getmtime) finds it automatically.


Building Pulse: The Core Script

Here’s the full script. Save it as pulse.py anywhere on your system.

#!/usr/bin/env python3
"""
pulse.py — Real-time token and cost monitor for Claude Code sessions.
Usage: python3 pulse.py [--alert TOKENS]
"""

import os
import json
import time
import glob
import argparse
from datetime import datetime

# Claude pricing as of April 2026 (claude-3-5-sonnet, claude-3-7-sonnet)
# Update these if Anthropic changes pricing.
PRICING = {
    "input_per_mtok": 3.00,          # $3.00 per million input tokens
    "output_per_mtok": 15.00,         # $15.00 per million output tokens
    "cache_read_per_mtok": 0.30,      # $0.30 per million cache read tokens
    "cache_write_per_mtok": 3.75,     # $3.75 per million cache write tokens
}

def find_latest_session():
    pattern = os.path.expanduser("~/.claude/projects/*/sessions/*.jsonl")
    files = glob.glob(pattern)
    if not files:
        return None
    return max(files, key=os.path.getmtime)

def parse_session(filepath):
    totals = {
        "input_tokens": 0,
        "output_tokens": 0,
        "cache_read_input_tokens": 0,
        "cache_creation_input_tokens": 0,
        "turns": 0,
    }
    try:
        with open(filepath, "r") as f:
            for line in f:
                line = line.strip()
                if not line:
                    continue
                try:
                    event = json.loads(line)
                except json.JSONDecodeError:
                    continue
                if event.get("type") == "assistant":
                    usage = event.get("message", {}).get("usage", {})
                    totals["input_tokens"] += usage.get("input_tokens", 0)
                    totals["output_tokens"] += usage.get("output_tokens", 0)
                    totals["cache_read_input_tokens"] += usage.get("cache_read_input_tokens", 0)
                    totals["cache_creation_input_tokens"] += usage.get("cache_creation_input_tokens", 0)
                    totals["turns"] += 1
    except FileNotFoundError:
        pass
    return totals

def calculate_cost(totals):
    cost = (
        (totals["input_tokens"] / 1_000_000) * PRICING["input_per_mtok"]
        + (totals["output_tokens"] / 1_000_000) * PRICING["output_per_mtok"]
        + (totals["cache_read_input_tokens"] / 1_000_000) * PRICING["cache_read_per_mtok"]
        + (totals["cache_creation_input_tokens"] / 1_000_000) * PRICING["cache_write_per_mtok"]
    )
    return cost

def format_tokens(n):
    if n >= 1_000_000:
        return f"{n/1_000_000:.2f}M"
    if n >= 1_000:
        return f"{n/1_000:.1f}K"
    return str(n)

def render(totals, cost, filepath, alert_threshold):
    os.system("clear")
    session_name = filepath.split("/sessions/")[-1].replace(".jsonl", "")[:16]
    now = datetime.now().strftime("%H:%M:%S")

    print(f"  ⚡ pulse  [{now}]  session: {session_name}...")
    print(f"  {'─' * 46}")
    print(f"  Turns          {totals['turns']:>10}")
    print(f"  Input tokens   {format_tokens(totals['input_tokens']):>10}")
    print(f"  Output tokens  {format_tokens(totals['output_tokens']):>10}")
    print(f"  Cache read     {format_tokens(totals['cache_read_input_tokens']):>10}")
    print(f"  Cache write    {format_tokens(totals['cache_creation_input_tokens']):>10}")
    print(f"  {'─' * 46}")
    print(f"  Est. cost      {'${:.4f}'.format(cost):>10}")

    if alert_threshold and totals["input_tokens"] >= alert_threshold:
        print(f"\n  ⚠️  ALERT: Input tokens exceeded {format_tokens(alert_threshold)}")

    print(f"\n  Refreshing every 3s. Ctrl+C to exit.")

def main():
    parser = argparse.ArgumentParser(description="Real-time Claude Code token monitor")
    parser.add_argument("--alert", type=int, default=None,
                        help="Alert when input tokens exceed this threshold")
    args = parser.parse_args()

    print("⚡ pulse — searching for active Claude Code session...")
    time.sleep(1)

    while True:
        filepath = find_latest_session()
        if not filepath:
            print("No Claude Code session found. Start a session and re-run pulse.")
            time.sleep(5)
            continue

        totals = parse_session(filepath)
        cost = calculate_cost(totals)
        render(totals, cost, filepath, args.alert)
        time.sleep(3)

if __name__ == "__main__":
    main()

Run it in a split terminal pane alongside Claude Code:

python3 pulse.py --alert 50000

That’s it. No dependencies beyond the Python standard library.


What You See and Why Each Number Matters

Pulse renders a clean, minimal display that refreshes every 3 seconds:

  ⚡ pulse  [09:31:44]  session: a3f8b21c9d04e7...

  ──────────────────────────────────────────────
  Turns                    14
  Input tokens           87.3K
  Output tokens           9.1K
  Cache read             41.2K
  Cache write            22.8K
  ──────────────────────────────────────────────
  Est. cost              $0.3921

  Refreshing every 3s. Ctrl+C to exit.

A few things worth calling out:

Input tokens are cumulative context, not per-message input. At turn 14 with 87.3K input tokens, each subsequent message is going to cost roughly 6,000+ input tokens just to maintain context. That’s $0.018 per turn at current pricing, purely for context.

Cache read tokens are your friend. Claude Code uses prompt caching aggressively on system prompts and stable context. Cache read tokens cost 10x less than standard input tokens. When you see a high cache-read ratio, you’re working efficiently.

Output tokens are almost always cheaper than input. At 15x the per-token cost of input, output sounds expensive. But output volume is usually 10 to 20 percent of input volume in typical coding sessions, so it rarely dominates the total.

What Pulse Tells You

  • Running cost estimate so you can set a session budget
  • Input token total so you can judge context bloat
  • Cache efficiency ratio to spot whether caching is actually working
  • Turn count to correlate cost per conversation turn
  • Threshold alerts before you blow past a daily cap

What Pulse Cannot Tell You

  • Which specific message caused a spike (requires log correlation)
  • Model version being used (affects pricing tiers)
  • Tool call token overhead broken out separately
  • Cross-session totals for the day (requires aggregation layer)

Extending Pulse: Daily Budget Tracking

The single-session view is useful. A daily aggregate is more useful. Add this function to log each session’s final totals to a simple JSON file when you exit:

import signal
import sys

DAILY_LOG = os.path.expanduser("~/.pulse_daily.json")

def load_daily():
    if os.path.exists(DAILY_LOG):
        with open(DAILY_LOG) as f:
            return json.load(f)
    today = datetime.now().strftime("%Y-%m-%d")
    return {"date": today, "total_cost": 0.0, "sessions": []}

def save_daily(daily, session_cost, session_id):
    today = datetime.now().strftime("%Y-%m-%d")
    if daily.get("date") != today:
        daily = {"date": today, "total_cost": 0.0, "sessions": []}
    daily["total_cost"] = round(daily["total_cost"] + session_cost, 6)
    daily["sessions"].append({"id": session_id, "cost": round(session_cost, 6)})
    with open(DAILY_LOG, "w") as f:
        json.dump(daily, f, indent=2)
    print(f"\n  Session ended. Today's total: ${daily['total_cost']:.4f}")

Hook this into the main loop using a signal.signal(signal.SIGINT, ...) handler so the daily log updates cleanly when you Ctrl+C. Over a week, ~/.pulse_daily.json becomes a useful audit trail.


Correlating High-Token Prompts After a Session

The session log is also useful for post-mortem analysis. This one-liner finds the three most expensive turns in your last session:

python3 - <<'EOF'
import glob, json, os

f = max(glob.glob(os.path.expanduser("~/.claude/projects/*/sessions/*.jsonl")),
        key=os.path.getmtime)

turns = []
for line in open(f):
    try:
        e = json.loads(line)
    except:
        continue
    if e.get("type") == "assistant":
        u = e.get("message", {}).get("usage", {})
        turns.append((u.get("input_tokens", 0), u.get("output_tokens", 0)))

turns.sort(reverse=True)
for i, (inp, out) in enumerate(turns[:3], 1):
    print(f"Turn {i}: {inp:,} input / {out:,} output")
EOF

Sample output from a recent refactoring session:

Turn 1: 94,821 input / 2,341 output
Turn 2: 88,104 input / 1,882 output
Turn 3: 71,339 input / 4,201 output

The first turn had nearly 95K input tokens because I had pasted a large schema early in the session and never cleared it. That single architectural decision cost more than the rest of the session combined.

💡 Practical Rule
If you paste large files or schemas into Claude Code, start a new session once that specific task is done. Carrying 50K tokens of irrelevant context forward into unrelated questions is the single easiest cost to eliminate.

Updating Pricing When Anthropic Changes Rates

Anthropic adjusts pricing periodically. The PRICING dictionary at the top of pulse.py is the only thing you need to update. Current rates for Claude 3.5 Sonnet and Claude 3.7 Sonnet (as of April 2026) are reflected in the script above. Check Anthropic’s pricing page after any model announcement.

If you want pulse to detect the model automatically, you can parse the model field from the assistant message events (it’s there in the JSONL) and look up the correct pricing tier from a dictionary. That’s a useful 20-line extension if you switch between models frequently.


Running Pulse Automatically with Every Claude Code Session

On macOS, you can wire pulse into a shell function so it launches automatically when you open Claude Code. Add this to your ~/.zshrc:

claude-with-pulse() {
  python3 ~/pulse.py --alert 80000 &
  PULSE_PID=$!
  claude "$@"
  kill $PULSE_PID 2>/dev/null
}
alias cc="claude-with-pulse"

Now cc . opens Claude Code in the current directory and starts the pulse monitor in the background. When you exit Claude Code, the monitor process is cleaned up automatically.


What This Changes in Practice

After two weeks of running pulse, I made three concrete changes to how I use Claude Code:

  1. I start fresh sessions more aggressively. When I finish a discrete task (write tests, refactor a module, debug a specific error), I start a new session rather than continuing. Session startup costs almost nothing. Carrying 80K tokens of stale context costs real money.

  2. I paste less. Instead of dumping entire files into the conversation, I reference them by path and let Claude Code’s file reading tools handle it. The tool result tokens are usually a fraction of what a manual paste would cost.

  3. I use the --alert flag as a soft daily budget. At 80K input tokens, I pause and ask: is what I’m working on worth another 80K tokens? Sometimes yes. Often I realize I should summarize and restart.

These aren’t revelations. But without visibility, I wasn’t making these decisions at all. Pulse made the cost concrete enough to change behavior.

Bottom Line

Pulse is 60 lines of Python and zero dependencies. Build it once, and you'll never fly blind in a Claude Code session again.


Where to Go From Here

Pulse is intentionally minimal. If you want to go further, a few natural extensions:

  • Web dashboard: Replace the terminal render with a http.server endpoint and poll it from a browser tab. Useful if you want the monitor visible without a split terminal.
  • Cost alerts via Slack or ntfy.sh: Post to a webhook when the threshold is crossed. Useful for long autonomous agent runs where you’re not watching the terminal. (Related: check out how to set up Claude Code for autonomous agent tasks using multi-session workflows.)
  • Weekly cost report: Aggregate ~/.pulse_daily.json files and email yourself a weekly summary. Useful for understanding which projects are actually expensive.
  • Multi-model support: If you’re also using Cursor or other tools that log JSONL, the same parsing pattern applies with adjusted pricing dictionaries.

If you want to go deeper on Claude’s token architecture before building on top of it, the AgentPlix guide to Claude’s context window covers how input, output, and cache tokens interact in detail.

The core idea here scales. Once you have the session log as a data source, you can build any kind of observability layer you want. Pulse is just the starting point.


Have questions about the script or want to share how you’ve extended it? Drop a comment below. And if you’re looking for a hosted alternative while you’re getting started, Anthropic’s Claude.ai usage dashboard gives you aggregate monthly token counts, though it lacks the real-time session-level granularity that makes pulse useful for active development work.