Cloud bills are scrutinized to the cent. AI coding tool spend analysis is complex, with token-based pricing, multiple subscriptions, and quick pilots hiding total costs until reviewed by finance.
This tutorial attempts to close that gap.
Soon you’ll have a CSV inventory with named owners and a defendable measurement for budget reviews.
What You’ll Build
A CSV report that details each AI software development tool used by teams, including ownership, monthly cost, usage level, and the amount of code surviving after sixty days. The first six columns come from a Python tool you run; the rest are added manually.
Success looks like this: answer “what does AI cost the engineering org this month, and what are they getting for it?” in a single glance.
What You’ll Learn
- How to run a Python tool that pulls cost, seat, and token data from major AI software development tool vendors.
- How to attribute spend to a single named owner per tool.
- How to compute a token-cost-per-merged-pull-request (PR) metric that ignores throwaway code.
- How to run the first monthly review so it produces decisions.
Time estimate: 1 to 2 hours. Most of that involves collecting admin API keys from vendors.
Difficulty: Intermediate. Assumes comfort with the command line, Python, and reading vendor admin docs.
Prerequisites
Required:
- Admin (or billing-admin) access to every AI vendor account the org pays. Anthropic, OpenAI, GitHub Copilot, Cursor, AWS, Google.
gitand a Unix-like shell.uv(Python package manager). Install withbrew install uvor per the uv install guide.- About 15 minutes of the finance partner’s time.
Not required:
- A FinOps platform. The tool writes a CSV that opens in any spreadsheet.
- A custom MLOps stack. The tool calls vendor admin endpoints directly.
- Buy-in from leadership before starting. Numbers earn buy-in.
Setup
Step 1: Generate the cost data
The ai-software-tool-usage-report pulls monthly cost, seat, and token data from every major AI coding vendor and writes a CSV. Its README walks through cloning, installing with uv sync, configuring vendor-specific credentials (Anthropic and OpenAI admin keys, Copilot PAT, Cursor team key, Gemini ADC, AWS profile), and running the report. Follow it end-to-end before continuing.
The output is ai-tool-usage-YYYY-MM.csv for the current calendar month, with one row per vendor.
Checkpoint: the CSV exists. Configured vendors have a non-empty monthly_cost_usd; skipped vendors say so in the notes column.
Tutorial Steps
Step 1: Review the CSV
Open ai-tool-usage-YYYY-MM.csv in a spreadsheet. The tool’s README explains each column. The key insight during the review is the seat-versus-usage delta for rows where pricing_model = seat: a 200-seat license with 60 active users covers 140 ghosts, making canceling or resizing straightforward.
Checkpoint: The total of monthly_cost_usd matches finance’s number within five percent. A larger gap indicates missing vendors, often pilot accounts, marketplace add-ons, or personal API keys charged to a corporate card.
Step 2: Assign one human owner per tool
The CSV does not include an owner column, so please add one manually. Assign a single named human per row — no team alias, Slack channel, or “TBD.” The owner updates the CSV monthly and reports the numbers at the review scheduled in Step 4.
If a tool has no plausible owner, that is the answer for that tool. Cancel it at the next renewal. Write CANCEL in the notes column and move on.
You should see: every row has a person’s name in Owner. Some rows have CANCEL in notes.
What just happened: AI spends balloons most often because no one owns a single line item. Adding a name to each row is the simplest, most effective change in this tutorial.
Checkpoint: Every row has an owner reachable today by DM to confirm last month’s expenses.
Step 3: Build a token-cost-per-merged-PR baseline
Velocity numbers are gameable. The goal here is a metric tied to code that has proven to survive.
The formula:
$$ CPD = \frac{C}{N} $$
Where:
- $CPD$ = cost per durable PR
- $C$ = monthly tool cost (USD)
- $N$ = durable PR count (merged 30-60 days ago, excluding reverts)
The 30-to-60-day window matters. Reverts within a month indicate noise, not output, so counting them isn’t accurate.
Use gh to gather the merged PR list, then filter out reverts and follow-up patches. Run it from the repo root:
# Merged PRs from the window 30 to 60 days back, excluding obvious reverts.
GH_DEBUG= gh pr list \
--state merged \
--search "merged:$(date -v-60d +%Y-%m-%d)..$(date -v-30d +%Y-%m-%d)" \
--limit 1000 \
--json number,title,mergeCommit \
2>/dev/null \
| jq '[.[] | select(.title | test("^Revert|^revert"; "i") | not)] | length'Divide each tool’s monthly_cost_usd by that count and write the result in a new column called Cost per durable PR (USD).
You should see: a per-tool number defensible for finance. It is unflattering for at least one tool. That is the point.
What just happened: The metric that the finance partner can read is built. The 60-day survival rule is what makes the number defensible: it counts code that survived, not code that shipped and got reverted.
Checkpoint: The reasoning for excluding reverts can be explained in two sentences without jargon.
Step 4: Schedule the first monthly review
Create a recurring 30-minute meeting on the first Tuesday of every month. The agenda is one page:
- Each
Ownerreads their row’smonthly_cost_usd,active_users, andCost per durable PR. - Anything that moved more than 20 percent gets a one-sentence reason.
- Anything still marked
CANCELgets a cancellation date. - New tools added that month get an owner before the meeting ends.
If a senior leader skips two months in a row, pause that tool’s spend at the next renewal. Please write that rule in the meeting invite description so everyone can see it.
You should see: a recurring calendar event with the agenda in the body and every tool owner invited.
Checkpoint: The first meeting is on the calendar with a written agenda. Start this month, not next quarter.
Verification
Done when all of these are true:
uv run python index.pywrites a CSV with one row per vendor the org pays.- Every row has an
Ownername and aCost per durable PRvalue. - Sum of
monthly_cost_usdmatches finance’s number within five percent. - The first monthly review is on the calendar with all owners invited.
Finish every item before sharing the dashboard. Half-built dashboards lose credibility fast.
A Starting Point, Not the Finish Line
A mature team would automate every step here: scheduled CSV pulls, owner attribution from an HRIS, durable-PR counts streamed from GitHub webhooks, alerts when Cost per durable PR crosses a threshold, and a Grafana board replacing the spreadsheet. That is the right destination.
This tutorial is the starting point. A manual CSV with named owners and a defendable metric is enough to win the first budget review, surface the loudest waste, and earn the political capital to automate the rest. Build the pipeline once the workflow has proven its worth.
Troubleshooting
For tool-side problems (a row reads SKIPPED, AWS Kiro reports zero, Gemini cost stays blank), see the tool README’s Caveats. The cases below are about the workflow, not the tool.
Problem: The token-cost-per-PR metric looks great, but the team feels slower
Symptoms: The metric drops month over month, but PR review time, defect escape rate, or staging incidents climb.
Solution: add a Code breakage rate column. Count PRs from the same 30-to-60-day window that triggered an incident, a hotfix, or a revert after 30 days, and divide by total durable PRs. A speed gain that ships defects to QA shifts costs downstream.
If that does not work: cross-reference incident reports with Author and look for a correlation between AI-heavy PRs and incidents. Adding ai-suggested:high|low PR labels (via a pre-commit hook that scans for AI signatures) makes this much easier.
Problem: owners stop pulling their numbers after two months
Symptoms: The monthly review turns into one person reading the dashboard aloud while the owners stay quiet.
Solution: escalate at the third miss. The pattern usually breaks after one enforcement. Soft enforcement does not work; the calendar entry requires a written consequence.
If that does not work: Please move ownership of the row to whoever is willing to advocate for it. An unowned tool is a tool that can be canceled.
Problem: the cost numbers swing wildly week to week
Symptoms: weekly spend doubles, then halves, then doubles again.
Solution: switch to a rolling 4-week mean. Vendor pricing changes, model upgrades, and surge usage make any single week noisy. A rolling window smooths noise and preserves real trends.
If that does not work: pin model versions in the tool config so silent model upgrades stay out of the cost comparison.
Next Steps
To learn more:
- Read the FinOps Foundation’s framework for cloud financial management. The same patterns apply to AI tool spend, with shorter feedback loops.
- Read on this blog about measuring software ROI, which frames durable output for non-AI work.
To extend this project:
- Add a vendor adapter to the tool — its README’s Development section walks through the
VendorAdapterpattern. - Add an LLM gateway with semantic caching in front of the chattiest tool. A 30-60% token reduction for duplicate questions is a reasonable target.
- Add
ai-suggestedPR labels via a pre-commit hook that scans for AI-authored hunks, and recomputeCost per durable PRsplit by AI-heavy versus AI-light PRs. - Add a quarterly total-cost-of-ownership column: subscription cost, minus saved engineering hours valued at fully loaded rates, minus extra QA and incident-response cost.
- Add this interview question to the staff and principal engineer hiring loop: “How would you measure whether an AI coding tool is paying for itself on your team?” Candidates who think in cost-per-outcome are rare and worth the seat.
References
- AI Software Tool Usage Report, the Python tool used in this tutorial. Its README covers vendor credentials, the run command, and CSV column reference.
- Measuring software ROI, a related post on this blog that frames the durable-output question for non-AI work.
- FinOps Framework, the FinOps Foundation’s reference model for cloud financial management. The principles port cleanly to the AI tool spend.
- GitHub CLI: pr list, the command used in Step 3 to count merged PRs over a date window.
- uv install guide, the package manager used in this tutorial.

Comments #