LLM Vendor Cost Comparison – API vs Self-Hosted

Vendor prices and self-hosted inputs are fully user-defined assumptions. Defaults are examples only.

Requests per second

Prompt tokens per request

Completion tokens per request

Active hours per day

Active days per month

API Vendors (2 to 4 rows)

Self-Hosted Inputs

GPUs

Tokens/sec per GPU

GPU price per hour (USD)

About This Calculator

LLM Vendor Cost Comparison – API vs Self-Hosted is designed to reduce manual errors and give repeatable outputs when you need quick, reliable answers.

Compare the monthly cost of running the same LLM workload across multiple API vendors versus self-hosting open-source models on GPUs.

If your workflow expands, pair this calculator with AI Infrastructure Total Cost of Ownership (On-Prem vs Cloud GPU) Calculator and LLM Token & Cloud GPU Cost Estimator to cross-check assumptions and build a stronger analysis chain.

Formula

R_m = r * 3600 * h_d * d_m; T = T_p + T_c; tokens_m = R_m * T; per vendor: C_api_j = (tokens_m / 1000) * p_j; self-hosted: capacity = q * N; GPUhours = tokens_m / ((q * N) * 3600); C_self = GPUhours * p_gpu.

Example Calculation

The worked example below demonstrates how the input fields translate into the final output. Use it as a quick validation pass before entering your own numbers.

requestsPerSecond: 1
promptTokensPerRequest: 500
completionTokensPerRequest: 300
activeHoursPerDay: 24
activeDaysPerMonth: 30
apiVendors: [object Object],[object Object]
selfHosted: [object Object]

Explanation of Results

Result Interpretation

For this workload, VendorA costs about 20.7k/month and VendorB about 41.5k/month, while a self-hosted setup that can serve the same load at 400 tokens/sec per GPU would cost roughly 1.3k/month in GPU time at 3.50/hour.

FAQ

What about storage, engineering time, and other self-hosting costs?

This model compares workload-serving cost only; include staffing, storage, and platform operations separately in your full TCO view.

How do I estimate realistic tokens-per-second for my model?

Use measured throughput from your own inference stack and hardware profile, then enter that observed value as the input assumption.

Does this compare quality or only price?

It compares cost only; model quality, latency, and reliability need separate evaluation.

Related Calculators

Continue exploring tools in this topic cluster to improve internal discoverability and reduce orphaned workflows.