KillToken™ API
Build against the LLM optimization gateway.
KillToken™ sits between your app and your model providers (OpenAI, Anthropic, Gemini, Mistral, DeepSeek, OpenRouter, Together AI, Perplexity, xAI, Azure OpenAI, any self-hosted OpenAI-compatible endpoint, AWS Bedrock, and Google Vertex AI). Send server-side LLM traffic through one gateway to measure prompt waste, track cost, enable safe optimization, reuse repeat-safe responses, and export tenant-level ROI data.
Start in minutes
Mint an API key, call `/v1/chat`, and inspect KillToken™ metrics.
Gateway reference
Request fields, response shape, optimization modes, and wrapper endpoints.
Analytics APIs
Requests, summaries, exports, ROI reports, and dashboard-backed metrics.
Production checks
Health, readiness, body limits, Redis cache, and Mongo persistence notes.
Quickstart
Call KillToken™ from a backend, worker, or secure codespace process. Do not expose tenant API keys in browser or mobile client code.
curl http://localhost:3000/v1/chat \
-H "Authorization: Bearer kt_..." \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4.1",
"optimizationMode": "measure_only",
"messages": [
{ "role": "user", "content": "Write a concise project update." }
],
"metadata": { "feature": "weekly-summary" }
}'
metadata.feature tags the request for the dashboard's per-feature savings breakdown — see Feature tags.
Authentication
Create a tenant API key in the dashboard, then send it as a Bearer token. The full key is shown once and only a hash plus preview are stored.
Authorization: Bearer kt_...
Provider credentials (strict BYOK)
KillToken™ is strict BYOK: every provider call uses the authenticated tenant's own stored, encrypted credential. There is no env/platform fallback — no OpenAI/Anthropic env keys, no AWS environment credentials, and no Google ADC, gcloud, metadata-server, or platform service accounts. If a tenant has no active credential for the requested provider, the call returns 400 provider_credential_required before any cache, idempotency, or provider call. Manage credentials in the dashboard or via POST/GET/PATCH/DELETE /v1/provider-credentials.
- Single-key providers (
openai,anthropic,gemini,mistral,deepseek,openrouter,together,perplexity,xai) supply anapiKey.azure_openaiandopenai_compatiblealso supply anapiKeyplus non-secretconfig(endpoint/baseUrl + default model/deployment). - Multi-secret providers (
aws_bedrock,google_vertex) supply asecretsbundle instead of anapiKey—aws_bedrockuses{ accessKeyId, secretAccessKey, sessionToken? };google_vertexuses{ clientEmail, privateKey, privateKeyId? }. MixingapiKeywithsecretsreturns400 invalid_provider_secrets. - google_vertex config requires
projectId,location, anddefaultModel(optional https-originendpointUrl). The stored service-account key signs a short-lived OAuth2 JWT — secrets are encrypted at rest and never returned, logged, or echoed in responses or errors.
Integrating KillToken™ into your app
KillToken™ is a server-side gateway. Call it from a backend, worker, cron, or codespace — never from a browser or mobile client, because the request carries your tenant API key. Your code never holds provider keys; strict BYOK uses your tenant's stored credential.
- Mint a tenant API key in the dashboard (sent as
Authorization: Bearer kt_...). - Add a provider credential in the dashboard or via
POST /v1/provider-credentials. - Call
/v1/chat(or a compatible wrapper) from your backend.
Runnable copies live in the repo's examples/ folder (sdk-chat.mjs, node-fetch-chat.mjs, openai-sdk-compatible.mjs, anthropic-messages-wrapper.mjs); each reads KILLTOKEN_BASE_URL and KILLTOKEN_API_KEY from the environment and contains no provider keys.
Integrating with an AI coding agent?
Point your assistant (Claude Code, Cursor, Copilot) at killtoken.io/llms-full.txt — a complete plain-markdown integration guide with every endpoint, header, error code, and example, written so an agent can do the whole integration in one pass. A short index lives at /llms.txt.
Official SDK (recommended)
The first-party @killtoken/sdk package is the recommended backend path — server-side only, strict BYOK-safe, and dependency-light. It throws a KillTokenAPIError (status/code/safe message) on non-2xx and never includes keys, secrets, or headers in errors.
import { KillTokenClient } from "@killtoken/sdk";
// apiKey is your KillToken tenant key (kt_...), NOT a provider key.
const client = new KillTokenClient({ baseUrl: process.env.KILLTOKEN_BASE_URL, apiKey: process.env.KILLTOKEN_API_KEY });
const { response, metrics } = await client.chat({ provider: "openai", model: "gpt-4.1-mini",
messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } });
// Also: client.providerCredentials.list() / create() / update() / delete() / test()
Plain fetch
const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
method: "POST",
headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
body: JSON.stringify({ provider: "openai", model: "gpt-4.1-mini", optimizationMode: "measure_only",
messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } })
});
const { response, metrics } = await res.json();
Official OpenAI SDK (baseURL pointed at the wrapper)
import OpenAI from "openai";
// baseURL ends with /v1/openai; apiKey is your KillToken tenant key (NOT an OpenAI key).
const client = new OpenAI({ baseURL: `${process.env.KILLTOKEN_BASE_URL}/v1/openai`, apiKey: process.env.KILLTOKEN_API_KEY,
defaultHeaders: { "x-killtoken-feature": "support-bot" } }); // optional: tags every call for the per-feature savings breakdown
const completion = await client.chat.completions.create({ model: "gpt-4.1-mini", messages: [{ role: "user", content: "Hello" }] });
const metrics = completion.killtoken?.metrics; // also in the x-killtoken-metrics header
Anthropic Messages wrapper
await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/anthropic/messages`, {
method: "POST",
headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
body: JSON.stringify({ model: "claude-3-5-haiku-latest", max_tokens: 256, messages: [{ role: "user", content: "Hello" }] })
});
Strict BYOK error handling
provider_credential_required— add a stored credential for that provider; KillToken never falls back to env/platform keys.invalid_api_key— the tenant Bearer key is missing/invalid (or a supplied provider key was empty on a credential write).provider_not_supported—provideris not a supported value.streaming_not_supported— the wrappers rejectstream: true; send a non-streaming request.
idempotencyKey & cachePolicy
idempotencyKey is a string you choose per request; repeating it returns the stored result without re-calling or re-billing the provider — safe to retry on timeouts. cachePolicy.exactCache (read_write / read_only / write_only / bypass) reuses byte-identical prior responses; a hit skips the provider call and shows in metrics.cacheStatus. Both require a cache backend (KILLTOKEN_CACHE_ENABLED=true).
POST /v1/chat
Primary gateway endpoint. The provider response is returned unchanged alongside KillToken™ metrics.
| Field | Required | Notes |
|---|---|---|
| provider | yes | One of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`. Requires an active tenant BYOK credential for that provider (strict BYOK — no env/platform fallback). |
| model | yes | Provider model name. KillToken™ does not route to a different model. |
| messages | yes | Chat messages sent through the gateway. |
| optimizationMode | no | Use `measure_only` or `safe` for the MVP. Defaults to `measure_only`. |
| metadata | no | Tenant-owned trace/search context. Set metadata.feature to tag the request for the per-feature savings breakdown (see Feature tags). |
| providerOptions | no | Provider-specific options, forwarded when supported. |
| cachePolicy | no | Exact-cache behavior. Defaults to bypass. |
| idempotencyKey | no | Replay-safe key for retries. Max 255 characters. |
export async function callKillToken(messages) {
const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
method: "POST",
headers: {
"authorization": `Bearer ${process.env.KILLTOKEN_API_KEY}`,
"content-type": "application/json"
},
body: JSON.stringify({
provider: "openai",
model: "gpt-4.1",
optimizationMode: "measure_only",
messages
})
});
if (!res.ok) throw new Error(`KillToken request failed: ${res.status}`);
return res.json();
}
Cache & idempotency
Caching is server-enabled, then request-opt-in. Use exact cache only when the same request should return the same answer.
bypassDefault. Do not read or write exact cache.
read_onlyReturn a hit if present; do not write misses.
write_onlySkip lookup; write the provider result.
read_writeRead first and write on miss.
Tag your traffic by feature
Tag each request with the part of your product it serves and the dashboard's Savings by feature table breaks down requests, savings, and cache reuse per feature — turning analytics into a business report. Untagged traffic groups under (untagged).
x-killtoken-feature header
Set it once as a default header on your client and every call from that client is tagged. Best when one client instance serves one app feature — e.g. defaultHeaders: { "x-killtoken-feature": "content-moderation" } on the OpenAI SDK.
metadata.feature body field
Per-request control on /v1/chat and both wrappers. If both are present, the body value wins over the header.
- Values are trimmed and capped at 80 characters. Use stable, kebab-case names (
content-moderation,weekly-summary). - Tags are labels, not isolation. Every request still belongs to one tenant, one provider credential, one quota, one bill. Tags answer "which part of my app costs what"; tenants answer "whose key and whose bill". If you serve end-customers who each bring their own provider key, give each one its own KillToken™ workspace — don't encode customers as tags.
OpenAI-compatible wrapper
Point OpenAI-style chat-completions clients at KillToken™. Metrics are returned under `killtoken.metrics` and in response headers.
Streaming is not implemented in the MVP. `stream: true` returns `422 streaming_not_supported`.
Anthropic Messages wrapper
Anthropic-style requests use top-level `system` plus `messages`. Unsupported OpenAI-style tool payloads are rejected before provider execution.
Metrics, exports, and reports
Read APIs are tenant-scoped and privacy-safe by default. Request lists and exports omit raw prompt content.
Paginated request trace list with provider/model/mode/cache filters.
Single tenant-owned request trace.
Aggregate totals, savings, cache hit rate, and top templates.
CSV request export with fixed privacy-safe columns.
JSON analytics export with filters and timestamp.
Structured ROI report for estimated, verified, potential, and cache savings.
Operations
GET /health
Lightweight liveness check. No auth.
GET /ready
Readiness checks for persistence, cache, and dashboard auth. No secrets returned.
Self-hosting on Render? The repo's docs/render-deploy.md covers the service blueprint, MongoDB Atlas, Upstash Redis, domain mapping, and rollback.
Common errors
| Status | Error | Meaning |
|---|---|---|
| 400 | invalid_messages | Messages are missing or malformed. |
| 400 | invalid_cache_policy | Cache policy is malformed. |
| 401 | invalid_api_key | Missing, unknown, or revoked Bearer key. |
| 422 | provider_not_supported | Provider is not one of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`. |
| 422 | streaming_not_supported | Streaming proxy support is not in the MVP. |