KillToken™ API

Build against the LLM optimization gateway.

KillToken™ sits between your app and your model providers (OpenAI, Anthropic, Gemini, Mistral, DeepSeek, OpenRouter, Together AI, Perplexity, xAI, Azure OpenAI, any self-hosted OpenAI-compatible endpoint, AWS Bedrock, and Google Vertex AI). Send server-side LLM traffic through one gateway to measure prompt waste, track cost, enable safe optimization, reuse repeat-safe responses, and export tenant-level ROI data.

Quickstart

Call KillToken™ from a backend, worker, or secure codespace process. Do not expose tenant API keys in browser or mobile client code.

curl
curl http://localhost:3000/v1/chat \
  -H "Authorization: Bearer kt_..." \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4.1",
    "optimizationMode": "measure_only",
    "messages": [
      { "role": "user", "content": "Write a concise project update." }
    ],
    "metadata": { "feature": "weekly-summary" }
  }'

metadata.feature tags the request for the dashboard's per-feature savings breakdown — see Feature tags.

Authentication

Create a tenant API key in the dashboard, then send it as a Bearer token. The full key is shown once and only a hash plus preview are stored.

Authorization: Bearer kt_...

Provider credentials (strict BYOK)

KillToken™ is strict BYOK: every provider call uses the authenticated tenant's own stored, encrypted credential. There is no env/platform fallback — no OpenAI/Anthropic env keys, no AWS environment credentials, and no Google ADC, gcloud, metadata-server, or platform service accounts. If a tenant has no active credential for the requested provider, the call returns 400 provider_credential_required before any cache, idempotency, or provider call. Manage credentials in the dashboard or via POST/GET/PATCH/DELETE /v1/provider-credentials.

  • Single-key providers (openai, anthropic, gemini, mistral, deepseek, openrouter, together, perplexity, xai) supply an apiKey. azure_openai and openai_compatible also supply an apiKey plus non-secret config (endpoint/baseUrl + default model/deployment).
  • Multi-secret providers (aws_bedrock, google_vertex) supply a secrets bundle instead of an apiKeyaws_bedrock uses { accessKeyId, secretAccessKey, sessionToken? }; google_vertex uses { clientEmail, privateKey, privateKeyId? }. Mixing apiKey with secrets returns 400 invalid_provider_secrets.
  • google_vertex config requires projectId, location, and defaultModel (optional https-origin endpointUrl). The stored service-account key signs a short-lived OAuth2 JWT — secrets are encrypted at rest and never returned, logged, or echoed in responses or errors.

Integrating KillToken™ into your app

KillToken™ is a server-side gateway. Call it from a backend, worker, cron, or codespace — never from a browser or mobile client, because the request carries your tenant API key. Your code never holds provider keys; strict BYOK uses your tenant's stored credential.

  1. Mint a tenant API key in the dashboard (sent as Authorization: Bearer kt_...).
  2. Add a provider credential in the dashboard or via POST /v1/provider-credentials.
  3. Call /v1/chat (or a compatible wrapper) from your backend.

Runnable copies live in the repo's examples/ folder (sdk-chat.mjs, node-fetch-chat.mjs, openai-sdk-compatible.mjs, anthropic-messages-wrapper.mjs); each reads KILLTOKEN_BASE_URL and KILLTOKEN_API_KEY from the environment and contains no provider keys.

Integrating with an AI coding agent?

Point your assistant (Claude Code, Cursor, Copilot) at killtoken.io/llms-full.txt — a complete plain-markdown integration guide with every endpoint, header, error code, and example, written so an agent can do the whole integration in one pass. A short index lives at /llms.txt.

Official SDK (recommended)

The first-party @killtoken/sdk package is the recommended backend path — server-side only, strict BYOK-safe, and dependency-light. It throws a KillTokenAPIError (status/code/safe message) on non-2xx and never includes keys, secrets, or headers in errors.

import { KillTokenClient } from "@killtoken/sdk";
// apiKey is your KillToken tenant key (kt_...), NOT a provider key.
const client = new KillTokenClient({ baseUrl: process.env.KILLTOKEN_BASE_URL, apiKey: process.env.KILLTOKEN_API_KEY });
const { response, metrics } = await client.chat({ provider: "openai", model: "gpt-4.1-mini",
  messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } });
// Also: client.providerCredentials.list() / create() / update() / delete() / test()

Plain fetch

const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
  method: "POST",
  headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
  body: JSON.stringify({ provider: "openai", model: "gpt-4.1-mini", optimizationMode: "measure_only",
    messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } })
});
const { response, metrics } = await res.json();

Official OpenAI SDK (baseURL pointed at the wrapper)

import OpenAI from "openai";
// baseURL ends with /v1/openai; apiKey is your KillToken tenant key (NOT an OpenAI key).
const client = new OpenAI({ baseURL: `${process.env.KILLTOKEN_BASE_URL}/v1/openai`, apiKey: process.env.KILLTOKEN_API_KEY,
  defaultHeaders: { "x-killtoken-feature": "support-bot" } }); // optional: tags every call for the per-feature savings breakdown
const completion = await client.chat.completions.create({ model: "gpt-4.1-mini", messages: [{ role: "user", content: "Hello" }] });
const metrics = completion.killtoken?.metrics; // also in the x-killtoken-metrics header

Anthropic Messages wrapper

await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/anthropic/messages`, {
  method: "POST",
  headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
  body: JSON.stringify({ model: "claude-3-5-haiku-latest", max_tokens: 256, messages: [{ role: "user", content: "Hello" }] })
});

Strict BYOK error handling

  • provider_credential_required — add a stored credential for that provider; KillToken never falls back to env/platform keys.
  • invalid_api_key — the tenant Bearer key is missing/invalid (or a supplied provider key was empty on a credential write).
  • provider_not_supportedprovider is not a supported value.
  • streaming_not_supported — the wrappers reject stream: true; send a non-streaming request.

idempotencyKey & cachePolicy

idempotencyKey is a string you choose per request; repeating it returns the stored result without re-calling or re-billing the provider — safe to retry on timeouts. cachePolicy.exactCache (read_write / read_only / write_only / bypass) reuses byte-identical prior responses; a hit skips the provider call and shows in metrics.cacheStatus. Both require a cache backend (KILLTOKEN_CACHE_ENABLED=true).

POST /v1/chat

Primary gateway endpoint. The provider response is returned unchanged alongside KillToken™ metrics.

FieldRequiredNotes
provideryesOne of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`. Requires an active tenant BYOK credential for that provider (strict BYOK — no env/platform fallback).
modelyesProvider model name. KillToken™ does not route to a different model.
messagesyesChat messages sent through the gateway.
optimizationModenoUse `measure_only` or `safe` for the MVP. Defaults to `measure_only`.
metadatanoTenant-owned trace/search context. Set metadata.feature to tag the request for the per-feature savings breakdown (see Feature tags).
providerOptionsnoProvider-specific options, forwarded when supported.
cachePolicynoExact-cache behavior. Defaults to bypass.
idempotencyKeynoReplay-safe key for retries. Max 255 characters.
TypeScript backend example
export async function callKillToken(messages) {
  const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
    method: "POST",
    headers: {
      "authorization": `Bearer ${process.env.KILLTOKEN_API_KEY}`,
      "content-type": "application/json"
    },
    body: JSON.stringify({
      provider: "openai",
      model: "gpt-4.1",
      optimizationMode: "measure_only",
      messages
    })
  });

  if (!res.ok) throw new Error(`KillToken request failed: ${res.status}`);
  return res.json();
}

Cache & idempotency

Caching is server-enabled, then request-opt-in. Use exact cache only when the same request should return the same answer.

bypass

Default. Do not read or write exact cache.

read_only

Return a hit if present; do not write misses.

write_only

Skip lookup; write the provider result.

read_write

Read first and write on miss.

Tag your traffic by feature

Tag each request with the part of your product it serves and the dashboard's Savings by feature table breaks down requests, savings, and cache reuse per feature — turning analytics into a business report. Untagged traffic groups under (untagged).

x-killtoken-feature header

Set it once as a default header on your client and every call from that client is tagged. Best when one client instance serves one app feature — e.g. defaultHeaders: { "x-killtoken-feature": "content-moderation" } on the OpenAI SDK.

metadata.feature body field

Per-request control on /v1/chat and both wrappers. If both are present, the body value wins over the header.

  • Values are trimmed and capped at 80 characters. Use stable, kebab-case names (content-moderation, weekly-summary).
  • Tags are labels, not isolation. Every request still belongs to one tenant, one provider credential, one quota, one bill. Tags answer "which part of my app costs what"; tenants answer "whose key and whose bill". If you serve end-customers who each bring their own provider key, give each one its own KillToken™ workspace — don't encode customers as tags.

OpenAI-compatible wrapper

Point OpenAI-style chat-completions clients at KillToken™. Metrics are returned under `killtoken.metrics` and in response headers.

POST /v1/openai/chat/completions

Streaming is not implemented in the MVP. `stream: true` returns `422 streaming_not_supported`.

Anthropic Messages wrapper

Anthropic-style requests use top-level `system` plus `messages`. Unsupported OpenAI-style tool payloads are rejected before provider execution.

POST /v1/anthropic/messages

Metrics, exports, and reports

Read APIs are tenant-scoped and privacy-safe by default. Request lists and exports omit raw prompt content.

GET /v1/requests

Paginated request trace list with provider/model/mode/cache filters.

GET /v1/requests/:requestId

Single tenant-owned request trace.

GET /v1/analytics/summary

Aggregate totals, savings, cache hit rate, and top templates.

GET /v1/exports/requests.csv

CSV request export with fixed privacy-safe columns.

GET /v1/exports/analytics.json

JSON analytics export with filters and timestamp.

GET /v1/reports/roi

Structured ROI report for estimated, verified, potential, and cache savings.

Operations

GET /health

Lightweight liveness check. No auth.

GET /ready

Readiness checks for persistence, cache, and dashboard auth. No secrets returned.

Self-hosting on Render? The repo's docs/render-deploy.md covers the service blueprint, MongoDB Atlas, Upstash Redis, domain mapping, and rollback.

Common errors

StatusErrorMeaning
400invalid_messagesMessages are missing or malformed.
400invalid_cache_policyCache policy is malformed.
401invalid_api_keyMissing, unknown, or revoked Bearer key.
422provider_not_supportedProvider is not one of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`.
422streaming_not_supportedStreaming proxy support is not in the MVP.