# KillToken — complete integration guide

This document is the complete, self-contained reference for integrating KillToken into an application. It is written so that an AI coding agent can perform the integration end to end without reading any other page. Last verified against the production API in June 2026.

## What KillToken is

KillToken is an adaptive LLM optimization gateway. Your application sends its chat-completion traffic to KillToken instead of directly to the provider. KillToken forwards each call to the provider using YOUR stored provider credential (strict BYOK), measures tokens and cost, optionally optimizes prompts and serves repeats from cache, and reports estimated/verified savings on a dashboard — broken down per app feature if you tag your traffic.

- Base URL: `https://killtoken.io` (all API routes under `/v1`)
- The provider (OpenAI, Anthropic, …) still bills you directly on your own account. KillToken never resells model usage.
- Chat completions only. No streaming (`stream: true` returns `422 streaming_not_supported`). Audio/Whisper, embeddings, images, and the OpenAI Responses API are NOT proxied — keep direct provider clients for those.

## Authentication

Every API request needs a KillToken tenant API key sent as a Bearer token:

```
Authorization: Bearer kt_...
```

- Create the key in the dashboard (https://killtoken.io/dashboard → API keys). It is shown once at creation.
- The `kt_` key is NOT a provider key. Never put an OpenAI/Anthropic key in the Authorization header.
- Server-side only: call KillToken from a backend, worker, or cron job — never from browser or mobile code, because the request carries your tenant key.
- Recommended environment variables in your app: `KILLTOKEN_API_KEY` (the `kt_` key) and `KILLTOKEN_BASE_URL` (`https://killtoken.io`, or the appropriate `/v1/openai` base for the OpenAI-compatible path).

## Strict BYOK (your provider keys)

KillToken makes the actual model call with a provider credential stored encrypted in your tenant — there is no platform fallback key of any kind.

- Store a credential once via the dashboard (Provider Credentials panel) or the API (below). After that, your application code never touches the provider key.
- If a request names a provider with no active stored credential, the call fails fast with `400 provider_credential_required` — before any cache, idempotency, or provider work.
- Supported providers: `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai` (single `apiKey`); `azure_openai` and `openai_compatible` (apiKey + non-secret `config` with endpoint/baseUrl and default model/deployment); `aws_bedrock` (`secrets: { accessKeyId, secretAccessKey, sessionToken? }`); `google_vertex` (`secrets: { clientEmail, privateKey, privateKeyId? }` + `config: { projectId, location, defaultModel }`).
- Secrets are encrypted at rest and never returned, logged, or echoed in errors.

## Integration paths (pick one)

### Path 1 — OpenAI SDK drop-in (easiest if you already use the OpenAI SDK)

Change the client construction; every existing `chat.completions.create` call works unchanged:

```js
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.KILLTOKEN_API_KEY,          // kt_..., NOT sk-...
  baseURL: "https://killtoken.io/v1/openai",
  defaultHeaders: { "x-killtoken-feature": "my-feature-name" } // optional but recommended
});

const completion = await openai.chat.completions.create({
  model: "gpt-4.1-mini",
  messages: [{ role: "user", content: "Hello" }]
});
```

Notes: chat completions only; `stream: true` is rejected; `openai.responses.*` returns `501 not_implemented`. The response is the normal OpenAI response plus a `killtoken.metrics` object.

### Path 2 — Native endpoint (any provider, most control)

```
POST https://killtoken.io/v1/chat
Authorization: Bearer kt_...
Content-Type: application/json

{
  "provider": "openai",
  "model": "gpt-4.1-mini",
  "optimizationMode": "measure_only",
  "messages": [{ "role": "user", "content": "Hello" }],
  "metadata": { "feature": "my-feature-name" },
  "cachePolicy": { "exactCache": "read_write" },
  "idempotencyKey": "req-123"
}
```

Required: `provider`, `model`, `messages` (array of `{ role, content }`). Everything else is optional.

### Path 3 — Anthropic-style wrapper

```
POST https://killtoken.io/v1/anthropic/messages
```

Accepts an Anthropic Messages-shaped body (top-level `system` string + `messages`); routes through `provider: "anthropic"`. Same optional headers as the other paths. No streaming, no OpenAI-style tool payloads.

### Path 4 — First-party SDK (Node backends)

```js
import { KillTokenClient } from "@killtoken/sdk";
const client = new KillTokenClient({
  baseUrl: process.env.KILLTOKEN_BASE_URL,  // https://killtoken.io
  apiKey: process.env.KILLTOKEN_API_KEY
});
const { response, metrics } = await client.chat({
  provider: "openai",
  model: "gpt-4.1-mini",
  messages: [{ role: "user", content: "Hello" }],
  metadata: { feature: "my-feature-name" },
  cachePolicy: { exactCache: "read_write" },
  idempotencyKey: "req-123"
});
```

## Request options (all paths)

### Optimization mode

`optimizationMode` body field or `x-killtoken-optimization-mode` header. One of:

- `measure_only` (default) — passes prompts through untouched; measures tokens/cost. Start here.
- `safe` — conservative prompt optimizations only.
- `balanced` — moderate optimizations.
- `aggressive` — maximum reduction.

### Feature tags (powers the per-feature savings breakdown)

Tag each request with the part of your product it serves. Two mechanisms:

1. `x-killtoken-feature` request header — set it once as a default header on your client and every call from that client is tagged. Best when one client instance serves one feature (e.g. one per service/module).
2. `metadata.feature` in the request body — per-request control. If both are present, the body value wins.

Rules: values are trimmed and capped at 80 characters; untagged traffic groups under "(untagged)" on the dashboard. Use stable, kebab-case names (`content-moderation`, `weekly-summary`).

Tags are analytics labels, not isolation: every request still belongs to one tenant, one provider credential, one quota, one bill. Tags answer "which part of my app costs what"; tenants answer "whose key and whose bill". If you serve multiple end-customers who each bring their own provider key, give each one its own KillToken tenant — do not encode customers as tags.

### Caching

Off by default. Enable per request on repeat-safe calls (identical prompts where reusing the response is correct):

```json
"cachePolicy": { "exactCache": "read_write", "ttlSeconds": 3600 }
```

`exactCache` modes: `bypass` (default), `read_only`, `write_only`, `read_write`. `ttlSeconds` optional, positive number. Cache hits cost $0 at the provider and are reported as verified savings. The dashboard's "Cache opportunity" panel shows how much money repeat traffic would save before you enable it.

### Idempotency

`idempotencyKey` body field or `x-killtoken-idempotency-key` header. Replays of the same key return the original response without re-calling the provider and without counting against your plan.

## Response shape

Success responses include the provider response plus a `killtoken.metrics` object (request id, token counts, estimated/verified savings, cache status, latency). Two response headers accompany every gateway call: `x-killtoken-request-id` and `x-killtoken-metrics` (JSON).

## Error contract

| Status | `error` code | Meaning / fix |
| --- | --- | --- |
| 400 | `provider_credential_required` | No stored credential for the requested provider. Add one in the dashboard or via the credentials API. |
| 400 | `invalid_model` / `invalid_messages` / `invalid_cache_policy` / `invalid_optimization_mode` | Malformed request field; message explains which. |
| 401 | `tenant_not_resolved` | Missing/invalid `Authorization: Bearer kt_...`. |
| 402 | `plan_limit_exceeded` | Monthly gateway-request quota exhausted. Headers: `x-killtoken-plan`, `x-killtoken-plan-limit`, `x-killtoken-plan-used`, `x-killtoken-plan-reset`. Upgrade at https://killtoken.io/pricing. Idempotent replays still work at the limit. |
| 403 | `tenant_suspended` | Workspace suspended; contact support@killtoken.io. |
| 422 | `streaming_not_supported` | Remove `stream: true`. |
| 429 | `rate_limited` | Abuse guardrail (default 120 chat requests/min per tenant). Back off and retry; this is separate from the monthly plan quota. |
| 501 | `not_implemented` | Endpoint scaffolded for a future phase (e.g. OpenAI Responses API). |

Error bodies are JSON: `{ "error": "<code>", "message": "<human explanation>" }`.

## Other API endpoints (same Bearer auth)

- Provider credentials: `POST /v1/provider-credentials`, `GET /v1/provider-credentials`, `PATCH /v1/provider-credentials/:id`, `DELETE /v1/provider-credentials/:id`, `POST /v1/provider-credentials/:id/test` (live round-trip test).
- Request history: `GET /v1/requests`, `GET /v1/requests/:requestId`.
- Analytics: `GET /v1/analytics/summary` (totals, savings, per-feature breakdown, cache opportunity).
- Exports: `GET /v1/exports/requests.csv`, `GET /v1/exports/analytics.json`, `GET /v1/reports/roi`.
- Prompt templates: `POST/GET/PATCH /v1/templates`.
- Health (no auth): `GET /health`, `GET /ready`.

## Recommended integration checklist for an AI agent

1. Confirm a KillToken tenant API key exists (ask the operator; it starts with `kt_`). Put it in `KILLTOKEN_API_KEY`.
2. Confirm a provider credential is stored in KillToken for each provider the app uses (the operator does this in the dashboard, or via `POST /v1/provider-credentials`). The app's own provider key stays where it is for any non-chat usage (Whisper, embeddings, streaming).
3. Route chat-completion traffic through KillToken via the path that matches the codebase (OpenAI SDK drop-in is usually the smallest diff). Make the routing conditional on `KILLTOKEN_API_KEY` being set, falling back to the direct provider client, so the integration is reversible by environment alone.
4. Keep streaming, audio, embeddings, and Responses-API calls on direct provider clients.
5. Tag every client with `x-killtoken-feature` using a stable kebab-case name per app feature.
6. Start in `measure_only`. After traffic accumulates, check the dashboard's Cache opportunity panel and enable `cachePolicy: { "exactCache": "read_write" }` on repeat-safe call sites.
7. Verify: make one request, confirm a 200 with a `killtoken.metrics` object and an `x-killtoken-request-id` header, and confirm the request appears on the dashboard (https://killtoken.io/dashboard) under the right feature name.

## Support

support@killtoken.io · https://killtoken.io/contact
KillToken™ is operated as a DBA of LenderFuel.Media, LLC · Tucson, Arizona, USA