PlatformLLM API

LLM & Image API

Chat, reasoning, and image generation through a single OpenAI-compatible API. Claude, GPT-5, Gemini, Grok, DeepSeek, Nano Banana, and more — one key, one balance.

Overview

GPUniq provides a single API surface for 90+ language models and image generators across Anthropic, OpenAI, Google, xAI, and DeepSeek. One API key, one balance, one usage dashboard — chat completions, reasoning, and text-to-image all in the same place.

You can use GPUniq LLMs two ways:

  1. Native GPUniq API (/v1/llm/*) — wrapped responses, persistent chat sessions, terminal-command generator, SDK helpers.
  2. OpenAI-compatible API (/v1/openai/*) — drop-in replacement for api.openai.com/v1. Works with Claude Code, Cursor, Continue.dev, Aider, LiteLLM, and the official OpenAI Python/JS SDKs without code changes.

Available Models

Chat & Reasoning

ProviderModelsBest for
AnthropicClaude Opus 4.7 / 4.6 / 4.5, Sonnet 4.6 / 4.5, Haiku 4.5General reasoning, coding, agents
OpenAIGPT-5.5, GPT-5.2 Pro / Codex, GPT-5, o3, o3-mini, GPT-4o, GPT-4.1Reasoning, structured output, vision
GoogleGemini 3 Pro / Flash, Gemini 2.5Long context, fast batch work
xAIGrok 4, Grok 4.1 Thinking, Grok 4 FastReal-time knowledge, low latency
DeepSeekV4 Pro / V4 Flash, V3.2 / V3.2 Thinking, V3.1 / Terminus, R1 / R1 (May 2025), Reasoner, Chat, OCRCost-efficient reasoning, OCR, conversational
MiniMaxM2.7 / M2.5 / M2.1 / M2Long-context Chinese & multilingual, balanced cost

DeepSeek pricing (USD per 1M tokens, already discounted −20%)

SlugInputOutputCategoryNotes
deepseek-v4-pro$2.40$4.00flagshipV4 family flagship
deepseek-v4-flash$0.18$0.30fastV4 family fast tier
deepseek-v3.2$2.16$3.24flagshipLatest flagship general model
deepseek-v3.2-thinking$0.30$0.45reasoningReasoning-tuned V3.2 (very cheap)
deepseek-v3.1$4.32$12.96flagshipPrevious flagship
deepseek-v3.1-terminus$0.15$0.30balancedUpdated V3.1, very cheap
deepseek-v3$2.16$8.64balancedOriginal V3 (Dec 2024)
deepseek-r1$4.32$17.28reasoningFirst reasoning model
deepseek-r1-0528$0.59$1.81reasoningUpdated R1, May 2025
deepseek-reasoner$0.30$0.45reasoningReasoning-focused alias
deepseek-chat$0.29$1.17balancedConversational alias
deepseek-ocr$0.23$0.23fastOCR model

MiniMax pricing (USD per 1M tokens)

SlugInputOutputCategoryPublic discount
MiniMax-M2.7$0.33$1.32balanced−10% off API
MiniMax-M2.5$0.33$1.32balanced−10% off API
MiniMax-M2.1$0.297$1.188balanced−20% off API
MiniMax-M2$2.079$8.316flagship−20% off API

Image Generation

Image models are billed per returned image, not per token.

ModelSlugPrice / imageNotes
Nano Banananano-banana$0.0312Fast text-to-image & image-to-image, 1K
Nano Banana 2nano-banana-2$0.0500Quality-value generation up to 2K
Nano Banana Pronano-banana-pro$0.1072Higher quality, ~1K resolution
Nano Banana Pro 4Knano-banana-pro-4k$0.1924K resolution
Grok 4 Imagegrok-4-image$0.0352xAI image generator
GPT Image 2gpt-image-2$0.0464OpenAI image, default 1K
GPT Image 1.5gpt-image-1-5$0.020OpenAI image (cheaper tier)
GPT-4o Imagegpt-4o-image$0.040OpenAI 4o image
FLUX.2 Proflux-2-pro$0.060Black Forest Labs FLUX.2 Pro 1K
FLUX.2 Flexflux-2-flex$0.180Premium quality 1K
Flux Kontext Proflux-kontext-pro$0.080Text-to-image & edit
Flux Kontext Maxflux-kontext-max$0.160Premium edit / generation
Seedream 4seedream-4$0.050ByteDance Seedream 4
Seedream 4.5seedream-4-5$0.040ByteDance Seedream 4.5
Seedream 5.0 Liteseedream-5-0-lite$0.035ByteDance Seedream 5.0 Lite
Z-Imagez-image$0.020Alibaba Z-Image

The synchronous POST /v1/llm/images/generations (and its OpenAI-compat twin POST /v1/openai/images/generations) holds the connection open for the full 5-minute upstream budget, which is plenty for every model in the catalog. If you sit behind a CDN with a strict idle-read limit (Cloudflare's free tier caps responses at ~100 s) and call from a browser, prefer the job-based API below — it returns a job_id in under a second and you poll GET /v1/llm/images/jobs/{job_id} every 2-3 s until completion.

POST /v1/llm/images/jobs returns a job_id in under a second, and you poll GET /v1/llm/images/jobs/{job_id} every 2-3 seconds until the status is terminal. You are charged only when the completion poll returns — a timed-out or failed job costs nothing. Server-side, polls that arrive within 2 seconds of each other are coalesced via Redis, so hammering the endpoint will not be billed as repeated upstream calls.

import time, requests

BASE = "https://api.gpuniq.com/v1/llm"
HEADERS = {"X-API-Key": "gpuniq_your_key"}

# 1. Kickoff
start = requests.post(
    f"{BASE}/images/jobs",
    headers=HEADERS,
    json={"model": "nano-banana-pro", "prompt": "a cozy cabin at sunrise", "n": 1},
).json()
job_id = start["data"]["job_id"]

# 2. Poll — 5-minute budget covers the slowest Pro / 4K runs
deadline = time.time() + 300
while time.time() < deadline:
    time.sleep(2.5)
    r = requests.get(f"{BASE}/images/jobs/{job_id}", headers=HEADERS).json()
    d = r["data"]
    if d["status"] == "completed":
        image_b64 = d["image"]["b64_json"]
        print(f"Cost: ${d['cost_usd']}, balance: ${d['balance_usd']}")
        break
    if d["status"] == "failed":
        print("failed:", d.get("error"))
        break

Only Nano Banana slugs are accepted on this surface. n must be 1 — issue separate jobs in parallel for batches.

Generating an image inside a chat session

When you want the image to appear as a turn in an existing chat (so the prompt and result both land in the chat history), POST to /v1/llm/chats/{chat_id}/messages with an image model. The response returns immediately with type: "image_pending" plus a job_id and the dialogue_id of a placeholder row that already lives in the chat history. Poll GET /v1/llm/chats/{chat_id}/image-jobs/{job_id} until the status is completed (placeholder is rewritten with the image and the balance is debited) or failed (placeholder is marked, nothing charged). The polling endpoint 404s once the job is terminal — the final dialogue is the source of truth from then on.

import time, requests

BASE = "https://api.gpuniq.com/v1/llm"
HEADERS = {"X-API-Key": "gpuniq_your_key"}
chat_id = 42  # existing chat created via POST /v1/llm/chats

# 1. Kickoff (POST /chats/{id}/messages with an image model)
start = requests.post(
    f"{BASE}/chats/{chat_id}/messages",
    headers=HEADERS,
    json={"model": "nano-banana-pro", "message": "a cozy cabin at sunrise"},
).json()
job_id = start["data"]["job_id"]
dialogue_id = start["data"]["dialogue_id"]

# 2. Poll — same 5-minute budget as the standalone /images/jobs flow
deadline = time.time() + 300
while time.time() < deadline:
    time.sleep(2.5)
    r = requests.get(
        f"{BASE}/chats/{chat_id}/image-jobs/{job_id}", headers=HEADERS,
    ).json()
    d = r["data"]
    if d["status"] == "completed":
        image_b64 = d["image"]["b64_json"]
        print(f"Cost: ${d['cost_usd']}, balance: ${d['balance_usd']}")
        break
    if d["status"] == "failed":
        print("failed:", d.get("error"))
        break

Use this surface when the image should be part of a multi-turn conversation. Use the standalone /images/jobs surface when you don't need persistence — it has the same job semantics without creating a chat row.

Video Generation

Video models are billed per delivered video, not per token. Every generation is asynchronous — POST to /v1/llm/videos/jobs to kick off a job, then poll GET /v1/llm/videos/jobs/{job_id} until the status is terminal. You are charged only when the completion poll returns a video.url — a failed or timed-out job costs nothing.

FamilySlugHeadline / videoNotes
OpenAI Sora 2sora-2-video$0.060Sora 2, default 10s
OpenAI Sora 2 Prosora-2-pro-video$1.000Premium quality, 10s
Sora 2 Officialsora-2-official$0.4808s, official API
Sora 2 Pro Officialsora-2-pro-official$0.5608s 1080p, official API
Google Veo 3.1 Liteveo-3-1-lite$0.100720p / 1080p
Google Veo 3.1 Fastveo-3-1-fast$0.200Balanced 720p / 1080p
Google Veo 3.1 Qualityveo-3-1-quality$1.200Flagship Google video
Kling 2.1 Prokling-2-1$0.405Standard / Pro / Master tiers, 5s or 10s, i2v
Kling 2.5 Turbo Prokling-2-5-turbo-pro$0.3155s or 10s, t2v / i2v
Kling 2.6kling-2-6$0.315Optional audio, 5s or 10s, t2v / i2v
Kling 3.0kling-3-0$0.504720p / 1080p / 4K, audio, multi-shot to 15s
Kling O3 (Video)kling-o3-video$0.150Premium audio video, 5s
Kling 2.6 Motion Controlkling-2-6-motion-control$0.504720p / 1080p video-to-video
Kling 3.0 Motion Controlkling-3-0-motion-control$0.756720p / 1080p video-to-video
Kling AI Avatar Prokling-avatar-pro$1.0351080p lip-sync, up to 15s
Kling AI Avatar Standardkling-avatar-standard$0.506720p lip-sync, up to 15s
Hailuo 02hailuo-02$0.200768p, 6s default
Hailuo 2.3hailuo-2-3$0.350768p 6s
Seedance 1.0 Proseedance-1-0-pro$0.210ByteDance 720p 5s
Seedance 1.5 Proseedance-1-5-pro$0.160ByteDance 720p 5s
Seedance 2seedance-2$0.200ByteDance 720p 5s
Alibaba Wan 2.2 Fastwan-2-2-fast$0.120720p fast tier
Alibaba Wan 2.5wan-2-5$0.600720p 5s
Alibaba Wan 2.6wan-2-6$0.800720p 5s flagship
Wan Animatewan-animate$0.150720p animation
Happy Horsehappy-horse$0.160720p
Grok Imagine Videogrok-imagine-video$0.300xAI video, 6s
Runway Gen-4.5runway-gen-4-5$0.750Runway flagship 5s

Kling SKUs are billed at −10% off the official public price. The headline above is the cheapest default configuration (1080p / no audio / 5s / Pro tier). Audio, longer duration, 4K, and Master tier scale the price linearly off the underlying reference rate × 0.9 — the exact cost is returned in the cost_usd field of the completion response. A 10% margin floor against the upstream supplier guarantees we never bill below source cost, so on a provider fallback the price may rise by 1-3%.

Job-based video generation

Same kickoff-then-poll shape as the image-jobs API. The catalog covers text-to-video (t2v), image-to-video (i2v, pass image_url), and video-to-video / motion-control (v2v, pass video_url + image_url for the conditioning frame). Avatar SKUs accept an audio reference URL in the prompt body — see the model-specific docs for the schema.

import time, requests

BASE = "https://api.gpuniq.com/v1/llm"
HEADERS = {"X-API-Key": "gpuniq_your_key"}

# 1. Kickoff
start = requests.post(
    f"{BASE}/videos/jobs",
    headers=HEADERS,
    json={
        "model": "kling-2-6",
        "prompt": "A small black cat slowly turns toward the camera at golden hour",
        "duration": 5,
        "audio": False,          # opt-in, doubles price on Kling 2.6 / 3.0
        "resolution": "1080p",   # 720p | 1080p | 4k (where supported)
    },
).json()
job_id = start["data"]["job_id"]
print(f"job: {job_id}, est cost: ${start['data']['estimated_cost_usd']}")

# 2. Poll — video models deliver in 30-90s; budget 5 minutes for the slowest variants
deadline = time.time() + 300
while time.time() < deadline:
    time.sleep(3)
    r = requests.get(f"{BASE}/videos/jobs/{job_id}", headers=HEADERS).json()
    d = r["data"]
    if d["status"] == "completed":
        print(f"video: {d['video']['url']}")
        print(f"cost: ${d['cost_usd']}, balance: ${d['balance_usd']}")
        break
    if d["status"] == "failed":
        print("failed:", d.get("error"))
        break
Request body
FieldTypeRequiredNotes
modelstringyesSlug from the table above.
promptstringyesUp to 4000 characters.
durationintnoSeconds; valid range depends on model (5 / 10 for most Kling, 1-15 for Avatar).
aspect_ratiostringno16:9 (default), 9:16, 1:1 where supported.
image_urlstringnohttps URL or data URI — enables image-to-video.
video_urlstringnohttps URL — required for motion-control v2v variants.
resolutionstringno720p (default for some SKUs), 1080p (default for Kling), 4k (Kling 3.0 only).
audioboolnoDefault false. Kling 2.6 / 3.0 double the price when true.
modestringnostandard / pro (default) / master for Kling 2.1; turbo for 2.5 Turbo Pro.
Response

The kickoff returns immediately with the GPUniq job id, the resolved parameter snapshot, and the cost estimate. Internal routing is opaque — the same job_id is valid across fallbacks, and the user-facing price stays stable.

// POST /v1/llm/videos/jobs
{
  "job_id": "vid_e93e98c7ca5e4982876b",
  "status": "pending",
  "model": "kling-2-6",
  "estimated_cost_usd": 0.315,
  "config": { "resolution": "1080p", "audio": false, "duration": 5, "task": "t2v", "mode": null }
}

// GET /v1/llm/videos/jobs/{job_id} — completed
{
  "job_id": "vid_e93e98c7ca5e4982876b",
  "status": "completed",
  "model": "kling-2-6",
  "video": { "url": "https://cdn.example.com/.../output.mp4" },
  "cost_usd": 0.315,
  "balance_usd": 9.17825791,
  "config": { "resolution": "1080p", "audio": false, "duration": 5, "task": "t2v", "mode": null }
}

The polling endpoint transparently falls back across internal routes if the first attempt fails — your job_id and the user-facing price stay stable across fallbacks. Internal route identifiers are deliberately omitted from the public response; they live only in admin/operator logs.

Chat models are sold at 20% below vendor list price.

Fetch the live catalog at any time:

models = client.llm.models()
for model in models["models"]:
    print(model)

The default model is claude-haiku-4-5 — fast, cheap, strong at code.

Long generations & streaming

The edge proxy closes inbound connections after ~100 seconds of streaming silence. A non-streaming request asking for max_tokens > 4096 is rejected up-front with HTTP 400 streaming_required — buffered responses past that length routinely lose to the cap. For long replies, set "stream": true or use the job-based long-poll API.

Your requestWhat to do
≤ 4096 output tokens, fast modelPlain POST /chat/completions works.
> 4096 output tokens OR slow / reasoning modelSet "stream": true.
Client can't speak SSEUse POST /v1/llm/chat/jobs (long-poll).

Reasoning models (Gemini 3 Pro, DeepSeek R1, o3, Claude Opus thinking) burn tokens on hidden chain-of-thought before the visible reply, so they need extra max_tokens headroom — see the Long generations guide for the full streaming / job-based / reasoning-token recipe.

Errors

Every failure returns a stable OpenAI error envelope with a structured code you can branch on — streaming_required, insufficient_balance, model_not_found, rate_limit_per_key, etc. See the Error reference for the complete catalog (29 codes), recovery strategies, and the native vs. OpenAI-compat envelope shapes.

{
  "error": {
    "message": "…human-readable description…",
    "type": "invalid_request_error",
    "code": "streaming_required",
    "doc_url": "https://docs.gpuniq.com/llm/long-generations",
    "meta": { "max_tokens": 8000, "limit": 4096 }
  },
  "status_code": 400,
  "request_id": "…"
}

OpenAI-Compatible Endpoint

Point any OpenAI-compatible tool at GPUniq by setting two environment variables:

OPENAI_API_KEY=gpuniq_your_key
OPENAI_BASE_URL=https://api.gpuniq.com/v1/openai

Every field of the OpenAI Chat Completions protocol is forwarded unchanged: tools, tool_choice, response_format, logprobs, seed, stream, stream_options, etc.

Official OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="gpuniq_your_key",
    base_url="https://api.gpuniq.com/v1/openai",
)

resp = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "Write a binary search in Rust."}],
)
print(resp.choices[0].message.content)

Streaming

Set stream: true — GPUniq returns a text/event-stream with byte-identical OpenAI SSE framing:

stream = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Explain MoE in one paragraph."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Image Generation

Both API surfaces expose a /images/generations endpoint that matches OpenAI's images.generate protocol. Pass any image slug from the catalog above (e.g. nano-banana-pro, gpt-image-2, flux-2-pro, seedream-4). Billing is flat per returned image — no token accounting.

Image requests route through a multi-tier reliability chain behind the scenes: a per-model priority gateway, two cost-optimised intermediaries, then a generic OpenAI-compatible fallback for safety. The chain is selected automatically per slug, so SDK callers never pick a backend themselves. If the primary fails or returns no image, the next tier is tried within the same HTTP request — you still see one synchronous POST /images/generations and pay for delivered images only.

Heavy generations (Pro / 4K, multi-image batches, high-quality preset) can run up to 5 minutes end-to-end; the connection is held open for that whole budget so SDKs never need to re-poll. For interactive UIs that cannot keep an HTTP connection open that long, prefer the job-based API.

from openai import OpenAI

client = OpenAI(
    api_key="gpuniq_your_key",
    base_url="https://api.gpuniq.com/v1/openai",
)

resp = client.images.generate(
    model="nano-banana-pro",
    prompt="A cozy mountain cabin at sunrise, cinematic lighting",
    n=2,
    size="1024x1024",
    response_format="b64_json",
)

for i, img in enumerate(resp.data):
    with open(f"out_{i}.png", "wb") as f:
        import base64
        f.write(base64.b64decode(img.b64_json))

Parameters

body
model

Any image slug from the catalog: the Nano Banana family, grok-4-image, gpt-image-2, gpt-image-1-5, gpt-4o-image, flux-2-pro, flux-2-flex, flux-kontext-pro, flux-kontext-max, seedream-4, seedream-4-5, seedream-5-0-lite, or z-image.

body
prompt

Text description of the image you want. Up to 4000 characters.

body
n

Number of images to generate. 1–4.

body
size

Output resolution hint forwarded to the upstream, e.g. 1024x1024, 2048x2048, 4096x4096. Nano Banana Pro 4K defaults to 4096.

body
quality

Optional upstream quality hint (e.g. standard, hd). Models that don't recognise the value silently fall back to their default.

body
response_format

b64_json returns inline PNG base64 (browser-renderable). url returns a short-lived upstream URL.

body
output_format

Re-encode every delivered image into this format on the server before returning, so the client doesn't need a Pillow / Sharp pipeline. One of:

  • png (default if omitted) — pass-through, lossless.
  • jpeg (alias jpg) — ~10× smaller payload, alpha is flattened onto white because JPEG has no transparency.
  • webp — ~5× smaller at comparable quality, alpha preserved.

Quality for the lossy formats is fixed at 92 — visually indistinguishable from the source PNG. Conversion failures degrade to "return source PNG unchanged" so you always get an image, never a 502 after the upstream has done the expensive work. The MIME type of the converted bytes is echoed back in data[i].mime_type.

body
input_images

Optional reference photos for image-to-image / editing. Each entry is a data: URL, https:// URL, or bare base64 string. Supported by Nano Banana family, GPT Image, FLUX Kontext, Seedream and Nano Banana Pro edit slugs.

If the upstream returns fewer images than requested (content-policy rejects, partial failures, etc.), you are billed only for what was delivered.

Claude Code

Claude Code can route through GPUniq via a LiteLLM proxy. Run LiteLLM locally as an Anthropic-compatible front-end for the GPUniq OpenAI endpoint:

# ~/litellm.yaml
model_list:
  - model_name: claude-opus-4-7
    litellm_params:
      model: openai/claude-opus-4-7
      api_base: https://api.gpuniq.com/v1/openai
      api_key: os.environ/GPUNIQ_API_KEY
export GPUNIQ_API_KEY=gpuniq_your_key
litellm --config ~/litellm.yaml --port 4000

# In another shell — point Claude Code at the proxy
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=sk-litellm-anything
claude

All tokens are billed against your GPUniq balance — no separate Anthropic account required.

Cursor

Settings → Models → Override OpenAI Base URL:

Base URL:  https://api.gpuniq.com/v1/openai
API Key:   gpuniq_your_key
Model:     claude-opus-4-7   # or any slug from /v1/openai/models

Continue.dev / Aider / LiteLLM

Any tool that accepts an OPENAI_BASE_URL works the same way:

export OPENAI_API_KEY=gpuniq_your_key
export OPENAI_BASE_URL=https://api.gpuniq.com/v1/openai

aider --model claude-sonnet-4-6

The OpenAI-compat endpoint returns raw OpenAI response objects (not wrapped in GPUniq's ResponseSchema). Errors use OpenAI's {"error": {"message", "type", "code"}} envelope so SDK retry logic works unchanged.

Native GPUniq SDK

For the fullest feature set — persistent chat sessions, USD balance conversion, usage history — use the native API.

Simple Chat

response = client.llm.chat("claude-haiku-4-5", "Explain how transformers work")
print(response)

Chat Completion (Full)

data = client.llm.chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "What is gradient descent?"},
    ],
    model="claude-sonnet-4-6",
    temperature=0.7,
    max_tokens=1000,
    top_p=0.9,
)

print(data["content"])
print(f"Tokens used: {data['tokens_used']}  cost: ${data['cost_usd']:.6f}")

Parameters

body
messages

List of message objects with role ("system", "user", "assistant") and content.

body
model

Model slug (e.g., claude-opus-4-7, gpt-5.2, gemini-3-pro). Defaults to claude-haiku-4-5.

body
max_tokens

Maximum tokens in the response.

body
temperature

Sampling temperature (0.0-2.0). Higher = more creative.

body
top_p

Top-p nucleus sampling parameter.

Account Balance

Chat and image requests are billed directly against your GPUniq account balance in USD — there is no separate "token pool" anymore. Each call deducts the model's blended retail rate × the tokens it actually consumed (or per-image flat rate for image models). Prepaid token packages and ruble-to-token conversions are no longer required and the corresponding endpoints have been retired.

balance = client.llm.balance()
print(f"Available: ${balance['balance_usd']:.4f} USD")

Top up the balance from the web dashboard → Billing (Stripe / YooKassa / crypto). The balance is shared with every other GPUniq surface — GPU rentals, volume storage, image generations — so a single deposit covers the whole platform.

Usage History

Per-request detail with prompt / completion / cached / reasoning tokens and the USD cost charged at retail. Backed by the /v1/llm/usage/history endpoint; pair it with /v1/llm/usage/breakdown for daily / weekly aggregates.

history = client.llm.usage_history(limit=50, offset=0)
for log in history["logs"]:
    print(f"{log['model']}: {log['total_tokens']} tokens — ${log['cost_usd']:.6f}")

Chat Sessions

Persistent conversations stored server-side — the model sees the full history on every call:

# Create a session
session = client.llm.create_chat_session(
    model="claude-sonnet-4-6",
    title="Research Assistant",
)

# Send messages within the session
reply = client.llm.send_message(
    chat_id=session["id"],
    message="What are the key papers on attention mechanisms?",
    temperature=0.5,
)

# List all sessions
sessions = client.llm.list_chat_sessions(limit=50)

# Get a session with full message history
full = client.llm.get_chat_session(chat_id=session["id"])

# Update title
client.llm.update_chat_session(chat_id=session["id"], title="New Title")

# Delete
client.llm.delete_chat_session(chat_id=session["id"])

Generate Terminal Commands

Convert natural language to a ranked list of shell commands with danger annotations:

cmds = client.llm.generate_commands(
    prompt="find all Python files larger than 1MB and sort by size",
    max_commands=5,
)
for c in cmds["commands"]:
    print(f"[{c['danger']}] {c['command']}  # {c['description']}")

API Key Management

API keys are created from the web dashboard (LLM API Keys) and sent as Authorization: Bearer gpuniq_... on OpenAI-compat routes, or X-API-Key: gpuniq_... on native routes.

Rate limit: 120 req/min per key, sliding window.