ChangelogRelease History

Changelog

Track the latest updates and improvements to GPUniq.

Follow us on Telegram for real-time announcements.

2026-05-14Kling video — multi-provider auto-fallback, parametric −10% off official pricing, Avatar SKUs
featurevideoklingpricing

Production Kling video — cheapest-of-two routing with transparent fallback

Kling video generation now ranks multiple upstreams per request by their parametric source price for the exact (slug, resolution, audio, duration, mode, task) combination, and falls back through the chain transparently if the active upstream errors out mid-flight. The GPUniq job_id and the user-facing price stay stable across fallbacks — clients just keep polling and either get a completed with the video URL or a final failed if every upstream rejects.

The response now exposes a config snapshot of the resolved generation parameters:

{
  "job_id": "vid_e93e98c7ca5e4982876b",
  "status": "completed",
  "model": "kling-2-6",
  "video": { "url": "https://cdn.example.com/.../output.mp4" },
  "cost_usd": 0.315,
  "balance_usd": 9.17825791,
  "config": { "resolution": "1080p", "audio": false, "duration": 5, "task": "t2v", "mode": null }
}

Kling retail is now anchored at the official reference price × 0.90

Every Kling SKU with a published official reference listing is billed at exactly −10% off the official public price for the requested configuration — the same headline regardless of which upstream actually served the job. A 10% margin floor against the chosen supplier guarantees we never bill below source cost, so on a provider fallback the user may see a 1-3% price bump but never a freebie. Catalog SKUs surface the badge as discount_percent_label: 10.0 for the frontend to render the "−10% off official" tag without hard-coding model lists.

Parametric examples:

SKUConfigRetail
kling-2-65s no-audio$0.315
kling-2-65s with audio$0.630
kling-3-01080p+audio 5s$0.756
kling-3-04K 5s$1.890
kling-2-1Pro 5s i2v$0.405
kling-2-6-motion-control1080p 5s v2v$0.504
kling-avatar-pro10s 1080p$1.035

Two new Kling-Avatar SKUs (lip-sync)

kling-avatar-pro (1080p) and kling-avatar-standard (720p) join the catalog for lip-sync avatars up to 15 seconds. Per-second billing; lip-sync is currently served by a single upstream only.

Request shape gained resolution, audio, mode, video_url

POST /v1/llm/videos/jobs accepts four new optional fields used by the Kling parametric pricing path:

  • resolution: 720p / 1080p (default) / 4k
  • audio: true to enable audio on Kling 2.6 / 3.0 (default false)
  • mode: standard / pro / master for Kling 2.1; turbo for 2.5 Turbo Pro
  • video_url: reference video for motion-control v2v variants

Non-Kling SKUs (Sora, Veo, Wan, Hailuo, Seedance) ignore the new fields — their flat-price catalog path is unchanged, and existing callers see no regression.

Full reference: LLM API → Video Generation.

2026-05-11Hardened error envelope, max_tokens > 4096 requires streaming, full provider failover refactor
featurellmerrorsbreaking

Stable error catalog with 29 codes

Every /v1/openai/* and /v1/llm/* failure now returns a structured error.code you can branch on — streaming_required, insufficient_balance, model_not_found, rate_limit_per_key, upstream_timeout, and 24 more. Full reference, recovery strategies, and the native vs. OpenAI-compat envelope shapes are documented at LLM API → Error reference.

The OpenAI envelope is now byte-identical to the spec — earlier deployments emitted a double-wrapped {"error":{"error":{…}}} body that broke the OpenAI SDK's typed exception parser. That regression is fixed: the wire format is {"error":{message,type,code,doc_url?,meta?}, status_code, request_id}, which lets OpenAI clients raise BadRequestError, RateLimitError, AuthenticationError etc. without special-casing.

max_tokens > 4096 without stream: true is now rejected up-front (BREAKING)

A non-streaming request asking for more than 4096 output tokens now returns HTTP 400 with error.code = "streaming_required". Earlier deployments silently upgraded the upstream call to streaming and reassembled the SSE chunks into a non-stream response — that worked inside SDKs but ate connections on every Cloudflare-fronted client (browsers, mobile, behind corporate proxies). The new behaviour fails fast with a clear hint:

{
  "error": {
    "message": "Requested max_tokens=8000 exceeds the non-streaming limit of 4096. Long responses must use streaming.",
    "type": "invalid_request_error",
    "code": "streaming_required",
    "meta": { "max_tokens": 8000, "limit": 4096, "hint": "stream=true" },
    "doc_url": "https://docs.gpuniq.com/llm/long-generations"
  }
}

Migration: add "stream": true to the request body. If your client can't speak SSE, use the job-based long-poll API. The threshold is configurable per-deployment; the 4096 default is the clean intersection of "fits in the 100s edge-proxy window" and "below every model's stream-chunk delivery rate".

Full provider failover chain on non-stream

The non-streaming chat path now iterates the cost-sorted provider chain top-to-bottom on transient failures (network errors, upstream 5xx, model_not_found / insufficient_user_quota patterns, vendor maintenance envelopes). Previously, only the cheapest provider was tried before falling straight to the safety-net upstream — the middle entries of the chain were skipped on failover, costing margin and reliability.

The streaming path now uses the same chain iterator. Both paths emit exactly ONE operator alert when the chain is fully exhausted (was sometimes two per request before).

Claude now streams via an Anthropic-native endpoint

For Claude haiku/sonnet/opus 4-5/4-6, requests now serve through an Anthropic-native endpoint with real SSE event streaming (verified end-to-end on opus 4.6 / sonnet 4.6). A redundant upstream was added at the same price point, so if one upstream throttles another picks up without a price change — improving reliability.

Sliding-window rate limiter + per-user gate

core/rate_limiter.py is now a true Redis ZSET sliding window — the previous fixed-bucket implementation let 2× the limit through at the minute boundary. A per-user aggregate cap (default 600 req/min) fires independently of the per-key cap and surfaces as error.code = "rate_limit_per_user" so clients can branch on which gate triggered.

Admin → Streaming Providers tab

A new admin tab under Unit Economics → Streaming Providers shows the live per-model eligibility table: cost / balance / enabled / stream-capable flag / reason for every provider in the chain, plus the actual stream and non-stream fallback chains the router would build right now. Useful when a customer reports an "expected provider X, got provider Y" mismatch.

2026-05-07Image format conversion, 5-min timeouts, expanded image catalog
featurellmimages

Server-side image format conversion

POST /v1/llm/images/generations and POST /v1/openai/images/generations now accept an optional output_format field. The server re-encodes delivered images into the chosen format before they go over the wire, so clients no longer need a Pillow / Sharp pipeline of their own.

  • png — pass-through (default if omitted), lossless.
  • jpeg (alias jpg) — ~10× smaller payload; alpha is flattened onto white because JPEG has no transparency.
  • webp — ~5× smaller at comparable quality, alpha preserved.

The chosen format is echoed back in usage.output_format and each delivered image entry carries its mime_type. Conversion failures degrade to "return source PNG unchanged" so a corrupt encode never loses an image the upstream already produced.

5-minute upstream timeout across the chain

Long-running reasoning models (Gemini 3 Pro thinking, GPT-5.2 Pro, Claude Opus 4.7 thinking, Sora-2 video) that legitimately run for several minutes were occasionally cut off mid-generation by a tight 60-80 s upstream budget and silently re-routed at full official rate. The whole LLM stack now holds the connection open for up to 5 minutes end-to-end (10 minutes worst-case if the primary fully exhausts and the reliability fallback also takes the full window). In practice we never see anything past ~3 minutes, but you'll never lose a slow honest generation again.

A dead gateway still fast-fails on TCP connect (10 s) and re-routes inside the same request — a 5-minute upper bound is the budget for progress, not a forced wait.

Expanded image model catalog

The image-generation endpoint now serves 11 additional models on top of the Nano Banana family and Grok 4 Image:

SlugPrice / imageNotes
gpt-image-2$0.0464OpenAI image, 1K default
gpt-image-1-5$0.020OpenAI cheaper tier
gpt-4o-image$0.040OpenAI 4o image
flux-2-pro$0.060Black Forest Labs FLUX.2 Pro 1K
flux-2-flex$0.180Premium quality 1K
flux-kontext-pro$0.080Text-to-image & edit
flux-kontext-max$0.160Premium edit / generation
seedream-4$0.050ByteDance Seedream 4
seedream-4-5$0.040ByteDance Seedream 4.5
seedream-5-0-lite$0.035ByteDance Seedream 5.0 Lite
z-image$0.020Alibaba Z-Image

Image-to-image (input_images) is supported on the FLUX Kontext, Seedream, GPT Image and Nano Banana edit slugs.

Gemini 3.1 Flash Lite at the standard discount tier

gemini-3.1-flash-lite joined the Gemini −20% retail tier:

BeforeNow
Retail $/MTok$0.25 / $1.50$0.20 / $1.20

Existing API keys see the lower price automatically — no migration needed.

Gemini 3 Pro fallback works again

gemini-3-pro requests now have a working fallback chain (the upstream alias was previously routing to a missing model and 404-ing). Recovery is transparent — same slug, same retail price, no client-side change.

2026-04-26Chat-image jobs + 5-minute polling budget
featurellmimages

In-chat image generation moves to the job-based pattern

POST /v1/llm/chats/{chat_id}/messages with an image model now returns in under 1 second with type: "image_pending", a job_id, and the dialogue_id of a placeholder dialogue that already lives in the chat history.

  • New endpoint: GET /v1/llm/chats/{chat_id}/image-jobs/{job_id} — poll every 2-3s until completed (placeholder is rewritten with the image and balance is debited) or failed (placeholder is marked, nothing charged).
  • Pro / 4K turns no longer hit a Cloudflare 524 when they cross 100 seconds — the request itself is short, the image arrives over polling.
  • Server-side, polls are coalesced via Redis (2-second TTL), so a client polling every 250ms still costs one upstream call every 2s.
  • Both this and the existing standalone /v1/llm/images/jobs flow share the same kickoff / poll / charge code path — pricing, ownership checks, and balance debit are guaranteed to stay in lockstep.

The recommended polling deadline (client-side) is now 5 minutes for either surface. Examples in the LLM API docs updated.

New chat models

  • gpt-5.5 — OpenAI flagship, 4.00 / 24.00 USD per Mtok (−20% off API)
  • deepseek-v4-pro — flagship, 2.40 / 4.00
  • deepseek-v4-flash — fast tier, 0.18 / 0.30
2026-04-25v3.5.4 — gpuniq SDK
featuresdkclillmimages

LLM chat + image generation in the SDK and CLI

The gpuniq Python package (pip install -U gpuniq) now ships chat and image-generation helpers, plus matching gg subcommands so you never need to leave the terminal.

Python SDK

  • client.llm.generate_image(prompt, model, n, size, quality, input_images, save_to) — synchronous text-to-image / image-to-image.
  • client.llm.start_image_job / get_image_job / generate_image_async — job-based surface for Nano Banana (polls automatically, emits on_progress callbacks, streams past proxy read-timeouts).
  • input_images accepts local paths, data: URLs, https:// URLs, raw bytes, or bare base64 — the SDK inlines local files as data URLs for you.
  • save_to accepts a filename (single image) or a directory (many), decodes b64_json and writes PNG(s). The list of written paths is returned as saved_paths.
  • client.llm.default_model() and model_catalog() expose the platform default and pricing metadata.
  • Dropped stale purchase_tokens / convert_rubles_to_tokens / packages — LLM and image usage are billed directly in USD from user.balance.

CLI

  • gg llm "prompt" — one-shot chat, prints the answer plus tokens / cost / balance.
  • gg llm — interactive REPL with /exit, /clear.
  • gg image "prompt" — generates image(s), saves PNG(s) to disk, prints paths + cost + balance. Auto-uses the async-poll path for Nano Banana slugs.
2026-04-25v3.5.2 / 3.5.3 — gpuniq CLI polish
cliux

CLI polish pass

  • gg help now works (alias for gg --help). Typos like gg oders print a clear Error: unknown command '<x>'. + full help instead of the confusing "gg not initialized" message — the shell-fallback only kicks in when a GPU-side gg init config actually exists.
  • 2D GPU picker — arrow-key navigation through a matrix laid out by generation (Datacenter · 50XX · 40XX · 30XX · 20XX · 1660) with Any GPU / Other… meta rows.
  • Templates — docker image presets (PyTorch, ComfyUI, vLLM, Ubuntu VM, Custom). gg rent picks PyTorch by default; gg replace defaults to the old instance's image so on-disk data keeps working.
  • gg open always goes via ssh.gpuniq.com — the CLI calls a new POST /v1/instances/{id}/ssh-proxy/ensure to allocate a proxy port on demand for older orders whose allocation failed at order time.
  • gg replace fully destroys the old instance (DELETE, not just stop) so the provider machine and SSH proxy port are released before the new one is placed.
  • Billing plans trimmed to week (default), month, and minute. Hourly/daily billing is no longer exposed in the rental UI.
2026-04-24v3.3.0 / 3.4.0 / 3.5.0 / 3.5.1 — gpuniq CLI
featurecli

gg rent and gg replace

  • gg rent — interactive GPU rental from the terminal. Filter wizard (GPU model picker, min count, max price, verified, sort), full-width marketplace table with n / p / f / s controls, template picker, volume picker (pick existing / create new / skip), confirm, place order. On HTTP 410 (offer taken mid-flow) the picker loops back without losing plan / volume choices.
  • gg replace <id> — swap the GPU on a running instance. Same picker, same filters. Preserves the original billing plan and volume; defaults the Docker image to whatever the old instance was running.
  • Adaptive table — columns resize with the terminal. Narrow: GPU / VRAM / RAM / DISK / LOCATION / RELIA / PRICE / VER. Wide: adds CPU, NET ↓/↑, AVAIL, CPU MODEL, HOSTING. GPU and LOCATION flex-share leftover width.
  • gg status recognises both gg login (client) and gg init (GPU) configs and shows a combined status view.
  • --image, --disk, --gpu, --count, --max-price, --sort, --pricing, --volume-id, --no-volume, --verified flags on gg rent / gg replace for non-interactive use.
  • OrderOfferGone typed exception and richer FastAPI error parsing (dict / list / string detail shapes) — you see order failed (400): docker_image is required instead of 400 Client Error: Bad Request for url.
2026-04-21v3.2.0
featurellmimages

Image Generation

  • Text-to-image and image-to-image via the OpenAI-compatible /v1/openai/images/generations and native /v1/llm/images/generations endpoints.
  • Four new models: nano-banana ($0.0312/img), nano-banana-pro ($0.1072/img), nano-banana-pro-4k ($0.192/img), grok-4-image ($0.0352/img).
  • Reference photos: attach up to 4 photos with any prompt — Nano Banana handles the rest for image editing and style transfer.
  • Flat per-image billing: you pay only for delivered images. If the upstream content-policy rejects a frame you're charged only for what arrived.
  • New in-app studio at /chat — pick any image model and the composer switches to a text+photos panel with a gallery view.
2026-04-21v3.1.0
featurellmapi

OpenAI-Compatible LLM Endpoint

  • Drop-in OpenAI API at /v1/openai/chat/completions and /v1/openai/models — every field of the OpenAI Chat Completions protocol is forwarded unchanged (tools, tool_choice, response_format, logprobs, seed, streaming).
  • Works with Claude Code (via LiteLLM proxy), Cursor, Continue.dev, Aider, LiteLLM, and the official OpenAI Python / JS SDKs — no code changes required.
  • Byte-identical SSE streaming — plug directly into OpenAI SDK's streaming parser.
  • Authenticates with your existing GPUniq API key via Authorization: Bearer gpuniq_....

Expanded Model Catalog

  • Anthropic Claude — Opus 4.7 / 4.6 / 4.5, Sonnet 4.6 / 4.5, Haiku 4.5 (now the platform default).
  • OpenAI GPT-5 family — GPT-5.2 Pro / Codex, GPT-5.1 Codex Max, GPT-5, o3 / o3-mini / o4-mini, GPT-4o, GPT-4.1.
  • Google Gemini — Gemini 3 Pro / Flash, Gemini 2.5, Nano Banana.
  • xAI Grok — Grok 4, Grok 4.1 Thinking, Grok 4 Fast.
  • Premium models priced 20% below vendor list price; 30+ free-tier community models remain available.
2026-02-28v3.0.0
featurecli

CLI Tool (gg)

  • Command checkpointing: Every command run via gg is saved with full output, exit code, and timing
  • Replay on restart: gg replay re-runs interrupted commands after instance restarts
  • PTY support: Full terminal colors, progress bars, and interactive prompts
  • Backend sync: Checkpoints are synced to the GPUniq backend automatically
  • 6 commands: gg init, gg run, gg list, gg logs, gg replay, gg status
  • Shorthand syntax: gg python train.py works the same as gg run python train.py
2026-02-23v2.0.0
featuresdkapi

Python SDK v2.0

  • Full Python SDK: pip install GPUniq — all endpoints accessible via GPUniq client
  • Universal API Key Auth: API keys now work across all endpoints (marketplace, instances, volumes, LLM, payments, settings)
  • Rate Limiting: 120 req/min per API key with auto-retry in SDK
  • 8 SDK Modules: marketplace, gpu_cloud, burst, instances, volumes, llm, payments, settings
  • Backward compatible: v1.x gpuniq.init() and client.request() still work

GPU Dex-Cloud

  • Deploy by GPU type: Pick a GPU model, the platform finds the best server automatically
  • One-call deployment: client.gpu_cloud.deploy(gpu_name="RTX_4090")
  • Pricing API: Check pricing before deploying

GPU Burst

  • Multi-GPU burst orders: Scale to 100 GPUs in a single order
  • Fallback GPUs: Define alternative GPU types with price caps
  • Cost estimation: Estimate burst cost before committing
  • Per-order billing: Transaction and run history per order

Persistent Volumes

  • Persistent storage: Create volumes that survive instance restarts
  • File management: Upload, download, list, and delete files via API/SDK
  • Attach to any instance: Mount volumes at creation time across all deployment modes
  • Sync logs: Monitor volume sync operations

LLM API

  • Unified LLM access: OpenAI, Qwen, DeepSeek, Llama, Mistral, and more
  • Chat sessions: Persistent conversations with message history
  • Token management: Balance, packages, usage history
  • Terminal commands: Generate shell commands from natural language

Settings

  • SSH key management: Add, update, toggle, delete SSH keys via API
  • Per-instance SSH keys: Attach/detach keys to individual instances
  • Telegram notifications: Link Telegram for status alerts
2026-01-09v0.5.1
featureimprovement

New Features

  • GPU Analytics Dashboard: Market analytics with price tracking and availability trends
  • Enhanced AI Recommendations: Improved GPU suggestions based on usage history
  • Async Order Creation: Reliable order creation with job status polling
  • SLA Monitoring: Real-time uptime tracking per instance

Improvements

  • Faster marketplace loading with optimized queries
  • Improved SSH connection reliability
  • Enhanced provider verification with reliability scoring
2025-11-15v0.4.0
featurecrypto

New Features

  • USDT Payments (TRC20): Deposit and withdraw using Tether on Tron network
  • Flexible Pricing Types: Hourly, daily, weekly, and monthly billing
  • Provider Dashboard: GPU owners can manage listings and track earnings
2025-09-20v0.3.0
featureapi

New Features

  • AI GPU Recommendations: Natural language GPU suggestions
  • Task History API: Rental history with detailed statistics
  • Session Management: View and manage active sessions
2025-07-10v0.2.0
featureprovider

New Features

  • Multi-GPU Support: Rent machines with multiple GPUs
  • Provider Agent: Connect your GPUs to the marketplace
  • Heartbeat Monitoring: Real-time provider availability tracking
2025-05-01v0.1.0
launch

Initial Launch

  • GPU Marketplace: Browse and rent GPUs from providers worldwide
  • Instant SSH Access: Get credentials immediately after rental
  • YooKassa Payments: Russian payment methods
  • User Dashboard: Manage rentals, balance, and usage
  • REST API: Full API access for automation

Upgrade Instructions

Python SDK: pip install --upgrade GPUniq

The API is backward compatible. Existing integrations continue to work.